[PATCH 00/28] check-parallel: Running tests without check

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/28] check-parallel: Running tests without check
@ 2025-04-17  3:00 Dave Chinner
  2025-04-17  3:00 ` [PATCH 01/28] fstests: remove support for non-numeric test names Dave Chinner
                   ` (27 more replies)
  0 siblings, 28 replies; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

Hi folks,

This set of patches is intended to move check-parallel away from
using check to execute tests.

To do this, we need to share a bunch of check code between check and
check-parallel. This is mainly the code that parses and builds the
test list, the config section parsing and iteration, and the test
execution loop itself.

To do this, test list parsing and building is factored out of check
into common/test_list. check is converted to use the test list
functions at the same time, and then check-parallel is converted to
use the factored code to directly build it's test list rather than
the open coded grep hack it currently uses.  This allows
check-parallel CLI to use the same group selection interface as
check.

The next change is to factor the config section parsing out of
common/config and move it to common/config-section. This allows
check-parallel to parse and implement section iteration itself
without needing to run all the environment setup code in
common/config. This also allows check-parallel to implement it's own
config section to define the device sizes that it will use
independently of the sections that run tests.

Next, we change check-parallel to use a global test list that runner
scripts can safely dequeue the next test to run. This uses a test
list file and a lock file to serialise access to the file. Hence a
runner can dequeue the next test and remove it from the test list
file without racing with any other runner trying to dequeue the next
test to run. This means we get rid of the static per-runner test
lists that result in many runners finishing and going idle while
other test runners have pending tests still to run. i.e. all test
runners keep executing tests until there are no tests left in the
queue, hence keeping utilisation as high as possible across the test
run.

Then we factor the test execution loop out of check and put it in
common/test_exec. This abstraction makes the results array part of
the test execution, as well using a context defined helper
"_run_seq" to do the actual execution of the test. This allows the
test execution loop to be completely generic, whilst allowing check
and check-parallel to do completely independent things with
individual test execution and overall results reporting.

Finally, we change check-parallel to run tests directly via the
common/test_exec infrastructure rather than executing them via
check. This requires a new helper function that does the test
environment setup in the private mount+pid namespace, but this is
much simpler and faster than using check itself to execute
individual tests. This last bit of functionality is still a work in
progress, so this specific patch is still tagged with [RFC].

There are lots of other bits of changes. The way common/rc and
common/config are used is changed. common/config only sets up the
execution environment now, and should not contain any code that
needs to be executed outside of environment setup. It should only be
sourced once at the highest level to set up the environment, and
never called again.

common/rc is similar - all directly executed code has been removed
from it, and that is now called from the high level code that needs
initialisation work done.  It no longer sources common/config,
either. The test preamble does not need to run init_rc() any more;
they just need to source the generic and fs specific functions the
tests may run. Also, because check does some weird things and lots
of _requires....() functions assume the TEST_DEV is mounted without
first running _require_test(), it also needs to ensure the TEST_DEV
is mounted...

check-parallel can now take a "-t N" parameter to specify how many
execution threads it will use. If this is not specified, it will
default to the number of CPUs in the machine. Testing with 4p
restrictions show that check-parallel will run the quick group 3.5x
faster on a 4p system with 8 execution threads than it will with a
single execution thread. IOWs, even on small test systems,
check-parallel can result in dramatic reductions in test runtime
over check.

On a 64 p machine, testing XFS with the quick group drops from 61
minutes to just under 4 minutes. Testing XFS with the auto group
drops from 246 minutes to just under 8 minutes.

Other miscellaneous stuff in the series:

	- kill non-numeric test name support
	- creating common/exit for all the general test exit
	  functions to fix circular dependencies between common/rc
	  and common/config
	- fix iscratch_mkfs_sized to make USE_EXTERNAL on XFS work
	  the same as ext4.
	- dm-logwrites devices are now created by check-parallel
	- several test conversions from sync() to syncfs()
	- removal of a could of stale .c test source files.
	- address poor CPU count scaling in a couple of tests

I have tried not to cause any regressions for people running plain
check. I've tested that a bit with XFS and ext4, but I can't
guarantee that there aren't issues I haven't uncovered. e.g. btrfs,
as yet, is untested. It is unfortunate that the problem I seek to
address - running exhaustive check testing across many filesystem
types and configurations is prohibitively expensive in terms of time
- is the very reason I can't really adequately test check for
regressions as I develop check-parallel functionality...

Thoughts, comments and code review all welcome!

-Dave.

 .gitignore                          |   1 -
 check                               | 727 ++++--------------------------------
 check-parallel                      | 351 ++++++++++++++---
 common/config                       | 612 +-----------------------------
 common/config-sections              | 461 +++++++++++++++++++++++
 common/dmlogwrites                  |   5 +-
 common/exit                         |  48 +++
 common/preamble                     |  19 +-
 common/rc                           | 253 +++++++++++--
 common/report                       |   2 +-
 common/test_exec                    | 377 +++++++++++++++++++
 common/test_list                    | 308 +++++++++++++++
 common/test_names                   |   8 +-
 new                                 |  24 --
 src/Makefile                        |   4 +-
 src/bulkstat_unlink_test.c          |  12 +-
 src/bulkstat_unlink_test_modified.c | 193 ----------
 src/fsync-tester.c                  |   2 +-
 src/open_by_handle.c                |   6 +-
 src/scaleread.c                     | 224 -----------
 src/scaleread.sh                    |  64 ----
 src/stale_handle.c                  |  15 +-
 tests/generic/531                   |   8 +-
 tests/xfs/259                       |   1 -
 tests/xfs/271                       |   2 -
 tools/run_test.sh                   | 116 ++++++
 26 files changed, 1954 insertions(+), 1889 deletions(-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/28] fstests: remove support for non-numeric test names
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-04-30  9:17   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 02/28] _scratch_mkfs_sized: obey USE_EXTERNAL for XFS filesystems Dave Chinner
                   ` (26 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

We haven't had any tests using the "999-the-mark-of-fstests" name
format for a long time. Th eonly test that used this format was
xfs/191-input-validation, and that got removed in 2022 by commit
c1941d6f5 ("xfs/191: remove broken test").

However, the infrastructure for this naming convention still exists,
so lets get rid of that dead code so we don't have to carry it
anymore.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check             | 15 ---------------
 common/test_names |  8 +-------
 new               | 24 ------------------------
 3 files changed, 1 insertion(+), 46 deletions(-)

diff --git a/check b/check
index 9451c350b..d6bab8b5f 100755
--- a/check
+++ b/check
@@ -856,21 +856,6 @@ function run_section()
 	for ((ix = 0; ix < ${#_list[*]}; !${#loop_status[*]} && ix++)); do
 		seq="${_list[$ix]}"
 
-		if [ ! -f $seq ]; then
-			# Try to get full name in case the user supplied only
-			# seq id and the test has a name. A bit of hassle to
-			# find really the test and not its sample output or
-			# helping files.
-			bname=$(basename $seq)
-			full_seq=$(find $(dirname $seq) -name $bname* -executable |
-				awk '(NR == 1 || length < length(shortest)) { shortest = $0 }\
-				     END { print shortest }')
-			if [ -f $full_seq ] && \
-			   [ x$(echo $bname | grep -o "^$VALID_TEST_ID") != x ]; then
-				seq=$full_seq
-			fi
-		fi
-
 		# the filename for the test and the name output are different.
 		# we don't include the tests/ directory in the name output.
 		export seqnum=${seq#$SRC_DIR/}
diff --git a/common/test_names b/common/test_names
index 98af40cdb..b18fc9e36 100644
--- a/common/test_names
+++ b/common/test_names
@@ -2,11 +2,5 @@
 
 # Valid test names start with 3 digits "NNN":
 #  "[0-9]\{3\}"
-# followed by an optional "-":
-#  "-\?"
-# followed by an optional combination of alphanumeric and "-" chars:
-#  "[[:alnum:]-]*"
-# e.g. 999-the-mark-of-fstests
-#
 VALID_TEST_ID="[0-9]\{3\}"
-VALID_TEST_NAME="$VALID_TEST_ID-\?[[:alnum:]-]*"
+VALID_TEST_NAME="$VALID_TEST_ID"
diff --git a/new b/new
index 6b50ffeda..c786a9dbb 100755
--- a/new
+++ b/new
@@ -50,30 +50,6 @@ export AWK_PROG="$(type -P awk)"
 echo "Next test id is $id"
 shift
 
-read -p "Append a name to the ID? Test name will be $id-\$name. y,[n]: " -r
-if [[ $REPLY = [Yy] ]]; then
-	# get the new name from user
-	name=""
-	while [ "$name" = "" ]; do
-		read -p "Enter the name: "
-		if [ "$REPLY" = "" ]; then
-			echo "For canceling, use ctrl+c."
-		elif echo "$id-$REPLY" | grep -q "^$VALID_TEST_NAME$"; then
-			if [ -e "$tdir/$id-$REPLY" ]; then
-				echo "File '$id-$REPLY' already exists, use another one."
-				echo
-			else
-				name="$REPLY"
-			fi
-		else
-			echo "A name can contain only alphanumeric symbols and dash!"
-			echo
-		fi
-	done
-
-	id="$id-$name"
-fi
-
 echo "Creating test file '$id'"
 
 if [ -f $tdir/$id ]
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/28] fstests: remove support for non-numeric test names
  2025-04-17  3:00 ` [PATCH 01/28] fstests: remove support for non-numeric test names Dave Chinner
@ 2025-04-30  9:17   ` Nirjhar Roy (IBM)
  2025-05-21  2:39     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  9:17 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We haven't had any tests using the "999-the-mark-of-fstests" name
> format for a long time. Th eonly test that used this format was
> xfs/191-input-validation, and that got removed in 2022 by commit
> c1941d6f5 ("xfs/191: remove broken test").
> 
> However, the infrastructure for this naming convention still exists,
> so lets get rid of that dead code so we don't have to carry it
> anymore.
Any other reason why we are planning to remove this convention apart
from the fact that it is not being used for a long time? But yes,I
agree that only numeric names are easier to refer and we can also use
some one liner shell script tricks to run several tests - something
like 
./check xfs/{1...100} to run all the tests from xfs/1 xfs/2 ... xfs/100
(ofcourse assuming all these tests with these numbers exist).
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check             | 15 ---------------
>  common/test_names |  8 +-------
>  new               | 24 ------------------------
>  3 files changed, 1 insertion(+), 46 deletions(-)
> 
> diff --git a/check b/check
> index 9451c350b..d6bab8b5f 100755
> --- a/check
> +++ b/check
> @@ -856,21 +856,6 @@ function run_section()
>  	for ((ix = 0; ix < ${#_list[*]}; !${#loop_status[*]} && ix++));
> do
>  		seq="${_list[$ix]}"
>  
> -		if [ ! -f $seq ]; then
> -			# Try to get full name in case the user
> supplied only
> -			# seq id and the test has a name. A bit of
> hassle to
> -			# find really the test and not its sample
> output or
> -			# helping files.
> -			bname=$(basename $seq)
> -			full_seq=$(find $(dirname $seq) -name $bname*
> -executable |
> -				awk '(NR == 1 || length <
> length(shortest)) { shortest = $0 }\
> -				     END { print shortest }')
> -			if [ -f $full_seq ] && \
> -			   [ x$(echo $bname | grep -o
> "^$VALID_TEST_ID") != x ]; then
> -				seq=$full_seq
> -			fi
> -		fi
> -
>  		# the filename for the test and the name output are
> different.
>  		# we don't include the tests/ directory in the name
> output.
>  		export seqnum=${seq#$SRC_DIR/}
> diff --git a/common/test_names b/common/test_names
> index 98af40cdb..b18fc9e36 100644
> --- a/common/test_names
> +++ b/common/test_names
> @@ -2,11 +2,5 @@
>  
>  # Valid test names start with 3 digits "NNN":
>  #  "[0-9]\{3\}"
> -# followed by an optional "-":
> -#  "-\?"
> -# followed by an optional combination of alphanumeric and "-" chars:
> -#  "[[:alnum:]-]*"
> -# e.g. 999-the-mark-of-fstests
> -#
>  VALID_TEST_ID="[0-9]\{3\}"
> -VALID_TEST_NAME="$VALID_TEST_ID-\?[[:alnum:]-]*"
> +VALID_TEST_NAME="$VALID_TEST_ID"
> diff --git a/new b/new
> index 6b50ffeda..c786a9dbb 100755
> --- a/new
> +++ b/new
> @@ -50,30 +50,6 @@ export AWK_PROG="$(type -P awk)"
>  echo "Next test id is $id"
>  shift
>  
> -read -p "Append a name to the ID? Test name will be $id-\$name.
> y,[n]: " -r
> -if [[ $REPLY = [Yy] ]]; then
> -	# get the new name from user
> -	name=""
> -	while [ "$name" = "" ]; do
> -		read -p "Enter the name: "
> -		if [ "$REPLY" = "" ]; then
> -			echo "For canceling, use ctrl+c."
> -		elif echo "$id-$REPLY" | grep -q "^$VALID_TEST_NAME$";
> then
> -			if [ -e "$tdir/$id-$REPLY" ]; then
> -				echo "File '$id-$REPLY' already exists,
> use another one."
> -				echo
> -			else
> -				name="$REPLY"
> -			fi
> -		else
> -			echo "A name can contain only alphanumeric
> symbols and dash!"
> -			echo
> -		fi
> -	done
> -
> -	id="$id-$name"
> -fi
> -
>  echo "Creating test file '$id'"
>  
>  if [ -f $tdir/$id ]


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/28] fstests: remove support for non-numeric test names
  2025-04-30  9:17   ` Nirjhar Roy (IBM)
@ 2025-05-21  2:39     ` Dave Chinner
  2025-05-26  5:14       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  2:39 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, Apr 30, 2025 at 02:47:00PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We haven't had any tests using the "999-the-mark-of-fstests" name
> > format for a long time. Th eonly test that used this format was
> > xfs/191-input-validation, and that got removed in 2022 by commit
> > c1941d6f5 ("xfs/191: remove broken test").
> > 
> > However, the infrastructure for this naming convention still exists,
> > so lets get rid of that dead code so we don't have to carry it
> > anymore.
> Any other reason why we are planning to remove this convention apart
> from the fact that it is not being used for a long time?

It hasn't been used because nobody has ever really seen much value in trying
to describe the test in the test filename. It would be used if
people valued it, right? Mostly, though, people complained about
that one test with a weird name....

> But yes,I
> agree that only numeric names are easier to refer and we can also use
> some one liner shell script tricks to run several tests - something
> like 
> ./check xfs/{1...100} to run all the tests from xfs/1 xfs/2 ... xfs/100
> (ofcourse assuming all these tests with these numbers exist).

Right, that becomes more complex as soon as names have free-form
components.

If you want to know what all the tests do, use the lsqa.pl to
extract the initial comment in the test that describes what the test
is exercising.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 01/28] fstests: remove support for non-numeric test names
  2025-05-21  2:39     ` Dave Chinner
@ 2025-05-26  5:14       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  5:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 08:09, Dave Chinner wrote:
> On Wed, Apr 30, 2025 at 02:47:00PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> We haven't had any tests using the "999-the-mark-of-fstests" name
>>> format for a long time. Th eonly test that used this format was
>>> xfs/191-input-validation, and that got removed in 2022 by commit
>>> c1941d6f5 ("xfs/191: remove broken test").
>>>
>>> However, the infrastructure for this naming convention still exists,
>>> so lets get rid of that dead code so we don't have to carry it
>>> anymore.
>> Any other reason why we are planning to remove this convention apart
>> from the fact that it is not being used for a long time?
> It hasn't been used because nobody has ever really seen much value in trying
> to describe the test in the test filename. It would be used if
> people valued it, right? Mostly, though, people complained about
> that one test with a weird name....
Yeah.
>
>> But yes,I
>> agree that only numeric names are easier to refer and we can also use
>> some one liner shell script tricks to run several tests - something
>> like
>> ./check xfs/{1...100} to run all the tests from xfs/1 xfs/2 ... xfs/100
>> (ofcourse assuming all these tests with these numbers exist).
> Right, that becomes more complex as soon as names have free-form
> components.
>
> If you want to know what all the tests do, use the lsqa.pl to
> extract the initial comment in the test that describes what the test
> is exercising.

Thanks. I wasn't aware of this. This is useful.

I feel this change is useful. Feel free to add

Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

--NR

>
> -Dave.
>
-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 02/28] _scratch_mkfs_sized: obey USE_EXTERNAL for XFS filesystems
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
  2025-04-17  3:00 ` [PATCH 01/28] fstests: remove support for non-numeric test names Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-05  6:14   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 03/28] fstests: move test exit functions to common/exit Dave Chinner
                   ` (25 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.como>

_scratch_mkfs_sized is failing on my check-parallel setup because it
has an RTDEV defined but USE_EXTERNAL is not set.  Hence
_scratch_mkfs_sized sees the RTDEV, adds a size parameter for it,
then passes that to _try_scratch_mkfs_xfs() which looks at
USE_EXTERNAL and so doesn't add a RTDEV to the filesystem.

The result is that _scratch_mkfs_sized -always- fails with this sort
of error:

.....
** mkfs failed with extra mkfs options added to "-m rmapbt=1 -i exchange=1 " by test 300 **
** attempting to mkfs using only test 300 options: -d size=536870912 -r size=536870912 -b size=4096 **
size specified for non-existent rt subvolume
Usage: mkfs.xfs
.....

Make the XFS code in _scratch_mkfs_sized look at USE_EXTERNAL like
the ext4 code does.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 common/rc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/common/rc b/common/rc
index 9bed6dad9..12a05cb96 100644
--- a/common/rc
+++ b/common/rc
@@ -1264,7 +1264,7 @@ _try_scratch_mkfs_sized()
 	xfs)
 		local rt_ops
 
-		if [ -b "$SCRATCH_RTDEV" ]; then
+		if [ "$USE_EXTERNAL" = yes -a -b "$SCRATCH_RTDEV" ]; then
 			local rtdevsize=`blockdev --getsize64 $SCRATCH_RTDEV`
 			if [ "$fssize" -gt "$rtdevsize" ]; then
 				_notrun "Scratch rt device too small"
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 02/28] _scratch_mkfs_sized: obey USE_EXTERNAL for XFS filesystems
  2025-04-17  3:00 ` [PATCH 02/28] _scratch_mkfs_sized: obey USE_EXTERNAL for XFS filesystems Dave Chinner
@ 2025-05-05  6:14   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-05  6:14 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.como>
> 
> _scratch_mkfs_sized is failing on my check-parallel setup because it
> has an RTDEV defined but USE_EXTERNAL is not set.  Hence
> _scratch_mkfs_sized sees the RTDEV, adds a size parameter for it,
> then passes that to _try_scratch_mkfs_xfs() which looks at
> USE_EXTERNAL and so doesn't add a RTDEV to the filesystem.
Okay so the function flow is like:
_try_scratch_mkfs_xfs() --> _scratch_mkfs_xfs_opts --
>  _scratch_xfs_options() --> _scratch_options mkfs -->
_scratch_xfs_options()
In _scratch_xfs_options() we have

   [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_RTDEV" ] && \
	SCRATCH_OPTIONS="$SCRATCH_OPTIONS
${rt_opt}rtdev=$SCRATCH_RTDEV"

_scratch_mkfs_xfs_opts() adds $mkfs_opts which already holds "rt_ops="-
r size=$fssize" but _scratch_xfs_options() doesn't add "-r rtdev=<realt
ime_device>" because USE_EXTERNAL is not set and hence the error. This
makes sense. We should fill up rt_ops only when USE_EXTERNAL is set.This looks good to me. 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
> 
> The result is that _scratch_mkfs_sized -always- fails with this sort
> of error:
> 
> .....
> ** mkfs failed with extra mkfs options added to "-m rmapbt=1 -i exchange=1 " by test 300 **
> ** attempting to mkfs using only test 300 options: -d size=536870912 -r size=536870912 -b size=4096 **
> size specified for non-existent rt subvolume
> Usage: mkfs.xfs
> .....
> 
> Make the XFS code in _scratch_mkfs_sized look at USE_EXTERNAL like
> the ext4 code does.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  common/rc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/common/rc b/common/rc
> index 9bed6dad9..12a05cb96 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -1264,7 +1264,7 @@ _try_scratch_mkfs_sized()
>  	xfs)
>  		local rt_ops
>  
> -		if [ -b "$SCRATCH_RTDEV" ]; then
> +		if [ "$USE_EXTERNAL" = yes -a -b "$SCRATCH_RTDEV" ]; then
>  			local rtdevsize=`blockdev --getsize64 $SCRATCH_RTDEV`
>  			if [ "$fssize" -gt "$rtdevsize" ]; then
>  				_notrun "Scratch rt device too small"


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 03/28] fstests: move test exit functions to common/exit
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
  2025-04-17  3:00 ` [PATCH 01/28] fstests: remove support for non-numeric test names Dave Chinner
  2025-04-17  3:00 ` [PATCH 02/28] _scratch_mkfs_sized: obey USE_EXTERNAL for XFS filesystems Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-04-17  3:00 ` [PATCH 04/28] check-parallel: report how many tests were _notrun Dave Chinner
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Functions like _fatal() and _exit() need to be called from
common/config as well as general common and test code. Having some
of these test exit functions defined in common/config requires all
contexts to source this file and run all the environment setup
just to gain access to these functions.

There is a catch-22 with these functions - they cannot be defined in
common/rc for all contexts to pick up automatically, because
common/config needs them and some functions in common/rc depend on
common/config being sourced first. And we can't define them all in
common/config, because there are contexts where that hasn't been
sourced that need a specific test exit function.

Solve this by move all the test exit functions to a new common/exit
file and source that explicitly in the contexts that need it, hence
removing the circular dependency between common/config and common/rc
for defining these functions...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check           |  2 ++
 common/config   | 16 +---------------
 common/exit     | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 common/preamble |  1 +
 common/rc       | 22 ----------------------
 5 files changed, 52 insertions(+), 37 deletions(-)
 create mode 100644 common/exit

diff --git a/check b/check
index d6bab8b5f..69866b14b 100755
--- a/check
+++ b/check
@@ -52,6 +52,8 @@ rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
 SRC_GROUPS="generic"
 export SRC_DIR="tests"
 
+. ./common/exit
+
 usage()
 {
     echo "Usage: $0 [options] [testlist]"'
diff --git a/common/config b/common/config
index eada39717..5081c300a 100644
--- a/common/config
+++ b/common/config
@@ -40,6 +40,7 @@
 #
 
 . common/test_names
+. common/exit
 
 # all tests should use a common language setting to prevent golden
 # output mismatches.
@@ -96,15 +97,6 @@ export LOCAL_CONFIGURE_OPTIONS=${LOCAL_CONFIGURE_OPTIONS:=--enable-readline=yes}
 
 export RECREATE_TEST_DEV=${RECREATE_TEST_DEV:=false}
 
-# This functions sets the exit code to status and then exits. Don't use
-# exit directly, as it might not set the value of "$status" correctly, which is
-# used as an exit code in the trap handler routine set up by the check script.
-_exit()
-{
-	test -n "$1" && status="$1"
-	exit "$status"
-}
-
 # Handle mkfs.$fstyp which does (or does not) require -f to overwrite
 set_mkfs_prog_path_with_opts()
 {
@@ -121,12 +113,6 @@ set_mkfs_prog_path_with_opts()
 	fi
 }
 
-_fatal()
-{
-    echo "$*"
-    _exit 1
-}
-
 export MKFS_PROG="$(type -P mkfs)"
 [ "$MKFS_PROG" = "" ] && _fatal "mkfs not found"
 
diff --git a/common/exit b/common/exit
new file mode 100644
index 000000000..16777507a
--- /dev/null
+++ b/common/exit
@@ -0,0 +1,48 @@
+##/bin/bash
+# SPDX-License-Identifier: GPL-2.0+
+# Copyright (c) 2000-2006 Silicon Graphics, Inc.  All Rights Reserved.
+#
+# Test termination functions that need to be independent of the context that the
+# test is running in. This must not have any dependencies on common/config or
+# common/rc, as they both require these functions to be defined before they are
+# sourced.
+
+# Exit a context, setting up the test exit status if appropriate.
+#
+# "$status" is the exit code for the trap handler routine set up by the check
+# script, hence we cannot call exit directly from contexts that require $status
+# to be set.
+_exit()
+{
+	test -n "$1" && status="$1"
+	exit "$status"
+}
+
+# Bail out of a test context, setting up .notrun file. Need to kill the
+# filesystem check files here, otherwise they are set incorrectly for the next
+# test.
+_notrun()
+{
+    echo "$*" > $seqres.notrun
+    echo "$seq not run: $*"
+    rm -f ${RESULT_DIR}/require_test*
+    rm -f ${RESULT_DIR}/require_scratch*
+
+    _exit 0
+}
+
+# Exit a test immediately with a fatal error.
+_fail()
+{
+    echo "$*" | tee -a $seqres.full
+    echo "(see $seqres.full for details)"
+    _exit 1
+}
+
+# Exit any context with a fatal error.
+_fatal()
+{
+    echo "$*"
+    _exit 1
+}
+
diff --git a/common/preamble b/common/preamble
index ba029a347..0b684cc33 100644
--- a/common/preamble
+++ b/common/preamble
@@ -49,6 +49,7 @@ _begin_fstest()
 
 	_register_cleanup _cleanup
 
+	. ./common/exit
 	. ./common/rc
 	init_rc
 
diff --git a/common/rc b/common/rc
index 12a05cb96..94c00d890 100644
--- a/common/rc
+++ b/common/rc
@@ -1798,28 +1798,6 @@ _do()
     return $ret
 }
 
-# bail out, setting up .notrun file. Need to kill the filesystem check files
-# here, otherwise they are set incorrectly for the next test.
-#
-_notrun()
-{
-    echo "$*" > $seqres.notrun
-    echo "$seq not run: $*"
-    rm -f ${RESULT_DIR}/require_test*
-    rm -f ${RESULT_DIR}/require_scratch*
-
-    _exit 0
-}
-
-# just plain bail out
-#
-_fail()
-{
-    echo "$*" | tee -a $seqres.full
-    echo "(see $seqres.full for details)"
-    _exit 1
-}
-
 #
 # Tests whether $FSTYP should be exclude from this test.
 #
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/28] check-parallel: report how many tests were _notrun
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (2 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 03/28] fstests: move test exit functions to common/exit Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-05  9:58   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 05/28] check: factor out test list building code Dave Chinner
                   ` (23 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

We already count tests "run" and tests that failed, but there is no
indication of how many tests were _notrun. With tests actually
failing to run because of things like bugs in _scratch_mkfs_sized
being reported as "_notrun" instead of failing, visibility of
changing numbers of _notrun tests is needed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/check-parallel b/check-parallel
index c85437252..d68d76e55 100755
--- a/check-parallel
+++ b/check-parallel
@@ -188,7 +188,10 @@ done;
 wait
 
 echo -n "Tests run: "
-grep Ran /mnt/xfs/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
+grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
+
+echo -n "Tests _notrun: "
+grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
 
 echo -n "Failure count: "
 grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/28] check-parallel: report how many tests were _notrun
  2025-04-17  3:00 ` [PATCH 04/28] check-parallel: report how many tests were _notrun Dave Chinner
@ 2025-05-05  9:58   ` Nirjhar Roy (IBM)
  2025-05-21  2:53     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-05  9:58 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> We already count tests "run" and tests that failed, but there is no
> indication of how many tests were _notrun. With tests actually
> failing to run because of things like bugs in _scratch_mkfs_sized
> being reported as "_notrun" instead of failing, visibility of
> changing numbers of _notrun tests is needed.
This looks okay to me. 2 questions below. Apart from that, this looks
good to me.

Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/check-parallel b/check-parallel
> index c85437252..d68d76e55 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -188,7 +188,10 @@ done;
>  wait
>  
>  echo -n "Tests run: "
> -grep Ran /mnt/xfs/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
> +grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
Nit: /mnt/xfs --> $basedir is not related to this change, right? Should
we move it to a different commit/patch?

Not related to this change - Can we change the above statement to
something like it is done below for _notrun i.e 
grep "Ran $basedir/*/log" | uniq | sed ... | wc - l
in this way we can get rid of the sort command?
--NR

> +
> +echo -n "Tests _notrun: "
> +grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -lDon't we need to sort before uniq?
>  
>  echo -n "Failure count: "
>  grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/28] check-parallel: report how many tests were _notrun
  2025-05-05  9:58   ` Nirjhar Roy (IBM)
@ 2025-05-21  2:53     ` Dave Chinner
  2025-05-26  6:09       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  2:53 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Mon, May 05, 2025 at 03:28:10PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > We already count tests "run" and tests that failed, but there is no
> > indication of how many tests were _notrun. With tests actually
> > failing to run because of things like bugs in _scratch_mkfs_sized
> > being reported as "_notrun" instead of failing, visibility of
> > changing numbers of _notrun tests is needed.
> This looks okay to me. 2 questions below. Apart from that, this looks
> good to me.
> 
> Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
> 
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  check-parallel | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/check-parallel b/check-parallel
> > index c85437252..d68d76e55 100755
> > --- a/check-parallel
> > +++ b/check-parallel
> > @@ -188,7 +188,10 @@ done;
> >  wait
> >  
> >  echo -n "Tests run: "
> > -grep Ran /mnt/xfs/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
> > +grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
> Nit: /mnt/xfs --> $basedir is not related to this change, right? Should
> we move it to a different commit/patch?

I don't see much point in separating it out into a different patch.
That makes more work for everyone and it's a simply, obvious fix to
reporting code this patch is adding to...

> Not related to this change - Can we change the above statement to
> something like it is done below for _notrun i.e 
> grep "Ran $basedir/*/log" | uniq | sed ... | wc - l
> in this way we can get rid of the sort command?

Don't think so - uniq only filters adjacent lines. You have to sort
the input to put all the duplicates together in the stream before
uniq will detect and remove them. I suspect that it is possible to
use 'sort -u' rather than 'sort | uniq' but I've always used the
latter because not all implementations of sort support the '-u'
option.

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 04/28] check-parallel: report how many tests were _notrun
  2025-05-21  2:53     ` Dave Chinner
@ 2025-05-26  6:09       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  6:09 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 08:23, Dave Chinner wrote:
> On Mon, May 05, 2025 at 03:28:10PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> We already count tests "run" and tests that failed, but there is no
>>> indication of how many tests were _notrun. With tests actually
>>> failing to run because of things like bugs in _scratch_mkfs_sized
>>> being reported as "_notrun" instead of failing, visibility of
>>> changing numbers of _notrun tests is needed.
>> This looks okay to me. 2 questions below. Apart from that, this looks
>> good to me.
>>
>> Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>>> ---
>>>   check-parallel | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/check-parallel b/check-parallel
>>> index c85437252..d68d76e55 100755
>>> --- a/check-parallel
>>> +++ b/check-parallel
>>> @@ -188,7 +188,10 @@ done;
>>>   wait
>>>   
>>>   echo -n "Tests run: "
>>> -grep Ran /mnt/xfs/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
>>> +grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
>> Nit: /mnt/xfs --> $basedir is not related to this change, right? Should
>> we move it to a different commit/patch?
> I don't see much point in separating it out into a different patch.
> That makes more work for everyone and it's a simply, obvious fix to
> reporting code this patch is adding to...
Okay.
>
>> Not related to this change - Can we change the above statement to
>> something like it is done below for _notrun i.e
>> grep "Ran $basedir/*/log" | uniq | sed ... | wc - l
>> in this way we can get rid of the sort command?
> Don't think so - uniq only filters adjacent lines. You have to sort
> the input to put all the duplicates together in the stream before
> uniq will detect and remove them. I suspect that it is possible to
> use 'sort -u' rather than 'sort | uniq' but I've always used the
> latter because not all implementations of sort support the '-u'
> option.

Yes, uniq considers adjacent lines. So for the getting number of 
_notruns we are using:

"grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' 
-e 's,^\n,,' | wc", why aren't we using sort? After looking at this 
command, I suggested above change.

--NR

>
> -Dave.
>
-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 05/28] check: factor out test list building code
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (3 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 04/28] check-parallel: report how many tests were _notrun Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-06 11:32   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 06/28] check-parallel: use common group list parsing code Dave Chinner
                   ` (22 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Factor out all the test list parsing and building code to
common/test_list so that it can be used by both check and
check-parallel.

This also namespaces all the test list code to use _tl_ prefixes,
and adds wrappers to set up test list parsing parameters.

Note: there is still some future work to convert externally visible
parameters like SRC_DIR to the new namespace.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check            | 270 +++----------------------------------------
 common/report    |   2 +-
 common/test_list | 295 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 314 insertions(+), 253 deletions(-)
 create mode 100644 common/test_list

diff --git a/check b/check
index 69866b14b..900ea2ba4 100755
--- a/check
+++ b/check
@@ -15,19 +15,13 @@ notrun=()
 interrupt=true
 diff="diff -u"
 showme=false
-have_test_arg=false
-randomize=false
-exact_order=false
 export here=`pwd`
-xfile=""
-subdir_xfile=""
 brief_test_summary=false
 do_report=false
 DUMP_OUTPUT=false
 iterations=1
 istop=false
 loop_on_fail=0
-exclude_tests=()
 
 # This is a global variable used to pass test failure text to reporting gunk
 _err_msg=""
@@ -49,8 +43,9 @@ timestamp=${TIMESTAMP:=false}
 
 rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
 
-SRC_GROUPS="generic"
-export SRC_DIR="tests"
+# We need to include the test list processing first as argument parsing
+# requires test list parsing and setup.
+. ./common/test_list
 
 . ./common/exit
 
@@ -126,153 +121,12 @@ examples:
 	    exit 1
 }
 
-get_sub_group_list()
-{
-	local d=$1
-	local grp=$2
-
-	test -s "$SRC_DIR/$d/group.list" || return 1
-
-	local grpl=$(sed -n < $SRC_DIR/$d/group.list \
-		-e 's/#.*//' \
-		-e 's/$/ /' \
-		-e "s;^\($VALID_TEST_NAME\).* $grp .*;$SRC_DIR/$d/\1;p")
-	echo $grpl
-}
-
-get_group_list()
-{
-	local grp=$1
-	local grpl=""
-	local sub=$(dirname $grp)
-	local fsgroup="$FSTYP"
-
-	if [ -n "$sub" -a "$sub" != "." -a -d "$SRC_DIR/$sub" ]; then
-		# group is given as <subdir>/<group> (e.g. xfs/quick)
-		grp=$(basename $grp)
-		get_sub_group_list $sub $grp
-		return
-	fi
-
-	if [ "$FSTYP" = ext2 -o "$FSTYP" = ext3 ]; then
-	    fsgroup=ext4
-	fi
-	for d in $SRC_GROUPS $fsgroup; do
-		if ! test -d "$SRC_DIR/$d" ; then
-			continue
-		fi
-		grpl="$grpl $(get_sub_group_list $d $grp)"
-	done
-	echo $grpl
-}
-
-# Find all tests, excluding files that are test metadata such as group files.
-# It matches test names against $VALID_TEST_NAME defined in common/rc
-get_all_tests()
-{
-	touch $tmp.list
-	for d in $SRC_GROUPS $FSTYP; do
-		if ! test -d "$SRC_DIR/$d" ; then
-			continue
-		fi
-		ls $SRC_DIR/$d/* | \
-			grep -v "\..*" | \
-			grep "^$SRC_DIR/$d/$VALID_TEST_NAME"| \
-			grep -v "group\|Makefile" >> $tmp.list 2>/dev/null
-	done
-}
-
-# takes the list of tests to run in $tmp.list, and removes the tests passed to
-# the function from that list.
-trim_test_list()
-{
-	local test_list="$*"
-
-	rm -f $tmp.grep
-	local numsed=0
-	for t in $test_list
-	do
-	    if [ $numsed -gt 100 ]; then
-		grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
-		mv $tmp.tmp $tmp.list
-		numsed=0
-		rm -f $tmp.grep
-	    fi
-	    echo "^$t\$" >>$tmp.grep
-	    numsed=`expr $numsed + 1`
-	done
-	grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
-	mv $tmp.tmp $tmp.list
-	rm -f $tmp.grep
-}
-
 _timestamp()
 {
     local now=`date "+%T"`
     echo -n " [$now]"
 }
 
-_prepare_test_list()
-{
-	unset list
-	# Tests specified on the command line
-	if [ -s $tmp.arglist ]; then
-		cat $tmp.arglist > $tmp.list
-	else
-		touch $tmp.list
-	fi
-
-	# Specified groups to include
-	# Note that the CLI processing adds a leading space to the first group
-	# parameter, so we have to catch that here checking for "all"
-	if ! $have_test_arg && [ "$GROUP_LIST" == " all" ]; then
-		# no test numbers, do everything
-		get_all_tests
-	else
-		for group in $GROUP_LIST; do
-			list=$(get_group_list $group)
-			if [ -z "$list" ]; then
-				echo "Group \"$group\" is empty or not defined?"
-				exit 1
-			fi
-
-			for t in $list; do
-				grep -s "^$t\$" $tmp.list >/dev/null || \
-							echo "$t" >>$tmp.list
-			done
-		done
-	fi
-
-	# Specified groups to exclude
-	for xgroup in $XGROUP_LIST; do
-		list=$(get_group_list $xgroup)
-		if [ -z "$list" ]; then
-			echo "Group \"$xgroup\" is empty or not defined?"
-			continue
-		fi
-
-		trim_test_list $list
-	done
-
-	# sort the list of tests into numeric order unless we're running tests
-	# in the exact order specified
-	if ! $exact_order; then
-		if $randomize; then
-			if type shuf >& /dev/null; then
-				sorter="shuf"
-			else
-				sorter="awk -v seed=$RANDOM -f randomize.awk"
-			fi
-		else
-			sorter="cat"
-		fi
-		list=`sort -n $tmp.list | uniq | $sorter`
-	else
-		list=`cat $tmp.list`
-	fi
-	rm -f $tmp.list
-}
-
 # Process command arguments first.
 while [ $# -gt 0 ]; do
 	case "$1" in
@@ -287,48 +141,20 @@ while [ $# -gt 0 ]; do
 		export OVERLAY=true
 		;;
 
-	-g)	group=$2 ; shift ;
-		GROUP_LIST="$GROUP_LIST ${group//,/ }"
-		;;
+	-g)	_tl_setup_group $2 ; shift ;;
+	-e)	_tl_setup_exclude_tests $2 ; shift ;;
+	-E)	_tl_setup_exclude_file $2 ; shift ;;
+	-x)	_tl_setup_exclude_group $2; shift ;;
+	-X)	_tl_setup_exclude_subdir $2; shift ;;
+	-r)	_tl_setup_randomise ;;
+	--exact-order) _tl_setup_ordered ;;
 
-	-x)	xgroup=$2 ; shift ;
-		XGROUP_LIST="$XGROUP_LIST ${xgroup//,/ }"
-		;;
-
-	-X)	subdir_xfile=$2; shift ;
-		;;
-	-e)
-		xfile=$2; shift ;
-		readarray -t -O "${#exclude_tests[@]}" exclude_tests < \
-			<(echo "$xfile" | tr ', ' '\n\n')
-		;;
-
-	-E)	xfile=$2; shift ;
-		if [ -f $xfile ]; then
-			readarray -t -O ${#exclude_tests[@]} exclude_tests < \
-				<(sed "s/#.*$//" $xfile)
-		fi
-		;;
 	-s)	RUN_SECTION="$RUN_SECTION $2"; shift ;;
 	-S)	EXCLUDE_SECTION="$EXCLUDE_SECTION $2"; shift ;;
 	-l)	diff="diff" ;;
 	-udiff)	diff="$diff -u" ;;
 
 	-n)	showme=true ;;
-	-r)
-		if $exact_order; then
-			echo "Cannot specify -r and --exact-order."
-			exit 1
-		fi
-		randomize=true
-		;;
-	--exact-order)
-		if $randomize; then
-			echo "Cannnot specify --exact-order and -r."
-			exit 1
-		fi
-		exact_order=true
-		;;
 	-i)	iterations=$2; shift ;;
 	-I) 	iterations=$2; istop=true; shift ;;
 	-T)	timestamp=true ;;
@@ -346,13 +172,13 @@ while [ $# -gt 0 ]; do
 
 	-*)	usage ;;
 	*)	# not an argument, we've got tests now.
-		have_test_arg=true ;;
+		_tl_setup_cli $*
 	esac
 
 	# if we've found a test specification, the break out of the processing
 	# loop before we shift the arguments so that this is the first argument
 	# that we process in the test arg loop below.
-	if $have_test_arg; then
+	if $_tl_have_test_args; then
 		break;
 	fi
 
@@ -392,51 +218,6 @@ if [ -n "$FUZZ_REWRITE_DURATION" ]; then
 	fi
 fi
 
-if [ -n "$subdir_xfile" ]; then
-	for d in $SRC_GROUPS $FSTYP; do
-		[ -f $SRC_DIR/$d/$subdir_xfile ] || continue
-		for f in `sed "s/#.*$//" $SRC_DIR/$d/$subdir_xfile`; do
-			exclude_tests+=($d/$f)
-		done
-	done
-fi
-
-# Process tests from command line now.
-if $have_test_arg; then
-	while [ $# -gt 0 ]; do
-		case "$1" in
-		-*)	echo "Arguments before tests, please!"
-			status=1
-			exit $status
-			;;
-		*)	# Expand test pattern (e.g. xfs/???, *fs/001)
-			list=$(cd $SRC_DIR; echo $1)
-			for t in $list; do
-				t=${t#$SRC_DIR/}
-				test_dir=${t%%/*}
-				test_name=${t##*/}
-				group_file=$SRC_DIR/$test_dir/group.list
-
-				if grep -Eq "^$test_name" $group_file; then
-					# in group file ... OK
-					echo $SRC_DIR/$test_dir/$test_name \
-						>>$tmp.arglist
-				else
-					# oops
-					echo "$t - unknown test, ignored"
-				fi
-			done
-			;;
-		esac
-
-		shift
-	done
-elif [ -z "$GROUP_LIST" ]; then
-	# default group list is the auto group. If any other group or test is
-	# specified, we use that instead.
-	GROUP_LIST="auto"
-fi
-
 if [ `id -u` -ne 0 ]
 then
     echo "check: QA must be run as root"
@@ -597,21 +378,6 @@ _check_filesystems()
 	return $ret
 }
 
-_expunge_test()
-{
-	local TEST_ID="$1"
-
-	for f in "${exclude_tests[@]}"; do
-		# $f may contain traling spaces and comments
-		local id_regex="^${TEST_ID}\b"
-		if [[ "$f" =~ ${id_regex} ]]; then
-			echo "       [expunged]"
-			return 0
-		fi
-	done
-	return 1
-}
-
 # retain files which would be overwritten in subsequent reruns of the same test
 _stash_fail_loop_files() {
 	local seq_prefix="${REPORT_DIR}/${1}"
@@ -719,7 +485,7 @@ _run_seq() {
 }
 
 _detect_kmemleak
-_prepare_test_list
+_tl_prepare_test_list
 fstests_start_time="$(date +"%F %T")"
 
 if $OPTIONS_HAVE_SECTIONS; then
@@ -793,7 +559,7 @@ function run_section()
 		# common/rc again with correct FSTYP to get FSTYP specific configs,
 		# e.g. common/xfs
 		. common/rc
-		_prepare_test_list
+		_tl_prepare_test_list
 	elif [ "$OLD_TEST_FS_MOUNT_OPTS" != "$TEST_FS_MOUNT_OPTS" ]; then
 		# Unmount TEST_DEV to apply the updated mount options.
 		# It will be mounted again by init_rc(), called shortly after.
@@ -854,13 +620,13 @@ function run_section()
 
 	loop_status=()	# track rerun-on-failure state
 	local tc_status ix
-	local -a _list=( $list )
+	local -a _list=( $_tl_tests )
 	for ((ix = 0; ix < ${#_list[*]}; !${#loop_status[*]} && ix++)); do
 		seq="${_list[$ix]}"
 
 		# the filename for the test and the name output are different.
 		# we don't include the tests/ directory in the name output.
-		export seqnum=${seq#$SRC_DIR/}
+		export seqnum=$(_tl_strip_src_dir $seq)
 		group=${seqnum%%/*}
 		if $OPTIONS_HAVE_SECTIONS; then
 			REPORT_DIR="$RESULT_BASE/$section"
@@ -882,7 +648,7 @@ function run_section()
 		echo -n "$seqnum"
 
 		if $showme; then
-			if _expunge_test $seqnum; then
+			if _tl_expunge_test $seqnum; then
 				tc_status="expunge"
 			else
 				echo
@@ -908,7 +674,7 @@ function run_section()
 		rm -f $seqres.out.bad $seqres.hints
 
 		# check if we really should run it
-		if _expunge_test $seqnum; then
+		if _tl_expunge_test $seqnum; then
 			tc_status="expunge"
 			_stash_test_status "$seqnum" "$tc_status"
 			continue
diff --git a/common/report b/common/report
index 7128bbeba..5697d2540 100644
--- a/common/report
+++ b/common/report
@@ -196,7 +196,7 @@ _xunit_make_testcase_report()
 		echo -e "\t\t<skipped/>" >> $report
 		;;
 	"fail")
-		local out_src="${SRC_DIR}/${test_name}.out"
+		local out_src="${_tl_src_dir}/${test_name}.out"
 		local full_file="${REPORT_DIR}/${test_name}.full"
 		local dmesg_file="${REPORT_DIR}/${test_name}.dmesg"
 		local outbad_file="${REPORT_DIR}/${test_name}.out.bad"
diff --git a/common/test_list b/common/test_list
new file mode 100644
index 000000000..2432be6f7
--- /dev/null
+++ b/common/test_list
@@ -0,0 +1,295 @@
+##/bin/bash
+# SPDX-License-Identifier: GPL-2.0+
+# Copyright (c) 2000-2002,2006 Silicon Graphics, Inc.  All Rights Reserved.
+# Copyright (c) 2024 Red Hat, Inc.  All Rights Reserved.
+#
+# Test list parsing and building functions
+#
+# Note: this file must stand alone and not be dependent on any other includes,
+# most especially common/rc and common/config. This is because we have to
+# include this file before option parsing, whilst the rc/config includes need to
+# be included -after- option parsing.
+#
+# Any function or variable that is public should have a "_tl_" prefix.
+
+export _tl_src_dir="tests"
+
+_SRC_GROUPS="generic"
+_GROUP_LIST=
+_XGROUP_LIST=
+_tl_exact_order=false
+_tl_randomise=false
+_tl_have_test_args=false
+_tl_file="$tmp.test_list"
+_tl_exclude_tests=()
+_tl_tests=
+
+_tl_strip_src_dir()
+{
+	local test="$1"
+
+	echo ${test#$_tl_src_dir/}
+}
+
+get_sub_group_list()
+{
+	local d=$1
+	local grp=$2
+
+	test -s "$_tl_src_dir/$d/group.list" || return 1
+
+	local grpl=$(sed -n < $_tl_src_dir/$d/group.list \
+		-e 's/#.*//' \
+		-e 's/$/ /' \
+		-e "s;^\($VALID_TEST_NAME\).* $grp .*;$_tl_src_dir/$d/\1;p")
+	echo $grpl
+}
+
+get_group_list()
+{
+	local grp=$1
+	local grpl=""
+	local sub=$(dirname $grp)
+	local fsgroup="$FSTYP"
+
+	if [ -n "$sub" -a "$sub" != "." -a -d "$_tl_src_dir/$sub" ]; then
+		# group is given as <subdir>/<group> (e.g. xfs/quick)
+		grp=$(basename $grp)
+		get_sub_group_list $sub $grp
+		return
+	fi
+
+	if [ "$FSTYP" = ext2 -o "$FSTYP" = ext3 ]; then
+	    fsgroup=ext4
+	fi
+	for d in $_SRC_GROUPS $fsgroup; do
+		if ! test -d "$_tl_src_dir/$d" ; then
+			continue
+		fi
+		grpl="$grpl $(get_sub_group_list $d $grp)"
+	done
+	echo $grpl
+}
+
+# Find all tests, excluding files that are test metadata such as group files.
+# It matches test names against $VALID_TEST_NAME defined in common/rc
+get_all_tests()
+{
+	touch $tmp.list
+	for d in $_SRC_GROUPS $FSTYP; do
+		if ! test -d "$_tl_src_dir/$d" ; then
+			continue
+		fi
+		ls $_tl_src_dir/$d/* | \
+			grep -v "\..*" | \
+			grep "^$_tl_src_dir/$d/$VALID_TEST_NAME"| \
+			grep -v "group\|Makefile" >> $tmp.list 2>/dev/null
+	done
+}
+
+# takes the list of tests to run in $tmp.list, and removes the tests passed to
+# the function from that list.
+trim_test_list()
+{
+	local test_list="$*"
+
+	rm -f $tmp.grep
+	local numsed=0
+	for t in $test_list
+	do
+	    if [ $numsed -gt 100 ]; then
+		grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
+		mv $tmp.tmp $tmp.list
+		numsed=0
+		rm -f $tmp.grep
+	    fi
+	    echo "^$t\$" >>$tmp.grep
+	    numsed=`expr $numsed + 1`
+	done
+	grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
+	mv $tmp.tmp $tmp.list
+	rm -f $tmp.grep
+}
+
+_tl_prepare_test_list()
+{
+	unset _tl_tests
+	# Tests specified on the command line
+	if [ -s $_tl_file ]; then
+		cat $_tl_file > $tmp.list
+	else
+		touch $tmp.list
+	fi
+
+	# Specified groups to include
+	# Note that the CLI processing adds a leading space to the first group
+	# parameter, so we have to catch that here checking for "all"
+	if ! $_tl_have_test_args && [ "$_GROUP_LIST" == " all" ]; then
+		# no test numbers, do everything
+		get_all_tests
+	else
+		for group in $_GROUP_LIST; do
+			list=$(get_group_list $group)
+			if [ -z "$list" ]; then
+				echo "Group \"$group\" is empty or not defined?"
+				exit 1
+			fi
+
+			for t in $list; do
+				grep -s "^$t\$" $tmp.list >/dev/null || \
+							echo "$t" >>$tmp.list
+			done
+		done
+	fi
+
+	# Specified groups to exclude
+	for xgroup in $_XGROUP_LIST; do
+		list=$(get_group_list $xgroup)
+		if [ -z "$list" ]; then
+			echo "Group \"$xgroup\" is empty or not defined?"
+			continue
+		fi
+
+		trim_test_list $list
+	done
+
+	# sort the list of tests into numeric order unless we're running tests
+	# in the exact order specified
+	if ! $_tl_exact_order; then
+		if $_tl_randomise; then
+			if type shuf >& /dev/null; then
+				sorter="shuf"
+			else
+				sorter="awk -v seed=$RANDOM -f randomize.awk"
+			fi
+		else
+			sorter="cat"
+		fi
+		_tl_tests=`sort -n $tmp.list | uniq | $sorter`
+	else
+		_tl_tests=`cat $tmp.list`
+	fi
+	rm -f $tmp.list
+}
+
+_tl_expunge_test()
+{
+	local TEST_ID="$1"
+
+	for f in "${_tl_exclude_tests[@]}"; do
+		# $f may contain traling spaces and comments
+		local id_regex="^${TEST_ID}\b"
+		if [[ "$f" =~ ${id_regex} ]]; then
+			echo "       [expunged]"
+			return 0
+		fi
+	done
+	return 1
+}
+
+_tl_setup_exclude_tests()
+{
+	local list="$1"
+
+	readarray -t -O "${#_tl_exclude_tests[@]}" _tl_exclude_tests < \
+		<(echo "$list" | tr ', ' '\n\n')
+}
+
+_tl_setup_exclude_file()
+{
+	local xfile="$1"
+
+	if [ -f $xfile ]; then
+		readarray -t -O ${#_tl_exclude_tests[@]} _tl_exclude_tests < \
+			<(sed "s/#.*$//" $xfile)
+	fi
+}
+
+_tl_setup_exclude_subdir()
+{
+	local xfile="$1"
+	local d
+	local f
+
+	[ -z "$xfile" ] && return
+
+	for d in $_SRC_GROUPS $FSTYP; do
+		[ -f $_tl_src_dir/$d/$xfile ] || continue
+		for f in `sed "s/#.*$//" $_tl_src_dir/$d/$xfile`; do
+			_tl_exclude_tests+=($d/$f)
+		done
+	done
+}
+
+_tl_setup_exclude_group()
+{
+	local xgroup="$1"
+
+	_XGROUP_LIST="$_XGROUP_LIST ${xgroup//,/ }"
+}
+
+_tl_setup_group()
+{
+	local group="$1"
+
+	_GROUP_LIST="$_GROUP_LIST ${group//,/ }"
+}
+
+_tl_setup_randomise()
+{
+	if $_tl_exact_order; then
+		echo "Cannot specify -r and --exact-order."
+		exit 1
+	fi
+	_tl_randomise=true
+}
+
+_tl_setup_ordered()
+{
+	if $_tl_randomise; then
+		echo "Cannnot specify --exact-order and -r."
+		exit 1
+	fi
+	_tl_exact_order=true
+}
+
+_tl_setup_cli()
+{
+	while [ $# -gt 0 ]; do
+		case "$1" in
+		-*)	echo "Arguments before tests, please!"
+			status=1
+			exit $status
+			;;
+		*)	# Expand test pattern (e.g. xfs/???, *fs/001)
+			local list=$(cd $_tl_src_dir; echo $1)
+			local t
+
+			for t in $list; do
+				t=${t#$_tl_src_dir/}
+				local test_dir=${t%%/*}
+				local test_name=${t##*/}
+				local group_file=$_tl_src_dir/$test_dir/group.list
+
+				if grep -Eq "^$test_name" $group_file; then
+					# in group file ... OK
+					echo $_tl_src_dir/$test_dir/$test_name \
+						>> $_tl_file
+					_tl_have_test_args=true
+				else
+					# oops
+					echo "$t - unknown test, ignored"
+				fi
+			done
+			;;
+		esac
+
+		shift
+	done
+
+	if ! $_tl_have_test_args && [ -z "$_GROUP_LIST" ]; then
+		# default group list is the auto group. If any other group or
+		# test is specified, we use that instead.
+		_GROUP_LIST="auto"
+	fi
+}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/28] check: factor out test list building code
  2025-04-17  3:00 ` [PATCH 05/28] check: factor out test list building code Dave Chinner
@ 2025-05-06 11:32   ` Nirjhar Roy (IBM)
  2025-05-21  3:55     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-06 11:32 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Factor out all the test list parsing and building code to
> common/test_list so that it can be used by both check and
> check-parallel.
> 
> This also namespaces all the test list code to use _tl_ prefixes,
> and adds wrappers to set up test list parsing parameters.
> 
> Note: there is still some future work to convert externally visible
> parameters like SRC_DIR to the new namespace.
I have gone through this patch. I did some basic testing too with
./check and I havent' found any obvious issues. There are some minor
feedback that I have given below:
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check            | 270 +++----------------------------------------
>  common/report    |   2 +-
>  common/test_list | 295 +++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 314 insertions(+), 253 deletions(-)
>  create mode 100644 common/test_list
> 
> diff --git a/check b/check
> index 69866b14b..900ea2ba4 100755
> --- a/check
> +++ b/check
> @@ -15,19 +15,13 @@ notrun=()
>  interrupt=true
>  diff="diff -u"
>  showme=false
> -have_test_arg=false
> -randomize=false
> -exact_order=false
>  export here=`pwd`
> -xfile=""
> -subdir_xfile=""
>  brief_test_summary=false
>  do_report=false
>  DUMP_OUTPUT=false
>  iterations=1
>  istop=false
>  loop_on_fail=0
> -exclude_tests=()
>  
>  # This is a global variable used to pass test failure text to reporting gunk
>  _err_msg=""
> @@ -49,8 +43,9 @@ timestamp=${TIMESTAMP:=false}
>  
>  rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
>  
> -SRC_GROUPS="generic"
> -export SRC_DIR="tests"
> +# We need to include the test list processing first as argument parsing
> +# requires test list parsing and setup.
> +. ./common/test_list
>  
>  . ./common/exit
>  
> @@ -126,153 +121,12 @@ examples:
>  	    exit 1
>  }
>  
> -get_sub_group_list()
> -{
> -	local d=$1
> -	local grp=$2
> -
> -	test -s "$SRC_DIR/$d/group.list" || return 1
> -
> -	local grpl=$(sed -n < $SRC_DIR/$d/group.list \
> -		-e 's/#.*//' \
> -		-e 's/$/ /' \
> -		-e "s;^\($VALID_TEST_NAME\).* $grp .*;$SRC_DIR/$d/\1;p")
> -	echo $grpl
> -}
> -
> -get_group_list()
> -{
> -	local grp=$1
> -	local grpl=""
> -	local sub=$(dirname $grp)
> -	local fsgroup="$FSTYP"
> -
> -	if [ -n "$sub" -a "$sub" != "." -a -d "$SRC_DIR/$sub" ]; then
> -		# group is given as <subdir>/<group> (e.g. xfs/quick)
> -		grp=$(basename $grp)
> -		get_sub_group_list $sub $grp
> -		return
> -	fi
> -
> -	if [ "$FSTYP" = ext2 -o "$FSTYP" = ext3 ]; then
> -	    fsgroup=ext4
> -	fi
> -	for d in $SRC_GROUPS $fsgroup; do
> -		if ! test -d "$SRC_DIR/$d" ; then
> -			continue
> -		fi
> -		grpl="$grpl $(get_sub_group_list $d $grp)"
> -	done
> -	echo $grpl
> -}
> -
> -# Find all tests, excluding files that are test metadata such as group files.
> -# It matches test names against $VALID_TEST_NAME defined in common/rc
> -get_all_tests()
> -{
> -	touch $tmp.list
> -	for d in $SRC_GROUPS $FSTYP; do
> -		if ! test -d "$SRC_DIR/$d" ; then
> -			continue
> -		fi
> -		ls $SRC_DIR/$d/* | \
> -			grep -v "\..*" | \
> -			grep "^$SRC_DIR/$d/$VALID_TEST_NAME"| \
> -			grep -v "group\|Makefile" >> $tmp.list 2>/dev/null
> -	done
> -}
> -
> -# takes the list of tests to run in $tmp.list, and removes the tests passed to
> -# the function from that list.
> -trim_test_list()
> -{
> -	local test_list="$*"
> -
> -	rm -f $tmp.grep
> -	local numsed=0
> -	for t in $test_list
> -	do
> -	    if [ $numsed -gt 100 ]; then
> -		grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
> -		mv $tmp.tmp $tmp.list
> -		numsed=0
> -		rm -f $tmp.grep
> -	    fi
> -	    echo "^$t\$" >>$tmp.grep
> -	    numsed=`expr $numsed + 1`
> -	done
> -	grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
> -	mv $tmp.tmp $tmp.list
> -	rm -f $tmp.grep
> -}
> -
>  _timestamp()
>  {
>      local now=`date "+%T"`
>      echo -n " [$now]"
>  }
>  
> -_prepare_test_list()
> -{
> -	unset list
> -	# Tests specified on the command line
> -	if [ -s $tmp.arglist ]; then
> -		cat $tmp.arglist > $tmp.list
> -	else
> -		touch $tmp.list
> -	fi
> -
> -	# Specified groups to include
> -	# Note that the CLI processing adds a leading space to the first group
> -	# parameter, so we have to catch that here checking for "all"
> -	if ! $have_test_arg && [ "$GROUP_LIST" == " all" ]; then
> -		# no test numbers, do everything
> -		get_all_tests
> -	else
> -		for group in $GROUP_LIST; do
> -			list=$(get_group_list $group)
> -			if [ -z "$list" ]; then
> -				echo "Group \"$group\" is empty or not defined?"
> -				exit 1
> -			fi
> -
> -			for t in $list; do
> -				grep -s "^$t\$" $tmp.list >/dev/null || \
> -							echo "$t" >>$tmp.list
> -			done
> -		done
> -	fi
> -
> -	# Specified groups to exclude
> -	for xgroup in $XGROUP_LIST; do
> -		list=$(get_group_list $xgroup)
> -		if [ -z "$list" ]; then
> -			echo "Group \"$xgroup\" is empty or not defined?"
> -			continue
> -		fi
> -
> -		trim_test_list $list
> -	done
> -
> -	# sort the list of tests into numeric order unless we're running tests
> -	# in the exact order specified
> -	if ! $exact_order; then
> -		if $randomize; then
> -			if type shuf >& /dev/null; then
> -				sorter="shuf"
> -			else
> -				sorter="awk -v seed=$RANDOM -f randomize.awk"
> -			fi
> -		else
> -			sorter="cat"
> -		fi
> -		list=`sort -n $tmp.list | uniq | $sorter`
> -	else
> -		list=`cat $tmp.list`
> -	fi
> -	rm -f $tmp.list
> -}
> -
>  # Process command arguments first.
>  while [ $# -gt 0 ]; do
>  	case "$1" in
> @@ -287,48 +141,20 @@ while [ $# -gt 0 ]; do
>  		export OVERLAY=true
>  		;;
>  
> -	-g)	group=$2 ; shift ;
> -		GROUP_LIST="$GROUP_LIST ${group//,/ }"
> -		;;
> +	-g)	_tl_setup_group $2 ; shift ;;
> +	-e)	_tl_setup_exclude_tests $2 ; shift ;;
> +	-E)	_tl_setup_exclude_file $2 ; shift ;;
> +	-x)	_tl_setup_exclude_group $2; shift ;;
> +	-X)	_tl_setup_exclude_subdir $2; shift ;;
> +	-r)	_tl_setup_randomise ;;
> +	--exact-order) _tl_setup_ordered ;;
>  
> -	-x)	xgroup=$2 ; shift ;
> -		XGROUP_LIST="$XGROUP_LIST ${xgroup//,/ }"
> -		;;
> -
> -	-X)	subdir_xfile=$2; shift ;
> -		;;
> -	-e)
> -		xfile=$2; shift ;
> -		readarray -t -O "${#exclude_tests[@]}" exclude_tests < \
> -			<(echo "$xfile" | tr ', ' '\n\n')
> -		;;
> -
> -	-E)	xfile=$2; shift ;
> -		if [ -f $xfile ]; then
> -			readarray -t -O ${#exclude_tests[@]} exclude_tests < \
> -				<(sed "s/#.*$//" $xfile)
> -		fi
> -		;;
>  	-s)	RUN_SECTION="$RUN_SECTION $2"; shift ;;
>  	-S)	EXCLUDE_SECTION="$EXCLUDE_SECTION $2"; shift ;;
>  	-l)	diff="diff" ;;
>  	-udiff)	diff="$diff -u" ;;
>  
>  	-n)	showme=true ;;
> -	-r)
> -		if $exact_order; then
> -			echo "Cannot specify -r and --exact-order."
> -			exit 1
> -		fi
> -		randomize=true
> -		;;
> -	--exact-order)
> -		if $randomize; then
> -			echo "Cannnot specify --exact-order and -r."
> -			exit 1
> -		fi
> -		exact_order=true
> -		;;
>  	-i)	iterations=$2; shift ;;
>  	-I) 	iterations=$2; istop=true; shift ;;
>  	-T)	timestamp=true ;;
> @@ -346,13 +172,13 @@ while [ $# -gt 0 ]; do
>  
>  	-*)	usage ;;
>  	*)	# not an argument, we've got tests now.
> -		have_test_arg=true ;;
> +		_tl_setup_cli $*
>  	esac
>  
>  	# if we've found a test specification, the break out of the processing
>  	# loop before we shift the arguments so that this is the first argument
>  	# that we process in the test arg loop below.
> -	if $have_test_arg; then
> +	if $_tl_have_test_args; then
>  		break;
>  	fi
>  
> @@ -392,51 +218,6 @@ if [ -n "$FUZZ_REWRITE_DURATION" ]; then
>  	fi
>  fi
>  
> -if [ -n "$subdir_xfile" ]; then
> -	for d in $SRC_GROUPS $FSTYP; do
> -		[ -f $SRC_DIR/$d/$subdir_xfile ] || continue
> -		for f in `sed "s/#.*$//" $SRC_DIR/$d/$subdir_xfile`; do
> -			exclude_tests+=($d/$f)
> -		done
> -	done
> -fi
> -
> -# Process tests from command line now.
> -if $have_test_arg; then
> -	while [ $# -gt 0 ]; do
> -		case "$1" in
> -		-*)	echo "Arguments before tests, please!"
> -			status=1
> -			exit $status
> -			;;
> -		*)	# Expand test pattern (e.g. xfs/???, *fs/001)
> -			list=$(cd $SRC_DIR; echo $1)
> -			for t in $list; do
> -				t=${t#$SRC_DIR/}
> -				test_dir=${t%%/*}
> -				test_name=${t##*/}
> -				group_file=$SRC_DIR/$test_dir/group.list
> -
> -				if grep -Eq "^$test_name" $group_file; then
> -					# in group file ... OK
> -					echo $SRC_DIR/$test_dir/$test_name \
> -						>>$tmp.arglist
> -				else
> -					# oops
> -					echo "$t - unknown test, ignored"
> -				fi
> -			done
> -			;;
> -		esac
> -
> -		shift
> -	done
> -elif [ -z "$GROUP_LIST" ]; then
> -	# default group list is the auto group. If any other group or test is
> -	# specified, we use that instead.
> -	GROUP_LIST="auto"
> -fi
> -
>  if [ `id -u` -ne 0 ]
>  then
>      echo "check: QA must be run as root"
> @@ -597,21 +378,6 @@ _check_filesystems()
>  	return $ret
>  }
>  
> -_expunge_test()
> -{
> -	local TEST_ID="$1"
> -
> -	for f in "${exclude_tests[@]}"; do
> -		# $f may contain traling spaces and comments
> -		local id_regex="^${TEST_ID}\b"
> -		if [[ "$f" =~ ${id_regex} ]]; then
> -			echo "       [expunged]"
> -			return 0
> -		fi
> -	done
> -	return 1
> -}
> -
>  # retain files which would be overwritten in subsequent reruns of the same test
>  _stash_fail_loop_files() {
>  	local seq_prefix="${REPORT_DIR}/${1}"
> @@ -719,7 +485,7 @@ _run_seq() {
>  }
>  
>  _detect_kmemleak
> -_prepare_test_list
> +_tl_prepare_test_list
>  fstests_start_time="$(date +"%F %T")"
>  
>  if $OPTIONS_HAVE_SECTIONS; then
> @@ -793,7 +559,7 @@ function run_section()
>  		# common/rc again with correct FSTYP to get FSTYP specific configs,
>  		# e.g. common/xfs
>  		. common/rc
> -		_prepare_test_list
> +		_tl_prepare_test_list
>  	elif [ "$OLD_TEST_FS_MOUNT_OPTS" != "$TEST_FS_MOUNT_OPTS" ]; then
>  		# Unmount TEST_DEV to apply the updated mount options.
>  		# It will be mounted again by init_rc(), called shortly after.
> @@ -854,13 +620,13 @@ function run_section()
>  
>  	loop_status=()	# track rerun-on-failure state
>  	local tc_status ix
> -	local -a _list=( $list )
> +	local -a _list=( $_tl_tests )
>  	for ((ix = 0; ix < ${#_list[*]}; !${#loop_status[*]} && ix++)); do
>  		seq="${_list[$ix]}"
>  
>  		# the filename for the test and the name output are different.
>  		# we don't include the tests/ directory in the name output.
> -		export seqnum=${seq#$SRC_DIR/}
> +		export seqnum=$(_tl_strip_src_dir $seq)
>  		group=${seqnum%%/*}
>  		if $OPTIONS_HAVE_SECTIONS; then
>  			REPORT_DIR="$RESULT_BASE/$section"
> @@ -882,7 +648,7 @@ function run_section()
>  		echo -n "$seqnum"
>  
>  		if $showme; then
> -			if _expunge_test $seqnum; then
> +			if _tl_expunge_test $seqnum; then
>  				tc_status="expunge"
>  			else
>  				echo
> @@ -908,7 +674,7 @@ function run_section()
>  		rm -f $seqres.out.bad $seqres.hints
>  
>  		# check if we really should run it
> -		if _expunge_test $seqnum; then
> +		if _tl_expunge_test $seqnum; then
>  			tc_status="expunge"
>  			_stash_test_status "$seqnum" "$tc_status"
>  			continue
> diff --git a/common/report b/common/report
> index 7128bbeba..5697d2540 100644
> --- a/common/report
> +++ b/common/report
> @@ -196,7 +196,7 @@ _xunit_make_testcase_report()
>  		echo -e "\t\t<skipped/>" >> $report
>  		;;
>  	"fail")
> -		local out_src="${SRC_DIR}/${test_name}.out"
> +		local out_src="${_tl_src_dir}/${test_name}.out"
>  		local full_file="${REPORT_DIR}/${test_name}.full"
>  		local dmesg_file="${REPORT_DIR}/${test_name}.dmesg"
>  		local outbad_file="${REPORT_DIR}/${test_name}.out.bad"
> diff --git a/common/test_list b/common/test_list
> new file mode 100644
> index 000000000..2432be6f7
> --- /dev/null
> +++ b/common/test_list
> @@ -0,0 +1,295 @@
> +##/bin/bash
> +# SPDX-License-Identifier: GPL-2.0+
> +# Copyright (c) 2000-2002,2006 Silicon Graphics, Inc.  All Rights Reserved.
> +# Copyright (c) 2024 Red Hat, Inc.  All Rights Reserved.
> +#
> +# Test list parsing and building functions
> +#
> +# Note: this file must stand alone and not be dependent on any other includes,
We can include common/exit, right? That file doesn't file only has a
couple of exit related functions (with no executable code) and no
further dependencies, correct?
> +# most especially common/rc and common/config. This is because we have to
> +# include this file before option parsing, whilst the rc/config includes need to
> +# be included -after- option parsing.
> +#
> +# Any function or variable that is public should have a "_tl_" prefix.
> +
> +export _tl_src_dir="tests"
Minor: There was a change[1] which converted some of global variables
in small case to upper case. Should we follow the same convention here,
i.e, _TL_SRC_DIR or maybe _tl_SRC_DIR?
[1] 
https://web.git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=ab459c67c5e0347ab4a5c28eadfe5ee4e3fd2f01
> +
> +_SRC_GROUPS="generic"
> +_GROUP_LIST=
> +_XGROUP_LIST=
> +_tl_exact_order=false
> +_tl_randomise=false
> +_tl_have_test_args=false
> +_tl_file="$tmp.test_list"
> +_tl_exclude_tests=()
> +_tl_tests=
Similar comments for all the above lower cased global variables?
> +
> +_tl_strip_src_dir()
> +{
> +	local test="$1"
> +
> +	echo ${test#$_tl_src_dir/}
> +}
> +
> +get_sub_group_list()
> +{
> +	local d=$1
> +	local grp=$2
> +
> +	test -s "$_tl_src_dir/$d/group.list" || return 1
> +
> +	local grpl=$(sed -n < $_tl_src_dir/$d/group.list \
> +		-e 's/#.*//' \
> +		-e 's/$/ /' \
> +		-e "s;^\($VALID_TEST_NAME\).* $grp .*;$_tl_src_dir/$d/\1;p")
> +	echo $grpl
> +}
> +
> +get_group_list()
> +{
> +	local grp=$1
> +	local grpl=""
> +	local sub=$(dirname $grp)
> +	local fsgroup="$FSTYP"
> +
> +	if [ -n "$sub" -a "$sub" != "." -a -d "$_tl_src_dir/$sub" ]; then
> +		# group is given as <subdir>/<group> (e.g. xfs/quick)
> +		grp=$(basename $grp)
> +		get_sub_group_list $sub $grp
> +		return
> +	fi
> +
> +	if [ "$FSTYP" = ext2 -o "$FSTYP" = ext3 ]; then
> +	    fsgroup=ext4
> +	fi
> +	for d in $_SRC_GROUPS $fsgroup; do
> +		if ! test -d "$_tl_src_dir/$d" ; then
> +			continue
> +		fi
> +		grpl="$grpl $(get_sub_group_list $d $grp)"
> +	done
> +	echo $grpl
> +}
> +
> +# Find all tests, excluding files that are test metadata such as group files.
> +# It matches test names against $VALID_TEST_NAME defined in common/rc
> +get_all_tests()
> +{
> +	touch $tmp.list
> +	for d in $_SRC_GROUPS $FSTYP; do
> +		if ! test -d "$_tl_src_dir/$d" ; then
> +			continue
> +		fi
> +		ls $_tl_src_dir/$d/* | \
> +			grep -v "\..*" | \
> +			grep "^$_tl_src_dir/$d/$VALID_TEST_NAME"| \
> +			grep -v "group\|Makefile" >> $tmp.list 2>/dev/null
> +	done
> +}
> +
> +# takes the list of tests to run in $tmp.list, and removes the tests passed to
> +# the function from that list.
> +trim_test_list()
> +{
> +	local test_list="$*"
> +
> +	rm -f $tmp.grep
> +	local numsed=0
> +	for t in $test_list
> +	do
> +	    if [ $numsed -gt 100 ]; then
> +		grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
> +		mv $tmp.tmp $tmp.list
> +		numsed=0
> +		rm -f $tmp.grep
> +	    fi
> +	    echo "^$t\$" >>$tmp.grep
> +	    numsed=`expr $numsed + 1`
> +	done
> +	grep -v -f $tmp.grep <$tmp.list >$tmp.tmp
> +	mv $tmp.tmp $tmp.list
> +	rm -f $tmp.grep
> +}
> +
> +_tl_prepare_test_list()
> +{
> +	unset _tl_tests
> +	# Tests specified on the command line
> +	if [ -s $_tl_file ]; then
> +		cat $_tl_file > $tmp.list
> +	else
> +		touch $tmp.list
> +	fi
> +
> +	# Specified groups to include
> +	# Note that the CLI processing adds a leading space to the first group
> +	# parameter, so we have to catch that here checking for "all"
> +	if ! $_tl_have_test_args && [ "$_GROUP_LIST" == " all" ]; then
> +		# no test numbers, do everything
> +		get_all_tests
> +	else
> +		for group in $_GROUP_LIST; do
> +			list=$(get_group_list $group)
> +			if [ -z "$list" ]; then
> +				echo "Group \"$group\" is empty or not defined?"
> +				exit 1
_fatal "Group \"$group\" is empty or not defined?" ?
> +			fi
> +
> +			for t in $list; do
> +				grep -s "^$t\$" $tmp.list >/dev/null || \
> +							echo "$t" >>$tmp.list
> +			done
> +		done
> +	fi
> +
> +	# Specified groups to exclude
> +	for xgroup in $_XGROUP_LIST; do
> +		list=$(get_group_list $xgroup)
> +		if [ -z "$list" ]; then
> +			echo "Group \"$xgroup\" is empty or not defined?"
> +			continue
> +		fi
> +
> +		trim_test_list $list
> +	done
> +
> +	# sort the list of tests into numeric order unless we're running tests
> +	# in the exact order specified
> +	if ! $_tl_exact_order; then
> +		if $_tl_randomise; then
> +			if type shuf >& /dev/null; then
> +				sorter="shuf"
> +			else
> +				sorter="awk -v seed=$RANDOM -f randomize.awk"
> +			fi
> +		else
> +			sorter="cat"
> +		fi
> +		_tl_tests=`sort -n $tmp.list | uniq | $sorter`
> +	else
> +		_tl_tests=`cat $tmp.list`
> +	fi
> +	rm -f $tmp.list
> +}
> +
> +_tl_expunge_test()
> +{
> +	local TEST_ID="$1"
> +
> +	for f in "${_tl_exclude_tests[@]}"; do
> +		# $f may contain traling spaces and comments
> +		local id_regex="^${TEST_ID}\b"
> +		if [[ "$f" =~ ${id_regex} ]]; then
> +			echo "       [expunged]"
> +			return 0
> +		fi
> +	done
> +	return 1
> +}
> +
> +_tl_setup_exclude_tests()
> +{
> +	local list="$1"
> +
> +	readarray -t -O "${#_tl_exclude_tests[@]}" _tl_exclude_tests < \
> +		<(echo "$list" | tr ', ' '\n\n')
> +}
> +
> +_tl_setup_exclude_file()
> +{
> +	local xfile="$1"
> +
> +	if [ -f $xfile ]; then
> +		readarray -t -O ${#_tl_exclude_tests[@]} _tl_exclude_tests < \
> +			<(sed "s/#.*$//" $xfile)
> +	fi
> +}
> +
> +_tl_setup_exclude_subdir()
> +{
> +	local xfile="$1"
> +	local d
> +	local f
> +
> +	[ -z "$xfile" ] && return
> +
> +	for d in $_SRC_GROUPS $FSTYP; do
> +		[ -f $_tl_src_dir/$d/$xfile ] || continue
> +		for f in `sed "s/#.*$//" $_tl_src_dir/$d/$xfile`; do
> +			_tl_exclude_tests+=($d/$f)
> +		done
> +	done
> +}
> +
> +_tl_setup_exclude_group()
> +{
> +	local xgroup="$1"
> +
> +	_XGROUP_LIST="$_XGROUP_LIST ${xgroup//,/ }"
> +}
> +
> +_tl_setup_group()
> +{
> +	local group="$1"
Minor: Since this function is public , do you think it is helpful to
have a quick sanitization that makes sure that $1 is not empty and
prints message otherwise? Similar comment for
_tl_setup_exclude_group(). 

> +
> +	_GROUP_LIST="$_GROUP_LIST ${group//,/ }"
> +}
> +
> +_tl_setup_randomise()
> +{
> +	if $_tl_exact_order; then
> +		echo "Cannot specify -r and --exact-order."
> +		exit 1
Use _fatal?
> +	fi
> +	_tl_randomise=true
> +}
> +
> +_tl_setup_ordered()
> +{
> +	if $_tl_randomise; then
> +		echo "Cannnot specify --exact-order and -r."
Minor/typo: "Cannnot" -> "Cannot"
> +		exit 1
use _fatal?
> +	fi
> +	_tl_exact_order=true
> +}
> +
> +_tl_setup_cli()
Nit: This function mostly sets up test list related variables - so
maybe a name that suggests that. Maybe something like
_tl_setup_test_list_vars or _tl_setup_test_vars - just a suggestion, no
hard preferences here. 
> +{
> +	while [ $# -gt 0 ]; do
> +		case "$1" in
> +		-*)	echo "Arguments before tests, please!"
> +			status=1
> +			exit $status
use _fatal?
> +			;;
> +		*)	# Expand test pattern (e.g. xfs/???, *fs/001)
> +			local list=$(cd $_tl_src_dir; echo $1)
> +			local t
> +
> +			for t in $list; do
> +				t=${t#$_tl_src_dir/}
> +				local test_dir=${t%%/*}
> +				local test_name=${t##*/}
> +				local group_file=$_tl_src_dir/$test_dir/group.list
> +
> +				if grep -Eq "^$test_name" $group_file; then
> +					# in group file ... OK
> +					echo $_tl_src_dir/$test_dir/$test_name \
> +						>> $_tl_file
> +					_tl_have_test_args=true
> +				else
> +					# oops
> +					echo "$t - unknown test, ignored"
Minor: Sometimes it has happened with me that I have added a new test
(with ./new ) but forgot to run make and hence the test wasn't getting
run and was reported to be an invalid test. Should we add a suggestion
to run make, in case we encounter an unidentified test name?
--NR
> +				fi
> +			done
> +			;;
> +		esac
> +
> +		shift
> +	done
> +
> +	if ! $_tl_have_test_args && [ -z "$_GROUP_LIST" ]; then
> +		# default group list is the auto group. If any other group or
> +		# test is specified, we use that instead.
> +		_GROUP_LIST="auto"
> +	fi
> +}


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/28] check: factor out test list building code
  2025-05-06 11:32   ` Nirjhar Roy (IBM)
@ 2025-05-21  3:55     ` Dave Chinner
  2025-05-26  6:48       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  3:55 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Tue, May 06, 2025 at 05:02:54PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Factor out all the test list parsing and building code to
> > common/test_list so that it can be used by both check and
> > check-parallel.
> > 
> > This also namespaces all the test list code to use _tl_ prefixes,
> > and adds wrappers to set up test list parsing parameters.
> > 
> > Note: there is still some future work to convert externally visible
> > parameters like SRC_DIR to the new namespace.
> I have gone through this patch. I did some basic testing too with
> ./check and I havent' found any obvious issues. There are some minor
> feedback that I have given below:

Can you trim the stuff you aren't commenting on out? It makes it
much easier to find what you are commenting on....

....
> > diff --git a/common/test_list b/common/test_list
> > new file mode 100644
> > index 000000000..2432be6f7
> > --- /dev/null
> > +++ b/common/test_list
> > @@ -0,0 +1,295 @@
> > +##/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0+
> > +# Copyright (c) 2000-2002,2006 Silicon Graphics, Inc.  All Rights Reserved.
> > +# Copyright (c) 2024 Red Hat, Inc.  All Rights Reserved.
> > +#
> > +# Test list parsing and building functions
> > +#
> > +# Note: this file must stand alone and not be dependent on any other includes,
> We can include common/exit, right? That file doesn't file only has a
> couple of exit related functions (with no executable code) and no
> further dependencies, correct?

I wrote that before common/exit was really a thing.

As it is, I don't think we should be sourcing files from sourced
files. it hides all the dependencies, and results in the same file
being sourced multiple times in the same process context.

i.e. if the top level is doing:

. common/exit
.....
. common/test_list
.....

Then we don't need to source common/exit from common/test_list
because it already has been sourced.

That's what the comment is trying to say, and when I wrote it I was
specifically thinking about the mess that common/rc and
common/config was causing.

> > +# most especially common/rc and common/config. This is because we have to
> > +# include this file before option parsing, whilst the rc/config includes need to
> > +# be included -after- option parsing.
> > +#
> > +# Any function or variable that is public should have a "_tl_" prefix.
> > +
> > +export _tl_src_dir="tests"
> Minor: There was a change[1] which converted some of global variables
> in small case to upper case. Should we follow the same convention here,
> i.e, _TL_SRC_DIR or maybe _tl_SRC_DIR?

I don't think so, I very much dislike the semi-random capitalisation
of namespace-less variables across fstests. I find coe full of upper
case variables much more difficult to read and follow than properly
namespaced lower case variables.

That's the other thing that is important - lots of the global scope
variables sourced from common files are namespaceless and it makes
it easy easy for tests to accidentally step on such variables. Once
the varaibles are namespaced, there is no need for capitalisation
to indicate that it might be a global variable - the namespace
prefix tells you that as well as where it comes from.

> [1] 
> https://web.git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=ab459c67c5e0347ab4a5c28eadfe5ee4e3fd2f01

Yeah, that was a case of choosing your battles.

It wasn't a bug, but it was buried in a long series of bug fixes and
at the time people were shouting and being really obnoxious about
the check-parallel changes. There was no point in dying on an
unimportant hill by objecting to that change.

I still do think it was the wrong thing to do because it makes that
code be inconsistent with every other local file loop device
variable. Those devices are not used outside of common/metadump -
it is internal implemenation variable that the code was written
specifically to be hidden from callers wanting to manipulate and
test metadumps...

> > +_SRC_GROUPS="generic"
> > +_GROUP_LIST=
> > +_XGROUP_LIST=
> > +_tl_exact_order=false
> > +_tl_randomise=false
> > +_tl_have_test_args=false
> > +_tl_file="$tmp.test_list"
> > +_tl_exclude_tests=()
> > +_tl_tests=
> Similar comments for all the above lower cased global variables?

I didn't changed the GROUP names because they are internal to the
test list implementation. I was largely moving the code and didn't
want to change anything that was internal that I didn't need to
touch. I'll have a look at cleaning that up.

.....
> > +	# Specified groups to include
> > +	# Note that the CLI processing adds a leading space to the first group
> > +	# parameter, so we have to catch that here checking for "all"
> > +	if ! $_tl_have_test_args && [ "$_GROUP_LIST" == " all" ]; then
> > +		# no test numbers, do everything
> > +		get_all_tests
> > +	else
> > +		for group in $_GROUP_LIST; do
> > +			list=$(get_group_list $group)
> > +			if [ -z "$list" ]; then
> > +				echo "Group \"$group\" is empty or not defined?"
> > +				exit 1
> _fatal "Group \"$group\" is empty or not defined?" ?

Remember that this was posted before those changes were made. This is
the sort of thing that gets fixed on rebase...

> > +_tl_setup_exclude_group()
> > +{
> > +	local xgroup="$1"
> > +
> > +	_XGROUP_LIST="$_XGROUP_LIST ${xgroup//,/ }"
> > +}
> > +
> > +_tl_setup_group()
> > +{
> > +	local group="$1"
> Minor: Since this function is public , do you think it is helpful to
> have a quick sanitization that makes sure that $1 is not empty and
> prints message otherwise? Similar comment for
> _tl_setup_exclude_group(). 
> 
> > +
> > +	_GROUP_LIST="$_GROUP_LIST ${group//,/ }"
> > +}

I don't think it needs it. It should be checked by the caller
if necessary because it is the one doing option parsing to set up
the group.

> > +
> > +_tl_setup_cli()
> Nit: This function mostly sets up test list related variables - so
> maybe a name that suggests that. Maybe something like
> _tl_setup_test_list_vars or _tl_setup_test_vars - just a suggestion, no
> hard preferences here. 

Ok, I'll come up with a new name for it, but keep in mind that this
function is actually parsing arguments that have come directly from
the CLI.

> > +{
> > +	while [ $# -gt 0 ]; do
> > +		case "$1" in
> > +		-*)	echo "Arguments before tests, please!"
> > +			status=1
> > +			exit $status
> use _fatal?
> > +			;;
> > +		*)	# Expand test pattern (e.g. xfs/???, *fs/001)
> > +			local list=$(cd $_tl_src_dir; echo $1)
> > +			local t
> > +
> > +			for t in $list; do
> > +				t=${t#$_tl_src_dir/}
> > +				local test_dir=${t%%/*}
> > +				local test_name=${t##*/}
> > +				local group_file=$_tl_src_dir/$test_dir/group.list
> > +
> > +				if grep -Eq "^$test_name" $group_file; then
> > +					# in group file ... OK
> > +					echo $_tl_src_dir/$test_dir/$test_name \
> > +						>> $_tl_file
> > +					_tl_have_test_args=true
> > +				else
> > +					# oops
> > +					echo "$t - unknown test, ignored"
> Minor: Sometimes it has happened with me that I have added a new test
> (with ./new ) but forgot to run make and hence the test wasn't getting
> run and was reported to be an invalid test. Should we add a suggestion
> to run make, in case we encounter an unidentified test name?

Not in this patch set. If you want these sorts of process reminders,
please send them as separate feature additions.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 05/28] check: factor out test list building code
  2025-05-21  3:55     ` Dave Chinner
@ 2025-05-26  6:48       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  6:48 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 09:25, Dave Chinner wrote:
> On Tue, May 06, 2025 at 05:02:54PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> Factor out all the test list parsing and building code to
>>> common/test_list so that it can be used by both check and
>>> check-parallel.
>>>
>>> This also namespaces all the test list code to use _tl_ prefixes,
>>> and adds wrappers to set up test list parsing parameters.
>>>
>>> Note: there is still some future work to convert externally visible
>>> parameters like SRC_DIR to the new namespace.
>> I have gone through this patch. I did some basic testing too with
>> ./check and I havent' found any obvious issues. There are some minor
>> feedback that I have given below:
> Can you trim the stuff you aren't commenting on out? It makes it
> much easier to find what you are commenting on....
Yeah, sorry. I think there was some issue with the editor of my email 
client.
>
> ....
>>> diff --git a/common/test_list b/common/test_list
>>> new file mode 100644
>>> index 000000000..2432be6f7
>>> --- /dev/null
>>> +++ b/common/test_list
>>> @@ -0,0 +1,295 @@
>>> +##/bin/bash
>>> +# SPDX-License-Identifier: GPL-2.0+
>>> +# Copyright (c) 2000-2002,2006 Silicon Graphics, Inc.  All Rights Reserved.
>>> +# Copyright (c) 2024 Red Hat, Inc.  All Rights Reserved.
>>> +#
>>> +# Test list parsing and building functions
>>> +#
>>> +# Note: this file must stand alone and not be dependent on any other includes,
>> We can include common/exit, right? That file doesn't file only has a
>> couple of exit related functions (with no executable code) and no
>> further dependencies, correct?
> I wrote that before common/exit was really a thing.
>
> As it is, I don't think we should be sourcing files from sourced
> files. it hides all the dependencies, and results in the same file
> being sourced multiple times in the same process context.
Yeah, right.
>
> i.e. if the top level is doing:
>
> . common/exit
> .....
> . common/test_list
> .....
>
> Then we don't need to source common/exit from common/test_list
> because it already has been sourced.
>
> That's what the comment is trying to say, and when I wrote it I was
> specifically thinking about the mess that common/rc and
> common/config was causing.
Okay.
>
>
>>> +# most especially common/rc and common/config. This is because we have to
>>> +# include this file before option parsing, whilst the rc/config includes need to
>>> +# be included -after- option parsing.
>>> +#
>>> +# Any function or variable that is public should have a "_tl_" prefix.
>>> +
>>> +export _tl_src_dir="tests"
>> Minor: There was a change[1] which converted some of global variables
>> in small case to upper case. Should we follow the same convention here,
>> i.e, _TL_SRC_DIR or maybe _tl_SRC_DIR?
> I don't think so, I very much dislike the semi-random capitalisation
> of namespace-less variables across fstests. I find coe full of upper
> case variables much more difficult to read and follow than properly
> namespaced lower case variables.
>
> That's the other thing that is important - lots of the global scope
> variables sourced from common files are namespaceless and it makes
> it easy easy for tests to accidentally step on such variables. Once
Yeah, accidental re-use and modification of global variable can happen. 
I agree.
> the varaibles are namespaced, there is no need for capitalisation
> to indicate that it might be a global variable - the namespace
> prefix tells you that as well as where it comes from.
>
>> [1]
>> https://web.git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=ab459c67c5e0347ab4a5c28eadfe5ee4e3fd2f01
> Yeah, that was a case of choosing your battles.
>
> It wasn't a bug, but it was buried in a long series of bug fixes and
> at the time people were shouting and being really obnoxious about
> the check-parallel changes. There was no point in dying on an
> unimportant hill by objecting to that change.
>
> I still do think it was the wrong thing to do because it makes that
> code be inconsistent with every other local file loop device
> variable. Those devices are not used outside of common/metadump -
> it is internal implemenation variable that the code was written
> specifically to be hidden from callers wanting to manipulate and
> test metadumps...

Okay, got it.

>
>>> +_SRC_GROUPS="generic"
>>> +_GROUP_LIST=
>>> +_XGROUP_LIST=
>>> +_tl_exact_order=false
>>> +_tl_randomise=false
>>> +_tl_have_test_args=false
>>> +_tl_file="$tmp.test_list"
>>> +_tl_exclude_tests=()
>>> +_tl_tests=
>> Similar comments for all the above lower cased global variables?
> I didn't changed the GROUP names because they are internal to the
> test list implementation. I was largely moving the code and didn't
> want to change anything that was internal that I didn't need to
> touch. I'll have a look at cleaning that up.
Okay.
>
> .....
>>> +	# Specified groups to include
>>> +	# Note that the CLI processing adds a leading space to the first group
>>> +	# parameter, so we have to catch that here checking for "all"
>>> +	if ! $_tl_have_test_args && [ "$_GROUP_LIST" == " all" ]; then
>>> +		# no test numbers, do everything
>>> +		get_all_tests
>>> +	else
>>> +		for group in $_GROUP_LIST; do
>>> +			list=$(get_group_list $group)
>>> +			if [ -z "$list" ]; then
>>> +				echo "Group \"$group\" is empty or not defined?"
>>> +				exit 1
>> _fatal "Group \"$group\" is empty or not defined?" ?
> Remember that this was posted before those changes were made. This is
> the sort of thing that gets fixed on rebase...
Yeah, right.
>
>>> +_tl_setup_exclude_group()
>>> +{
>>> +	local xgroup="$1"
>>> +
>>> +	_XGROUP_LIST="$_XGROUP_LIST ${xgroup//,/ }"
>>> +}
>>> +
>>> +_tl_setup_group()
>>> +{
>>> +	local group="$1"
>> Minor: Since this function is public , do you think it is helpful to
>> have a quick sanitization that makes sure that $1 is not empty and
>> prints message otherwise? Similar comment for
>> _tl_setup_exclude_group().
>>
>>> +
>>> +	_GROUP_LIST="$_GROUP_LIST ${group//,/ }"
>>> +}
> I don't think it needs it. It should be checked by the caller
> if necessary because it is the one doing option parsing to set up
> the group.
Okay.
>
>>> +
>>> +_tl_setup_cli()
>> Nit: This function mostly sets up test list related variables - so
>> maybe a name that suggests that. Maybe something like
>> _tl_setup_test_list_vars or _tl_setup_test_vars - just a suggestion, no
>> hard preferences here.
> Ok, I'll come up with a new name for it, but keep in mind that this
> function is actually parsing arguments that have come directly from
> the CLI.
Yes, I have got that.
>>> +{
>>> +	while [ $# -gt 0 ]; do
>>> +		case "$1" in
>>> +		-*)	echo "Arguments before tests, please!"
>>> +			status=1
>>> +			exit $status
>> use _fatal?
>>> +			;;
>>> +		*)	# Expand test pattern (e.g. xfs/???, *fs/001)
>>> +			local list=$(cd $_tl_src_dir; echo $1)
>>> +			local t
>>> +
>>> +			for t in $list; do
>>> +				t=${t#$_tl_src_dir/}
>>> +				local test_dir=${t%%/*}
>>> +				local test_name=${t##*/}
>>> +				local group_file=$_tl_src_dir/$test_dir/group.list
>>> +
>>> +				if grep -Eq "^$test_name" $group_file; then
>>> +					# in group file ... OK
>>> +					echo $_tl_src_dir/$test_dir/$test_name \
>>> +						>> $_tl_file
>>> +					_tl_have_test_args=true
>>> +				else
>>> +					# oops
>>> +					echo "$t - unknown test, ignored"
>> Minor: Sometimes it has happened with me that I have added a new test
>> (with ./new ) but forgot to run make and hence the test wasn't getting
>> run and was reported to be an invalid test. Should we add a suggestion
>> to run make, in case we encounter an unidentified test name?
> Not in this patch set. If you want these sorts of process reminders,
> please send them as separate feature additions.

Yeah.

--NR

>
> -Dave.

-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 06/28] check-parallel: use common group list parsing code
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (4 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 05/28] check: factor out test list building code Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-06 15:56   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 07/28] check-parallel: adjust concurrency according to CPU count Dave Chinner
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Build the test list directly from command line prompts, rather
than hard coding the tests and using the check infrastructure to
filter that list.

We still pass exact test lists to check to execute the tests that
each runner needs to execute, but all other test list commands
are no longer passed to check.

As a result of this change, check-parallel no longer passes unknown
CLI parameters through to the internal check invocations. At this
point, the only non test-list related option is config file section
selection; more of the check options will be brought across as
needed in future patches.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check            |   6 +-
 check-parallel   | 156 ++++++++++++++++++++++++++++++++++++++++-------
 common/test_list |   7 +++
 3 files changed, 143 insertions(+), 26 deletions(-)

diff --git a/check b/check
index 900ea2ba4..0b489cb4b 100755
--- a/check
+++ b/check
@@ -43,11 +43,9 @@ timestamp=${TIMESTAMP:=false}
 
 rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
 
-# We need to include the test list processing first as argument parsing
-# requires test list parsing and setup.
-. ./common/test_list
-
 . ./common/exit
+. ./common/test_names
+. ./common/test_list
 
 usage()
 {
diff --git a/check-parallel b/check-parallel
index d68d76e55..cb5d6aedf 100755
--- a/check-parallel
+++ b/check-parallel
@@ -9,18 +9,114 @@
 # for them and runs the test in the background. When it completes, it tears down
 # the loop devices.
 
-export SRC_DIR="tests"
-basedir=$1
-shift
-check_args="$*"
+basedir=""
 runners=64
 runner_list=()
 runtimes=()
+show_test_list=
+run_section=""
 
+tmp=/tmp/check-parallel.$$
 
-# tests in auto group
-test_list=$(awk '/^[0-9].*auto/ { print "generic/" $1 }' tests/generic/group.list)
-test_list+=$(awk '/^[0-9].*auto/ { print "xfs/" $1 }' tests/xfs/group.list)
+export FSTYP=xfs
+
+. ./common/exit
+. ./common/test_names
+. ./common/test_list
+
+usage()
+{
+    echo "Usage: $0 [options] [testlist]"'
+
+check options
+    -D <dir>		Directory to run in
+    -n			Output test list, do not run tests
+    -r			randomize test order
+    --exact-order	run tests in the exact order specified
+    -s section		run only specified section from config file
+
+testlist options
+    -g group[,group...]	include tests from these groups
+    -x group[,group...]	exclude tests from these groups
+    -X exclude_file	exclude individual tests
+    -e testlist         exclude a specific list of tests
+    -E external_file	exclude individual tests
+    [testlist]		include tests matching names in testlist
+
+testlist argument is a list of tests in the form of <test dir>/<test name>.
+
+<test dir> is a directory under tests that contains a group file,
+with a list of the names of the tests in that directory.
+
+<test name> may be either a specific test file name (e.g. xfs/001) or
+a test file name match pattern (e.g. xfs/*).
+
+group argument is either a name of a tests group to collect from all
+the test dirs (e.g. quick) or a name of a tests group to collect from
+a specific tests dir in the form of <test dir>/<group name> (e.g. xfs/quick).
+If you want to run all the tests in the test suite, use "-g all" to specify all
+groups.
+
+exclude_file argument refers to a name of a file inside each test directory.
+for every test dir where this file is found, the listed test names are
+excluded from the list of tests to run from that test dir.
+
+external_file argument is a path to a single file containing a list of tests
+to exclude in the form of <test dir>/<test name>.
+
+examples:
+ check-parallel -D /mnt xfs/001
+ check-parallel -D /mnt -g quick
+ check-parallel -D /mnt -g xfs/quick
+ check-parallel -D /mnt -x stress xfs/*
+ check-parallel -D /mnt -X .exclude -g auto
+ check-parallel -D /mnt -E ~/.xfstests.exclude
+'
+	    exit 1
+}
+
+# Process command arguments first.
+while [ $# -gt 0 ]; do
+	case "$1" in
+	-\? | -h | --help) usage ;;
+
+	-D)	basedir=$2; shift ;;
+	-g)	_tl_setup_group $2 ; shift ;;
+	-e)	_tl_setup_exclude_tests $2 ; shift ;;
+	-E)	_tl_setup_exclude_file $2 ; shift ;;
+	-x)	_tl_setup_exclude_group $2; shift ;;
+	-X)	_tl_setup_exclude_subdir $2; shift ;;
+	-r)	_tl_setup_randomise ;;
+	--exact-order) _tl_setup_ordered ;;
+	-n)	show_test_list="yes" ;;
+
+	-s)	run_section="$run_section -s $2"; shift ;;
+
+	-*)	usage ;;
+	*)	# not an argument, we've got tests now.
+		_tl_setup_cli $*
+	esac
+
+	# if we've found a test specification, the break out of the processing
+	# loop before we shift the arguments so that this is the first argument
+	# that we process in the test arg loop below.
+	if $_tl_have_test_args; then
+		break;
+	fi
+
+	shift
+done
+
+if [ ! -d "$basedir" ]; then
+	echo "Invalid basedir specification"
+	usage
+fi
+if [ -d "$basedir/runner-0/" ]; then
+	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
+fi
+
+_tl_prepare_test_list
+_tl_strip_test_list
 
 # grab all previously run tests and order them from highest runtime to lowest
 # We are going to try to run the longer tests first, hopefully so we can avoid
@@ -30,25 +126,23 @@ test_list+=$(awk '/^[0-9].*auto/ { print "xfs/" $1 }' tests/xfs/group.list)
 #
 # If we have tests in the test list that don't have runtimes recorded, then
 # append them to be run last.
-
-build_runner_list()
+time_order_test_list()
 {
 	local runtimes
 	local run_list=()
-	local prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
 
 	runtimes=$(cat $basedir/*/$prev_results/check.time | sort -k 2 -nr | cut -d " " -f 1)
 
 	# Iterate the timed list first. For every timed list entry that
 	# is found in the test_list, add it to the local runner list.
 	local -a _list=( $runtimes )
-	local -a _tlist=( $test_list )
+	local -a _tlist=( $_tl_tests )
 	local rx=0
 	local ix
 	local jx
 	#set -x
 	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
-		echo $test_list | grep -q ${_list[$ix]}
+		echo $_tl_tests | grep -q ${_list[$ix]}
 		if [ $? == 0 ]; then
 			# add the test to the new run list and remove
 			# it from the remaining test list.
@@ -60,24 +154,27 @@ build_runner_list()
 
 	# The final test list is all the time ordered tests followed by
 	# all the tests we didn't find time records for.
-	test_list="${run_list[*]} ${_tlist[*]}"
+	_tl_tests="${run_list[*]} ${_tlist[*]}"
 }
 
-if [ -f $basedir/runner-0/results/check.time ]; then
-	build_runner_list
+if ! $_tl_randomise -a ! $_tl_exact_order; then
+	if [ -f $basedir/runner-0/$prev_results/check.time ]; then
+		time_order_test_list
+	fi
 fi
 
 # split the list amongst N runners
-
 split_runner_list()
 {
 	local ix
 	local rx
-	local -a _list=( $test_list )
+	local -a _list=( $_tl_tests )
 	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
 		seq="${_list[$ix]}"
 		rx=$((ix % $runners))
-		runner_list[$rx]+="${_list[$ix]} "
+		if ! _tl_expunge_test $seq; then
+			runner_list[$rx]+="${_list[$ix]} "
+		fi
 		#echo $seq
 	done
 }
@@ -137,7 +234,7 @@ runner_go()
 
 	# Run the tests in it's own mount namespace, as per the comment below
 	# that precedes making the basedir a private mount.
-	./src/nsexec -m ./check $check_args -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
+	./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
 
 	wait
 	sleep 1
@@ -165,6 +262,13 @@ cleanup()
 
 trap "cleanup; exit" HUP INT QUIT TERM
 
+split_runner_list
+if [ -n "$show_test_list" ]; then
+	echo Time ordered test list:
+	echo $_tl_tests
+	echo
+fi
+
 
 # Each parallel test runner needs to only see it's own mount points. If we
 # leave the basedir as shared, then all tests see all mounts and then we get
@@ -178,15 +282,23 @@ trap "cleanup; exit" HUP INT QUIT TERM
 # in it's own mount namespace so that they cannot see mounts that other tests
 # are performing.
 mount --make-private $basedir
-split_runner_list
+
 now=`date +%Y-%m-%d-%H:%M:%S`
 for ((i = 0; i < $runners; i++)); do
 
-	runner_go $i $now &
+	if [ -n "$show_test_list" ]; then
+		echo "Runner $i: ${runner_list[$i]}"
+	else
+		runner_go $i $now &
+	fi
 
 done;
 wait
 
+if [ -n "$show_test_list" ]; then
+	exit 0
+fi
+
 echo -n "Tests run: "
 grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
 
@@ -198,7 +310,7 @@ grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\)
 echo
 
 echo Ten slowest tests - runtime in seconds:
-cat $basedir/*/results/check.time | sort -k 2 -nr | head -10
+cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
 
 echo
 echo Cleanup on Aisle 5?
diff --git a/common/test_list b/common/test_list
index 2432be6f7..2b3ae9fbf 100644
--- a/common/test_list
+++ b/common/test_list
@@ -24,6 +24,7 @@ _tl_file="$tmp.test_list"
 _tl_exclude_tests=()
 _tl_tests=
 
+# strip 'tests\' prefix from the provided test name
 _tl_strip_src_dir()
 {
 	local test="$1"
@@ -31,6 +32,12 @@ _tl_strip_src_dir()
 	echo ${test#$_tl_src_dir/}
 }
 
+# strip 'tests\' prefix from all the tests in the test list
+_tl_strip_test_list()
+{
+	_tl_tests=$(echo $_tl_tests | sed -e "s/$_tl_src_dir\///g")
+}
+
 get_sub_group_list()
 {
 	local d=$1
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/28] check-parallel: use common group list parsing code
  2025-04-17  3:00 ` [PATCH 06/28] check-parallel: use common group list parsing code Dave Chinner
@ 2025-05-06 15:56   ` Nirjhar Roy (IBM)
  2025-05-21  4:13     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-06 15:56 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Build the test list directly from command line prompts, rather
> than hard coding the tests and using the check infrastructure to
> filter that list.
> 
> We still pass exact test lists to check to execute the tests that
> each runner needs to execute, but all other test list commands
> are no longer passed to check.
> 
> As a result of this change, check-parallel no longer passes unknown
> CLI parameters through to the internal check invocations. At this
> point, the only non test-list related option is config file section
> selection; more of the check options will be brought across as
> needed in future patches.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check            |   6 +-
>  check-parallel   | 156 ++++++++++++++++++++++++++++++++++++++++-------
>  common/test_list |   7 +++
>  3 files changed, 143 insertions(+), 26 deletions(-)
> 
> diff --git a/check b/check
> index 900ea2ba4..0b489cb4b 100755
> --- a/check
> +++ b/check
> @@ -43,11 +43,9 @@ timestamp=${TIMESTAMP:=false}
>  
>  rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
>  
> -# We need to include the test list processing first as argument parsing
> -# requires test list parsing and setup.
> -. ./common/test_list
> -
>  . ./common/exit
> +. ./common/test_names
> +. ./common/test_list
>  
>  usage()
>  {
> diff --git a/check-parallel b/check-parallel
> index d68d76e55..cb5d6aedf 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -9,18 +9,114 @@
>  # for them and runs the test in the background. When it completes, it tears down
>  # the loop devices.
>  
> -export SRC_DIR="tests"
> -basedir=$1
> -shift
> -check_args="$*"
> +basedir=""
>  runners=64
>  runner_list=()
>  runtimes=()
> +show_test_list=
> +run_section=""
>  
> +tmp=/tmp/check-parallel.$$
>  
> -# tests in auto group
> -test_list=$(awk '/^[0-9].*auto/ { print "generic/" $1 }' tests/generic/group.list)
> -test_list+=$(awk '/^[0-9].*auto/ { print "xfs/" $1 }' tests/xfs/group.list)
> +export FSTYP=xfs
> +
> +. ./common/exit
> +. ./common/test_names
> +. ./common/test_list
> +
> +usage()
> +{
> +    echo "Usage: $0 [options] [testlist]"'
> +
> +check options
> +    -D <dir>		Directory to run in
> +    -n			Output test list, do not run tests
> +    -r			randomize test order
> +    --exact-order	run tests in the exact order specified
> +    -s section		run only specified section from config file
> +
> +testlist options
> +    -g group[,group...]	include tests from these groups
> +    -x group[,group...]	exclude tests from these groups
> +    -X exclude_file	exclude individual tests
> +    -e testlist         exclude a specific list of tests
> +    -E external_file	exclude individual tests
> +    [testlist]		include tests matching names in testlist
> +
> +testlist argument is a list of tests in the form of <test dir>/<test name>.
> +
> +<test dir> is a directory under tests that contains a group file,
> +with a list of the names of the tests in that directory.
> +
> +<test name> may be either a specific test file name (e.g. xfs/001) or
> +a test file name match pattern (e.g. xfs/*).
> +
> +group argument is either a name of a tests group to collect from all
> +the test dirs (e.g. quick) or a name of a tests group to collect from
> +a specific tests dir in the form of <test dir>/<group name> (e.g. xfs/quick).
> +If you want to run all the tests in the test suite, use "-g all" to specify all
> +groups.
> +
> +exclude_file argument refers to a name of a file inside each test directory.
> +for every test dir where this file is found, the listed test names are
> +excluded from the list of tests to run from that test dir.
> +
> +external_file argument is a path to a single file containing a list of tests
> +to exclude in the form of <test dir>/<test name>.
> +
> +examples:
> + check-parallel -D /mnt xfs/001
> + check-parallel -D /mnt -g quick
> + check-parallel -D /mnt -g xfs/quick
> + check-parallel -D /mnt -x stress xfs/*
> + check-parallel -D /mnt -X .exclude -g auto
> + check-parallel -D /mnt -E ~/.xfstests.exclude
> +'
> +	    exit 1
_fatal ?
> +}
> +
> +# Process command arguments first.
> +while [ $# -gt 0 ]; do
> +	case "$1" in
> +	-\? | -h | --help) usage ;;
> +
> +	-D)	basedir=$2; shift ;;
> +	-g)	_tl_setup_group $2 ; shift ;;
> +	-e)	_tl_setup_exclude_tests $2 ; shift ;;
> +	-E)	_tl_setup_exclude_file $2 ; shift ;;
> +	-x)	_tl_setup_exclude_group $2; shift ;;
> +	-X)	_tl_setup_exclude_subdir $2; shift ;;
> +	-r)	_tl_setup_randomise ;;
> +	--exact-order) _tl_setup_ordered ;;
> +	-n)	show_test_list="yes" ;;
> +
> +	-s)	run_section="$run_section -s $2"; shift ;;
> +
> +	-*)	usage ;;
> +	*)	# not an argument, we've got tests now.
> +		_tl_setup_cli $*
> +	esac
> +
> +	# if we've found a test specification, the break out of the processing
> +	# loop before we shift the arguments so that this is the first argument
> +	# that we process in the test arg loop below.
> +	if $_tl_have_test_args; then
> +		break;
> +	fi
> +
> +	shift
> +done
> +
> +if [ ! -d "$basedir" ]; then
> +	echo "Invalid basedir specification"
> +	usage
> +fi
> +if [ -d "$basedir/runner-0/" ]; then
> +	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
> +fi
> +
> +_tl_prepare_test_list
> +_tl_strip_test_list
>  
>  # grab all previously run tests and order them from highest runtime to lowest
>  # We are going to try to run the longer tests first, hopefully so we can avoid
> @@ -30,25 +126,23 @@ test_list+=$(awk '/^[0-9].*auto/ { print "xfs/" $1 }' tests/xfs/group.list)
>  #
>  # If we have tests in the test list that don't have runtimes recorded, then
>  # append them to be run last.
> -
> -build_runner_list()
> +time_order_test_list()
>  {
>  	local runtimes
>  	local run_list=()
> -	local prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
>  
>  	runtimes=$(cat $basedir/*/$prev_results/check.time | sort -k 2 -nr | cut -d " " -f 1)
>  
>  	# Iterate the timed list first. For every timed list entry that
>  	# is found in the test_list, add it to the local runner list.
>  	local -a _list=( $runtimes )
> -	local -a _tlist=( $test_list )
> +	local -a _tlist=( $_tl_tests )
>  	local rx=0
>  	local ix
>  	local jx
>  	#set -x
>  	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
> -		echo $test_list | grep -q ${_list[$ix]}
> +		echo $_tl_tests | grep -q ${_list[$ix]}
>  		if [ $? == 0 ]; then
>  			# add the test to the new run list and remove
>  			# it from the remaining test list.
> @@ -60,24 +154,27 @@ build_runner_list()
>  
>  	# The final test list is all the time ordered tests followed by
>  	# all the tests we didn't find time records for.
> -	test_list="${run_list[*]} ${_tlist[*]}"
> +	_tl_tests="${run_list[*]} ${_tlist[*]}"
>  }
>  
> -if [ -f $basedir/runner-0/results/check.time ]; then
> -	build_runner_list
> +if ! $_tl_randomise -a ! $_tl_exact_order; then
> +	if [ -f $basedir/runner-0/$prev_results/check.time ]; then
> +		time_order_test_list
> +	fi
>  fi
>  
>  # split the list amongst N runners
> -
>  split_runner_list()
>  {
>  	local ix
>  	local rx
> -	local -a _list=( $test_list )
> +	local -a _list=( $_tl_tests )
>  	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
>  		seq="${_list[$ix]}"
>  		rx=$((ix % $runners))
> -		runner_list[$rx]+="${_list[$ix]} "
> +		if ! _tl_expunge_test $seq; then
> +			runner_list[$rx]+="${_list[$ix]} "
> +		fi
>  		#echo $seq
>  	done
>  }
> @@ -137,7 +234,7 @@ runner_go()
>  
>  	# Run the tests in it's own mount namespace, as per the comment below
>  	# that precedes making the basedir a private mount.
> -	./src/nsexec -m ./check $check_args -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
> +	./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
>  
>  	wait
>  	sleep 1
> @@ -165,6 +262,13 @@ cleanup()
>  
>  trap "cleanup; exit" HUP INT QUIT TERM
>  
> +split_runner_list
> +if [ -n "$show_test_list" ]; then
> +	echo Time ordered test list:
> +	echo $_tl_tests
> +	echo
> +fi
> +
>  
>  # Each parallel test runner needs to only see it's own mount points. If we
>  # leave the basedir as shared, then all tests see all mounts and then we get
> @@ -178,15 +282,23 @@ trap "cleanup; exit" HUP INT QUIT TERM
>  # in it's own mount namespace so that they cannot see mounts that other tests
>  # are performing.
>  mount --make-private $basedir
> -split_runner_list
> +
>  now=`date +%Y-%m-%d-%H:%M:%S`
>  for ((i = 0; i < $runners; i++)); do
>  
> -	runner_go $i $now &
> +	if [ -n "$show_test_list" ]; then
> +		echo "Runner $i: ${runner_list[$i]}"
> +	else
> +		runner_go $i $now &
> +	fi
>  
>  done;
>  wait
>  
> +if [ -n "$show_test_list" ]; then
> +	exit 0
> +fi
> +
>  echo -n "Tests run: "
>  grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
>  
> @@ -198,7 +310,7 @@ grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\)
>  echo
>  
>  echo Ten slowest tests - runtime in seconds:
> -cat $basedir/*/results/check.time | sort -k 2 -nr | head -10
> +cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
Maybe have this default to top 10 list but make it parameterized, just
like we do it for DIFF_LENGTH (# number of diff lines from a failed
test, 0 for whole output)? Do you think that would be useful?
>  
>  echo
>  echo Cleanup on Aisle 5?
> diff --git a/common/test_list b/common/test_list
> index 2432be6f7..2b3ae9fbf 100644
> --- a/common/test_list
> +++ b/common/test_list
> @@ -24,6 +24,7 @@ _tl_file="$tmp.test_list"
>  _tl_exclude_tests=()
>  _tl_tests=
>  
> +# strip 'tests\' prefix from the provided test name
>  _tl_strip_src_dir()
>  {
>  	local test="$1"
> @@ -31,6 +32,12 @@ _tl_strip_src_dir()
>  	echo ${test#$_tl_src_dir/}
>  }
>  
> +# strip 'tests\' prefix from all the tests in the test list
> +_tl_strip_test_list()
> +{
> +	_tl_tests=$(echo $_tl_tests | sed -e "s/$_tl_src_dir\///g")
> +}
> +
>  get_sub_group_list()
>  {
>  	local d=$1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/28] check-parallel: use common group list parsing code
  2025-05-06 15:56   ` Nirjhar Roy (IBM)
@ 2025-05-21  4:13     ` Dave Chinner
  2025-05-26  6:58       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  4:13 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Tue, May 06, 2025 at 09:26:37PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Build the test list directly from command line prompts, rather
> > than hard coding the tests and using the check infrastructure to
> > filter that list.
> > 
> > We still pass exact test lists to check to execute the tests that
> > each runner needs to execute, but all other test list commands
> > are no longer passed to check.
> > 
> > As a result of this change, check-parallel no longer passes unknown
> > CLI parameters through to the internal check invocations. At this
> > point, the only non test-list related option is config file section
> > selection; more of the check options will be brought across as
> > needed in future patches.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>

.....
> > +usage()
> > +{
> > +    echo "Usage: $0 [options] [testlist]"'
> > +
> > +check options
> > +    -D <dir>		Directory to run in
> > +    -n			Output test list, do not run tests
> > +    -r			randomize test order
> > +    --exact-order	run tests in the exact order specified
> > +    -s section		run only specified section from config file
> > +
> > +testlist options
> > +    -g group[,group...]	include tests from these groups
> > +    -x group[,group...]	exclude tests from these groups
> > +    -X exclude_file	exclude individual tests
> > +    -e testlist         exclude a specific list of tests
> > +    -E external_file	exclude individual tests
> > +    [testlist]		include tests matching names in testlist
> > +
> > +testlist argument is a list of tests in the form of <test dir>/<test name>.
> > +
> > +<test dir> is a directory under tests that contains a group file,
> > +with a list of the names of the tests in that directory.
> > +
> > +<test name> may be either a specific test file name (e.g. xfs/001) or
> > +a test file name match pattern (e.g. xfs/*).
> > +
> > +group argument is either a name of a tests group to collect from all
> > +the test dirs (e.g. quick) or a name of a tests group to collect from
> > +a specific tests dir in the form of <test dir>/<group name> (e.g. xfs/quick).
> > +If you want to run all the tests in the test suite, use "-g all" to specify all
> > +groups.
> > +
> > +exclude_file argument refers to a name of a file inside each test directory.
> > +for every test dir where this file is found, the listed test names are
> > +excluded from the list of tests to run from that test dir.
> > +
> > +external_file argument is a path to a single file containing a list of tests
> > +to exclude in the form of <test dir>/<test name>.
> > +
> > +examples:
> > + check-parallel -D /mnt xfs/001
> > + check-parallel -D /mnt -g quick
> > + check-parallel -D /mnt -g xfs/quick
> > + check-parallel -D /mnt -x stress xfs/*
> > + check-parallel -D /mnt -X .exclude -g auto
> > + check-parallel -D /mnt -E ~/.xfstests.exclude
> > +'
> > +	    exit 1
> _fatal ?

This is a common usage() function pattern in shell scripts. It's
clear that it exits with a non-zero error status when someone gets a
command line parameter wrong.

Also, check-parallel does not have a status variable that needs to
be set to propagate exit values, nor does it have any traps set up
at this point to need exit value propagation.

Hence calling exit directly is really the right thing to do here.

> > @@ -198,7 +310,7 @@ grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\)
> >  echo
> >  
> >  echo Ten slowest tests - runtime in seconds:
> > -cat $basedir/*/results/check.time | sort -k 2 -nr | head -10
> > +cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
> Maybe have this default to top 10 list but make it parameterized, just
> like we do it for DIFF_LENGTH (# number of diff lines from a failed
> test, 0 for whole output)? Do you think that would be useful?

Not at this point in time. This is largely initial debug
information; it tells me which test(s) are consuming all the time,
and how close the overall runtime is to being bound by single test
runtime.  I only put this here because I was running that command on
the CLI after a test run frequently, and generally I was only
looking at the top 2 or 3 tests in the list....

e.g. if the whole auto group test run takes 9 minutes, and the
longest test takes 8m30s, then it is clear that total runtime is
bound by the slowest test. Make that test run faster, and
overall auto group test time will run faster, too.

This tells me what tests I need to focus on for runtime optimisation
(e.g. sync -> syncfs conversions). Once all the low hanging fruit is
gone, it won't tell us anything useful and I'll remove it...

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 06/28] check-parallel: use common group list parsing code
  2025-05-21  4:13     ` Dave Chinner
@ 2025-05-26  6:58       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  6:58 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 09:43, Dave Chinner wrote:
> On Tue, May 06, 2025 at 09:26:37PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> Build the test list directly from command line prompts, rather
>>> than hard coding the tests and using the check infrastructure to
>>> filter that list.
>>>
>>> We still pass exact test lists to check to execute the tests that
>>> each runner needs to execute, but all other test list commands
>>> are no longer passed to check.
>>>
>>> As a result of this change, check-parallel no longer passes unknown
>>> CLI parameters through to the internal check invocations. At this
>>> point, the only non test-list related option is config file section
>>> selection; more of the check options will be brought across as
>>> needed in future patches.
>>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> .....
>>> +usage()
>>> +{
>>> +    echo "Usage: $0 [options] [testlist]"'
>>> +
>>> +check options
>>> +    -D <dir>		Directory to run in
>>> +    -n			Output test list, do not run tests
>>> +    -r			randomize test order
>>> +    --exact-order	run tests in the exact order specified
>>> +    -s section		run only specified section from config file
>>> +
>>> +testlist options
>>> +    -g group[,group...]	include tests from these groups
>>> +    -x group[,group...]	exclude tests from these groups
>>> +    -X exclude_file	exclude individual tests
>>> +    -e testlist         exclude a specific list of tests
>>> +    -E external_file	exclude individual tests
>>> +    [testlist]		include tests matching names in testlist
>>> +
>>> +testlist argument is a list of tests in the form of <test dir>/<test name>.
>>> +
>>> +<test dir> is a directory under tests that contains a group file,
>>> +with a list of the names of the tests in that directory.
>>> +
>>> +<test name> may be either a specific test file name (e.g. xfs/001) or
>>> +a test file name match pattern (e.g. xfs/*).
>>> +
>>> +group argument is either a name of a tests group to collect from all
>>> +the test dirs (e.g. quick) or a name of a tests group to collect from
>>> +a specific tests dir in the form of <test dir>/<group name> (e.g. xfs/quick).
>>> +If you want to run all the tests in the test suite, use "-g all" to specify all
>>> +groups.
>>> +
>>> +exclude_file argument refers to a name of a file inside each test directory.
>>> +for every test dir where this file is found, the listed test names are
>>> +excluded from the list of tests to run from that test dir.
>>> +
>>> +external_file argument is a path to a single file containing a list of tests
>>> +to exclude in the form of <test dir>/<test name>.
>>> +
>>> +examples:
>>> + check-parallel -D /mnt xfs/001
>>> + check-parallel -D /mnt -g quick
>>> + check-parallel -D /mnt -g xfs/quick
>>> + check-parallel -D /mnt -x stress xfs/*
>>> + check-parallel -D /mnt -X .exclude -g auto
>>> + check-parallel -D /mnt -E ~/.xfstests.exclude
>>> +'
>>> +	    exit 1
>> _fatal ?
> This is a common usage() function pattern in shell scripts. It's
> clear that it exits with a non-zero error status when someone gets a
> command line parameter wrong.
>
> Also, check-parallel does not have a status variable that needs to
> be set to propagate exit values, nor does it have any traps set up
> at this point to need exit value propagation.
Oh, right. Yeah, check-parallel directly will exit from here, so yes 
exit 1 should be fine. Thanks.
>
> Hence calling exit directly is really the right thing to do here.
>
>>> @@ -198,7 +310,7 @@ grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\)
>>>   echo
>>>   
>>>   echo Ten slowest tests - runtime in seconds:
>>> -cat $basedir/*/results/check.time | sort -k 2 -nr | head -10
>>> +cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
>> Maybe have this default to top 10 list but make it parameterized, just
>> like we do it for DIFF_LENGTH (# number of diff lines from a failed
>> test, 0 for whole output)? Do you think that would be useful?
> Not at this point in time. This is largely initial debug
> information; it tells me which test(s) are consuming all the time,
> and how close the overall runtime is to being bound by single test
> runtime.  I only put this here because I was running that command on
> the CLI after a test run frequently, and generally I was only
> looking at the top 2 or 3 tests in the list....
>
> e.g. if the whole auto group test run takes 9 minutes, and the
> longest test takes 8m30s, then it is clear that total runtime is
> bound by the slowest test. Make that test run faster, and
> overall auto group test time will run faster, too.
>
> This tells me what tests I need to focus on for runtime optimisation
> (e.g. sync -> syncfs conversions). Once all the low hanging fruit is
> gone, it won't tell us anything useful and I'll remove it...

Ah, okay. Now I get the reason for introducing the information of the 
top slow running tests. However, we can still keep this feature - in 
case someone writes a new test and uses some functionality or command 
(like syncfs instead of sync) and slows it down, although that can be 
caught in code reviews too. Anyway, no hard preferences here. But thanks 
for pointing out the reason for introducing the top slow running tests 
information.

--NR

>
> -Dave.

-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 07/28] check-parallel: adjust concurrency according to CPU count
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (5 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 06/28] check-parallel: use common group list parsing code Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-07  6:45   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 08/28] check-parallel: add logwrite device support Dave Chinner
                   ` (20 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Concurrency is currently hard coded at 64 worker threads. This is
too many for small CPU count machines; the idea is to create a
sustained load of roughly one test per CPU as they are mostly single
threaded/single process tests. The number "64" was chosen because
I've been developing this functionality on a 64p VM.

Rather than hard coding the concurrency, probe the number of CPUs
available and create that many running contexts as the default
concurrency to use.

Further, add a CLI option to specify the number of threads to run so
that we can over- or under-commit the CPU resources to enable direct
benchmarking of performance with different levels of concurrency.

Let's use that capability to show how much check-parallel can
benefit small systems. Using a single check execution thread for all
tests inside a 4p control group to limit maximum CPU usage to the
equivalent of a small 4p machine:

$ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 1 -g quick -s xfs -x dump -X generic/531
Runner 0 Failures:  generic/504
Tests run: 921
Tests _notrun: 272
Failure count: 2
.....

real    61m31.362s
user    0m0.029s
sys     0m0.059s

the quick group on XFS takes *over an hour* to run.

If we use the same 4p control group setup and run with 8 test
execution threads to ensure the 4 CPUs are fully utilised for most
of the test run:

$ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 8 -g quick -s xfs -x dump -X generic/531
Runner 7 Failures:  generic/504
Tests run: 921
Tests _notrun: 145
Failure count: 1
.....

real    17m33.124s
user    0m0.009s
sys     0m0.017s

The same test run takes only 17m33s. The same number of tests were
run, the same failures occurred. [ Ignore the differences in
notrun/failure count - the multi-file aggregation currently doesn't
work correctly for the single log file case. ]

That's a reduction in test runtime of ~72% for a 4 CPU system. Or,
if we want to measure it the other way, we get a ~3.5x improvement
in runtime scalability. i.e. going from 1 -> 4 CPUs being used for
test execution (4x increase) we get a 3.5x improvement in
scalability when we go from check to check-parallel.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/check-parallel b/check-parallel
index cb5d6aedf..0649a417f 100755
--- a/check-parallel
+++ b/check-parallel
@@ -10,7 +10,7 @@
 # the loop devices.
 
 basedir=""
-runners=64
+runners=$(getconf _NPROCESSORS_CONF)
 runner_list=()
 runtimes=()
 show_test_list=
@@ -30,6 +30,7 @@ usage()
 
 check options
     -D <dir>		Directory to run in
+    -t <n>		Number of concurrent tests to  run
     -n			Output test list, do not run tests
     -r			randomize test order
     --exact-order	run tests in the exact order specified
@@ -81,6 +82,7 @@ while [ $# -gt 0 ]; do
 	-\? | -h | --help) usage ;;
 
 	-D)	basedir=$2; shift ;;
+	-t)	runners=$2; shift ;;
 	-g)	_tl_setup_group $2 ; shift ;;
 	-e)	_tl_setup_exclude_tests $2 ; shift ;;
 	-E)	_tl_setup_exclude_file $2 ; shift ;;
@@ -111,6 +113,11 @@ if [ ! -d "$basedir" ]; then
 	echo "Invalid basedir specification"
 	usage
 fi
+if [[ $runners -le 0 || $runners -gt 1024 ]]; then
+	echo "Invalid thread specificaton: $runners"
+	usage
+fi
+
 if [ -d "$basedir/runner-0/" ]; then
 	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
 fi
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/28] check-parallel: adjust concurrency according to CPU count
  2025-04-17  3:00 ` [PATCH 07/28] check-parallel: adjust concurrency according to CPU count Dave Chinner
@ 2025-05-07  6:45   ` Nirjhar Roy (IBM)
  2025-05-21  4:32     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-07  6:45 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Concurrency is currently hard coded at 64 worker threads. This is
> too many for small CPU count machines; the idea is to create a
> sustained load of roughly one test per CPU as they are mostly single
> threaded/single process tests. The number "64" was chosen because
> I've been developing this functionality on a 64p VM.
> 
> Rather than hard coding the concurrency, probe the number of CPUs
> available and create that many running contexts as the default
> concurrency to use.
> 
> Further, add a CLI option to specify the number of threads to run so
> that we can over- or under-commit the CPU resources to enable direct
> benchmarking of performance with different levels of concurrency.
> 
> Let's use that capability to show how much check-parallel can
> benefit small systems. Using a single check execution thread for all
> tests inside a 4p control group to limit maximum CPU usage to the
> equivalent of a small 4p machine:
> 
> $ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 1 -g quick -s xfs -x dump -X generic/531
> Runner 0 Failures:  generic/504
> Tests run: 921
> Tests _notrun: 272
> Failure count: 2
> .....
> 
> real    61m31.362s
> user    0m0.029s
> sys     0m0.059s
> 
> the quick group on XFS takes *over an hour* to run.
> 
> If we use the same 4p control group setup and run with 8 test
> execution threads to ensure the 4 CPUs are fully utilised for most
> of the test run:
> 
> $ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 8 -g quick -s xfs -x dump -X generic/531
> Runner 7 Failures:  generic/504
> Tests run: 921
> Tests _notrun: 145
> Failure count: 1
> .....
> 
> real    17m33.124s
> user    0m0.009s
> sys     0m0.017s
> 
> The same test run takes only 17m33s. The same number of tests were
> run, the same failures occurred. [ Ignore the differences in
> notrun/failure count - the multi-file aggregation currently doesn't
> work correctly for the single log file case. ]
> 
> That's a reduction in test runtime of ~72% for a 4 CPU system. Or,
> if we want to measure it the other way, we get a ~3.5x improvement
> in runtime scalability. i.e. going from 1 -> 4 CPUs being used for
> test execution (4x increase) we get a 3.5x improvement in
> scalability when we go from check to check-parallel.
The functionality looks useful to me and the implementation is also
fine as far as I understand. I have some minor/nit comments below.
Other than that:

Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/check-parallel b/check-parallel
> index cb5d6aedf..0649a417f 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -10,7 +10,7 @@
>  # the loop devices.
>  
>  basedir=""
> -runners=64
> +runners=$(getconf _NPROCESSORS_CONF)
Minor: Not related to this change. Maybe we should have a _get_nproc()
just like 
_get_page_size()
{
	echo $(getconf PAGE_SIZE)
}
_get_nproc()
{
	echo $(getconf _NPROCESSORS_CONF)
}
and replace all the $(getconf _NPROCESSORS_CONF) with $(_get_nproc)
>  runner_list=()
>  runtimes=()
>  show_test_list=
> @@ -30,6 +30,7 @@ usage()
>  
>  check options
>      -D <dir>		Directory to run in
> +    -t <n>		Number of concurrent tests to  run
Minor: Maybe we should mention the valid range of <n> i.e, 0 to 1024?
>      -n			Output test list, do not run tests

Nit: Maybe there is some spacing issue here? "Number of concurrent ..."
and "Output test..." don't begin together.
--NR
>      -r			randomize test order
>      --exact-order	run tests in the exact order specified
> @@ -81,6 +82,7 @@ while [ $# -gt 0 ]; do
>  	-\? | -h | --help) usage ;;
>  
>  	-D)	basedir=$2; shift ;;
> +	-t)	runners=$2; shift ;;
>  	-g)	_tl_setup_group $2 ; shift ;;
>  	-e)	_tl_setup_exclude_tests $2 ; shift ;;
>  	-E)	_tl_setup_exclude_file $2 ; shift ;;
> @@ -111,6 +113,11 @@ if [ ! -d "$basedir" ]; then
>  	echo "Invalid basedir specification"
>  	usage
>  fi
> +if [[ $runners -le 0 || $runners -gt 1024 ]]; then
> +	echo "Invalid thread specificaton: $runners"
Minor: Maybe we should mention the valid range of "runners" in this
error message (0 to 1024)?
> +	usage
> +fi
> +
>  if [ -d "$basedir/runner-0/" ]; then
>  	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
>  fi


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/28] check-parallel: adjust concurrency according to CPU count
  2025-05-07  6:45   ` Nirjhar Roy (IBM)
@ 2025-05-21  4:32     ` Dave Chinner
  2025-05-26  8:50       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  4:32 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, May 07, 2025 at 12:15:09PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Concurrency is currently hard coded at 64 worker threads. This is
> > too many for small CPU count machines; the idea is to create a
> > sustained load of roughly one test per CPU as they are mostly single
> > threaded/single process tests. The number "64" was chosen because
> > I've been developing this functionality on a 64p VM.
.....
> > diff --git a/check-parallel b/check-parallel
> > index cb5d6aedf..0649a417f 100755
> > --- a/check-parallel
> > +++ b/check-parallel
> > @@ -10,7 +10,7 @@
> >  # the loop devices.
> >  
> >  basedir=""
> > -runners=64
> > +runners=$(getconf _NPROCESSORS_CONF)
> Minor: Not related to this change. Maybe we should have a _get_nproc()
> just like 
> _get_page_size()
> {
> 	echo $(getconf PAGE_SIZE)
> }
> _get_nproc()
> {
> 	echo $(getconf _NPROCESSORS_CONF)
> }
> and replace all the $(getconf _NPROCESSORS_CONF) with $(_get_nproc)

I have thoughts on that.

I think determining test scaling based on the number of CPUs in the
machine is wrong. If I run in a cgroup that limits tests to 4p on a
64p machine, then fstests will think it is running on a 64p machine
rahter than on 4p. That's ... not ideal.

I think there should be a global variable that defines the
concurrency that tests should use to scale load/processes rather
than use the CPU count. Then check/check-parallel can set the
concurrency as they desire and we no longer have to worry about
tests creating excesively huge loads on high CPU count machines
because they were sized to load a small, low concurrency test
system.

i.e. I intend to separate the concurrency with which check-parallel
runs from the concurrency that individual tests use to scale. For
check-parallel, I am thinking of fixing test concurrency scaling at
something like min(nr_cpus, 8), whilst the check-parallel harness
itself uses nr_cpus to determine how many concurrent runners to
instantiate.

> >  runtimes=()
> >  show_test_list=
> > @@ -30,6 +30,7 @@ usage()
> >  
> >  check options
> >      -D <dir>		Directory to run in
> > +    -t <n>		Number of concurrent tests to  run
> Minor: Maybe we should mention the valid range of <n> i.e, 0 to 1024?
> >      -n			Output test list, do not run tests
> 
> Nit: Maybe there is some spacing issue here? "Number of concurrent ..."
> and "Output test..." don't begin together.

Not that I can see, I think this is just email quoting and tabs
doing funky stuff.

> >      -r			randomize test order
> >      --exact-order	run tests in the exact order specified
> > @@ -81,6 +82,7 @@ while [ $# -gt 0 ]; do
> >  	-\? | -h | --help) usage ;;
> >  
> >  	-D)	basedir=$2; shift ;;
> > +	-t)	runners=$2; shift ;;
> >  	-g)	_tl_setup_group $2 ; shift ;;
> >  	-e)	_tl_setup_exclude_tests $2 ; shift ;;
> >  	-E)	_tl_setup_exclude_file $2 ; shift ;;
> > @@ -111,6 +113,11 @@ if [ ! -d "$basedir" ]; then
> >  	echo "Invalid basedir specification"
> >  	usage
> >  fi
> > +if [[ $runners -le 0 || $runners -gt 1024 ]]; then
> > +	echo "Invalid thread specificaton: $runners"
> Minor: Maybe we should mention the valid range of "runners" in this
> error message (0 to 1024)?

/me shrugs.

It's an arbitrary "check-parallel won't ever scale past this" limit
because the diminishing returns from added concurrency that Amdahl's
Law define is already kicking in at 64 threads. If you are running
check-parallel on a >1024p machine, you already know enough to be
able to look at the source code to determine why the error is being
emitted. :)

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/28] check-parallel: adjust concurrency according to CPU count
  2025-05-21  4:32     ` Dave Chinner
@ 2025-05-26  8:50       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  8:50 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 10:02, Dave Chinner wrote:
> On Wed, May 07, 2025 at 12:15:09PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> Concurrency is currently hard coded at 64 worker threads. This is
>>> too many for small CPU count machines; the idea is to create a
>>> sustained load of roughly one test per CPU as they are mostly single
>>> threaded/single process tests. The number "64" was chosen because
>>> I've been developing this functionality on a 64p VM.
> .....
>>> diff --git a/check-parallel b/check-parallel
>>> index cb5d6aedf..0649a417f 100755
>>> --- a/check-parallel
>>> +++ b/check-parallel
>>> @@ -10,7 +10,7 @@
>>>   # the loop devices.
>>>   
>>>   basedir=""
>>> -runners=64
>>> +runners=$(getconf _NPROCESSORS_CONF)
>> Minor: Not related to this change. Maybe we should have a _get_nproc()
>> just like
>> _get_page_size()
>> {
>> 	echo $(getconf PAGE_SIZE)
>> }
>> _get_nproc()
>> {
>> 	echo $(getconf _NPROCESSORS_CONF)
>> }
>> and replace all the $(getconf _NPROCESSORS_CONF) with $(_get_nproc)
> I have thoughts on that.
>
> I think determining test scaling based on the number of CPUs in the
> machine is wrong. If I run in a cgroup that limits tests to 4p on a
> 64p machine, then fstests will think it is running on a 64p machine
> rahter than on 4p. That's ... not ideal.
>
> I think there should be a global variable that defines the
> concurrency that tests should use to scale load/processes rather
> than use the CPU count. Then check/check-parallel can set the
> concurrency as they desire and we no longer have to worry about
> tests creating excesively huge loads on high CPU count machines
> because they were sized to load a small, low concurrency test
> system.
>
> i.e. I intend to separate the concurrency with which check-parallel
> runs from the concurrency that individual tests use to scale. For
> check-parallel, I am thinking of fixing test concurrency scaling at
> something like min(nr_cpus, 8), whilst the check-parallel harness
> itself uses nr_cpus to determine how many concurrent runners to
> instantiate.
Okay yes. That makes sense.
>
>>>   runtimes=()
>>>   show_test_list=
>>> @@ -30,6 +30,7 @@ usage()
>>>   
>>>   check options
>>>       -D <dir>		Directory to run in
>>> +    -t <n>		Number of concurrent tests to  run
>> Minor: Maybe we should mention the valid range of <n> i.e, 0 to 1024?
>>>       -n			Output test list, do not run tests
>> Nit: Maybe there is some spacing issue here? "Number of concurrent ..."
>> and "Output test..." don't begin together.
> Not that I can see, I think this is just email quoting and tabs
> doing funky stuff.
Okay.
>>>       -r			randomize test order
>>>       --exact-order	run tests in the exact order specified
>>> @@ -81,6 +82,7 @@ while [ $# -gt 0 ]; do
>>>   	-\? | -h | --help) usage ;;
>>>   
>>>   	-D)	basedir=$2; shift ;;
>>> +	-t)	runners=$2; shift ;;
>>>   	-g)	_tl_setup_group $2 ; shift ;;
>>>   	-e)	_tl_setup_exclude_tests $2 ; shift ;;
>>>   	-E)	_tl_setup_exclude_file $2 ; shift ;;
>>> @@ -111,6 +113,11 @@ if [ ! -d "$basedir" ]; then
>>>   	echo "Invalid basedir specification"
>>>   	usage
>>>   fi
>>> +if [[ $runners -le 0 || $runners -gt 1024 ]]; then
>>> +	echo "Invalid thread specificaton: $runners"
>> Minor: Maybe we should mention the valid range of "runners" in this
>> error message (0 to 1024)?
> /me shrugs.
>
> It's an arbitrary "check-parallel won't ever scale past this" limit
> because the diminishing returns from added concurrency that Amdahl's
> Law define is already kicking in at 64 threads. If you are running
> check-parallel on a >1024p machine, you already know enough to be
> able to look at the source code to determine why the error is being
> emitted. :)

Yeah. Got it.

This patch looks okay to me. Thanks for the clarifications.

--NR

>
> -Dave.

-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 08/28] check-parallel: add logwrite device support
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (6 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 07/28] check-parallel: adjust concurrency according to CPU count Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-07  8:18   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI Dave Chinner
                   ` (19 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Every logwrite test will use the same /dev/mapper/<dev>
name for the logwrites device, so we also need to convert
common/dmlogwrite to use per-test device names as we have done for
other dm devices.

Then add a per-test-runner LOGWRITES_DEV so that all tests using
dm-logwrites now get run by check-parallel.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel     | 4 ++++
 common/dmlogwrites | 5 +++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/check-parallel b/check-parallel
index 0649a417f..5fee62f37 100755
--- a/check-parallel
+++ b/check-parallel
@@ -216,12 +216,14 @@ runner_go()
 	local me=$basedir/runner-$id
 	local _test=$me/test.img
 	local _scratch=$me/scratch.img
+	local _logwrites=$me/logwrites.img
 	local _results=$me/results-$2
 
 	mkdir -p $me
 
 	xfs_io -f -c 'truncate 2g' $_test
 	xfs_io -f -c 'truncate 8g' $_scratch
+	xfs_io -f -c 'truncate 1g' $_logwrites
 
 	mkfs.xfs -f $_test > /dev/null 2>&1
 
@@ -229,6 +231,7 @@ runner_go()
 	export TEST_DIR=$me/test
 	export SCRATCH_DEV=$(_create_loop_device $_scratch)
 	export SCRATCH_MNT=$me/scratch
+	export LOGWRITES_DEV=$(_create_loop_device $_logwrites)
 	export FSTYP=xfs
 	export RESULT_BASE=$_results
 
@@ -249,6 +252,7 @@ runner_go()
 	umount -R $SCRATCH_MNT 2> /dev/null
 	_destroy_loop_device $TEST_DEV
 	_destroy_loop_device $SCRATCH_DEV
+	_destroy_loop_device $LOGWRITES_DEV
 
 	grep -q Failures: $me/log
 	if [ $? -eq 0 ]; then
diff --git a/common/dmlogwrites b/common/dmlogwrites
index a27e1966a..7c3ad95c9 100644
--- a/common/dmlogwrites
+++ b/common/dmlogwrites
@@ -4,6 +4,9 @@
 #
 # common functions for setting up and tearing down a dm log-writes device
 
+LOGWRITES_NAME=logwrites-$seq
+LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
+
 _require_log_writes()
 {
 	[ -z "$LOGWRITES_DEV" -o ! -b "$LOGWRITES_DEV" ] && \
@@ -81,8 +84,6 @@ _log_writes_init()
 		BLK_DEV_SIZE=$((length / blksz))
 	fi
 
-	LOGWRITES_NAME=logwrites-test
-	LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
 	LOGWRITES_TABLE="0 $BLK_DEV_SIZE log-writes $blkdev $LOGWRITES_DEV"
 	_dmsetup_create $LOGWRITES_NAME --table "$LOGWRITES_TABLE" || \
 		_fail "failed to create log-writes device"
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/28] check-parallel: add logwrite device support
  2025-04-17  3:00 ` [PATCH 08/28] check-parallel: add logwrite device support Dave Chinner
@ 2025-05-07  8:18   ` Nirjhar Roy (IBM)
  2025-05-21 10:07     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-07  8:18 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Every logwrite test will use the same /dev/mapper/<dev>
> name for the logwrites device, so we also need to convert
> common/dmlogwrite to use per-test device names as we have done for
> other dm devices.
> 
> Then add a per-test-runner LOGWRITES_DEV so that all tests using
> dm-logwrites now get run by check-parallel.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel     | 4 ++++
>  common/dmlogwrites | 5 +++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index 0649a417f..5fee62f37 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -216,12 +216,14 @@ runner_go()
>  	local me=$basedir/runner-$id
>  	local _test=$me/test.img
>  	local _scratch=$me/scratch.img
> +	local _logwrites=$me/logwrites.img
>  	local _results=$me/results-$2
>  
>  	mkdir -p $me
>  
>  	xfs_io -f -c 'truncate 2g' $_test
>  	xfs_io -f -c 'truncate 8g' $_scratch
> +	xfs_io -f -c 'truncate 1g' $_logwrites
>  
>  	mkfs.xfs -f $_test > /dev/null 2>&1
>  
> @@ -229,6 +231,7 @@ runner_go()
>  	export TEST_DIR=$me/test
>  	export SCRATCH_DEV=$(_create_loop_device $_scratch)
>  	export SCRATCH_MNT=$me/scratch
> +	export LOGWRITES_DEV=$(_create_loop_device $_logwrites)
>  	export FSTYP=xfs
>  	export RESULT_BASE=$_results
>  
> @@ -249,6 +252,7 @@ runner_go()
>  	umount -R $SCRATCH_MNT 2> /dev/null
>  	_destroy_loop_device $TEST_DEV
>  	_destroy_loop_device $SCRATCH_DEV
> +	_destroy_loop_device $LOGWRITES_DEV
>  
>  	grep -q Failures: $me/log
>  	if [ $? -eq 0 ]; then
> diff --git a/common/dmlogwrites b/common/dmlogwrites
> index a27e1966a..7c3ad95c9 100644
> --- a/common/dmlogwrites
> +++ b/common/dmlogwrites
> @@ -4,6 +4,9 @@
>  #
>  # common functions for setting up and tearing down a dm log-writes device
>  
> +LOGWRITES_NAME=logwrites-$seq
if 2 different runners are running tests with the same $seq, won't
there be a conflict? For example runner-0 is running xfs/xyz and
runner-1 is running generic/xyz ? 
The rest looks fine to me. 
--NR
> +LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
> +
>  _require_log_writes()
>  {
>  	[ -z "$LOGWRITES_DEV" -o ! -b "$LOGWRITES_DEV" ] && \
> @@ -81,8 +84,6 @@ _log_writes_init()
>  		BLK_DEV_SIZE=$((length / blksz))
>  	fi
>  
> -	LOGWRITES_NAME=logwrites-test
> -	LOGWRITES_DMDEV=/dev/mapper/$LOGWRITES_NAME
>  	LOGWRITES_TABLE="0 $BLK_DEV_SIZE log-writes $blkdev $LOGWRITES_DEV"
>  	_dmsetup_create $LOGWRITES_NAME --table "$LOGWRITES_TABLE" || \
>  		_fail "failed to create log-writes device"


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/28] check-parallel: add logwrite device support
  2025-05-07  8:18   ` Nirjhar Roy (IBM)
@ 2025-05-21 10:07     ` Dave Chinner
  2025-05-26  8:59       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21 10:07 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, May 07, 2025 at 01:48:03PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Every logwrite test will use the same /dev/mapper/<dev>
> > name for the logwrites device, so we also need to convert
> > common/dmlogwrite to use per-test device names as we have done for
> > other dm devices.
> > 
> > Then add a per-test-runner LOGWRITES_DEV so that all tests using
> > dm-logwrites now get run by check-parallel.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
.....
> > diff --git a/common/dmlogwrites b/common/dmlogwrites
> > index a27e1966a..7c3ad95c9 100644
> > --- a/common/dmlogwrites
> > +++ b/common/dmlogwrites
> > @@ -4,6 +4,9 @@
> >  #
> >  # common functions for setting up and tearing down a dm log-writes device
> >  
> > +LOGWRITES_NAME=logwrites-$seq
> if 2 different runners are running tests with the same $seq, won't
> there be a conflict? For example runner-0 is running xfs/xyz and
> runner-1 is running generic/xyz ? 

Yes, it could happen, but it won't happen right now as there is no
overlap in test numbers using dmlogwrites between different test
directories.

I do need to solve this generically (i.e. for all the dm device
types) in the near future because I want to be able to run the same
test N times in parallel. e.g do flakey test profiling really
quickly over thousands of iterations by running 64 instances of the
test at the same time.

For this use case, the dm device name cannot rely on the test
name/sequence number at all, and so I'm going to have to come up
with a unique ID of some kind for this purpose. It may be as simple
as encoding the unique runner ID along with the test sequence number
into the device name - it's not a difficult issue to solve.

Hence I haven't spent much more thought on it than that, because
I've answered the two main questions that were relevant: "can
I solve it?" (yes) and "do I need to solve it now?" (no).

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 08/28] check-parallel: add logwrite device support
  2025-05-21 10:07     ` Dave Chinner
@ 2025-05-26  8:59       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  8:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 15:37, Dave Chinner wrote:
> On Wed, May 07, 2025 at 01:48:03PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> Every logwrite test will use the same /dev/mapper/<dev>
>>> name for the logwrites device, so we also need to convert
>>> common/dmlogwrite to use per-test device names as we have done for
>>> other dm devices.
>>>
>>> Then add a per-test-runner LOGWRITES_DEV so that all tests using
>>> dm-logwrites now get run by check-parallel.
>>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> .....
>>> diff --git a/common/dmlogwrites b/common/dmlogwrites
>>> index a27e1966a..7c3ad95c9 100644
>>> --- a/common/dmlogwrites
>>> +++ b/common/dmlogwrites
>>> @@ -4,6 +4,9 @@
>>>   #
>>>   # common functions for setting up and tearing down a dm log-writes device
>>>   
>>> +LOGWRITES_NAME=logwrites-$seq
>> if 2 different runners are running tests with the same $seq, won't
>> there be a conflict? For example runner-0 is running xfs/xyz and
>> runner-1 is running generic/xyz ?
> Yes, it could happen, but it won't happen right now as there is no
> overlap in test numbers using dmlogwrites between different test
> directories.
>
> I do need to solve this generically (i.e. for all the dm device
> types) in the near future because I want to be able to run the same
> test N times in parallel. e.g do flakey test profiling really
> quickly over thousands of iterations by running 64 instances of the
> test at the same time.
>
> For this use case, the dm device name cannot rely on the test
> name/sequence number at all, and so I'm going to have to come up
> with a unique ID of some kind for this purpose. It may be as simple
> as encoding the unique runner ID along with the test sequence number
> into the device name - it's not a difficult issue to solve.
>
> Hence I haven't spent much more thought on it than that, because
> I've answered the two main questions that were relevant: "can
> I solve it?" (yes) and "do I need to solve it now?" (no).

Okay. Yes, I just wanted to point this out and confirm if my 
understanding is correct.

--NR

>
> -Dave.

-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (7 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 08/28] check-parallel: add logwrite device support Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-07  8:49   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation Dave Chinner
                   ` (18 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Add a CLI option to specify the initial FSTYP to test. If this is
not specified the the default of "xfs" will be used. This option is
different to the way check has FSTYP specified as check-parallel
has no infrastructure to support non block device based filesystems
and hence we have to reject virtual or network based filesysetms
are this point in time.

Note: This patch only implements default mkfs parameter support for
the test device. Config sections can be used to override this as
check will then format the test device when the section that defines
non-default test device mkfs options is selected.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/check-parallel b/check-parallel
index 5fee62f37..19f2d2b0c 100755
--- a/check-parallel
+++ b/check-parallel
@@ -18,7 +18,7 @@ run_section=""
 
 tmp=/tmp/check-parallel.$$
 
-export FSTYP=xfs
+FSTYP=
 
 . ./common/exit
 . ./common/test_names
@@ -35,6 +35,7 @@ check options
     -r			randomize test order
     --exact-order	run tests in the exact order specified
     -s section		run only specified section from config file
+    -f <FSTYPE>		specify the filesystem type to test
 
 testlist options
     -g group[,group...]	include tests from these groups
@@ -66,16 +67,39 @@ external_file argument is a path to a single file containing a list of tests
 to exclude in the form of <test dir>/<test name>.
 
 examples:
- check-parallel -D /mnt xfs/001
- check-parallel -D /mnt -g quick
+ check-parallel -f xfs -D /mnt xfs/001
+ check-parallel -f ext4 -D /mnt -g quick
  check-parallel -D /mnt -g xfs/quick
  check-parallel -D /mnt -x stress xfs/*
- check-parallel -D /mnt -X .exclude -g auto
- check-parallel -D /mnt -E ~/.xfstests.exclude
+ check-parallel -f btrfs -D /mnt -X .exclude -g auto
+ check-parallel -f udf -D /mnt -E ~/.xfstests.exclude
 '
 	    exit 1
 }
 
+# Only support block device based filesystems with generic mkfs support
+# at the moment.
+is_supported_fstype()
+{
+	local fstype=$1
+
+	case $fstype in
+	xfs)		;;
+	ext2|ext3|ext4)	;;
+	udf)		;;
+	jfs)		;;
+	f2fs)		;;
+	btrfs)		;;
+	bcachefs)	;;
+	gfs2)		;;
+	ocfs2)		;;
+	*)
+		echo "unsupported FSTYPE: $fstype"
+		usage
+		;;
+	esac
+}
+
 # Process command arguments first.
 while [ $# -gt 0 ]; do
 	case "$1" in
@@ -92,6 +116,8 @@ while [ $# -gt 0 ]; do
 	--exact-order) _tl_setup_ordered ;;
 	-n)	show_test_list="yes" ;;
 
+	-f)	is_supported_fstype $2 ; FSTYP=$2; shift ;;
+
 	-s)	run_section="$run_section -s $2"; shift ;;
 
 	-*)	usage ;;
@@ -109,6 +135,8 @@ while [ $# -gt 0 ]; do
 	shift
 done
 
+export FSTYP=${FSTYP:=xfs}
+
 if [ ! -d "$basedir" ]; then
 	echo "Invalid basedir specification"
 	usage
@@ -225,8 +253,6 @@ runner_go()
 	xfs_io -f -c 'truncate 8g' $_scratch
 	xfs_io -f -c 'truncate 1g' $_logwrites
 
-	mkfs.xfs -f $_test > /dev/null 2>&1
-
 	export TEST_DEV=$(_create_loop_device $_test)
 	export TEST_DIR=$me/test
 	export SCRATCH_DEV=$(_create_loop_device $_scratch)
@@ -240,6 +266,10 @@ runner_go()
 	mkdir -p $RESULT_BASE
 	rm -f $RESULT_BASE/check.*
 
+	# Only supports default mkfs parameters right now
+	wipefs -a $TEST_DEV > /dev/null 2>&1
+	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
+
 #	export DUMP_CORRUPT_FS=1
 
 	# Run the tests in it's own mount namespace, as per the comment below
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI
  2025-04-17  3:00 ` [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI Dave Chinner
@ 2025-05-07  8:49   ` Nirjhar Roy (IBM)
  2025-05-21 10:17     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-07  8:49 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Add a CLI option to specify the initial FSTYP to test. If this is
> not specified the the default of "xfs" will be used. This option is
> different to the way check has FSTYP specified as check-parallel
> has no infrastructure to support non block device based filesystems
> and hence we have to reject virtual or network based filesysetms
> are this point in time.
> 
> Note: This patch only implements default mkfs parameter support for
> the test device. Config sections can be used to override this as
> check will then format the test device when the section that defines
> non-default test device mkfs options is selected.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 44 +++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 37 insertions(+), 7 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index 5fee62f37..19f2d2b0c 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -18,7 +18,7 @@ run_section=""
>  
>  tmp=/tmp/check-parallel.$$
>  
> -export FSTYP=xfs
> +FSTYP=
>  
>  . ./common/exit
>  . ./common/test_names
> @@ -35,6 +35,7 @@ check options
>      -r			randomize test order
>      --exact-order	run tests in the exact order specified
>      -s section		run only specified section from config file
> +    -f <FSTYPE>		specify the filesystem type to test
>  
>  testlist options
>      -g group[,group...]	include tests from these groups
> @@ -66,16 +67,39 @@ external_file argument is a path to a single file containing a list of tests
>  to exclude in the form of <test dir>/<test name>.
>  
>  examples:
> - check-parallel -D /mnt xfs/001
> - check-parallel -D /mnt -g quick
> + check-parallel -f xfs -D /mnt xfs/001
> + check-parallel -f ext4 -D /mnt -g quick
>   check-parallel -D /mnt -g xfs/quick
>   check-parallel -D /mnt -x stress xfs/*
> - check-parallel -D /mnt -X .exclude -g auto
> - check-parallel -D /mnt -E ~/.xfstests.exclude
> + check-parallel -f btrfs -D /mnt -X .exclude -g auto
> + check-parallel -f udf -D /mnt -E ~/.xfstests.exclude
>  '
>  	    exit 1
>  }
>  
> +# Only support block device based filesystems with generic mkfs support
> +# at the moment.
> +is_supported_fstype()
> +{
> +	local fstype=$1
> +
> +	case $fstype in
> +	xfs)		;;
> +	ext2|ext3|ext4)	;;
> +	udf)		;;
> +	jfs)		;;
> +	f2fs)		;;
> +	btrfs)		;;
> +	bcachefs)	;;
> +	gfs2)		;;
> +	ocfs2)		;;
Extremely minor: Maybe list the test in alphabetical order?
The rest looks fine to me. 

Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
> +	*)
> +		echo "unsupported FSTYPE: $fstype"
> +		usage
> +		;;
> +	esac
> +}
> +
>  # Process command arguments first.
>  while [ $# -gt 0 ]; do
>  	case "$1" in
> @@ -92,6 +116,8 @@ while [ $# -gt 0 ]; do
>  	--exact-order) _tl_setup_ordered ;;
>  	-n)	show_test_list="yes" ;;
>  
> +	-f)	is_supported_fstype $2 ; FSTYP=$2; shift ;;
> +
>  	-s)	run_section="$run_section -s $2"; shift ;;
>  
>  	-*)	usage ;;
> @@ -109,6 +135,8 @@ while [ $# -gt 0 ]; do
>  	shift
>  done
>  
> +export FSTYP=${FSTYP:=xfs}
> +
>  if [ ! -d "$basedir" ]; then
>  	echo "Invalid basedir specification"
>  	usage
> @@ -225,8 +253,6 @@ runner_go()
>  	xfs_io -f -c 'truncate 8g' $_scratch
>  	xfs_io -f -c 'truncate 1g' $_logwrites
>  
> -	mkfs.xfs -f $_test > /dev/null 2>&1
> -
>  	export TEST_DEV=$(_create_loop_device $_test)
>  	export TEST_DIR=$me/test
>  	export SCRATCH_DEV=$(_create_loop_device $_scratch)
> @@ -240,6 +266,10 @@ runner_go()
>  	mkdir -p $RESULT_BASE
>  	rm -f $RESULT_BASE/check.*
>  
> +	# Only supports default mkfs parameters right now
> +	wipefs -a $TEST_DEV > /dev/null 2>&1
> +	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
> +
>  #	export DUMP_CORRUPT_FS=1
>  
>  	# Run the tests in it's own mount namespace, as per the comment below


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI
  2025-05-07  8:49   ` Nirjhar Roy (IBM)
@ 2025-05-21 10:17     ` Dave Chinner
  2025-05-26  9:00       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21 10:17 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, May 07, 2025 at 02:19:31PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Add a CLI option to specify the initial FSTYP to test. If this is
> > not specified the the default of "xfs" will be used. This option is
> > different to the way check has FSTYP specified as check-parallel
> > has no infrastructure to support non block device based filesystems
> > and hence we have to reject virtual or network based filesysetms
> > are this point in time.
> > 
> > Note: This patch only implements default mkfs parameter support for
> > the test device. Config sections can be used to override this as
> > check will then format the test device when the section that defines
> > non-default test device mkfs options is selected.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  check-parallel | 44 +++++++++++++++++++++++++++++++++++++-------
> >  1 file changed, 37 insertions(+), 7 deletions(-)
> > 
> > diff --git a/check-parallel b/check-parallel
> > index 5fee62f37..19f2d2b0c 100755
> > --- a/check-parallel
> > +++ b/check-parallel
> > @@ -18,7 +18,7 @@ run_section=""
> >  
> >  tmp=/tmp/check-parallel.$$
> >  
> > -export FSTYP=xfs
> > +FSTYP=
> >  
> >  . ./common/exit
> >  . ./common/test_names
> > @@ -35,6 +35,7 @@ check options
> >      -r			randomize test order
> >      --exact-order	run tests in the exact order specified
> >      -s section		run only specified section from config file
> > +    -f <FSTYPE>		specify the filesystem type to test
> >  
> >  testlist options
> >      -g group[,group...]	include tests from these groups
> > @@ -66,16 +67,39 @@ external_file argument is a path to a single file containing a list of tests
> >  to exclude in the form of <test dir>/<test name>.
> >  
> >  examples:
> > - check-parallel -D /mnt xfs/001
> > - check-parallel -D /mnt -g quick
> > + check-parallel -f xfs -D /mnt xfs/001
> > + check-parallel -f ext4 -D /mnt -g quick
> >   check-parallel -D /mnt -g xfs/quick
> >   check-parallel -D /mnt -x stress xfs/*
> > - check-parallel -D /mnt -X .exclude -g auto
> > - check-parallel -D /mnt -E ~/.xfstests.exclude
> > + check-parallel -f btrfs -D /mnt -X .exclude -g auto
> > + check-parallel -f udf -D /mnt -E ~/.xfstests.exclude
> >  '
> >  	    exit 1
> >  }
> >  
> > +# Only support block device based filesystems with generic mkfs support
> > +# at the moment.
> > +is_supported_fstype()
> > +{
> > +	local fstype=$1
> > +
> > +	case $fstype in
> > +	xfs)		;;
> > +	ext2|ext3|ext4)	;;
> > +	udf)		;;
> > +	jfs)		;;
> > +	f2fs)		;;
> > +	btrfs)		;;
> > +	bcachefs)	;;
> > +	gfs2)		;;
> > +	ocfs2)		;;
> Extremely minor: Maybe list the test in alphabetical order?

I copied the list from a case statement somewhere in common/rc that
was already in that order. I don't it's worth the time and brain
cells to order it, let alone require anyone to maintain that order
over the long term. It's just a simple string check....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI
  2025-05-21 10:17     ` Dave Chinner
@ 2025-05-26  9:00       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  9:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 15:47, Dave Chinner wrote:
> On Wed, May 07, 2025 at 02:19:31PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> Add a CLI option to specify the initial FSTYP to test. If this is
>>> not specified the the default of "xfs" will be used. This option is
>>> different to the way check has FSTYP specified as check-parallel
>>> has no infrastructure to support non block device based filesystems
>>> and hence we have to reject virtual or network based filesysetms
>>> are this point in time.
>>>
>>> Note: This patch only implements default mkfs parameter support for
>>> the test device. Config sections can be used to override this as
>>> check will then format the test device when the section that defines
>>> non-default test device mkfs options is selected.
>>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>>> ---
>>>   check-parallel | 44 +++++++++++++++++++++++++++++++++++++-------
>>>   1 file changed, 37 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/check-parallel b/check-parallel
>>> index 5fee62f37..19f2d2b0c 100755
>>> --- a/check-parallel
>>> +++ b/check-parallel
>>> @@ -18,7 +18,7 @@ run_section=""
>>>   
>>>   tmp=/tmp/check-parallel.$$
>>>   
>>> -export FSTYP=xfs
>>> +FSTYP=
>>>   
>>>   . ./common/exit
>>>   . ./common/test_names
>>> @@ -35,6 +35,7 @@ check options
>>>       -r			randomize test order
>>>       --exact-order	run tests in the exact order specified
>>>       -s section		run only specified section from config file
>>> +    -f <FSTYPE>		specify the filesystem type to test
>>>   
>>>   testlist options
>>>       -g group[,group...]	include tests from these groups
>>> @@ -66,16 +67,39 @@ external_file argument is a path to a single file containing a list of tests
>>>   to exclude in the form of <test dir>/<test name>.
>>>   
>>>   examples:
>>> - check-parallel -D /mnt xfs/001
>>> - check-parallel -D /mnt -g quick
>>> + check-parallel -f xfs -D /mnt xfs/001
>>> + check-parallel -f ext4 -D /mnt -g quick
>>>    check-parallel -D /mnt -g xfs/quick
>>>    check-parallel -D /mnt -x stress xfs/*
>>> - check-parallel -D /mnt -X .exclude -g auto
>>> - check-parallel -D /mnt -E ~/.xfstests.exclude
>>> + check-parallel -f btrfs -D /mnt -X .exclude -g auto
>>> + check-parallel -f udf -D /mnt -E ~/.xfstests.exclude
>>>   '
>>>   	    exit 1
>>>   }
>>>   
>>> +# Only support block device based filesystems with generic mkfs support
>>> +# at the moment.
>>> +is_supported_fstype()
>>> +{
>>> +	local fstype=$1
>>> +
>>> +	case $fstype in
>>> +	xfs)		;;
>>> +	ext2|ext3|ext4)	;;
>>> +	udf)		;;
>>> +	jfs)		;;
>>> +	f2fs)		;;
>>> +	btrfs)		;;
>>> +	bcachefs)	;;
>>> +	gfs2)		;;
>>> +	ocfs2)		;;
>> Extremely minor: Maybe list the test in alphabetical order?
> I copied the list from a case statement somewhere in common/rc that
> was already in that order. I don't it's worth the time and brain
> cells to order it, let alone require anyone to maintain that order
> over the long term. It's just a simple string check....

Yes, that's fine.

--NR

>
> -Dave.

-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (8 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-07  9:02   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 11/28] check-parallel: initial support for specifying device sizes Dave Chinner
                   ` (17 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

This provides isolation between individual runners so that they
cannot see the processes that other test runners have created.
This means tools like pkill will only find processes run by the test
that is calling it, hence there is no danger that it might kill
processes owned by a different test in a different runner context.

This means we need to turn of private pid/mount namespaces inside
check itself - nesting private namespaces causes all sorts of weird
TEST_DIR mount issues with SELinux disallowing TEST_DIR mounts.

Note that this also regresses generic/504 for check-parallel,
because it triggers looking at the init namespace from
FSTESTS_ISOL=privatens and this variable is not exported to test
subprocesses anymore.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/check-parallel b/check-parallel
index 19f2d2b0c..aa88c681e 100755
--- a/check-parallel
+++ b/check-parallel
@@ -267,14 +267,18 @@ runner_go()
 	rm -f $RESULT_BASE/check.*
 
 	# Only supports default mkfs parameters right now
-	wipefs -a $TEST_DEV > /dev/null 2>&1
-	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
+	wipefs -a $TEST_DEV > $me/log 2>&1
+	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
 
 #	export DUMP_CORRUPT_FS=1
 
 	# Run the tests in it's own mount namespace, as per the comment below
 	# that precedes making the basedir a private mount.
-	./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
+	#
+	# Similarly, we need to run check in it's own PID namespace so that
+	# operations like pkill only affect the runner instance, not globally
+	# kill processes from other check instances.
+	tools/run_privatens ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
 
 	wait
 	sleep 1
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation
  2025-04-17  3:00 ` [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation Dave Chinner
@ 2025-05-07  9:02   ` Nirjhar Roy (IBM)
  2025-05-21 10:19     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-07  9:02 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This provides isolation between individual runners so that they
> cannot see the processes that other test runners have created.
> This means tools like pkill will only find processes run by the test
> that is calling it, hence there is no danger that it might kill
> processes owned by a different test in a different runner context.
> 
> This means we need to turn of private pid/mount namespaces inside
> check itself - nesting private namespaces causes all sorts of weird
> TEST_DIR mount issues with SELinux disallowing TEST_DIR mounts.
> 
> Note that this also regresses generic/504 for check-parallel,
> because it triggers looking at the init namespace from
> FSTESTS_ISOL=privatens and this variable is not exported to test
> subprocesses anymore.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index 19f2d2b0c..aa88c681e 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -267,14 +267,18 @@ runner_go()
>  	rm -f $RESULT_BASE/check.*
>  
>  	# Only supports default mkfs parameters right now
> -	wipefs -a $TEST_DEV > /dev/null 2>&1
> -	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
> +	wipefs -a $TEST_DEV > $me/log 2>&1
> +	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
>  
>  #	export DUMP_CORRUPT_FS=1
>  
>  	# Run the tests in it's own mount namespace, as per the comment below
>  	# that precedes making the basedir a private mount.
> -	./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
> +	#
> +	# Similarly, we need to run check in it's own PID namespace so that
> +	# operations like pkill only affect the runner instance, not globally
> +	# kill processes from other check instances.
> +	tools/run_privatens ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
Why aren't we using -s $section and only using $section with check?
--NR
>  
>  	wait
>  	sleep 1


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation
  2025-05-07  9:02   ` Nirjhar Roy (IBM)
@ 2025-05-21 10:19     ` Dave Chinner
  2025-05-26  9:04       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21 10:19 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, May 07, 2025 at 02:32:29PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > This provides isolation between individual runners so that they
> > cannot see the processes that other test runners have created.
> > This means tools like pkill will only find processes run by the test
> > that is calling it, hence there is no danger that it might kill
> > processes owned by a different test in a different runner context.
> > 
> > This means we need to turn of private pid/mount namespaces inside
> > check itself - nesting private namespaces causes all sorts of weird
> > TEST_DIR mount issues with SELinux disallowing TEST_DIR mounts.
> > 
> > Note that this also regresses generic/504 for check-parallel,
> > because it triggers looking at the init namespace from
> > FSTESTS_ISOL=privatens and this variable is not exported to test
> > subprocesses anymore.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  check-parallel | 10 +++++++---
> >  1 file changed, 7 insertions(+), 3 deletions(-)
> > 
> > diff --git a/check-parallel b/check-parallel
> > index 19f2d2b0c..aa88c681e 100755
> > --- a/check-parallel
> > +++ b/check-parallel
> > @@ -267,14 +267,18 @@ runner_go()
> >  	rm -f $RESULT_BASE/check.*
> >  
> >  	# Only supports default mkfs parameters right now
> > -	wipefs -a $TEST_DEV > /dev/null 2>&1
> > -	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
> > +	wipefs -a $TEST_DEV > $me/log 2>&1
> > +	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
> >  
> >  #	export DUMP_CORRUPT_FS=1
> >  
> >  	# Run the tests in it's own mount namespace, as per the comment below
> >  	# that precedes making the basedir a private mount.
> > -	./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
> > +	#
> > +	# Similarly, we need to run check in it's own PID namespace so that
> > +	# operations like pkill only affect the runner instance, not globally
> > +	# kill processes from other check instances.
> > +	tools/run_privatens ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
> Why aren't we using -s $section and only using $section with check?

It is already encoded in $run_section - the CLI parsing adds the -s
for each section that is specified on the command line:

        -s)     run_section="$run_section -s $2"; shift ;;

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation
  2025-05-21 10:19     ` Dave Chinner
@ 2025-05-26  9:04       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  9:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 15:49, Dave Chinner wrote:
> On Wed, May 07, 2025 at 02:32:29PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> This provides isolation between individual runners so that they
>>> cannot see the processes that other test runners have created.
>>> This means tools like pkill will only find processes run by the test
>>> that is calling it, hence there is no danger that it might kill
>>> processes owned by a different test in a different runner context.
>>>
>>> This means we need to turn of private pid/mount namespaces inside
>>> check itself - nesting private namespaces causes all sorts of weird
>>> TEST_DIR mount issues with SELinux disallowing TEST_DIR mounts.
>>>
>>> Note that this also regresses generic/504 for check-parallel,
>>> because it triggers looking at the init namespace from
>>> FSTESTS_ISOL=privatens and this variable is not exported to test
>>> subprocesses anymore.
>>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>>> ---
>>>   check-parallel | 10 +++++++---
>>>   1 file changed, 7 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/check-parallel b/check-parallel
>>> index 19f2d2b0c..aa88c681e 100755
>>> --- a/check-parallel
>>> +++ b/check-parallel
>>> @@ -267,14 +267,18 @@ runner_go()
>>>   	rm -f $RESULT_BASE/check.*
>>>   
>>>   	# Only supports default mkfs parameters right now
>>> -	wipefs -a $TEST_DEV > /dev/null 2>&1
>>> -	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV > /dev/null 2>&1
>>> +	wipefs -a $TEST_DEV > $me/log 2>&1
>>> +	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
>>>   
>>>   #	export DUMP_CORRUPT_FS=1
>>>   
>>>   	# Run the tests in it's own mount namespace, as per the comment below
>>>   	# that precedes making the basedir a private mount.
>>> -	./src/nsexec -m ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} > $me/log 2>&1
>>> +	#
>>> +	# Similarly, we need to run check in it's own PID namespace so that
>>> +	# operations like pkill only affect the runner instance, not globally
>>> +	# kill processes from other check instances.
>>> +	tools/run_privatens ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
>> Why aren't we using -s $section and only using $section with check?
> It is already encoded in $run_section - the CLI parsing adds the -s
> for each section that is specified on the command line:
>
>          -s)     run_section="$run_section -s $2"; shift ;;

Yeah, right. I realized that later. Thanks

-NR

>
> -Dave.
>
-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 11/28] check-parallel: initial support for specifying device sizes
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (9 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-07 10:05   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 12/28] config: move config section code to it's own file Dave Chinner
                   ` (16 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Rather than hard coding loop device sizes, abstract them out into
environment variables with defined default values. This allows
a check-parallel wrapper to specify sizes or, in future, for them
to be read from a config file. Sizes are specified in human readable
values using M/G/T suffixes to indicate the units being specified

Whilst doing this also add support for creating all the external
devices that XFS uses during fstests execution. A typical setup
will be something like:

TEST_DEV_SIZE=10G
TEST_RTDEV_SIZE=10G
TEST_LOGDEV_SIZE=128M
SCRATCH_DEV_SIZE=20G
SCRATCH_RTDEV_SIZE=20G
SCRATCH_LOGDEV_SIZE=512M
LOGWRITES_DEV_SIZE=2G

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel | 35 ++++++++++++++++++++++++++++++-----
 1 file changed, 30 insertions(+), 5 deletions(-)

diff --git a/check-parallel b/check-parallel
index aa88c681e..5bb44b6a5 100755
--- a/check-parallel
+++ b/check-parallel
@@ -18,6 +18,14 @@ run_section=""
 
 tmp=/tmp/check-parallel.$$
 
+TEST_DEV_SIZE=${TEST_DEV_SIZE:=10G}
+TEST_RTDEV_SIZE=${TEST_RTDEV_SIZE:=10G}
+TEST_LOGDEV_SIZE=${TEST_LOGDEV_SIZE:=128M}
+SCRATCH_DEV_SIZE=${SCRATCH_DEV_SIZE:=20G}
+SCRATCH_RTDEV_SIZE=${SCRATCH_RTDEV_SIZE:=20G}
+SCRATCH_LOGDEV_SIZE=${SCRATCH_LOGDEV_SIZE:=512M}
+LOGWRITES_DEV_SIZE=${LOGWRITES_DEV_SIZE:=2G}
+
 FSTYP=
 
 . ./common/exit
@@ -116,7 +124,7 @@ while [ $# -gt 0 ]; do
 	--exact-order) _tl_setup_ordered ;;
 	-n)	show_test_list="yes" ;;
 
-	-f)	is_supported_fstype $2 ; FSTYP=$2; shift ;;
+	-f)	is_supported_fstype $2 ; export FSTYP=$2; shift ;;
 
 	-s)	run_section="$run_section -s $2"; shift ;;
 
@@ -243,22 +251,35 @@ runner_go()
 	local id=$1
 	local me=$basedir/runner-$id
 	local _test=$me/test.img
+	local _test_rt=$me/test-rt.img
+	local _test_log=$me/test-log.img
 	local _scratch=$me/scratch.img
+	local _scratch_rt=$me/scratch-rt.img
+	local _scratch_log=$me/scratch-log.img
 	local _logwrites=$me/logwrites.img
 	local _results=$me/results-$2
 
 	mkdir -p $me
 
-	xfs_io -f -c 'truncate 2g' $_test
-	xfs_io -f -c 'truncate 8g' $_scratch
-	xfs_io -f -c 'truncate 1g' $_logwrites
+	xfs_io -f -c "truncate $TEST_DEV_SIZE" $_test
+	xfs_io -f -c "truncate $TEST_RTDEV_SIZE" $_test_rt
+	xfs_io -f -c "truncate $TEST_LOGDEV_SIZE" $_test_log
+	xfs_io -f -c "truncate $SCRATCH_DEV_SIZE" $_scratch
+	xfs_io -f -c "truncate $SCRATCH_RTDEV_SIZE" $_scratch_rt
+	xfs_io -f -c "truncate $SCRATCH_LOGDEV_SIZE" $_scratch_log
+	xfs_io -f -c "truncate $LOGWRITES_DEV_SIZE" $_logwrites
 
 	export TEST_DEV=$(_create_loop_device $_test)
+	export TEST_RTDEV=$(_create_loop_device $_test_rt)
+	export TEST_LOGDEV=$(_create_loop_device $_test_log)
 	export TEST_DIR=$me/test
+
 	export SCRATCH_DEV=$(_create_loop_device $_scratch)
+	export SCRATCH_RTDEV=$(_create_loop_device $_scratch_rt)
+	export SCRATCH_LOGDEV=$(_create_loop_device $_scratch_log)
 	export SCRATCH_MNT=$me/scratch
+
 	export LOGWRITES_DEV=$(_create_loop_device $_logwrites)
-	export FSTYP=xfs
 	export RESULT_BASE=$_results
 
 	mkdir -p $TEST_DIR
@@ -285,7 +306,11 @@ runner_go()
 	umount -R $TEST_DIR 2> /dev/null
 	umount -R $SCRATCH_MNT 2> /dev/null
 	_destroy_loop_device $TEST_DEV
+	_destroy_loop_device $TEST_RTDEV
+	_destroy_loop_device $TEST_LOGDEV
 	_destroy_loop_device $SCRATCH_DEV
+	_destroy_loop_device $SCRATCH_RTDEV
+	_destroy_loop_device $SCRATCH_LOGDEV
 	_destroy_loop_device $LOGWRITES_DEV
 
 	grep -q Failures: $me/log
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/28] check-parallel: initial support for specifying device sizes
  2025-04-17  3:00 ` [PATCH 11/28] check-parallel: initial support for specifying device sizes Dave Chinner
@ 2025-05-07 10:05   ` Nirjhar Roy (IBM)
  2025-05-21 11:11     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-07 10:05 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Rather than hard coding loop device sizes, abstract them out into
> environment variables with defined default values. This allows
> a check-parallel wrapper to specify sizes or, in future, for them
> to be read from a config file. Sizes are specified in human readable
> values using M/G/T suffixes to indicate the units being specified
> 
> Whilst doing this also add support for creating all the external
> devices that XFS uses during fstests execution. A typical setup
> will be something like:
> 
> TEST_DEV_SIZE=10G
> TEST_RTDEV_SIZE=10G
> TEST_LOGDEV_SIZE=128M
> SCRATCH_DEV_SIZE=20G
> SCRATCH_RTDEV_SIZE=20G
> SCRATCH_LOGDEV_SIZE=512M
> LOGWRITES_DEV_SIZE=2G
Should we document these new variables in README?

The functionality looks fine to me. I have some refactoring suggestions
below (nothing functional). 
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 35 ++++++++++++++++++++++++++++++-----
>  1 file changed, 30 insertions(+), 5 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index aa88c681e..5bb44b6a5 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -18,6 +18,14 @@ run_section=""
>  
>  tmp=/tmp/check-parallel.$$
>  
> +TEST_DEV_SIZE=${TEST_DEV_SIZE:=10G}
> +TEST_RTDEV_SIZE=${TEST_RTDEV_SIZE:=10G}
> +TEST_LOGDEV_SIZE=${TEST_LOGDEV_SIZE:=128M}
> +SCRATCH_DEV_SIZE=${SCRATCH_DEV_SIZE:=20G}
> +SCRATCH_RTDEV_SIZE=${SCRATCH_RTDEV_SIZE:=20G}
> +SCRATCH_LOGDEV_SIZE=${SCRATCH_LOGDEV_SIZE:=512M}
> +LOGWRITES_DEV_SIZE=${LOGWRITES_DEV_SIZE:=2G}
> +
>  FSTYP=
>  
>  . ./common/exit
> @@ -116,7 +124,7 @@ while [ $# -gt 0 ]; do
>  	--exact-order) _tl_setup_ordered ;;
>  	-n)	show_test_list="yes" ;;
>  
> -	-f)	is_supported_fstype $2 ; FSTYP=$2; shift ;;
> +	-f)	is_supported_fstype $2 ; export FSTYP=$2; shift ;;
>  
>  	-s)	run_section="$run_section -s $2"; shift ;;
>  
> @@ -243,22 +251,35 @@ runner_go()
>  	local id=$1
>  	local me=$basedir/runner-$id
>  	local _test=$me/test.img
> +	local _test_rt=$me/test-rt.img
> +	local _test_log=$me/test-log.img
>  	local _scratch=$me/scratch.img
> +	local _scratch_rt=$me/scratch-rt.img
> +	local _scratch_log=$me/scratch-log.img
>  	local _logwrites=$me/logwrites.img
>  	local _results=$me/results-$2
>  
>  	mkdir -p $me
>  
> -	xfs_io -f -c 'truncate 2g' $_test
> -	xfs_io -f -c 'truncate 8g' $_scratch
> -	xfs_io -f -c 'truncate 1g' $_logwrites
> +	xfs_io -f -c "truncate $TEST_DEV_SIZE" $_test
> +	xfs_io -f -c "truncate $TEST_RTDEV_SIZE" $_test_rt
> +	xfs_io -f -c "truncate $TEST_LOGDEV_SIZE" $_test_log
> +	xfs_io -f -c "truncate $SCRATCH_DEV_SIZE" $_scratch
> +	xfs_io -f -c "truncate $SCRATCH_RTDEV_SIZE" $_scratch_rt
> +	xfs_io -f -c "truncate $SCRATCH_LOGDEV_SIZE" $_scratch_log
> +	xfs_io -f -c "truncate $LOGWRITES_DEV_SIZE" $_logwrites
Do you like the following refactoring:


declare -a devices_map=(
	"$TEST_DEV_SIZE:$_test:TEST_DEV"
	"$TEST_RTDEV_SIZE:$_test_rt:TEST_RTDEV"
	"$TEST_LOGDEV_SIZE:$_test_log:TEST_LOGDEV"
	"$SCRATCH_DEV_SIZE:$_scratch:SCRATCH_DEV"
	"$SCRATCH_RTDEV_SIZE:$_scratch_rt:SCRATCH_RTDEV"
	"$SCRATCH_LOGDEV_SIZE:$_scratch_log:SCRATCH_LOGDEV"
	"$LOGWRITES_DEV_SIZE:$_logwrites:LOGWRITES_DEV"
)
create_dev_images devices_map

create_dev_images() {
    # this executes commands like 
    # xfs_io -f -c "truncate 10G  <test-file.img>"
    # for all the devices
    local  -n dev_list="$1"  
    for entry in "${dev_list[@]}"; do
        IFS=':' read -r -a tuple <<< "$entry"
        xfs_io -f -c "truncate ${tuple[0]}  ${tuple[1]}"
    done
}

>  
>  	export TEST_DEV=$(_create_loop_device $_test)
> +	export TEST_RTDEV=$(_create_loop_device $_test_rt)
> +	export TEST_LOGDEV=$(_create_loop_device $_test_log)
>  	export TEST_DIR=$me/test
> +
>  	export SCRATCH_DEV=$(_create_loop_device $_scratch)
> +	export SCRATCH_RTDEV=$(_create_loop_device $_scratch_rt)
> +	export SCRATCH_LOGDEV=$(_create_loop_device $_scratch_log)
>  	export SCRATCH_MNT=$me/scratch
> +
>  	export LOGWRITES_DEV=$(_create_loop_device $_logwrites)
Something similar as above:

create_devices()
{
    # This exports all the device related environment variables
    # like TEST_DEV, SCRATCH_DEV etc.
 
    local -n dev_list="$1" 
    IFS=':' read -r -a tuple <<< "$entry"
    eval "export ${tuple[2]}=\$(_create_loop_device ${tuple[1]})"
	
}
and then simply call
create_devices devices_map
export TEST_DIR=$me/test
export SCRATCH_MNT=$me/scratch


> -	export FSTYP=xfs
>  	export RESULT_BASE=$_results
>  
>  	mkdir -p $TEST_DIR
> @@ -285,7 +306,11 @@ runner_go()
>  	umount -R $TEST_DIR 2> /dev/null
>  	umount -R $SCRATCH_MNT 2> /dev/null
>  	_destroy_loop_device $TEST_DEV
> +	_destroy_loop_device $TEST_RTDEV
> +	_destroy_loop_device $TEST_LOGDEV
>  	_destroy_loop_device $SCRATCH_DEV
> +	_destroy_loop_device $SCRATCH_RTDEV
> +	_destroy_loop_device $SCRATCH_LOGDEV
>  	_destroy_loop_device $LOGWRITES_DEV
destroy_devices()
{
    # This destroys all the loop devices that were created. 

    local -n dev_list="$1"
    IFS=':' read -r -a tuple <<< "$entry"
    eval "_destroy_loop_device \$${tuple[2]}"
}
and then call
destroy_devices devices_map. 
IMO, this reduces the number of lines of code and adding an entry to
devices_map automatically does the devices creation, removal and
setting of environment variables without having to manually add them.
This is just a suggestion - please let me know what do you think about
this.
--NR
>  
>  	grep -q Failures: $me/log


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 11/28] check-parallel: initial support for specifying device sizes
  2025-05-07 10:05   ` Nirjhar Roy (IBM)
@ 2025-05-21 11:11     ` Dave Chinner
  0 siblings, 0 replies; 80+ messages in thread
From: Dave Chinner @ 2025-05-21 11:11 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, May 07, 2025 at 03:35:29PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Rather than hard coding loop device sizes, abstract them out into
> > environment variables with defined default values. This allows
> > a check-parallel wrapper to specify sizes or, in future, for them
> > to be read from a config file. Sizes are specified in human readable
> > values using M/G/T suffixes to indicate the units being specified
> > 
> > Whilst doing this also add support for creating all the external
> > devices that XFS uses during fstests execution. A typical setup
> > will be something like:
> > 
> > TEST_DEV_SIZE=10G
> > TEST_RTDEV_SIZE=10G
> > TEST_LOGDEV_SIZE=128M
> > SCRATCH_DEV_SIZE=20G
> > SCRATCH_RTDEV_SIZE=20G
> > SCRATCH_LOGDEV_SIZE=512M
> > LOGWRITES_DEV_SIZE=2G
> Should we document these new variables in README?

Not at this point in time. This patch is just introducing
the code to create all the necessary devices for a runner, the sizes
of which need to be specified somewhere for it to work.

....
> > @@ -243,22 +251,35 @@ runner_go()
> >  	local id=$1
> >  	local me=$basedir/runner-$id
> >  	local _test=$me/test.img
> > +	local _test_rt=$me/test-rt.img
> > +	local _test_log=$me/test-log.img
> >  	local _scratch=$me/scratch.img
> > +	local _scratch_rt=$me/scratch-rt.img
> > +	local _scratch_log=$me/scratch-log.img
> >  	local _logwrites=$me/logwrites.img
> >  	local _results=$me/results-$2
> >  
> >  	mkdir -p $me
> >  
> > -	xfs_io -f -c 'truncate 2g' $_test
> > -	xfs_io -f -c 'truncate 8g' $_scratch
> > -	xfs_io -f -c 'truncate 1g' $_logwrites
> > +	xfs_io -f -c "truncate $TEST_DEV_SIZE" $_test
> > +	xfs_io -f -c "truncate $TEST_RTDEV_SIZE" $_test_rt
> > +	xfs_io -f -c "truncate $TEST_LOGDEV_SIZE" $_test_log
> > +	xfs_io -f -c "truncate $SCRATCH_DEV_SIZE" $_scratch
> > +	xfs_io -f -c "truncate $SCRATCH_RTDEV_SIZE" $_scratch_rt
> > +	xfs_io -f -c "truncate $SCRATCH_LOGDEV_SIZE" $_scratch_log
> > +	xfs_io -f -c "truncate $LOGWRITES_DEV_SIZE" $_logwrites
> Do you like the following refactoring:
> 
> 
> declare -a devices_map=(
> 	"$TEST_DEV_SIZE:$_test:TEST_DEV"
> 	"$TEST_RTDEV_SIZE:$_test_rt:TEST_RTDEV"
> 	"$TEST_LOGDEV_SIZE:$_test_log:TEST_LOGDEV"
> 	"$SCRATCH_DEV_SIZE:$_scratch:SCRATCH_DEV"
> 	"$SCRATCH_RTDEV_SIZE:$_scratch_rt:SCRATCH_RTDEV"
> 	"$SCRATCH_LOGDEV_SIZE:$_scratch_log:SCRATCH_LOGDEV"
> 	"$LOGWRITES_DEV_SIZE:$_logwrites:LOGWRITES_DEV"
> )
> create_dev_images devices_map
> 
> create_dev_images() {
>     # this executes commands like 
>     # xfs_io -f -c "truncate 10G  <test-file.img>"
>     # for all the devices
>     local  -n dev_list="$1"  
>     for entry in "${dev_list[@]}"; do
>         IFS=':' read -r -a tuple <<< "$entry"
>         xfs_io -f -c "truncate ${tuple[0]}  ${tuple[1]}"
>     done
> }

My brain hurts just looking at that. That's way too clever and
complex for a dumb engineer like myself - it took me several minutes
to sorta work out what it does and how it works.

However, I'm still not sure how it works, because the image file
variables have different values for every runner context. i.e. we'd
need a map per runner, not a single global map. I don't know if that
variable is evaluated at declaration time, when it is read into the
array, when the array variable is evaluated, how that interacts with
background execution, etc. I'm not sure I can work that out without
help of the bash man page and watching the output when 'set -x'
has been issued.

Compared to reading a bunch of truncate commands in a couple of
seconds and being certain the code is doing what it should be doing,
adding a device map like this is adding a lot of extra cognitive
load and analysis every time I look at this code.

IMO, there's a good reason for writing code that is dumb as a
hammer: everyone knows how to use a hammer. The KISS principle is
everyone's friend...

> > @@ -285,7 +306,11 @@ runner_go()
> >  	umount -R $TEST_DIR 2> /dev/null
> >  	umount -R $SCRATCH_MNT 2> /dev/null
> >  	_destroy_loop_device $TEST_DEV
> > +	_destroy_loop_device $TEST_RTDEV
> > +	_destroy_loop_device $TEST_LOGDEV
> >  	_destroy_loop_device $SCRATCH_DEV
> > +	_destroy_loop_device $SCRATCH_RTDEV
> > +	_destroy_loop_device $SCRATCH_LOGDEV
> >  	_destroy_loop_device $LOGWRITES_DEV
> destroy_devices()
> {
>     # This destroys all the loop devices that were created. 
> 
>     local -n dev_list="$1"
>     IFS=':' read -r -a tuple <<< "$entry"
>     eval "_destroy_loop_device \$${tuple[2]}"
> }
> and then call
> destroy_devices devices_map. 

Once everything has settled, I'll be replacing the loop destroy code
with a single call to 'losetup -D' from the check-parallel cleanup
trap.  This will detatch all the image files from all the loop
devices in a single command, rather than doing them one at a time in
every runner context.

> IMO, this reduces the number of lines of code and adding an entry to
> devices_map automatically does the devices creation, removal and
> setting of environment variables without having to manually add them.
> This is just a suggestion - please let me know what do you think about
> this.

I can't tell if it works or not without spending a lot more time
trying to understand it. I've already spent way too much time trying
to understand it well enough to write a coherent response.

I think that about sums it up....

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 12/28] config: move config section code to it's own file
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (10 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 11/28] check-parallel: initial support for specifying device sizes Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-09  6:09   ` Nirjhar Roy
  2025-04-17  3:00 ` [PATCH 13/28] check-parallel: introduce config file support Dave Chinner
                   ` (15 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Move the config section parsing, checking and setup code from
common/config to common/config-section so that it can be included
directly in contexts where the rest of common/config is not needed.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 common/config          | 382 +---------------------------------------
 common/config-sections | 390 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 392 insertions(+), 380 deletions(-)
 create mode 100644 common/config-sections

diff --git a/common/config b/common/config
index 5081c300a..f90a66862 100644
--- a/common/config
+++ b/common/config
@@ -41,6 +41,7 @@
 
 . common/test_names
 . common/exit
+. common/config-sections
 
 # all tests should use a common language setting to prevent golden
 # output mismatches.
@@ -544,386 +545,7 @@ _source_specific_fs()
 	esac
 }
 
-known_hosts()
-{
-	[ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs
-
-	[ -f /etc/xfsqa.config ]             && export HOST_OPTIONS=/etc/xfsqa.config
-	[ -f $HOST_CONFIG_DIR/$HOST ]        && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST
-	[ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
-}
-
-# Returns a list of sections in config file
-# Each section starts with the section name in the format
-# [section_name1]. Only alphanumeric characters and '_' is allowed
-# in the section name otherwise the section will not be resognised.
-# Section name must be contained between square brackets.
-get_config_sections() {
-	sed -n -e "s/^\[\([[:alnum:]_-]*\)\]/\1/p" < $1
-}
-
-if [ ! -f "$HOST_OPTIONS" ]; then
-	known_hosts
-fi
-
-export HOST_OPTIONS_SECTIONS="-no-sections-"
-export OPTIONS_HAVE_SECTIONS=false
-if [ -f "$HOST_OPTIONS" ]; then
-	export HOST_OPTIONS_SECTIONS=`get_config_sections $HOST_OPTIONS`
-	if [ -z "$HOST_OPTIONS_SECTIONS" ]; then
-		. $HOST_OPTIONS
-		export HOST_OPTIONS_SECTIONS="-no-sections-"
-	else
-		export OPTIONS_HAVE_SECTIONS=true
-	fi
-fi
-
-_check_device()
-{
-	local name=$1
-	local dev_needed=$2
-	local dev=$3
-
-	if [ -z "$dev" ]; then
-		if [ "$dev_needed" == "required" ]; then
-			_fatal "common/config: $name is required but not defined!"
-		fi
-		return 0
-	fi
-
-	if [ -b "$dev" ] || ( echo $dev | grep -qE ":|//" ); then
-		# block device or a network url
-		return 0
-	fi
-
-	case "$FSTYP" in
-	9p|fuse|tmpfs|virtiofs|afs)
-		# 9p, fuse, virtiofs and afs mount tags are just plain strings,
-		# so anything is allowed tmpfs doesn't use mount source, ignore
-		;;
-	ceph)
-		# ceph has two different possible syntaxes for mount devices. The
-		# network URL check above catches the legacy syntax. Check for the
-		# new-style syntax here.
-		if ( echo $dev | grep -qEv "=/" ); then
-			_fatal "common/config: $name ($dev) is not a valid ceph mount string"
-		fi
-		;;
-	overlay)
-		if [ ! -d "$dev" ]; then
-			_fatal "common/config: $name ($dev) is not a directory for overlay"
-		fi
-		;;
-	ubifs)
-		if [ ! -c "$dev" ]; then
-			_fatal "common/config: $name ($dev) is not a character device"
-		fi
-		;;
-	ceph-fuse)
-		;;
-	*)
-		_fatal "common/config: $name ($dev) is not a block device or a network filesystem"
-	esac
-}
-
-# check and return a canonical mount point path
-_canonicalize_mountpoint()
-{
-	local name=$1
-	local dir=$2
-
-	if [ -d "$dir" ]; then
-		# this follows symlinks and removes all trailing "/"s
-		readlink -e "$dir"
-		return 0
-	fi
-
-	if [ "$FSTYP" != "overlay" ] || [[ "$name" == OVL_BASE_* ]]; then
-		_fatal "common/config: $name ($dir) is not a directory"
-	fi
-
-	# base fs may not be mounted yet, so just check that parent dir
-	# exists (where base fs will be mounted) because we are going to
-	# mkdir the overlay mount point dir anyway
-	local base=`basename $dir`
-	local parent=`dirname $dir`
-	parent=`_canonicalize_mountpoint OVL_BASE_$name "$parent"`
-
-	# prepend the overlay mount point to canonical parent path
-	echo "$parent/$base"
-}
-
-# Enables usage of /dev/disk/by-id/ symlinks to persist target devices
-# over reboots
-_canonicalize_devices()
-{
-	if [ "$CANON_DEVS" != "yes" ]; then
-		return
-	fi
-	[ -L "$TEST_DEV" ]	&& TEST_DEV=$(readlink -e "$TEST_DEV")
-	[ -L "$SCRATCH_DEV" ]	&& SCRATCH_DEV=$(readlink -e "$SCRATCH_DEV")
-	[ -L "$TEST_LOGDEV" ]	&& TEST_LOGDEV=$(readlink -e "$TEST_LOGDEV")
-	[ -L "$TEST_RTDEV" ]	&& TEST_RTDEV=$(readlink -e "$TEST_RTDEV")
-	[ -L "$SCRATCH_RTDEV" ]	&& SCRATCH_RTDEV=$(readlink -e "$SCRATCH_RTDEV")
-	[ -L "$LOGWRITES_DEV" ]	&& LOGWRITES_DEV=$(readlink -e "$LOGWRITES_DEV")
-	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
-		local NEW_SCRATCH_POOL=""
-		for i in $SCRATCH_DEV_POOL; do
-			if [ -L $i ]; then
-				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $(readlink -e $i)"
-			else
-				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $i"
-			fi
-		done
-		SCRATCH_DEV_POOL="$NEW_SCRATCH_POOL"
-	fi
-}
-
-# On check -overlay, for the non multi section config case, this
-# function is called on every test, before init_rc().
-# When SCRATCH/TEST_* vars are defined in config file, config file
-# is sourced on every test and this function overrides the vars
-# every time.
-# When SCRATCH/TEST_* vars are defined in evironment and not
-# in config file, this function is called after vars have already
-# been overriden in the previous test.
-# In that case, TEST_DEV is a directory and not a blockdev/chardev and
-# the function will return without overriding the SCRATCH/TEST_* vars.
-_overlay_config_override()
-{
-	# There are 2 options for configuring overlayfs tests:
-	#
-	# 1. (legacy) SCRATCH/TEST_DEV point to existing directories
-	#    on an already mounted fs.  In this case, the new
-	#    OVL_BASE_SCRATCH/TEST_* vars are set to use the legacy
-	#    vars values (even though they may not be mount points).
-	#
-	[ ! -d "$TEST_DEV" ] || export OVL_BASE_TEST_DIR="$TEST_DEV"
-	[ ! -d "$SCRATCH_DEV" ] || export OVL_BASE_SCRATCH_MNT="$SCRATCH_DEV"
-
-	# Config file may specify base fs type, but we obay -overlay flag
-	[ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
-	export FSTYP=overlay
-
-	# 2. SCRATCH/TEST_DEV point to the base fs partitions.  In this case,
-	#    the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
-	#    of the configured base fs and SCRATCH/TEST_DEV vars are set to the
-	#    overlayfs base and mount dirs inside base fs mount.
-	[ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
-
-	# Store original base fs vars
-	export OVL_BASE_TEST_DEV="$TEST_DEV"
-	export OVL_BASE_TEST_DIR="$TEST_DIR"
-	# If config does not set MOUNT_OPTIONS, its value may be
-	# leftover from previous _overlay_config_override, so
-	# don't use that value for base fs mount
-	[ "$MOUNT_OPTIONS" != "$OVERLAY_MOUNT_OPTIONS" ] || unset MOUNT_OPTIONS
-	export OVL_BASE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
-
-	# Set TEST vars to overlay base and mount dirs inside base fs
-	export TEST_DEV="$OVL_BASE_TEST_DIR"
-	export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
-	export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
-
-	[ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
-
-	# Store original base fs vars
-	export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
-	export OVL_BASE_SCRATCH_MNT="$SCRATCH_MNT"
-
-	# Set SCRATCH vars to overlay base and mount dirs inside base fs
-	export SCRATCH_DEV="$OVL_BASE_SCRATCH_MNT"
-	export SCRATCH_MNT="$OVL_BASE_SCRATCH_MNT/$OVL_MNT"
-
-	# Set fsck options, use default if user not set directly.
-	export FSCK_OPTIONS="$OVERLAY_FSCK_OPTIONS"
-	[ -z "$FSCK_OPTIONS" ] && _fsck_opts
-	export IDMAPPED_MOUNTS="$IDMAPPED_MOUNTS"
-}
-
-_overlay_config_restore()
-{
-	export OVERLAY=true
-	[ -z "$OVL_BASE_FSTYP" ] || export FSTYP=$OVL_BASE_FSTYP
-	[ -z "$OVL_BASE_TEST_DEV" ] || export TEST_DEV=$OVL_BASE_TEST_DEV
-	[ -z "$OVL_BASE_TEST_DIR" ] || export TEST_DIR=$OVL_BASE_TEST_DIR
-	[ -z "$OVL_BASE_SCRATCH_DEV" ] || export SCRATCH_DEV=$OVL_BASE_SCRATCH_DEV
-	[ -z "$OVL_BASE_SCRATCH_MNT" ] || export SCRATCH_MNT=$OVL_BASE_SCRATCH_MNT
-	[ -z "$OVL_BASE_MOUNT_OPTIONS" ] || export MOUNT_OPTIONS=$OVL_BASE_MOUNT_OPTIONS
-}
-
-# Parse config section options. This function will parse all the configuration
-# within a single section which name is passed as an argument. For section
-# name format see comments in get_config_sections().
-# Empty lines and everything after '#' will be ignored.
-# Configuration options should be defined in the format
-#
-# CONFIG_OPTION=value
-#
-# This 'CONFIG_OPTION' variable and will be exported as an environment variable.
-parse_config_section() {
-	SECTION=$1
-	if ! $OPTIONS_HAVE_SECTIONS; then
-		return 0
-	fi
-	eval `sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
-		-e 's/#.*$//' \
-		-e 's/[[:space:]]*$//' \
-		-e 's/^[[:space:]]*//' \
-		-e "s/^\([^=]*\)=\"\?'\?\([^\"']*\)\"\?'\?$/export \1=\"\2\"/" \
-		< $HOST_OPTIONS \
-		| sed -n -e "/^\[$SECTION\]/,/^\s*\[/{/^[^#].*\=.*/p;}"`
-}
-
-get_next_config() {
-	if [ ! -z "$CONFIG_INCLUDED" ] && ! $OPTIONS_HAVE_SECTIONS; then
-		return 0
-	fi
-
-	# We might have overriden FSTYP and TEST/SCRATCH vars with overlay values
-	# in the previous section, so restore them to original values stored in
-	# OVL_BASE_*.
-	# We need to do this *before* old FSTYP and MOUNT_OPTIONS are recorded
-	# and *before* SCRATCH_DEV and MOUNT_OPTIONS are unset
-	if [ "$FSTYP" == "overlay" ]; then
-		_overlay_config_restore
-	fi
-
-	local OLD_FSTYP=$FSTYP
-	local OLD_MOUNT_OPTIONS=$MOUNT_OPTIONS
-	local OLD_TEST_FS_MOUNT_OPTS=$TEST_FS_MOUNT_OPTS
-	local OLD_MKFS_OPTIONS=$MKFS_OPTIONS
-	local OLD_FSCK_OPTIONS=$FSCK_OPTIONS
-	local OLD_USE_EXTERNAL=$USE_EXTERNAL
-
-	unset MOUNT_OPTIONS
-	unset TEST_FS_MOUNT_OPTS
-	unset MKFS_OPTIONS
-	unset FSCK_OPTIONS
-	unset USE_EXTERNAL
-
-	# We might have deduced SCRATCH_DEV from the SCRATCH_DEV_POOL in the previous
-	# run, so we have to unset it now.
-	if [ "$SCRATCH_DEV_NOT_SET" == "true" ]; then
-		unset SCRATCH_DEV
-	fi
-
-	parse_config_section $1
-	if [ ! -z "$OLD_FSTYP" ] && [ $OLD_FSTYP != $FSTYP ]; then
-		[ -z "$MOUNT_OPTIONS" ] && _mount_opts
-		[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
-		[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
-		[ -z "$FSCK_OPTIONS" ] && _fsck_opts
-
-		# clear the external devices if we are not using them
-		if [ -z "$USE_EXTERNAL" ]; then
-			unset TEST_RTDEV
-			unset TEST_LOGDEV
-			unset SCRATCH_RTDEV
-			unset SCRATCH_LOGDEV
-		fi
-	else
-		[ -z "$MOUNT_OPTIONS" ] && export MOUNT_OPTIONS=$OLD_MOUNT_OPTIONS
-		[ -z "$TEST_FS_MOUNT_OPTS" ] && export TEST_FS_MOUNT_OPTS=$OLD_TEST_FS_MOUNT_OPTS
-		[ -z "$MKFS_OPTIONS" ] && export MKFS_OPTIONS=$OLD_MKFS_OPTIONS
-		[ -z "$FSCK_OPTIONS" ] && export FSCK_OPTIONS=$OLD_FSCK_OPTIONS
-		[ -z "$USE_EXTERNAL" ] && export USE_EXTERNAL=$OLD_USE_EXTERNAL
-	fi
-
-	# set default RESULT_BASE
-	if [ -z "$RESULT_BASE" ]; then
-		export RESULT_BASE="$here/results/"
-	fi
-
-	if [ "$FSTYP" == "tmpfs" ]; then
-		if [ -z "$TEST_DEV" ]; then
-			export TEST_DEV=tmpfs_test
-		fi
-		if [ -z "$SCRATCH_DEV" ]; then
-			export TEST_DEV=tmpfs_scratch
-		fi
-	fi
-
-	#  Mandatory Config values.
-	MC=""
-	[ -z "$EMAIL" ]          && MC="$MC EMAIL"
-	[ -z "$TEST_DIR" ]       && MC="$MC TEST_DIR"
-	[ -z "$TEST_DEV" ]       && MC="$MC TEST_DEV"
-
-	if [ -n "$MC" ]; then
-		echo "Warning: need to define parameters for host $HOST"
-		echo "       or set variables:"
-		echo "       $MC"
-		_exit 1
-	fi
-
-	_check_device TEST_DEV required $TEST_DEV
-	export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
-
-	# a btrfs tester will set only SCRATCH_DEV_POOL, we will put first of its dev
-	# to SCRATCH_DEV and rest to SCRATCH_DEV_POOL to maintain the backward compatibility
-	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
-		if [ ! -z "$SCRATCH_DEV" ]; then
-			echo "common/config: Error: \$SCRATCH_DEV ($SCRATCH_DEV) should be unset when \$SCRATCH_DEV_POOL ($SCRATCH_DEV_POOL) is set"
-			_exit 1
-		fi
-		SCRATCH_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`
-		export SCRATCH_DEV
-		export SCRATCH_DEV_NOT_SET=true
-	fi
-
-	_check_device SCRATCH_DEV optional $SCRATCH_DEV
-	export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
-
-	if [ -n "$USE_EXTERNAL" ]; then
-		_check_device TEST_RTDEV optional $TEST_RTDEV
-		_check_device TEST_LOGDEV optional $TEST_LOGDEV
-		_check_device SCRATCH_RTDEV optional $SCRATCH_RTDEV
-		_check_device SCRATCH_LOGDEV optional $SCRATCH_LOGDEV
-	fi
-
-	# Override FSTYP from config when running ./check -overlay
-	# and maybe override base fs TEST/SCRATCH_DEV with overlay base dirs.
-	# We need to do this *after* default mount options are set by base FSTYP
-	# and *after* SCRATCH_DEV is deduced from SCRATCH_DEV_POOL
-	if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
-		_overlay_config_override
-	fi
-}
-
-if [ -z "$CONFIG_INCLUDED" ]; then
-	get_next_config `echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
-	export CONFIG_INCLUDED=true
-
-	# Autodetect fs type based on what's on $TEST_DEV unless it's been set
-	# externally
-	if [ -z "$FSTYP" ] && [ ! -z "$TEST_DEV" ]; then
-		FSTYP=`blkid -c /dev/null -s TYPE -o value $TEST_DEV`
-	fi
-	FSTYP=${FSTYP:=xfs}
-	export FSTYP
-	[ -z "$MOUNT_OPTIONS" ] && _mount_opts
-	[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
-	[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
-	[ -z "$FSCK_OPTIONS" ] && _fsck_opts
-else
-	# We get here for the non multi section case, on every test that sources
-	# common/rc after re-sourcing the HOST_OPTIONS config file.
-	# Because of this re-sourcing, we need to re-canonicalize the configured
-	# mount points and re-override TEST/SCRATCH_DEV overlay vars.
-
-	# canonicalize the mount points
-	# this follows symlinks and removes all trailing "/"s
-	export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
-	export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
-
-	# Override FSTYP from config when running ./check -overlay
-	# and maybe override base fs TEST/SCRATCH_DEV with overlay base dirs
-	if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
-		_overlay_config_override
-	fi
-fi
-
+_config_section_setup
 _canonicalize_devices
 # mkfs.xfs checks for TEST_DEV before permitting < 300M filesystems. TEST_DIR
 # and QA_CHECK_FS are also checked by mkfs.xfs, but already exported elsewhere.
diff --git a/common/config-sections b/common/config-sections
new file mode 100644
index 000000000..69a03375a
--- /dev/null
+++ b/common/config-sections
@@ -0,0 +1,390 @@
+##/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2000-2003,2006 Silicon Graphics, Inc.  All Rights Reserved.
+# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
+#
+# Config section parsing and setup code
+
+_check_device()
+{
+	local name=$1
+	local dev_needed=$2
+	local dev=$3
+
+	if [ -z "$dev" ]; then
+		if [ "$dev_needed" == "required" ]; then
+			_fatal "common/config: $name is required but not defined!"
+		fi
+		return 0
+	fi
+
+	if [ -b "$dev" ] || ( echo $dev | grep -qE ":|//" ); then
+		# block device or a network url
+		return 0
+	fi
+
+	case "$FSTYP" in
+	9p|fuse|tmpfs|virtiofs|afs)
+		# 9p, fuse, virtiofs and afs mount tags are just plain strings,
+		# so anything is allowed tmpfs doesn't use mount source, ignore
+		;;
+	ceph)
+		# ceph has two different possible syntaxes for mount devices. The
+		# network URL check above catches the legacy syntax. Check for the
+		# new-style syntax here.
+		if ( echo $dev | grep -qEv "=/" ); then
+			_fatal "common/config: $name ($dev) is not a valid ceph mount string"
+		fi
+		;;
+	overlay)
+		if [ ! -d "$dev" ]; then
+			_fatal "common/config: $name ($dev) is not a directory for overlay"
+		fi
+		;;
+	ubifs)
+		if [ ! -c "$dev" ]; then
+			_fatal "common/config: $name ($dev) is not a character device"
+		fi
+		;;
+	ceph-fuse)
+		;;
+	*)
+		_fatal "common/config: $name ($dev) is not a block device or a network filesystem"
+	esac
+}
+
+# check and return a canonical mount point path
+_canonicalize_mountpoint()
+{
+	local name=$1
+	local dir=$2
+
+	if [ -d "$dir" ]; then
+		# this follows symlinks and removes all trailing "/"s
+		readlink -e "$dir"
+		return 0
+	fi
+
+	if [ "$FSTYP" != "overlay" ] || [[ "$name" == OVL_BASE_* ]]; then
+		_fatal "common/config: $name ($dir) is not a directory"
+	fi
+
+	# base fs may not be mounted yet, so just check that parent dir
+	# exists (where base fs will be mounted) because we are going to
+	# mkdir the overlay mount point dir anyway
+	local base=`basename $dir`
+	local parent=`dirname $dir`
+	parent=`_canonicalize_mountpoint OVL_BASE_$name "$parent"`
+
+	# prepend the overlay mount point to canonical parent path
+	echo "$parent/$base"
+}
+
+# Enables usage of /dev/disk/by-id/ symlinks to persist target devices
+# over reboots
+_canonicalize_devices()
+{
+	if [ "$CANON_DEVS" != "yes" ]; then
+		return
+	fi
+	[ -L "$TEST_DEV" ]	&& TEST_DEV=$(readlink -e "$TEST_DEV")
+	[ -L "$SCRATCH_DEV" ]	&& SCRATCH_DEV=$(readlink -e "$SCRATCH_DEV")
+	[ -L "$TEST_LOGDEV" ]	&& TEST_LOGDEV=$(readlink -e "$TEST_LOGDEV")
+	[ -L "$TEST_RTDEV" ]	&& TEST_RTDEV=$(readlink -e "$TEST_RTDEV")
+	[ -L "$SCRATCH_RTDEV" ]	&& SCRATCH_RTDEV=$(readlink -e "$SCRATCH_RTDEV")
+	[ -L "$LOGWRITES_DEV" ]	&& LOGWRITES_DEV=$(readlink -e "$LOGWRITES_DEV")
+	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
+		local NEW_SCRATCH_POOL=""
+		for i in $SCRATCH_DEV_POOL; do
+			if [ -L $i ]; then
+				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $(readlink -e $i)"
+			else
+				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $i"
+			fi
+		done
+		SCRATCH_DEV_POOL="$NEW_SCRATCH_POOL"
+	fi
+}
+
+# On check -overlay, for the non multi section config case, this
+# function is called on every test, before init_rc().
+# When SCRATCH/TEST_* vars are defined in config file, config file
+# is sourced on every test and this function overrides the vars
+# every time.
+# When SCRATCH/TEST_* vars are defined in evironment and not
+# in config file, this function is called after vars have already
+# been overriden in the previous test.
+# In that case, TEST_DEV is a directory and not a blockdev/chardev and
+# the function will return without overriding the SCRATCH/TEST_* vars.
+_overlay_config_override()
+{
+	# There are 2 options for configuring overlayfs tests:
+	#
+	# 1. (legacy) SCRATCH/TEST_DEV point to existing directories
+	#    on an already mounted fs.  In this case, the new
+	#    OVL_BASE_SCRATCH/TEST_* vars are set to use the legacy
+	#    vars values (even though they may not be mount points).
+	#
+	[ ! -d "$TEST_DEV" ] || export OVL_BASE_TEST_DIR="$TEST_DEV"
+	[ ! -d "$SCRATCH_DEV" ] || export OVL_BASE_SCRATCH_MNT="$SCRATCH_DEV"
+
+	# Config file may specify base fs type, but we obay -overlay flag
+	[ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
+	export FSTYP=overlay
+
+	# 2. SCRATCH/TEST_DEV point to the base fs partitions.  In this case,
+	#    the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
+	#    of the configured base fs and SCRATCH/TEST_DEV vars are set to the
+	#    overlayfs base and mount dirs inside base fs mount.
+	[ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
+
+	# Store original base fs vars
+	export OVL_BASE_TEST_DEV="$TEST_DEV"
+	export OVL_BASE_TEST_DIR="$TEST_DIR"
+	# If config does not set MOUNT_OPTIONS, its value may be
+	# leftover from previous _overlay_config_override, so
+	# don't use that value for base fs mount
+	[ "$MOUNT_OPTIONS" != "$OVERLAY_MOUNT_OPTIONS" ] || unset MOUNT_OPTIONS
+	export OVL_BASE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
+
+	# Set TEST vars to overlay base and mount dirs inside base fs
+	export TEST_DEV="$OVL_BASE_TEST_DIR"
+	export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
+	export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
+
+	[ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
+
+	# Store original base fs vars
+	export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
+	export OVL_BASE_SCRATCH_MNT="$SCRATCH_MNT"
+
+	# Set SCRATCH vars to overlay base and mount dirs inside base fs
+	export SCRATCH_DEV="$OVL_BASE_SCRATCH_MNT"
+	export SCRATCH_MNT="$OVL_BASE_SCRATCH_MNT/$OVL_MNT"
+
+	# Set fsck options, use default if user not set directly.
+	export FSCK_OPTIONS="$OVERLAY_FSCK_OPTIONS"
+	[ -z "$FSCK_OPTIONS" ] && _fsck_opts
+	export IDMAPPED_MOUNTS="$IDMAPPED_MOUNTS"
+}
+
+_overlay_config_restore()
+{
+	export OVERLAY=true
+	[ -z "$OVL_BASE_FSTYP" ] || export FSTYP=$OVL_BASE_FSTYP
+	[ -z "$OVL_BASE_TEST_DEV" ] || export TEST_DEV=$OVL_BASE_TEST_DEV
+	[ -z "$OVL_BASE_TEST_DIR" ] || export TEST_DIR=$OVL_BASE_TEST_DIR
+	[ -z "$OVL_BASE_SCRATCH_DEV" ] || export SCRATCH_DEV=$OVL_BASE_SCRATCH_DEV
+	[ -z "$OVL_BASE_SCRATCH_MNT" ] || export SCRATCH_MNT=$OVL_BASE_SCRATCH_MNT
+	[ -z "$OVL_BASE_MOUNT_OPTIONS" ] || export MOUNT_OPTIONS=$OVL_BASE_MOUNT_OPTIONS
+}
+
+# Returns a list of sections in config file
+# Each section starts with the section name in the format
+# [section_name1]. Only alphanumeric characters and '_' is allowed
+# in the section name otherwise the section will not be resognised.
+# Section name must be contained between square brackets.
+get_config_sections() {
+	sed -n -e "s/^\[\([[:alnum:]_-]*\)\]/\1/p" < $1
+}
+
+# Parse config section options. This function will parse all the configuration
+# within a single section which name is passed as an argument. For section
+# name format see comments in get_config_sections().
+# Empty lines and everything after '#' will be ignored.
+# Configuration options should be defined in the format
+#
+# CONFIG_OPTION=value
+#
+# This 'CONFIG_OPTION' variable and will be exported as an environment variable.
+parse_config_section() {
+	SECTION=$1
+	if ! $OPTIONS_HAVE_SECTIONS; then
+		return 0
+	fi
+	eval `sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
+		-e 's/#.*$//' \
+		-e 's/[[:space:]]*$//' \
+		-e 's/^[[:space:]]*//' \
+		-e "s/^\([^=]*\)=\"\?'\?\([^\"']*\)\"\?'\?$/export \1=\"\2\"/" \
+		< $HOST_OPTIONS \
+		| sed -n -e "/^\[$SECTION\]/,/^\s*\[/{/^[^#].*\=.*/p;}"`
+}
+
+get_next_config() {
+	if [ ! -z "$CONFIG_INCLUDED" ] && ! $OPTIONS_HAVE_SECTIONS; then
+		return 0
+	fi
+
+	# We might have overriden FSTYP and TEST/SCRATCH vars with overlay values
+	# in the previous section, so restore them to original values stored in
+	# OVL_BASE_*.
+	# We need to do this *before* old FSTYP and MOUNT_OPTIONS are recorded
+	# and *before* SCRATCH_DEV and MOUNT_OPTIONS are unset
+	if [ "$FSTYP" == "overlay" ]; then
+		_overlay_config_restore
+	fi
+
+	local OLD_FSTYP=$FSTYP
+	local OLD_MOUNT_OPTIONS=$MOUNT_OPTIONS
+	local OLD_TEST_FS_MOUNT_OPTS=$TEST_FS_MOUNT_OPTS
+	local OLD_MKFS_OPTIONS=$MKFS_OPTIONS
+	local OLD_FSCK_OPTIONS=$FSCK_OPTIONS
+	local OLD_USE_EXTERNAL=$USE_EXTERNAL
+
+	unset MOUNT_OPTIONS
+	unset TEST_FS_MOUNT_OPTS
+	unset MKFS_OPTIONS
+	unset FSCK_OPTIONS
+	unset USE_EXTERNAL
+
+	# We might have deduced SCRATCH_DEV from the SCRATCH_DEV_POOL in the previous
+	# run, so we have to unset it now.
+	if [ "$SCRATCH_DEV_NOT_SET" == "true" ]; then
+		unset SCRATCH_DEV
+	fi
+
+	parse_config_section $1
+	if [ ! -z "$OLD_FSTYP" ] && [ $OLD_FSTYP != $FSTYP ]; then
+		[ -z "$MOUNT_OPTIONS" ] && _mount_opts
+		[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
+		[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
+		[ -z "$FSCK_OPTIONS" ] && _fsck_opts
+
+		# clear the external devices if we are not using them
+		if [ -z "$USE_EXTERNAL" ]; then
+			unset TEST_RTDEV
+			unset TEST_LOGDEV
+			unset SCRATCH_RTDEV
+			unset SCRATCH_LOGDEV
+		fi
+	else
+		[ -z "$MOUNT_OPTIONS" ] && export MOUNT_OPTIONS=$OLD_MOUNT_OPTIONS
+		[ -z "$TEST_FS_MOUNT_OPTS" ] && export TEST_FS_MOUNT_OPTS=$OLD_TEST_FS_MOUNT_OPTS
+		[ -z "$MKFS_OPTIONS" ] && export MKFS_OPTIONS=$OLD_MKFS_OPTIONS
+		[ -z "$FSCK_OPTIONS" ] && export FSCK_OPTIONS=$OLD_FSCK_OPTIONS
+		[ -z "$USE_EXTERNAL" ] && export USE_EXTERNAL=$OLD_USE_EXTERNAL
+	fi
+
+	# set default RESULT_BASE
+	if [ -z "$RESULT_BASE" ]; then
+		export RESULT_BASE="$here/results/"
+	fi
+
+	if [ "$FSTYP" == "tmpfs" ]; then
+		if [ -z "$TEST_DEV" ]; then
+			export TEST_DEV=tmpfs_test
+		fi
+		if [ -z "$SCRATCH_DEV" ]; then
+			export TEST_DEV=tmpfs_scratch
+		fi
+	fi
+
+	#  Mandatory Config values.
+	MC=""
+	[ -z "$EMAIL" ]          && MC="$MC EMAIL"
+	[ -z "$TEST_DIR" ]       && MC="$MC TEST_DIR"
+	[ -z "$TEST_DEV" ]       && MC="$MC TEST_DEV"
+
+	if [ -n "$MC" ]; then
+		echo "Warning: need to define parameters for host $HOST"
+		echo "       or set variables:"
+		echo "       $MC"
+		_exit 1
+	fi
+
+	_check_device TEST_DEV required $TEST_DEV
+	export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
+
+	# a btrfs tester will set only SCRATCH_DEV_POOL, we will put first of its dev
+	# to SCRATCH_DEV and rest to SCRATCH_DEV_POOL to maintain the backward compatibility
+	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
+		if [ ! -z "$SCRATCH_DEV" ]; then
+			echo "common/config: Error: \$SCRATCH_DEV ($SCRATCH_DEV) should be unset when \$SCRATCH_DEV_POOL ($SCRATCH_DEV_POOL) is set"
+			_exit 1
+		fi
+		SCRATCH_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`
+		export SCRATCH_DEV
+		export SCRATCH_DEV_NOT_SET=true
+	fi
+
+	_check_device SCRATCH_DEV optional $SCRATCH_DEV
+	export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
+
+	if [ -n "$USE_EXTERNAL" ]; then
+		_check_device TEST_RTDEV optional $TEST_RTDEV
+		_check_device TEST_LOGDEV optional $TEST_LOGDEV
+		_check_device SCRATCH_RTDEV optional $SCRATCH_RTDEV
+		_check_device SCRATCH_LOGDEV optional $SCRATCH_LOGDEV
+	fi
+
+	# Override FSTYP from config when running ./check -overlay
+	# and maybe override base fs TEST/SCRATCH_DEV with overlay base dirs.
+	# We need to do this *after* default mount options are set by base FSTYP
+	# and *after* SCRATCH_DEV is deduced from SCRATCH_DEV_POOL
+	if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
+		_overlay_config_override
+	fi
+}
+
+known_hosts()
+{
+	[ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs
+
+	[ -f /etc/xfsqa.config ]             && export HOST_OPTIONS=/etc/xfsqa.config
+	[ -f $HOST_CONFIG_DIR/$HOST ]        && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST
+	[ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
+}
+
+_config_section_setup()
+{
+	if [ ! -f "$HOST_OPTIONS" ]; then
+		known_hosts
+	fi
+
+	export HOST_OPTIONS_SECTIONS="-no-sections-"
+	export OPTIONS_HAVE_SECTIONS=false
+	if [ -f "$HOST_OPTIONS" ]; then
+		export HOST_OPTIONS_SECTIONS=`get_config_sections $HOST_OPTIONS`
+		if [ -z "$HOST_OPTIONS_SECTIONS" ]; then
+			. $HOST_OPTIONS
+			export HOST_OPTIONS_SECTIONS="-no-sections-"
+		else
+			export OPTIONS_HAVE_SECTIONS=true
+		fi
+	fi
+
+	if [ -z "$CONFIG_INCLUDED" ]; then
+		get_next_config `echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
+		export CONFIG_INCLUDED=true
+
+		# Autodetect fs type based on what's on $TEST_DEV unless it's
+		# been set externally
+		if [ -z "$FSTYP" ] && [ ! -z "$TEST_DEV" ]; then
+			FSTYP=`blkid -c /dev/null -s TYPE -o value $TEST_DEV`
+		fi
+		FSTYP=${FSTYP:=xfs}
+		export FSTYP
+		[ -z "$MOUNT_OPTIONS" ] && _mount_opts
+		[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
+		[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
+		[ -z "$FSCK_OPTIONS" ] && _fsck_opts
+	else
+		# We get here for the non multi section case, on every test that
+		# sources common/rc after re-sourcing the HOST_OPTIONS config
+		# file.  Because of this re-sourcing, we need to re-canonicalize
+		# the configured mount points and re-override TEST/SCRATCH_DEV
+		# overlay vars.
+
+		# canonicalize the mount points
+		# this follows symlinks and removes all trailing "/"s
+		export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
+		export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
+
+		# Override FSTYP from config when running ./check -overlay and
+		# maybe override base fs TEST/SCRATCH_DEV with overlay base dirs
+		if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
+			_overlay_config_override
+		fi
+	fi
+}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/28] config: move config section code to it's own file
  2025-04-17  3:00 ` [PATCH 12/28] config: move config section code to it's own file Dave Chinner
@ 2025-05-09  6:09   ` Nirjhar Roy
  2025-05-21 11:28     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy @ 2025-05-09  6:09 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang, ritesh.list

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Move the config section parsing, checking and setup code from
> common/config to common/config-section so that it can be included
> directly in contexts where the rest of common/config is not needed.
This looks okay to me. Just a couple of nits comments below.

Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

Ritesh - this patch addresses one of feedbacks[1] on one of my cleanup
patches.
[1] https://lore.kernel.org/all/87r028vamn.fsf@gmail.com/
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  common/config          | 382 +---------------------------------------
>  common/config-sections | 390 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 392 insertions(+), 380 deletions(-)
>  create mode 100644 common/config-sections
> 
> diff --git a/common/config b/common/config
> index 5081c300a..f90a66862 100644
> --- a/common/config
> +++ b/common/config
> @@ -41,6 +41,7 @@
>  
>  . common/test_names
>  . common/exit
> +. common/config-sections
>  
>  # all tests should use a common language setting to prevent golden
>  # output mismatches.
> @@ -544,386 +545,7 @@ _source_specific_fs()
>  	esac
>  }
>  
> -known_hosts()
> -{
> -	[ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs
> -
> -	[ -f /etc/xfsqa.config ]             && export HOST_OPTIONS=/etc/xfsqa.config
> -	[ -f $HOST_CONFIG_DIR/$HOST ]        && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST
> -	[ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
> -}
> -
> -# Returns a list of sections in config file
> -# Each section starts with the section name in the format
> -# [section_name1]. Only alphanumeric characters and '_' is allowed
> -# in the section name otherwise the section will not be resognised.
> -# Section name must be contained between square brackets.
> -get_config_sections() {
> -	sed -n -e "s/^\[\([[:alnum:]_-]*\)\]/\1/p" < $1
> -}
> -
> -if [ ! -f "$HOST_OPTIONS" ]; then
> -	known_hosts
> -fi
> -
> -export HOST_OPTIONS_SECTIONS="-no-sections-"
> -export OPTIONS_HAVE_SECTIONS=false
> -if [ -f "$HOST_OPTIONS" ]; then
> -	export HOST_OPTIONS_SECTIONS=`get_config_sections $HOST_OPTIONS`
> -	if [ -z "$HOST_OPTIONS_SECTIONS" ]; then
> -		. $HOST_OPTIONS
> -		export HOST_OPTIONS_SECTIONS="-no-sections-"
> -	else
> -		export OPTIONS_HAVE_SECTIONS=true
> -	fi
> -fi
> -
> -_check_device()
> -{
> -	local name=$1
> -	local dev_needed=$2
> -	local dev=$3
> -
> -	if [ -z "$dev" ]; then
> -		if [ "$dev_needed" == "required" ]; then
> -			_fatal "common/config: $name is required but not defined!"
> -		fi
> -		return 0
> -	fi
> -
> -	if [ -b "$dev" ] || ( echo $dev | grep -qE ":|//" ); then
> -		# block device or a network url
> -		return 0
> -	fi
> -
> -	case "$FSTYP" in
> -	9p|fuse|tmpfs|virtiofs|afs)
> -		# 9p, fuse, virtiofs and afs mount tags are just plain strings,
> -		# so anything is allowed tmpfs doesn't use mount source, ignore
> -		;;
> -	ceph)
> -		# ceph has two different possible syntaxes for mount devices. The
> -		# network URL check above catches the legacy syntax. Check for the
> -		# new-style syntax here.
> -		if ( echo $dev | grep -qEv "=/" ); then
> -			_fatal "common/config: $name ($dev) is not a valid ceph mount string"
> -		fi
> -		;;
> -	overlay)
> -		if [ ! -d "$dev" ]; then
> -			_fatal "common/config: $name ($dev) is not a directory for overlay"
> -		fi
> -		;;
> -	ubifs)
> -		if [ ! -c "$dev" ]; then
> -			_fatal "common/config: $name ($dev) is not a character device"
> -		fi
> -		;;
> -	ceph-fuse)
> -		;;
> -	*)
> -		_fatal "common/config: $name ($dev) is not a block device or a network filesystem"
> -	esac
> -}
> -
> -# check and return a canonical mount point path
> -_canonicalize_mountpoint()
> -{
> -	local name=$1
> -	local dir=$2
> -
> -	if [ -d "$dir" ]; then
> -		# this follows symlinks and removes all trailing "/"s
> -		readlink -e "$dir"
> -		return 0
> -	fi
> -
> -	if [ "$FSTYP" != "overlay" ] || [[ "$name" == OVL_BASE_* ]]; then
> -		_fatal "common/config: $name ($dir) is not a directory"
> -	fi
> -
> -	# base fs may not be mounted yet, so just check that parent dir
> -	# exists (where base fs will be mounted) because we are going to
> -	# mkdir the overlay mount point dir anyway
> -	local base=`basename $dir`
> -	local parent=`dirname $dir`
> -	parent=`_canonicalize_mountpoint OVL_BASE_$name "$parent"`
> -
> -	# prepend the overlay mount point to canonical parent path
> -	echo "$parent/$base"
> -}
> -
> -# Enables usage of /dev/disk/by-id/ symlinks to persist target devices
> -# over reboots
> -_canonicalize_devices()
> -{
> -	if [ "$CANON_DEVS" != "yes" ]; then
> -		return
> -	fi
> -	[ -L "$TEST_DEV" ]	&& TEST_DEV=$(readlink -e "$TEST_DEV")
> -	[ -L "$SCRATCH_DEV" ]	&& SCRATCH_DEV=$(readlink -e "$SCRATCH_DEV")
> -	[ -L "$TEST_LOGDEV" ]	&& TEST_LOGDEV=$(readlink -e "$TEST_LOGDEV")
> -	[ -L "$TEST_RTDEV" ]	&& TEST_RTDEV=$(readlink -e "$TEST_RTDEV")
> -	[ -L "$SCRATCH_RTDEV" ]	&& SCRATCH_RTDEV=$(readlink -e "$SCRATCH_RTDEV")
> -	[ -L "$LOGWRITES_DEV" ]	&& LOGWRITES_DEV=$(readlink -e "$LOGWRITES_DEV")
> -	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
> -		local NEW_SCRATCH_POOL=""
> -		for i in $SCRATCH_DEV_POOL; do
> -			if [ -L $i ]; then
> -				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $(readlink -e $i)"
> -			else
> -				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $i"
> -			fi
> -		done
> -		SCRATCH_DEV_POOL="$NEW_SCRATCH_POOL"
> -	fi
> -}
> -
> -# On check -overlay, for the non multi section config case, this
> -# function is called on every test, before init_rc().
> -# When SCRATCH/TEST_* vars are defined in config file, config file
> -# is sourced on every test and this function overrides the vars
> -# every time.
> -# When SCRATCH/TEST_* vars are defined in evironment and not
> -# in config file, this function is called after vars have already
> -# been overriden in the previous test.
> -# In that case, TEST_DEV is a directory and not a blockdev/chardev and
> -# the function will return without overriding the SCRATCH/TEST_* vars.
> -_overlay_config_override()
> -{
> -	# There are 2 options for configuring overlayfs tests:
> -	#
> -	# 1. (legacy) SCRATCH/TEST_DEV point to existing directories
> -	#    on an already mounted fs.  In this case, the new
> -	#    OVL_BASE_SCRATCH/TEST_* vars are set to use the legacy
> -	#    vars values (even though they may not be mount points).
> -	#
> -	[ ! -d "$TEST_DEV" ] || export OVL_BASE_TEST_DIR="$TEST_DEV"
> -	[ ! -d "$SCRATCH_DEV" ] || export OVL_BASE_SCRATCH_MNT="$SCRATCH_DEV"
> -
> -	# Config file may specify base fs type, but we obay -overlay flag
> -	[ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
> -	export FSTYP=overlay
> -
> -	# 2. SCRATCH/TEST_DEV point to the base fs partitions.  In this case,
> -	#    the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
> -	#    of the configured base fs and SCRATCH/TEST_DEV vars are set to the
> -	#    overlayfs base and mount dirs inside base fs mount.
> -	[ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
> -
> -	# Store original base fs vars
> -	export OVL_BASE_TEST_DEV="$TEST_DEV"
> -	export OVL_BASE_TEST_DIR="$TEST_DIR"
> -	# If config does not set MOUNT_OPTIONS, its value may be
> -	# leftover from previous _overlay_config_override, so
> -	# don't use that value for base fs mount
> -	[ "$MOUNT_OPTIONS" != "$OVERLAY_MOUNT_OPTIONS" ] || unset MOUNT_OPTIONS
> -	export OVL_BASE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
> -
> -	# Set TEST vars to overlay base and mount dirs inside base fs
> -	export TEST_DEV="$OVL_BASE_TEST_DIR"
> -	export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
> -	export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
> -
> -	[ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
> -
> -	# Store original base fs vars
> -	export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
> -	export OVL_BASE_SCRATCH_MNT="$SCRATCH_MNT"
> -
> -	# Set SCRATCH vars to overlay base and mount dirs inside base fs
> -	export SCRATCH_DEV="$OVL_BASE_SCRATCH_MNT"
> -	export SCRATCH_MNT="$OVL_BASE_SCRATCH_MNT/$OVL_MNT"
> -
> -	# Set fsck options, use default if user not set directly.
> -	export FSCK_OPTIONS="$OVERLAY_FSCK_OPTIONS"
> -	[ -z "$FSCK_OPTIONS" ] && _fsck_opts
> -	export IDMAPPED_MOUNTS="$IDMAPPED_MOUNTS"
> -}
> -
> -_overlay_config_restore()
> -{
> -	export OVERLAY=true
> -	[ -z "$OVL_BASE_FSTYP" ] || export FSTYP=$OVL_BASE_FSTYP
> -	[ -z "$OVL_BASE_TEST_DEV" ] || export TEST_DEV=$OVL_BASE_TEST_DEV
> -	[ -z "$OVL_BASE_TEST_DIR" ] || export TEST_DIR=$OVL_BASE_TEST_DIR
> -	[ -z "$OVL_BASE_SCRATCH_DEV" ] || export SCRATCH_DEV=$OVL_BASE_SCRATCH_DEV
> -	[ -z "$OVL_BASE_SCRATCH_MNT" ] || export SCRATCH_MNT=$OVL_BASE_SCRATCH_MNT
> -	[ -z "$OVL_BASE_MOUNT_OPTIONS" ] || export MOUNT_OPTIONS=$OVL_BASE_MOUNT_OPTIONS
> -}
> -
> -# Parse config section options. This function will parse all the configuration
> -# within a single section which name is passed as an argument. For section
> -# name format see comments in get_config_sections().
> -# Empty lines and everything after '#' will be ignored.
> -# Configuration options should be defined in the format
> -#
> -# CONFIG_OPTION=value
> -#
> -# This 'CONFIG_OPTION' variable and will be exported as an environment variable.
> -parse_config_section() {
> -	SECTION=$1
> -	if ! $OPTIONS_HAVE_SECTIONS; then
> -		return 0
> -	fi
> -	eval `sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
> -		-e 's/#.*$//' \
> -		-e 's/[[:space:]]*$//' \
> -		-e 's/^[[:space:]]*//' \
> -		-e "s/^\([^=]*\)=\"\?'\?\([^\"']*\)\"\?'\?$/export \1=\"\2\"/" \
> -		< $HOST_OPTIONS \
> -		| sed -n -e "/^\[$SECTION\]/,/^\s*\[/{/^[^#].*\=.*/p;}"`
> -}
> -
> -get_next_config() {
> -	if [ ! -z "$CONFIG_INCLUDED" ] && ! $OPTIONS_HAVE_SECTIONS; then
> -		return 0
> -	fi
> -
> -	# We might have overriden FSTYP and TEST/SCRATCH vars with overlay values
> -	# in the previous section, so restore them to original values stored in
> -	# OVL_BASE_*.
> -	# We need to do this *before* old FSTYP and MOUNT_OPTIONS are recorded
> -	# and *before* SCRATCH_DEV and MOUNT_OPTIONS are unset
> -	if [ "$FSTYP" == "overlay" ]; then
> -		_overlay_config_restore
> -	fi
> -
> -	local OLD_FSTYP=$FSTYP
> -	local OLD_MOUNT_OPTIONS=$MOUNT_OPTIONS
> -	local OLD_TEST_FS_MOUNT_OPTS=$TEST_FS_MOUNT_OPTS
> -	local OLD_MKFS_OPTIONS=$MKFS_OPTIONS
> -	local OLD_FSCK_OPTIONS=$FSCK_OPTIONS
> -	local OLD_USE_EXTERNAL=$USE_EXTERNAL
> -
> -	unset MOUNT_OPTIONS
> -	unset TEST_FS_MOUNT_OPTS
> -	unset MKFS_OPTIONS
> -	unset FSCK_OPTIONS
> -	unset USE_EXTERNAL
> -
> -	# We might have deduced SCRATCH_DEV from the SCRATCH_DEV_POOL in the previous
> -	# run, so we have to unset it now.
> -	if [ "$SCRATCH_DEV_NOT_SET" == "true" ]; then
> -		unset SCRATCH_DEV
> -	fi
> -
> -	parse_config_section $1
> -	if [ ! -z "$OLD_FSTYP" ] && [ $OLD_FSTYP != $FSTYP ]; then
> -		[ -z "$MOUNT_OPTIONS" ] && _mount_opts
> -		[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
> -		[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
> -		[ -z "$FSCK_OPTIONS" ] && _fsck_opts
> -
> -		# clear the external devices if we are not using them
> -		if [ -z "$USE_EXTERNAL" ]; then
> -			unset TEST_RTDEV
> -			unset TEST_LOGDEV
> -			unset SCRATCH_RTDEV
> -			unset SCRATCH_LOGDEV
> -		fi
> -	else
> -		[ -z "$MOUNT_OPTIONS" ] && export MOUNT_OPTIONS=$OLD_MOUNT_OPTIONS
> -		[ -z "$TEST_FS_MOUNT_OPTS" ] && export TEST_FS_MOUNT_OPTS=$OLD_TEST_FS_MOUNT_OPTS
> -		[ -z "$MKFS_OPTIONS" ] && export MKFS_OPTIONS=$OLD_MKFS_OPTIONS
> -		[ -z "$FSCK_OPTIONS" ] && export FSCK_OPTIONS=$OLD_FSCK_OPTIONS
> -		[ -z "$USE_EXTERNAL" ] && export USE_EXTERNAL=$OLD_USE_EXTERNAL
> -	fi
> -
> -	# set default RESULT_BASE
> -	if [ -z "$RESULT_BASE" ]; then
> -		export RESULT_BASE="$here/results/"
> -	fi
> -
> -	if [ "$FSTYP" == "tmpfs" ]; then
> -		if [ -z "$TEST_DEV" ]; then
> -			export TEST_DEV=tmpfs_test
> -		fi
> -		if [ -z "$SCRATCH_DEV" ]; then
> -			export TEST_DEV=tmpfs_scratch
> -		fi
> -	fi
> -
> -	#  Mandatory Config values.
> -	MC=""
> -	[ -z "$EMAIL" ]          && MC="$MC EMAIL"
> -	[ -z "$TEST_DIR" ]       && MC="$MC TEST_DIR"
> -	[ -z "$TEST_DEV" ]       && MC="$MC TEST_DEV"
> -
> -	if [ -n "$MC" ]; then
> -		echo "Warning: need to define parameters for host $HOST"
> -		echo "       or set variables:"
> -		echo "       $MC"
> -		_exit 1
> -	fi
> -
> -	_check_device TEST_DEV required $TEST_DEV
> -	export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
> -
> -	# a btrfs tester will set only SCRATCH_DEV_POOL, we will put first of its dev
> -	# to SCRATCH_DEV and rest to SCRATCH_DEV_POOL to maintain the backward compatibility
> -	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
> -		if [ ! -z "$SCRATCH_DEV" ]; then
> -			echo "common/config: Error: \$SCRATCH_DEV ($SCRATCH_DEV) should be unset when \$SCRATCH_DEV_POOL ($SCRATCH_DEV_POOL) is set"
> -			_exit 1
> -		fi
> -		SCRATCH_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`
> -		export SCRATCH_DEV
> -		export SCRATCH_DEV_NOT_SET=true
> -	fi
> -
> -	_check_device SCRATCH_DEV optional $SCRATCH_DEV
> -	export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
> -
> -	if [ -n "$USE_EXTERNAL" ]; then
> -		_check_device TEST_RTDEV optional $TEST_RTDEV
> -		_check_device TEST_LOGDEV optional $TEST_LOGDEV
> -		_check_device SCRATCH_RTDEV optional $SCRATCH_RTDEV
> -		_check_device SCRATCH_LOGDEV optional $SCRATCH_LOGDEV
> -	fi
> -
> -	# Override FSTYP from config when running ./check -overlay
> -	# and maybe override base fs TEST/SCRATCH_DEV with overlay base dirs.
> -	# We need to do this *after* default mount options are set by base FSTYP
> -	# and *after* SCRATCH_DEV is deduced from SCRATCH_DEV_POOL
> -	if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
> -		_overlay_config_override
> -	fi
> -}
> -
> -if [ -z "$CONFIG_INCLUDED" ]; then
> -	get_next_config `echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
> -	export CONFIG_INCLUDED=true
> -
> -	# Autodetect fs type based on what's on $TEST_DEV unless it's been set
> -	# externally
> -	if [ -z "$FSTYP" ] && [ ! -z "$TEST_DEV" ]; then
> -		FSTYP=`blkid -c /dev/null -s TYPE -o value $TEST_DEV`
> -	fi
> -	FSTYP=${FSTYP:=xfs}
> -	export FSTYP
> -	[ -z "$MOUNT_OPTIONS" ] && _mount_opts
> -	[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
> -	[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
> -	[ -z "$FSCK_OPTIONS" ] && _fsck_opts
> -else
> -	# We get here for the non multi section case, on every test that sources
> -	# common/rc after re-sourcing the HOST_OPTIONS config file.
> -	# Because of this re-sourcing, we need to re-canonicalize the configured
> -	# mount points and re-override TEST/SCRATCH_DEV overlay vars.
> -
> -	# canonicalize the mount points
> -	# this follows symlinks and removes all trailing "/"s
> -	export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
> -	export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
> -
> -	# Override FSTYP from config when running ./check -overlay
> -	# and maybe override base fs TEST/SCRATCH_DEV with overlay base dirs
> -	if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
> -		_overlay_config_override
> -	fi
> -fi
> -
> +_config_section_setup
>  _canonicalize_devices
>  # mkfs.xfs checks for TEST_DEV before permitting < 300M filesystems. TEST_DIR
>  # and QA_CHECK_FS are also checked by mkfs.xfs, but already exported elsewhere.
> diff --git a/common/config-sections b/common/config-sections
> new file mode 100644
> index 000000000..69a03375a
> --- /dev/null
> +++ b/common/config-sections
> @@ -0,0 +1,390 @@
> +##/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2000-2003,2006 Silicon Graphics, Inc.  All Rights Reserved.
> +# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
> +#
> +# Config section parsing and setup code
Since we want to facilitate sourcing of this file independently, should
we at least mention in the comments some of the dependant files this
file needs to be sourced? For example, this file uses _exit() - so
usage of this file requires common/exit to be sourced?
> +
> +_check_device()
> +{
> +	local name=$1
> +	local dev_needed=$2
> +	local dev=$3
> +
> +	if [ -z "$dev" ]; then
> +		if [ "$dev_needed" == "required" ]; then
> +			_fatal "common/config: $name is required but not defined!"
> +		fi
> +		return 0
> +	fi
> +
> +	if [ -b "$dev" ] || ( echo $dev | grep -qE ":|//" ); then
> +		# block device or a network url
> +		return 0
> +	fi
> +
> +	case "$FSTYP" in
> +	9p|fuse|tmpfs|virtiofs|afs)
> +		# 9p, fuse, virtiofs and afs mount tags are just plain strings,
> +		# so anything is allowed tmpfs doesn't use mount source, ignore
> +		;;
> +	ceph)
> +		# ceph has two different possible syntaxes for mount devices. The
> +		# network URL check above catches the legacy syntax. Check for the
> +		# new-style syntax here.
> +		if ( echo $dev | grep -qEv "=/" ); then
> +			_fatal "common/config: $name ($dev) is not a valid ceph mount string"
> +		fi
> +		;;
> +	overlay)
> +		if [ ! -d "$dev" ]; then
> +			_fatal "common/config: $name ($dev) is not a directory for overlay"
> +		fi
> +		;;
> +	ubifs)
> +		if [ ! -c "$dev" ]; then
> +			_fatal "common/config: $name ($dev) is not a character device"
> +		fi
> +		;;
> +	ceph-fuse)
> +		;;
> +	*)
> +		_fatal "common/config: $name ($dev) is not a block device or a network filesystem"
Nit: 80 chars limit exceed.
> +	esac
> +}
> +
> +# check and return a canonical mount point path
> +_canonicalize_mountpoint()
> +{
> +	local name=$1
> +	local dir=$2
> +
> +	if [ -d "$dir" ]; then
> +		# this follows symlinks and removes all trailing "/"s
> +		readlink -e "$dir"
> +		return 0
> +	fi
> +
> +	if [ "$FSTYP" != "overlay" ] || [[ "$name" == OVL_BASE_* ]]; then
> +		_fatal "common/config: $name ($dir) is not a directory"
> +	fi
> +
> +	# base fs may not be mounted yet, so just check that parent dir
> +	# exists (where base fs will be mounted) because we are going to
> +	# mkdir the overlay mount point dir anyway
> +	local base=`basename $dir`
> +	local parent=`dirname $dir`
> +	parent=`_canonicalize_mountpoint OVL_BASE_$name "$parent"`
> +
> +	# prepend the overlay mount point to canonical parent path
> +	echo "$parent/$base"
> +}
> +
> +# Enables usage of /dev/disk/by-id/ symlinks to persist target devices
> +# over reboots
> +_canonicalize_devices()
> +{
> +	if [ "$CANON_DEVS" != "yes" ]; then
> +		return
> +	fi
> +	[ -L "$TEST_DEV" ]	&& TEST_DEV=$(readlink -e "$TEST_DEV")
> +	[ -L "$SCRATCH_DEV" ]	&& SCRATCH_DEV=$(readlink -e "$SCRATCH_DEV")
> +	[ -L "$TEST_LOGDEV" ]	&& TEST_LOGDEV=$(readlink -e "$TEST_LOGDEV")
> +	[ -L "$TEST_RTDEV" ]	&& TEST_RTDEV=$(readlink -e "$TEST_RTDEV")
> +	[ -L "$SCRATCH_RTDEV" ]	&& SCRATCH_RTDEV=$(readlink -e "$SCRATCH_RTDEV")
> +	[ -L "$LOGWRITES_DEV" ]	&& LOGWRITES_DEV=$(readlink -e "$LOGWRITES_DEV")
> +	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
> +		local NEW_SCRATCH_POOL=""
> +		for i in $SCRATCH_DEV_POOL; do
> +			if [ -L $i ]; then
> +				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $(readlink -e $i)"
> +			else
> +				NEW_SCRATCH_POOL="$NEW_SCRATCH_POOL $i"
> +			fi
> +		done
> +		SCRATCH_DEV_POOL="$NEW_SCRATCH_POOL"
> +	fi
> +}
> +
> +# On check -overlay, for the non multi section config case, this
> +# function is called on every test, before init_rc().
> +# When SCRATCH/TEST_* vars are defined in config file, config file
> +# is sourced on every test and this function overrides the vars
> +# every time.
> +# When SCRATCH/TEST_* vars are defined in evironment and not
> +# in config file, this function is called after vars have already
> +# been overriden in the previous test.
> +# In that case, TEST_DEV is a directory and not a blockdev/chardev and
> +# the function will return without overriding the SCRATCH/TEST_* vars.
> +_overlay_config_override()
> +{
> +	# There are 2 options for configuring overlayfs tests:
> +	#
> +	# 1. (legacy) SCRATCH/TEST_DEV point to existing directories
> +	#    on an already mounted fs.  In this case, the new
> +	#    OVL_BASE_SCRATCH/TEST_* vars are set to use the legacy
> +	#    vars values (even though they may not be mount points).
> +	#
> +	[ ! -d "$TEST_DEV" ] || export OVL_BASE_TEST_DIR="$TEST_DEV"
> +	[ ! -d "$SCRATCH_DEV" ] || export OVL_BASE_SCRATCH_MNT="$SCRATCH_DEV"
> +
> +	# Config file may specify base fs type, but we obay -overlay flag
> +	[ "$FSTYP" == overlay ] || export OVL_BASE_FSTYP="$FSTYP"
> +	export FSTYP=overlay
> +
> +	# 2. SCRATCH/TEST_DEV point to the base fs partitions.  In this case,
> +	#    the new OVL_BASE_SCRATCH/TEST_DEV/MNT vars are set to the values
> +	#    of the configured base fs and SCRATCH/TEST_DEV vars are set to the
> +	#    overlayfs base and mount dirs inside base fs mount.
> +	[ -b "$TEST_DEV" ] || [ -c "$TEST_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
Nit: 80 chars limit exceeded
> +
> +	# Store original base fs vars
> +	export OVL_BASE_TEST_DEV="$TEST_DEV"
> +	export OVL_BASE_TEST_DIR="$TEST_DIR"
> +	# If config does not set MOUNT_OPTIONS, its value may be
> +	# leftover from previous _overlay_config_override, so
> +	# don't use that value for base fs mount
> +	[ "$MOUNT_OPTIONS" != "$OVERLAY_MOUNT_OPTIONS" ] || unset MOUNT_OPTIONS
> +	export OVL_BASE_MOUNT_OPTIONS="$MOUNT_OPTIONS"
> +
> +	# Set TEST vars to overlay base and mount dirs inside base fs
> +	export TEST_DEV="$OVL_BASE_TEST_DIR"
> +	export TEST_DIR="$OVL_BASE_TEST_DIR/$OVL_MNT"
> +	export MOUNT_OPTIONS="$OVERLAY_MOUNT_OPTIONS"
> +
> +
> 	[ -b "$SCRATCH_DEV" ] || [ -c "$SCRATCH_DEV" ] || [ "$OVL_BASE_FSTYP" == tmpfs ] || return 0
Nit: 80 chars limit exceeded
> +
> +	# Store original base fs vars
> +	export OVL_BASE_SCRATCH_DEV="$SCRATCH_DEV"
> +	export OVL_BASE_SCRATCH_MNT="$SCRATCH_MNT"
> +
> +	# Set SCRATCH vars to overlay base and mount dirs inside base fs
> +	export SCRATCH_DEV="$OVL_BASE_SCRATCH_MNT"
> +	export SCRATCH_MNT="$OVL_BASE_SCRATCH_MNT/$OVL_MNT"
> +
> +	# Set fsck options, use default if user not set directly.
> +	export FSCK_OPTIONS="$OVERLAY_FSCK_OPTIONS"
> +	[ -z "$FSCK_OPTIONS" ] && _fsck_opts
> +	export IDMAPPED_MOUNTS="$IDMAPPED_MOUNTS"
> +}
> +
> +_overlay_config_restore()
> +{
> +	export OVERLAY=true
> +	[ -z "$OVL_BASE_FSTYP" ] || export FSTYP=$OVL_BASE_FSTYP
> +	[ -z "$OVL_BASE_TEST_DEV" ] || export TEST_DEV=$OVL_BASE_TEST_DEV
> +	[ -z "$OVL_BASE_TEST_DIR" ] || export TEST_DIR=$OVL_BASE_TEST_DIR
> +	[ -z "$OVL_BASE_SCRATCH_DEV" ] || export SCRATCH_DEV=$OVL_BASE_SCRATCH_DEV
> +	[ -z "$OVL_BASE_SCRATCH_MNT" ] || export SCRATCH_MNT=$OVL_BASE_SCRATCH_MNT
> +	[ -z "$OVL_BASE_MOUNT_OPTIONS" ] || export MOUNT_OPTIONS=$OVL_BASE_MOUNT_OPTIONS
> +}
> +
> +# Returns a list of sections in config file
> +# Each section starts with the section name in the format
> +# [section_name1]. Only alphanumeric characters and '_' is allowed
> +# in the section name otherwise the section will not be resognised.
> +# Section name must be contained between square brackets.
> +get_config_sections() {
> +	sed -n -e "s/^\[\([[:alnum:]_-]*\)\]/\1/p" < $1
> +}
> +
> +# Parse config section options. This function will parse all the configuration
> +# within a single section which name is passed as an argument. For section
> +# name format see comments in get_config_sections().
> +# Empty lines and everything after '#' will be ignored.
> +# Configuration options should be defined in the format
> +#
> +# CONFIG_OPTION=value
> +#
> +# This 'CONFIG_OPTION' variable and will be exported as an environment variable.
> +parse_config_section() {
> +	SECTION=$1
> +	if ! $OPTIONS_HAVE_SECTIONS; then
> +		return 0
> +	fi
> +	eval `sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
> +		-e 's/#.*$//' \
> +		-e 's/[[:space:]]*$//' \
> +		-e 's/^[[:space:]]*//' \
> +		-e "s/^\([^=]*\)=\"\?'\?\([^\"']*\)\"\?'\?$/export \1=\"\2\"/" \
> +		< $HOST_OPTIONS \
> +		| sed -n -e "/^\[$SECTION\]/,/^\s*\[/{/^[^#].*\=.*/p;}"`
> +}
> +
> +get_next_config() {
> +	if [ ! -z "$CONFIG_INCLUDED" ] && ! $OPTIONS_HAVE_SECTIONS; then
> +		return 0
> +	fi
> +
> +	# We might have overriden FSTYP and TEST/SCRATCH vars with overlay values
> +	# in the previous section, so restore them to original values stored in
> +	# OVL_BASE_*.
> +	# We need to do this *before* old FSTYP and MOUNT_OPTIONS are recorded
> +	# and *before* SCRATCH_DEV and MOUNT_OPTIONS are unset
> +	if [ "$FSTYP" == "overlay" ]; then
> +		_overlay_config_restore
> +	fi
> +
> +	local OLD_FSTYP=$FSTYP
> +	local OLD_MOUNT_OPTIONS=$MOUNT_OPTIONS
> +	local OLD_TEST_FS_MOUNT_OPTS=$TEST_FS_MOUNT_OPTS
> +	local OLD_MKFS_OPTIONS=$MKFS_OPTIONS
> +	local OLD_FSCK_OPTIONS=$FSCK_OPTIONS
> +	local OLD_USE_EXTERNAL=$USE_EXTERNAL
> +
> +	unset MOUNT_OPTIONS
> +	unset TEST_FS_MOUNT_OPTS
> +	unset MKFS_OPTIONS
> +	unset FSCK_OPTIONS
> +	unset USE_EXTERNAL
> +
> +	# We might have deduced SCRATCH_DEV from the SCRATCH_DEV_POOL in the previous
> +	# run, so we have to unset it now.
> +	if [ "$SCRATCH_DEV_NOT_SET" == "true" ]; then
> +		unset SCRATCH_DEV
> +	fi
> +
> +	parse_config_section $1
> +	if [ ! -z "$OLD_FSTYP" ] && [ $OLD_FSTYP != $FSTYP ]; then
> +		[ -z "$MOUNT_OPTIONS" ] && _mount_opts
> +		[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
> +		[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
> +		[ -z "$FSCK_OPTIONS" ] && _fsck_opts
> +
> +		# clear the external devices if we are not using them
> +		if [ -z "$USE_EXTERNAL" ]; then
> +			unset TEST_RTDEV
> +			unset TEST_LOGDEV
> +			unset SCRATCH_RTDEV
> +			unset SCRATCH_LOGDEV
> +		fi
> +	else
> +		[ -z "$MOUNT_OPTIONS" ] && export MOUNT_OPTIONS=$OLD_MOUNT_OPTIONS
> +		[ -z "$TEST_FS_MOUNT_OPTS" ] && export TEST_FS_MOUNT_OPTS=$OLD_TEST_FS_MOUNT_OPTS
> +		[ -z "$MKFS_OPTIONS" ] && export MKFS_OPTIONS=$OLD_MKFS_OPTIONS
> +		[ -z "$FSCK_OPTIONS" ] && export FSCK_OPTIONS=$OLD_FSCK_OPTIONS
> +		[ -z "$USE_EXTERNAL" ] && export USE_EXTERNAL=$OLD_USE_EXTERNAL
> +	fi
> +
> +	# set default RESULT_BASE
> +	if [ -z "$RESULT_BASE" ]; then
> +		export RESULT_BASE="$here/results/"
> +	fi
> +
> +	if [ "$FSTYP" == "tmpfs" ]; then
> +		if [ -z "$TEST_DEV" ]; then
> +			export TEST_DEV=tmpfs_test
> +		fi
> +		if [ -z "$SCRATCH_DEV" ]; then
> +			export TEST_DEV=tmpfs_scratch
> +		fi
> +	fi
> +
> +	#  Mandatory Config values.
> +	MC=""
> +	[ -z "$EMAIL" ]          && MC="$MC EMAIL"
> +	[ -z "$TEST_DIR" ]       && MC="$MC TEST_DIR"
> +	[ -z "$TEST_DEV" ]       && MC="$MC TEST_DEV"
> +
> +	if [ -n "$MC" ]; then
> +		echo "Warning: need to define parameters for host $HOST"
> +		echo "       or set variables:"
> +		echo "       $MC"
> +		_exit 1
> +	fi
> +
> +	_check_device TEST_DEV required $TEST_DEV
> +	export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
> +
> +	# a btrfs tester will set only SCRATCH_DEV_POOL, we will put first of its dev
> +	# to SCRATCH_DEV and rest to SCRATCH_DEV_POOL to maintain the backward compatibility
> +	if [ ! -z "$SCRATCH_DEV_POOL" ]; then
> +		if [ ! -z "$SCRATCH_DEV" ]; then
> +			echo "common/config: Error: \$SCRATCH_DEV ($SCRATCH_DEV) should be unset when \$SCRATCH_DEV_POOL ($SCRATCH_DEV_POOL) is set"
> +			_exit 1
> +		fi
> +		SCRATCH_DEV=`echo $SCRATCH_DEV_POOL | awk '{print $1}'`
> +		export SCRATCH_DEV
> +		export SCRATCH_DEV_NOT_SET=true
> +	fi
> +
> +	_check_device SCRATCH_DEV optional $SCRATCH_DEV
> +	export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
> +
> +	if [ -n "$USE_EXTERNAL" ]; then
> +		_check_device TEST_RTDEV optional $TEST_RTDEV
> +		_check_device TEST_LOGDEV optional $TEST_LOGDEV
> +		_check_device SCRATCH_RTDEV optional $SCRATCH_RTDEV
> +		_check_device SCRATCH_LOGDEV optional $SCRATCH_LOGDEV
> +	fi
> +
> +	# Override FSTYP from config when running ./check -overlay
> +	# and maybe override base fs TEST/SCRATCH_DEV with overlay base dirs.
> +	# We need to do this *after* default mount options are set by base FSTYP
> +	# and *after* SCRATCH_DEV is deduced from SCRATCH_DEV_POOL
> +	if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
> +		_overlay_config_override
> +	fi
> +}
> +
> +known_hosts()
> +{
> +	[ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs
> +
> +	[ -f /etc/xfsqa.config ]             && export HOST_OPTIONS=/etc/xfsqa.config
> +	[ -f $HOST_CONFIG_DIR/$HOST ]        && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST
> +	[ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
Nit: All 3 statements above exceed 80 char limit
--NR
> +}
> +
> +_config_section_setup()
> +{
> +	if [ ! -f "$HOST_OPTIONS" ]; then
> +		known_hosts
> +	fi
> +
> +	export HOST_OPTIONS_SECTIONS="-no-sections-"
> +	export OPTIONS_HAVE_SECTIONS=false
> +	if [ -f "$HOST_OPTIONS" ]; then
> +		export HOST_OPTIONS_SECTIONS=`get_config_sections $HOST_OPTIONS`
> +		if [ -z "$HOST_OPTIONS_SECTIONS" ]; then
> +			. $HOST_OPTIONS
> +			export HOST_OPTIONS_SECTIONS="-no-sections-"
> +		else
> +			export OPTIONS_HAVE_SECTIONS=true
> +		fi
> +	fi
> +
> +	if [ -z "$CONFIG_INCLUDED" ]; then
> +		get_next_config `echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
> +		export CONFIG_INCLUDED=true
> +
> +		# Autodetect fs type based on what's on $TEST_DEV unless it's
> +		# been set externally
> +		if [ -z "$FSTYP" ] && [ ! -z "$TEST_DEV" ]; then
> +			FSTYP=`blkid -c /dev/null -s TYPE -o value $TEST_DEV`
> +		fi
> +		FSTYP=${FSTYP:=xfs}
> +		export FSTYP
> +		[ -z "$MOUNT_OPTIONS" ] && _mount_opts
> +		[ -z "$TEST_FS_MOUNT_OPTS" ] && _test_mount_opts
> +		[ -z "$MKFS_OPTIONS" ] && _mkfs_opts
> +		[ -z "$FSCK_OPTIONS" ] && _fsck_opts
> +	else
> +		# We get here for the non multi section case, on every test that
> +		# sources common/rc after re-sourcing the HOST_OPTIONS config
> +		# file.  Because of this re-sourcing, we need to re-canonicalize
> +		# the configured mount points and re-override TEST/SCRATCH_DEV
> +		# overlay vars.
> +
> +		# canonicalize the mount points
> +		# this follows symlinks and removes all trailing "/"s
> +		export TEST_DIR=`_canonicalize_mountpoint TEST_DIR $TEST_DIR`
> +		export SCRATCH_MNT=`_canonicalize_mountpoint SCRATCH_MNT $SCRATCH_MNT`
> +
> +		# Override FSTYP from config when running ./check -overlay and
> +		# maybe override base fs TEST/SCRATCH_DEV with overlay base dirs
> +		if [ "$OVERLAY" == "true" -o "$FSTYP" == "overlay" ]; then
> +			_overlay_config_override
> +		fi
> +	fi
> +}


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/28] config: move config section code to it's own file
  2025-05-09  6:09   ` Nirjhar Roy
@ 2025-05-21 11:28     ` Dave Chinner
  0 siblings, 0 replies; 80+ messages in thread
From: Dave Chinner @ 2025-05-21 11:28 UTC (permalink / raw)
  To: Nirjhar Roy; +Cc: fstests, zlang, ritesh.list

On Fri, May 09, 2025 at 11:39:30AM +0530, Nirjhar Roy wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Move the config section parsing, checking and setup code from
> > common/config to common/config-section so that it can be included
> > directly in contexts where the rest of common/config is not needed.
> This looks okay to me. Just a couple of nits comments below.
> 
> Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
....
> > diff --git a/common/config-sections b/common/config-sections
> > new file mode 100644
> > index 000000000..69a03375a
> > --- /dev/null
> > +++ b/common/config-sections
> > @@ -0,0 +1,390 @@
> > +##/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2000-2003,2006 Silicon Graphics, Inc.  All Rights Reserved.
> > +# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
> > +#
> > +# Config section parsing and setup code
> Since we want to facilitate sourcing of this file independently, should
> we at least mention in the comments some of the dependant files this
> file needs to be sourced? For example, this file uses _exit() - so
> usage of this file requires common/exit to be sourced?

The high level code should have already sourced that file.

> > +
> > +_check_device()
> > +{
> > +	local name=$1
> > +	local dev_needed=$2
> > +	local dev=$3
> > +
> > +	if [ -z "$dev" ]; then
> > +		if [ "$dev_needed" == "required" ]; then
> > +			_fatal "common/config: $name is required but not defined!"
> > +		fi
> > +		return 0
> > +	fi
> > +
> > +	if [ -b "$dev" ] || ( echo $dev | grep -qE ":|//" ); then
> > +		# block device or a network url
> > +		return 0
> > +	fi
> > +
> > +	case "$FSTYP" in
> > +	9p|fuse|tmpfs|virtiofs|afs)
> > +		# 9p, fuse, virtiofs and afs mount tags are just plain strings,
> > +		# so anything is allowed tmpfs doesn't use mount source, ignore
> > +		;;
> > +	ceph)
> > +		# ceph has two different possible syntaxes for mount devices. The
> > +		# network URL check above catches the legacy syntax. Check for the
> > +		# new-style syntax here.
> > +		if ( echo $dev | grep -qEv "=/" ); then
> > +			_fatal "common/config: $name ($dev) is not a valid ceph mount string"
> > +		fi
> > +		;;
> > +	overlay)
> > +		if [ ! -d "$dev" ]; then
> > +			_fatal "common/config: $name ($dev) is not a directory for overlay"
> > +		fi
> > +		;;
> > +	ubifs)
> > +		if [ ! -c "$dev" ]; then
> > +			_fatal "common/config: $name ($dev) is not a character device"
> > +		fi
> > +		;;
> > +	ceph-fuse)
> > +		;;
> > +	*)
> > +		_fatal "common/config: $name ($dev) is not a block device or a network filesystem"
> Nit: 80 chars limit exceed.

There are many of these in the code I moved, and there are many,
many lines that exceed 80 columns all through fstests. I don't think
this needs fixing.

Regardless, this patch is moving code from A to B, so I'm trying to
avoid mixing in formatting or bug fixes that would otherwise be
impossible to spot in the diff...

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 13/28] check-parallel: introduce config file support
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (11 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 12/28] config: move config section code to it's own file Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-09 12:01   ` Nirjhar Roy
  2025-04-17  3:00 ` [PATCH 14/28] fstests: further separate sourcing common/rc and common/config from initialisation Dave Chinner
                   ` (14 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

check-parallel will use the same config file format check does,
and use the same code to auto-discover the config file.

The biggest difference is that check-parallel -requires- the use
of config sections, and the first section *must* be named
"[check-parallel]". This first section is used for defining
setup parameters for check parallel - loop device image file sizes,
etc.

The second biggest difference is that check-parallel does not allow
the config file to define devices. Any section found to contain a
device definition such as TEST_DEV or SCRATCH_DEV will result
check-parallel terminating with an error.

This config file format works for check-parallel invoking check,
too, because once a section is specified on the check command line,
it effectively ignores unknown values set in sections that it
doesn't run.  Hence it effectively skips over the [check-parallel]
setup section.

For check-parallel, each config section now defines just the
filesystem configuration to be tested; all the usual mount and mkfs
options apply, and USE_EXTERNAL must be set for testing external
devices.

This commit implements the initial [check-parallel] section support
and moves the build in default values for these parameters to the
config file setup. This means if the config file does not contain
all the necessary parameter values, a default value will be used for
it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel         | 14 +++-----
 common/config-sections | 78 +++++++++++++++++++++++++++++++++++++++---
 2 files changed, 77 insertions(+), 15 deletions(-)

diff --git a/check-parallel b/check-parallel
index 5bb44b6a5..6fc86fb92 100755
--- a/check-parallel
+++ b/check-parallel
@@ -15,22 +15,14 @@ runner_list=()
 runtimes=()
 show_test_list=
 run_section=""
+iam="check-parallel"
 
 tmp=/tmp/check-parallel.$$
 
-TEST_DEV_SIZE=${TEST_DEV_SIZE:=10G}
-TEST_RTDEV_SIZE=${TEST_RTDEV_SIZE:=10G}
-TEST_LOGDEV_SIZE=${TEST_LOGDEV_SIZE:=128M}
-SCRATCH_DEV_SIZE=${SCRATCH_DEV_SIZE:=20G}
-SCRATCH_RTDEV_SIZE=${SCRATCH_RTDEV_SIZE:=20G}
-SCRATCH_LOGDEV_SIZE=${SCRATCH_LOGDEV_SIZE:=512M}
-LOGWRITES_DEV_SIZE=${LOGWRITES_DEV_SIZE:=2G}
-
-FSTYP=
-
 . ./common/exit
 . ./common/test_names
 . ./common/test_list
+. ./common/config-sections
 
 usage()
 {
@@ -332,6 +324,8 @@ cleanup()
 
 trap "cleanup; exit" HUP INT QUIT TERM
 
+_config_setup_parallel
+
 split_runner_list
 if [ -n "$show_test_list" ]; then
 	echo Time ordered test list:
diff --git a/common/config-sections b/common/config-sections
index 69a03375a..28bd11bab 100644
--- a/common/config-sections
+++ b/common/config-sections
@@ -329,6 +329,7 @@ get_next_config() {
 
 known_hosts()
 {
+	[ -z "$HOST" ] && export HOST=`hostname -s`
 	[ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs
 
 	[ -f /etc/xfsqa.config ]             && export HOST_OPTIONS=/etc/xfsqa.config
@@ -336,7 +337,7 @@ known_hosts()
 	[ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
 }
 
-_config_section_setup()
+_config_file_setup()
 {
 	if [ ! -f "$HOST_OPTIONS" ]; then
 		known_hosts
@@ -346,13 +347,32 @@ _config_section_setup()
 	export OPTIONS_HAVE_SECTIONS=false
 	if [ -f "$HOST_OPTIONS" ]; then
 		export HOST_OPTIONS_SECTIONS=`get_config_sections $HOST_OPTIONS`
-		if [ -z "$HOST_OPTIONS_SECTIONS" ]; then
-			. $HOST_OPTIONS
-			export HOST_OPTIONS_SECTIONS="-no-sections-"
-		else
+		if [ -n "$HOST_OPTIONS_SECTIONS" ]; then
 			export OPTIONS_HAVE_SECTIONS=true
 		fi
 	fi
+}
+
+_config_section_setup()
+{
+	if [ "$iam" == "check-parallel" ]; then
+		echo "$iam: incorrect config file format chosen!"
+		exit 1;
+	fi
+
+	_config_file_setup
+
+	# If we don't have sections, source the options from the config file.
+	# Otherwise, strip sections that should not be run by check that may be
+	# present in the config file
+	if [ "$OPTIONS_HAVE_SECTIONS" != "true" ]; then
+		. $HOST_OPTIONS
+		export HOST_OPTIONS_SECTIONS="-no-sections-"
+	else
+		export HOST_OPTIONS_SECTIONS=$(echo $HOST_OPTIONS_SECTIONS | \
+				sed -e 's/check-parallel//')
+	fi
+
 
 	if [ -z "$CONFIG_INCLUDED" ]; then
 		get_next_config `echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
@@ -388,3 +408,51 @@ _config_section_setup()
 		fi
 	fi
 }
+
+# check-parallel config files must:
+# 1. have config sections defined
+# 2. use the first config section for check-parallel setup
+# 3. not define any physical device parameter in any section
+#
+# If all these are true, then we read the first section that defines
+# the check-parallel config parameters and continue onwards.
+_config_setup_parallel()
+{
+	if [ "$iam" != "check-parallel" ]; then
+		echo "$iam: incorrect config file format chosen!"
+		exit 1;
+	fi
+
+	_config_file_setup
+
+	if [ "$OPTIONS_HAVE_SECTIONS" != "true" ]; then
+		echo "$iam config file has no sections!"
+		exit 1;
+	fi
+
+	local first_section=`echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
+	if [ "$first_section" != "$iam" ]; then
+		echo "$iam config file has no [$iam] section"
+		exit 1
+	fi
+
+	grep DEV $HOST_OPTIONS |grep -qv SIZE
+	if [ $? -ne 1 ]; then
+		echo "$iam config file has devices defined"
+		exit 1
+	fi
+
+	# we only need to pull in the config parameters here and set defaults
+	# if they are not set after pulling in the config values.
+	parse_config_section $1
+
+	TEST_DEV_SIZE=${TEST_DEV_SIZE:=10G}
+	TEST_RTDEV_SIZE=${TEST_RTDEV_SIZE:=10G}
+	TEST_LOGDEV_SIZE=${TEST_LOGDEV_SIZE:=128M}
+	SCRATCH_DEV_SIZE=${SCRATCH_DEV_SIZE:=20G}
+	SCRATCH_RTDEV_SIZE=${SCRATCH_RTDEV_SIZE:=20G}
+	SCRATCH_LOGDEV_SIZE=${SCRATCH_LOGDEV_SIZE:=512M}
+	LOGWRITES_DEV_SIZE=${LOGWRITES_DEV_SIZE:=2G}
+
+	FSTYP=${FSTYP:=xfs}
+}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/28] check-parallel: introduce config file support
  2025-04-17  3:00 ` [PATCH 13/28] check-parallel: introduce config file support Dave Chinner
@ 2025-05-09 12:01   ` Nirjhar Roy
  2025-05-21 12:23     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy @ 2025-05-09 12:01 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> check-parallel will use the same config file format check does,
> and use the same code to auto-discover the config file.
> 
> The biggest difference is that check-parallel -requires- the use
> of config sections, and the first section *must* be named
> "[check-parallel]". This first section is used for defining
> setup parameters for check parallel - loop device image file sizes,
> etc.
> 
> The second biggest difference is that check-parallel does not allow
> the config file to define devices. Any section found to contain a
> device definition such as TEST_DEV or SCRATCH_DEV will result
> check-parallel terminating with an error.
> 
> This config file format works for check-parallel invoking check,
> too, because once a section is specified on the check command line,
> it effectively ignores unknown values set in sections that it
> doesn't run.  Hence it effectively skips over the [check-parallel]
> setup section.
> 
> For check-parallel, each config section now defines just the
> filesystem configuration to be tested; all the usual mount and mkfs
> options apply, and USE_EXTERNAL must be set for testing external
> devices.
> 
> This commit implements the initial [check-parallel] section support
> and moves the build in default values for these parameters to the
> config file setup. This means if the config file does not contain
> all the necessary parameter values, a default value will be used for
> it.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel         | 14 +++-----
>  common/config-sections | 78 +++++++++++++++++++++++++++++++++++++++---
>  2 files changed, 77 insertions(+), 15 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index 5bb44b6a5..6fc86fb92 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -15,22 +15,14 @@ runner_list=()
>  runtimes=()
>  show_test_list=
>  run_section=""
> +iam="check-parallel"
>  
>  tmp=/tmp/check-parallel.$$
>  
> -TEST_DEV_SIZE=${TEST_DEV_SIZE:=10G}
> -TEST_RTDEV_SIZE=${TEST_RTDEV_SIZE:=10G}
> -TEST_LOGDEV_SIZE=${TEST_LOGDEV_SIZE:=128M}
> -SCRATCH_DEV_SIZE=${SCRATCH_DEV_SIZE:=20G}
> -SCRATCH_RTDEV_SIZE=${SCRATCH_RTDEV_SIZE:=20G}
> -SCRATCH_LOGDEV_SIZE=${SCRATCH_LOGDEV_SIZE:=512M}
> -LOGWRITES_DEV_SIZE=${LOGWRITES_DEV_SIZE:=2G}
> -
> -FSTYP=
> -
>  . ./common/exit
>  . ./common/test_names
>  . ./common/test_list
> +. ./common/config-sections
>  
>  usage()
>  {
> @@ -332,6 +324,8 @@ cleanup()
>  
>  trap "cleanup; exit" HUP INT QUIT TERM
>  
> +_config_setup_parallel
> +
>  split_runner_list
>  if [ -n "$show_test_list" ]; then
>  	echo Time ordered test list:
> diff --git a/common/config-sections b/common/config-sections
> index 69a03375a..28bd11bab 100644
> --- a/common/config-sections
> +++ b/common/config-sections
> @@ -329,6 +329,7 @@ get_next_config() {
>  
>  known_hosts()
>  {
> +	[ -z "$HOST" ] && export HOST=`hostname -s`
>  	[ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs
>  
>  	[ -f /etc/xfsqa.config ]             && export HOST_OPTIONS=/etc/xfsqa.config
> @@ -336,7 +337,7 @@ known_hosts()
>  	[ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config
>  }
>  
> -_config_section_setup()
> +_config_file_setup()
>  {
>  	if [ ! -f "$HOST_OPTIONS" ]; then
>  		known_hosts
> @@ -346,13 +347,32 @@ _config_section_setup()
>  	export OPTIONS_HAVE_SECTIONS=false
>  	if [ -f "$HOST_OPTIONS" ]; then
>  		export HOST_OPTIONS_SECTIONS=`get_config_sections $HOST_OPTIONS`
> -		if [ -z "$HOST_OPTIONS_SECTIONS" ]; then
> -			. $HOST_OPTIONS
> -			export HOST_OPTIONS_SECTIONS="-no-sections-"
> -		else
> +		if [ -n "$HOST_OPTIONS_SECTIONS" ]; then
>  			export OPTIONS_HAVE_SECTIONS=true
>  		fi
>  	fi
> +}
> +
> +_config_section_setup()
> +{
> +	if [ "$iam" == "check-parallel" ]; then
> +		echo "$iam: incorrect config file format chosen!"
> +		exit 1;
> +	fi
> +
> +	_config_file_setup
> +
> +	# If we don't have sections, source the options from the config file.
> +	# Otherwise, strip sections that should not be run by check that may be
> +	# present in the config file
> +	if [ "$OPTIONS_HAVE_SECTIONS" != "true" ]; then
> +		. $HOST_OPTIONS
> +		export HOST_OPTIONS_SECTIONS="-no-sections-"
> +	else
> +		export HOST_OPTIONS_SECTIONS=$(echo $HOST_OPTIONS_SECTIONS | \
> +				sed -e 's/check-parallel//')
> +	fi
> +
>  
>  	if [ -z "$CONFIG_INCLUDED" ]; then
>  		get_next_config `echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
> @@ -388,3 +408,51 @@ _config_section_setup()
>  		fi
>  	fi
>  }
> +
> +# check-parallel config files must:
> +# 1. have config sections defined
> +# 2. use the first config section for check-parallel setup
> +# 3. not define any physical device parameter in any section
Nit: Maybe some documentation of the above restrictions and some new
variables introduced (like SCRATCH_DEV_SIZE, ..)in the README?
> +# If all these are true, then we read the first section that defines
> +# the check-parallel config parameters and continue onwards.
> +_config_setup_parallel()
> +{
> +	if [ "$iam" != "check-parallel" ]; then
> +		echo "$iam: incorrect config file format chosen!"
> +		exit 1;
> +	fi
> +
> +	_config_file_setup
> +
> +	if [ "$OPTIONS_HAVE_SECTIONS" != "true" ]; then
Most of the places the way OPTIONS_HAVE_SECTIONS is being used is as 
if $OPTIONS_HAVE_SECTIONS; then 
    ...
fi

so, when OPTIONS_HAVE_SECTIONS=true, /bin/true is executed and the exit
code of /bin/true is always a success and it is not a string comparison
with "true" or "false". /bin/true always succeeds and hence if
OPTIONS_HAVE_SECTIONS=true, then 
if $OPTIONS_HAVE_SECTIONS is always executed. In the same way
/bin/false always has a failure code upon exit.
So maybe we should change this to 
if $OPTIONS_HAVE_SECTIONS; then - just to have consistency with the
rest of the code?
> +		echo "$iam config file has no sections!"
> +		exit 1;
> +	fi
> +
> +	local first_section=`echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
> +	if [ "$first_section" != "$iam" ]; then
> +		echo "$iam config file has no [$iam] section"
> +		exit 1
> +	fi
> +
> +	grep DEV $HOST_OPTIONS |grep -qv SIZE
This will incorrectly fail the check-parallel invocation even if we
have the devices defined in the comment section - so something like the
following is causing a failure

local.config >>
[check-parallel]
TEST_DEV_SIZE=2G
#TEST_DEV=/dev/loop0

Maybe we should try to ignore the lines that begin with '#'
something like - grep -v '^[[:space:]]*#' $HOST_OPTIONS | grep DEV |
grep -qv SIZE ? 

Another case:

Let us consider another local.config file:

[check-parallel]
TEST_DEV_SIZE=2G

[xfs_4k]
TEST_DEV=/dev/loop0
TEST_DIR=/mnt1/test
SCRATCH_DEV=/dev/loop1
SCRATCH_MNT=/mnt1/scratch

The above file runs fine ./check -s xfs_4k selftest/001

However, with check-parallel, it will fail and it is expected according
to the design. But is there any specific reason to fail when the above
configuration is perfectly suitable to run check-parallel (we have
check-parallel, section defined)? So 

./check-parallel -s check-parralel -D /mnt1 -x stress -t 1 selftest/001
check-parallel config file has devices defined

Instead of grepping on the entire $HOST_OPTIONS, how about we only grep
on the check-parallel section? In this way we parse/export the
environment variables only in check-parallel section and ignore the
sections that have devices defined and/or check-parallel incompatible.
./check -s xfs_4k works anyway with [check-parallel] section defined.
./check -s check-parallel will fail with *DEVs not defined which is
self explanatory.

Some something like:
get_env_section $first_section | grep -v '^[[:space:]]*#' | grep DEV | 
grep -qv SIZE

get_env_section()
{
   # code taken from parse_config_section
   SECTION=$1
   echo `sed -e 's/[[:space:]]*\=[[:space:]]*/=/g' \
   -e 's/#.*$//' \
   -e 's/[[:space:]]*$//' \
   -e 's/^[[:space:]]*//' \
   -e "s/^\([^=]*\)=\"\?'\?\([^\"']*\)\"\?'\?$/export \1=\"\2\"/" \
   < $HOST_OPTIONS \
   | sed -n -e "/^\[$SECTION\]/,/^\s*\[/{/^[^#].*\=.*/p;}"`
}

The value that we are getting out of the the above suggestion
modification is that we can re-use local.config file between check and
check-parallel. 
Please let me know what do you think of the above?
> +	if [ $? -ne 1 ]; then
+		echo "$iam config file has
> devices defined"
+		exit 1
+	fi
+
+	# we only need to
> pull in the config parameters here and set defaults
+	# if they are
> not set after pulling in the config values.
> +	parse_config_section $1
What value is $1 holding? _config_setup_parallel is being called only
from check-parallel without any parameters passed, right? Did you mean
_config_setup_parallel $first_section ?
--NR
> +
+	TEST_DEV_SIZE=${TEST_DEV_SIZE:=10G}
+	TEST_RTDEV_SIZE=${TES
> T_RTDEV_SIZE:=10G}
+	TEST_LOGDEV_SIZE=${TEST_LOGDEV_SIZE:=128M}
+	
> SCRATCH_DEV_SIZE=${SCRATCH_DEV_SIZE:=20G}
+	SCRATCH_RTDEV_SIZE=${SC
> RATCH_RTDEV_SIZE:=20G}
+	SCRATCH_LOGDEV_SIZE=${SCRATCH_LOGDEV_SI
> ZE:=512M}
+	LOGWRITES_DEV_SIZE=${LOGWRITES_DEV_SIZE:=2G}
+
+	FSTYP
> =${FSTYP:=xfs}
+}


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/28] check-parallel: introduce config file support
  2025-05-09 12:01   ` Nirjhar Roy
@ 2025-05-21 12:23     ` Dave Chinner
  0 siblings, 0 replies; 80+ messages in thread
From: Dave Chinner @ 2025-05-21 12:23 UTC (permalink / raw)
  To: Nirjhar Roy; +Cc: fstests, zlang

On Fri, May 09, 2025 at 05:31:49PM +0530, Nirjhar Roy wrote:
> On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > check-parallel will use the same config file format check does,
> > and use the same code to auto-discover the config file.
> > 
> > The biggest difference is that check-parallel -requires- the use
> > of config sections, and the first section *must* be named
> > "[check-parallel]". This first section is used for defining
> > setup parameters for check parallel - loop device image file sizes,
> > etc.
> > 
> > The second biggest difference is that check-parallel does not allow
> > the config file to define devices. Any section found to contain a
> > device definition such as TEST_DEV or SCRATCH_DEV will result
> > check-parallel terminating with an error.
> > 
> > This config file format works for check-parallel invoking check,
> > too, because once a section is specified on the check command line,
> > it effectively ignores unknown values set in sections that it
> > doesn't run.  Hence it effectively skips over the [check-parallel]
> > setup section.
> > 
> > For check-parallel, each config section now defines just the
> > filesystem configuration to be tested; all the usual mount and mkfs
> > options apply, and USE_EXTERNAL must be set for testing external
> > devices.
> > 
> > This commit implements the initial [check-parallel] section support
> > and moves the build in default values for these parameters to the
> > config file setup. This means if the config file does not contain
> > all the necessary parameter values, a default value will be used for
> > it.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
....

> > @@ -388,3 +408,51 @@ _config_section_setup()
> >  		fi
> >  	fi
> >  }
> > +
> > +# check-parallel config files must:
> > +# 1. have config sections defined
> > +# 2. use the first config section for check-parallel setup
> > +# 3. not define any physical device parameter in any section
> Nit: Maybe some documentation of the above restrictions and some new
> variables introduced (like SCRATCH_DEV_SIZE, ..)in the README?

Eventually, yes. Right now stuff is still in a state of flux, so
there is no point in adding extra documentation that has to be kept
up to date.

> > +# If all these are true, then we read the first section that defines
> > +# the check-parallel config parameters and continue onwards.
> > +_config_setup_parallel()
> > +{
> > +	if [ "$iam" != "check-parallel" ]; then
> > +		echo "$iam: incorrect config file format chosen!"
> > +		exit 1;
> > +	fi
> > +
> > +	_config_file_setup
> > +
> > +	if [ "$OPTIONS_HAVE_SECTIONS" != "true" ]; then
> Most of the places the way OPTIONS_HAVE_SECTIONS is being used is as 
> if $OPTIONS_HAVE_SECTIONS; then 
>     ...
> fi
> 
> so, when OPTIONS_HAVE_SECTIONS=true, /bin/true is executed and the exit
> code of /bin/true is always a success and it is not a string comparison
> with "true" or "false".

Yes, I know.

Executing /bin/true requires creating a new process just to return
a successful exit code. This takes several milliseconds of CPU time
to create and tear down a new process context.

OTOH, doing a string comparison in the shell parser takes a couple
of microseconds of CPU time....

Which is faster and burns less energy?

>
> /bin/true always succeeds and hence if
> OPTIONS_HAVE_SECTIONS=true, then 
> if $OPTIONS_HAVE_SECTIONS is always executed. In the same way
> /bin/false always has a failure code upon exit.
> So maybe we should change this to 
> if $OPTIONS_HAVE_SECTIONS; then - just to have consistency with the
> rest of the code?
> > +		echo "$iam config file has no sections!"
> > +		exit 1;
> > +	fi
> > +
> > +	local first_section=`echo $HOST_OPTIONS_SECTIONS | cut -f1 -d" "`
> > +	if [ "$first_section" != "$iam" ]; then
> > +		echo "$iam config file has no [$iam] section"
> > +		exit 1
> > +	fi
> > +
> > +	grep DEV $HOST_OPTIONS |grep -qv SIZE
> This will incorrectly fail the check-parallel invocation even if we
> have the devices defined in the comment section - so something like the
> following is causing a failure
> 
> local.config >>
> [check-parallel]
> TEST_DEV_SIZE=2G
> #TEST_DEV=/dev/loop0

Don't put devices in the check-parallel config file.

> Maybe we should try to ignore the lines that begin with '#'
> something like - grep -v '^[[:space:]]*#' $HOST_OPTIONS | grep DEV |
> grep -qv SIZE ? 
> 
> Another case:
> 
> Let us consider another local.config file:
> 
> [check-parallel]
> TEST_DEV_SIZE=2G
> 
> [xfs_4k]
> TEST_DEV=/dev/loop0
> TEST_DIR=/mnt1/test
> SCRATCH_DEV=/dev/loop1
> SCRATCH_MNT=/mnt1/scratch
> 
> The above file runs fine ./check -s xfs_4k selftest/001

Yes, but it is an invalid check-parallel config file because it has
devices defined in it.

If check-parallel allows this, then when it runs a check instance
it will override the environment defines set by the check-parallel
runner and every runner will try to use the same devices and mount
points.

It just doesn't work.

> However, with check-parallel, it will fail and it is expected according
> to the design. But is there any specific reason to fail when the above
> configuration is perfectly suitable to run check-parallel (we have
> check-parallel, section defined)? So 
> 
> ./check-parallel -s check-parralel -D /mnt1 -x stress -t 1 selftest/001
> check-parallel config file has devices defined
>
> Instead of grepping on the entire $HOST_OPTIONS, how about we only grep
> on the check-parallel section? In this way we parse/export the
> environment variables only in check-parallel section and ignore the
> sections that have devices defined and/or check-parallel incompatible.
> ./check -s xfs_4k works anyway with [check-parallel] section defined.
> ./check -s check-parallel will fail with *DEVs not defined which is
> self explanatory.

I think you may have misunderstood the direction check-parallel is
going in.

I don't want check and check-parallel to share the same config file.
What I need is for them to use the same file format and hence be
able to share parser code.

Later in the patchset I remove the dependency that check-parallel
has on check, and at that point the check-parallel config file no
longer needs to work with check.

However, there is an intermediate period in the patchset where
check-parallel calls check, and so the config file for
check-parallel has to be compatible with check for at least that
short period in time.

To make that work, the check-parallel config file cannot have any
devices or mount points defined anywhere in it, otherwise check will
override the values that check-parallel has provided via the
environment. We don't need to over-engineer this check - if there
are any known devices defined in the config file, abort.

If it gets a false positive, I don't care because once
check-parallel no longer calls check, the "no devices in
check-parallel config file" rule becomes irrelevant because the
config files are no longer shared. At this point the config file
supports only the subset of parameters that check-parallel defines
as valid, and no more. This set of valid parameters for
check-parallel will be enumerated in future patches.

Hence all we have to do at this point is make sure that the section
parser continues to work correctly for both test runners, not try to
invent the One True Config File To Rule Them All....

> > +	parse_config_section $1
> What value is $1 holding? _config_setup_parallel is being called only
> from check-parallel without any parameters passed, right? Did you mean
> _config_setup_parallel $first_section ?

Yup. Fixed.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 14/28] fstests: further separate sourcing common/rc and common/config from initialisation
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (12 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 13/28] check-parallel: introduce config file support Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-10 14:08   ` Nirjhar Roy (IBM)
  2025-04-17  3:00 ` [PATCH 15/28] check-parallel: de-batch test execution Dave Chinner
                   ` (13 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

The sourcing of common/rc still causes code to be run, partially
because it sources common/config and partially because there is some
in-line code amongst all the function definitions inside common/rc

This is messy, and re-sourcing those files also does an awful
lot of setup work that isn't actually required.

common/config only needs to be included once - everything that
scripts then depend on should be exported by it, and hence it should
only be included once from check/check-parallel to set up all the
environmental parameters for the entire run.

common/rc also only needs to be included once per context, but it
does not need to directly include common config nor does it need to
run init_rc in each individual test context.

Seperate out this mess. Include common/config directly where needed
and only use it to set up the environment. Move all the code that is
in common/config to common/rc so that common/config is not needed
for any purpose other than setting up the initial environment.
Move the initialisation functions to the scripts that include
common/config.

Config file and config section parsing can be run directly from check
and/or check-parallel; this is not needed for every context that
needs to know how what XFS_MKFS_PROG is set to...

Similarly, include common/rc only once, and only call init_rc or
_source_specific_fs() from the contexts that actually need that code
to be run.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check           |  23 ++---
 common/config   | 212 --------------------------------------------
 common/preamble |  18 +++-
 common/rc       | 227 ++++++++++++++++++++++++++++++++++++++++++++----
 4 files changed, 240 insertions(+), 240 deletions(-)

diff --git a/check b/check
index 0b489cb4b..fea86f7b9 100755
--- a/check
+++ b/check
@@ -46,6 +46,9 @@ rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
 . ./common/exit
 . ./common/test_names
 . ./common/test_list
+. ./common/config
+. ./common/config-sections
+. ./common/rc
 
 usage()
 {
@@ -183,15 +186,17 @@ while [ $# -gt 0 ]; do
 	shift
 done
 
-# we need common/rc, that also sources common/config. We need to source it
-# after processing args, overlay needs FSTYP set before sourcing common/config
-if ! . ./common/rc; then
-	echo "check: failed to source common/rc"
-	exit 1
-fi
-
+# now we have done argument parsing, overlay has FSTYP set and we can now
+# start processing the config files and setting up devices.
+_config_section_setup
+_canonicalize_devices
 init_rc
 
+if [ ! -z "$REPORT_LIST" ]; then
+	. ./common/report
+	_assert_report_list
+fi
+
 # If the test config specified a soak test duration, see if there are any
 # unit suffixes that need converting to an integer seconds count.
 if [ -n "$SOAK_DURATION" ]; then
@@ -553,10 +558,6 @@ function run_section()
 			status=1
 			exit
 		fi
-		# Previous FSTYP derived from TEST_DEV could be changed, source
-		# common/rc again with correct FSTYP to get FSTYP specific configs,
-		# e.g. common/xfs
-		. common/rc
 		_tl_prepare_test_list
 	elif [ "$OLD_TEST_FS_MOUNT_OPTS" != "$TEST_FS_MOUNT_OPTS" ]; then
 		# Unmount TEST_DEV to apply the updated mount options.
diff --git a/common/config b/common/config
index f90a66862..b93a6c0d3 100644
--- a/common/config
+++ b/common/config
@@ -41,7 +41,6 @@
 
 . common/test_names
 . common/exit
-. common/config-sections
 
 # all tests should use a common language setting to prevent golden
 # output mismatches.
@@ -342,214 +341,3 @@ if [ -x /usr/sbin/selinuxenabled ] && /usr/sbin/selinuxenabled; then
 	: ${SELINUX_MOUNT_OPTIONS:="-o context=$(stat -c %C /)"}
 	export SELINUX_MOUNT_OPTIONS
 fi
-
-_common_mount_opts()
-{
-	case $FSTYP in
-	9p)
-		echo $PLAN9_MOUNT_OPTIONS
-		;;
-	fuse)
-		echo $FUSE_MOUNT_OPTIONS
-		;;
-	xfs)
-		echo $XFS_MOUNT_OPTIONS
-		;;
-	udf)
-		echo $UDF_MOUNT_OPTIONS
-		;;
-	nfs)
-		echo $NFS_MOUNT_OPTIONS
-		;;
-	afs)
-		echo $AFS_MOUNT_OPTIONS
-		;;
-	cifs)
-		echo $CIFS_MOUNT_OPTIONS
-		;;
-	ceph)
-		echo $CEPHFS_MOUNT_OPTIONS
-		;;
-	glusterfs)
-		echo $GLUSTERFS_MOUNT_OPTIONS
-		;;
-	overlay)
-		echo $OVERLAY_MOUNT_OPTIONS
-		;;
-	ext2|ext3|ext4)
-		# acls & xattrs aren't turned on by default on ext$FOO
-		echo "-o acl,user_xattr $EXT_MOUNT_OPTIONS"
-		;;
-	f2fs)
-		echo "-o acl,user_xattr $F2FS_MOUNT_OPTIONS"
-		;;
-       reiser4)
-		# acls & xattrs aren't supported by reiser4
-		echo $REISER4_MOUNT_OPTIONS
-		;;
-	gfs2)
-		# acls aren't turned on by default on gfs2
-		echo "-o acl $GFS2_MOUNT_OPTIONS"
-		;;
-	tmpfs)
-		# We need to specify the size at mount, use 1G by default
-		echo "-o size=1G $TMPFS_MOUNT_OPTIONS"
-		;;
-	ubifs)
-		echo $UBIFS_MOUNT_OPTIONS
-		;;
-	*)
-		;;
-	esac
-}
-
-_mount_opts()
-{
-	export MOUNT_OPTIONS=$(_common_mount_opts)
-}
-
-_test_mount_opts()
-{
-	export TEST_FS_MOUNT_OPTS=$(_common_mount_opts)
-}
-
-_mkfs_opts()
-{
-	case $FSTYP in
-	xfs)
-		export MKFS_OPTIONS=$XFS_MKFS_OPTIONS
-		;;
-	udf)
-		[ ! -z "$udf_fsize" ] && \
-			UDF_MKFS_OPTIONS="$UDF_MKFS_OPTIONS -s $udf_fsize"
-		export MKFS_OPTIONS=$UDF_MKFS_OPTIONS
-		;;
-	nfs)
-		export MKFS_OPTIONS=$NFS_MKFS_OPTIONS
-		;;
-	afs)
-		export MKFS_OPTIONS=$AFS_MKFS_OPTIONS
-		;;
-	cifs)
-		export MKFS_OPTIONS=$CIFS_MKFS_OPTIONS
-		;;
-	ceph)
-		export MKFS_OPTIONS=$CEPHFS_MKFS_OPTIONS
-		;;
-       reiser4)
-		export MKFS_OPTIONS=$REISER4_MKFS_OPTIONS
-		;;
-	gfs2)
-		export MKFS_OPTIONS="$GFS2_MKFS_OPTIONS -O -p lock_nolock"
-		;;
-	jfs)
-		export MKFS_OPTIONS="$JFS_MKFS_OPTIONS -q"
-		;;
-	f2fs)
-		export MKFS_OPTIONS="$F2FS_MKFS_OPTIONS"
-		;;
-	btrfs)
-		export MKFS_OPTIONS="$BTRFS_MKFS_OPTIONS"
-		;;
-	bcachefs)
-		export MKFS_OPTIONS=$BCACHEFS_MKFS_OPTIONS
-		;;
-	*)
-		;;
-	esac
-}
-
-_fsck_opts()
-{
-	case $FSTYP in
-	ext2|ext3|ext4)
-		export FSCK_OPTIONS="-nf"
-		;;
-	reiser*)
-		export FSCK_OPTIONS="--yes"
-		;;
-	f2fs)
-		export FSCK_OPTIONS=""
-		;;
-	*)
-		export FSCK_OPTIONS="-n"
-		;;
-	esac
-}
-
-# check necessary running dependences then source sepcific fs helpers
-_source_specific_fs()
-{
-	local fs=$1
-
-	if [ -z "$fs" ];then
-		fs=$FSTYP
-	fi
-
-	case "$fs" in
-	xfs)
-		[ "$XFS_LOGPRINT_PROG" = "" ] && _fatal "xfs_logprint not found"
-		[ "$XFS_REPAIR_PROG" = "" ] && _fatal "xfs_repair not found"
-		[ "$XFS_DB_PROG" = "" ] && _fatal "xfs_db not found"
-		[ "$MKFS_XFS_PROG" = "" ] && _fatal "mkfs_xfs not found"
-		[ "$XFS_INFO_PROG" = "" ] && _fatal "xfs_info not found"
-
-		. ./common/xfs
-		;;
-	udf)
-		[ "$MKFS_UDF_PROG" = "" ] && _fatal "mkfs_udf/mkudffs not found"
-		;;
-	btrfs)
-		[ "$MKFS_BTRFS_PROG" = "" ] && _fatal "mkfs.btrfs not found"
-
-		. ./common/btrfs
-		;;
-	ext4)
-		[ "$MKFS_EXT4_PROG" = "" ] && _fatal "mkfs.ext4 not found"
-		. ./common/ext4
-		;;
-	ext2|ext3)
-		. ./common/ext4
-		;;
-	f2fs)
-		[ "$MKFS_F2FS_PROG" = "" ] && _fatal "mkfs.f2fs not found"
-		;;
-	nfs)
-		. ./common/nfs
-		;;
-	afs)
-		;;
-	cifs)
-		;;
-	9p)
-		;;
-	fuse)
-		;;
-	ceph)
-		. ./common/ceph
-		;;
-	glusterfs)
-		;;
-	overlay)
-		. ./common/overlay
-		;;
-	reiser4)
-		[ "$MKFS_REISER4_PROG" = "" ] && _fatal "mkfs.reiser4 not found"
-		;;
-	pvfs2)
-		;;
-	ubifs)
-		[ "$UBIUPDATEVOL_PROG" = "" ] && _fatal "ubiupdatevol not found"
-		. ./common/ubifs
-		;;
-	esac
-}
-
-_config_section_setup
-_canonicalize_devices
-# mkfs.xfs checks for TEST_DEV before permitting < 300M filesystems. TEST_DIR
-# and QA_CHECK_FS are also checked by mkfs.xfs, but already exported elsewhere.
-export TEST_DEV
-
-# make sure this script returns success
-/bin/true
diff --git a/common/preamble b/common/preamble
index 0b684cc33..265b5649f 100644
--- a/common/preamble
+++ b/common/preamble
@@ -51,7 +51,23 @@ _begin_fstest()
 
 	. ./common/exit
 	. ./common/rc
-	init_rc
+
+	# Explicitly source the filesystem specific functions the test may need.
+	# This opens the door for template-file based functionality using
+	# function redfinition (e.g. to provide _scratch_mkfs()), rather than
+	# having everyting FSTYP specific being implemented in common/rc with
+	# massive case statements.
+	_source_specific_fs $FSTYP
+
+	# Always mount the test device because there many feature checks (i.e.
+	# _requires_....() functions) that assume the TEST_DIR is mounted. Lots
+	# of tests do not call _require_test to actually mount the test device
+	# first, so if we don't mount the test device then the _requires...
+	# checks are not probing the correct filesystem for support.
+	_check_if_dev_already_mounted $TEST_DEV $TEST_DIR
+	if [ $? -eq 1 ]; then
+		_test_mount || _fail "Cannot mount $TEST_DEV on $TEST_DIR"
+	fi
 
 	# remove previous $seqres.full before test
 	rm -f $seqres.full $seqres.hints
diff --git a/common/rc b/common/rc
index 94c00d890..be6cd92c4 100644
--- a/common/rc
+++ b/common/rc
@@ -2,10 +2,11 @@
 # SPDX-License-Identifier: GPL-2.0+
 # Copyright (c) 2000-2006 Silicon Graphics, Inc.  All Rights Reserved.
 
-. common/config
-
 BC="$(type -P bc)" || BC=
 
+# make sure we have a standard umask
+umask 022
+
 # Don't use sync(1) directly if at all possible. In most cases we only need to
 # sync the fs under test, so we use syncfs if it is supported to prevent
 # disturbance of other tests that may be running concurrently.
@@ -246,17 +247,6 @@ _log_err()
     echo "(see $seqres.full for details)"
 }
 
-# make sure we have a standard umask
-umask 022
-
-# check for correct setup and source the $FSTYP specific functions now
-_source_specific_fs $FSTYP
-
-if [ ! -z "$REPORT_LIST" ]; then
-	. ./common/report
-	_assert_report_list
-fi
-
 _get_filesize()
 {
     stat -c %s "$1"
@@ -4934,6 +4924,8 @@ init_rc()
 		_exit 1
 	fi
 
+	_source_specific_fs $FSTYP
+
 	# if $TEST_DEV is not mounted, mount it now as XFS
 	if [ -z "`_fs_type $TEST_DEV`" ]
 	then
@@ -4973,6 +4965,11 @@ init_rc()
 	# it is supported.
 	$XFS_IO_PROG -i -c quit 2>/dev/null && \
 		export XFS_IO_PROG="$XFS_IO_PROG -i"
+
+	# mkfs.xfs checks for TEST_DEV before permitting < 300M filesystems.
+	# TEST_DIR and QA_CHECK_FS are also checked by mkfs.xfs, but already
+	# exported elsewhere.
+	export TEST_DEV
 }
 
 # get real device path name by following link
@@ -5844,6 +5841,204 @@ _require_program() {
 	_have_program "$1" || _notrun "$tag required"
 }
 
-################################################################################
-# make sure this script returns success
-/bin/true
+_common_mount_opts()
+{
+	case $FSTYP in
+	9p)
+		echo $PLAN9_MOUNT_OPTIONS
+		;;
+	fuse)
+		echo $FUSE_MOUNT_OPTIONS
+		;;
+	xfs)
+		echo $XFS_MOUNT_OPTIONS
+		;;
+	udf)
+		echo $UDF_MOUNT_OPTIONS
+		;;
+	nfs)
+		echo $NFS_MOUNT_OPTIONS
+		;;
+	afs)
+		echo $AFS_MOUNT_OPTIONS
+		;;
+	cifs)
+		echo $CIFS_MOUNT_OPTIONS
+		;;
+	ceph)
+		echo $CEPHFS_MOUNT_OPTIONS
+		;;
+	glusterfs)
+		echo $GLUSTERFS_MOUNT_OPTIONS
+		;;
+	overlay)
+		echo $OVERLAY_MOUNT_OPTIONS
+		;;
+	ext2|ext3|ext4)
+		# acls & xattrs aren't turned on by default on ext$FOO
+		echo "-o acl,user_xattr $EXT_MOUNT_OPTIONS"
+		;;
+	f2fs)
+		echo "-o acl,user_xattr $F2FS_MOUNT_OPTIONS"
+		;;
+	reiser4)
+		# acls & xattrs aren't supported by reiser4
+		echo $REISER4_MOUNT_OPTIONS
+		;;
+	gfs2)
+		# acls aren't turned on by default on gfs2
+		echo "-o acl $GFS2_MOUNT_OPTIONS"
+		;;
+	tmpfs)
+		# We need to specify the size at mount, use 1G by default
+		echo "-o size=1G $TMPFS_MOUNT_OPTIONS"
+		;;
+	ubifs)
+		echo $UBIFS_MOUNT_OPTIONS
+		;;
+	*)
+		;;
+	esac
+}
+
+_mount_opts()
+{
+	export MOUNT_OPTIONS=$(_common_mount_opts)
+}
+
+_test_mount_opts()
+{
+	export TEST_FS_MOUNT_OPTS=$(_common_mount_opts)
+}
+
+_mkfs_opts()
+{
+	case $FSTYP in
+	xfs)
+		export MKFS_OPTIONS=$XFS_MKFS_OPTIONS
+		;;
+	udf)
+		[ ! -z "$udf_fsize" ] && \
+			UDF_MKFS_OPTIONS="$UDF_MKFS_OPTIONS -s $udf_fsize"
+		export MKFS_OPTIONS=$UDF_MKFS_OPTIONS
+		;;
+	nfs)
+		export MKFS_OPTIONS=$NFS_MKFS_OPTIONS
+		;;
+	afs)
+		export MKFS_OPTIONS=$AFS_MKFS_OPTIONS
+		;;
+	cifs)
+		export MKFS_OPTIONS=$CIFS_MKFS_OPTIONS
+		;;
+	ceph)
+		export MKFS_OPTIONS=$CEPHFS_MKFS_OPTIONS
+		;;
+	reiser4)
+		export MKFS_OPTIONS=$REISER4_MKFS_OPTIONS
+		;;
+	gfs2)
+		export MKFS_OPTIONS="$GFS2_MKFS_OPTIONS -O -p lock_nolock"
+		;;
+	jfs)
+		export MKFS_OPTIONS="$JFS_MKFS_OPTIONS -q"
+		;;
+	f2fs)
+		export MKFS_OPTIONS="$F2FS_MKFS_OPTIONS"
+		;;
+	btrfs)
+		export MKFS_OPTIONS="$BTRFS_MKFS_OPTIONS"
+		;;
+	bcachefs)
+		export MKFS_OPTIONS=$BCACHEFS_MKFS_OPTIONS
+		;;
+	*)
+		;;
+	esac
+}
+
+_fsck_opts()
+{
+	case $FSTYP in
+	ext2|ext3|ext4)
+		export FSCK_OPTIONS="-nf"
+		;;
+	reiser*)
+		export FSCK_OPTIONS="--yes"
+		;;
+	f2fs)
+		export FSCK_OPTIONS=""
+		;;
+	*)
+		export FSCK_OPTIONS="-n"
+		;;
+	esac
+}
+
+# check necessary running dependences then source sepcific fs helpers
+_source_specific_fs()
+{
+	local fs=$1
+
+	if [ -z "$fs" ];then
+		fs=$FSTYP
+	fi
+
+	case "$fs" in
+	xfs)
+		[ "$XFS_LOGPRINT_PROG" = "" ] && _fatal "xfs_logprint not found"
+		[ "$XFS_REPAIR_PROG" = "" ] && _fatal "xfs_repair not found"
+		[ "$XFS_DB_PROG" = "" ] && _fatal "xfs_db not found"
+		[ "$MKFS_XFS_PROG" = "" ] && _fatal "mkfs_xfs not found"
+		[ "$XFS_INFO_PROG" = "" ] && _fatal "xfs_info not found"
+
+		. ./common/xfs
+		;;
+	udf)
+		[ "$MKFS_UDF_PROG" = "" ] && _fatal "mkfs_udf/mkudffs not found"
+		;;
+	btrfs)
+		[ "$MKFS_BTRFS_PROG" = "" ] && _fatal "mkfs.btrfs not found"
+
+		. ./common/btrfs
+		;;
+	ext4)
+		[ "$MKFS_EXT4_PROG" = "" ] && _fatal "mkfs.ext4 not found"
+		. ./common/ext4
+		;;
+	ext2|ext3)
+		. ./common/ext4
+		;;
+	f2fs)
+		[ "$MKFS_F2FS_PROG" = "" ] && _fatal "mkfs.f2fs not found"
+		;;
+	nfs)
+		. ./common/nfs
+		;;
+	afs)
+		;;
+	cifs)
+		;;
+	9p)
+		;;
+	fuse)
+		;;
+	ceph)
+		. ./common/ceph
+		;;
+	glusterfs)
+		;;
+	overlay)
+		. ./common/overlay
+		;;
+	reiser4)
+		[ "$MKFS_REISER4_PROG" = "" ] && _fatal "mkfs.reiser4 not found"
+		;;
+	pvfs2)
+		;;
+	ubifs)
+		[ "$UBIUPDATEVOL_PROG" = "" ] && _fatal "ubiupdatevol not found"
+		. ./common/ubifs
+		;;
+	esac
+}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 14/28] fstests: further separate sourcing common/rc and common/config from initialisation
  2025-04-17  3:00 ` [PATCH 14/28] fstests: further separate sourcing common/rc and common/config from initialisation Dave Chinner
@ 2025-05-10 14:08   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-10 14:08 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The sourcing of common/rc still causes code to be run, partially
> because it sources common/config and partially because there is some
> in-line code amongst all the function definitions inside common/rc
> 
> This is messy, and re-sourcing those files also does an awful
> lot of setup work that isn't actually required.
> 
> common/config only needs to be included once - everything that
> scripts then depend on should be exported by it, and hence it should
> only be included once from check/check-parallel to set up all the
> environmental parameters for the entire run.
> 
> common/rc also only needs to be included once per context, but it
> does not need to directly include common config nor does it need to
> run init_rc in each individual test context.
> 
> Seperate out this mess. Include common/config directly where needed
> and only use it to set up the environment. Move all the code that is
> in common/config to common/rc so that common/config is not needed
> for any purpose other than setting up the initial environment.
> Move the initialisation functions to the scripts that include
> common/config.
> 
> Config file and config section parsing can be run directly from check
> and/or check-parallel; this is not needed for every context that
> needs to know how what XFS_MKFS_PROG is set to...
> 
> Similarly, include common/rc only once, and only call init_rc or
> _source_specific_fs() from the contexts that actually need that code
> to be run.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check           |  23 ++---
>  common/config   | 212 --------------------------------------------
>  common/preamble |  18 +++-
>  common/rc       | 227 ++++++++++++++++++++++++++++++++++++++++++++----
>  4 files changed, 240 insertions(+), 240 deletions(-)
> 
> diff --git a/check b/check
> index 0b489cb4b..fea86f7b9 100755
> --- a/check
> +++ b/check
> @@ -46,6 +46,9 @@ rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
>  . ./common/exit
>  . ./common/test_names
>  . ./common/test_list
> +. ./common/config
> +. ./common/config-sections
> +. ./common/rc
>  
>  usage()
>  {
> @@ -183,15 +186,17 @@ while [ $# -gt 0 ]; do
>  	shift
>  done
>  
> -# we need common/rc, that also sources common/config. We need to source it
> -# after processing args, overlay needs FSTYP set before sourcing common/config
> -if ! . ./common/rc; then
> -	echo "check: failed to source common/rc"
> -	exit 1
> -fi
> -
> +# now we have done argument parsing, overlay has FSTYP set and we can now
> +# start processing the config files and setting up devices.
> +_config_section_setup
> +_canonicalize_devices
>  init_rc
>  
> +if [ ! -z "$REPORT_LIST" ]; then
> +	. ./common/report
> +	_assert_report_list
> +fi
> +
>  # If the test config specified a soak test duration, see if there are any
>  # unit suffixes that need converting to an integer seconds count.
>  if [ -n "$SOAK_DURATION" ]; then
> @@ -553,10 +558,6 @@ function run_section()
>  			status=1
>  			exit
>  		fi
> -		# Previous FSTYP derived from TEST_DEV could be changed, source
> -		# common/rc again with correct FSTYP to get FSTYP specific configs,
> -		# e.g. common/xfs
> -		. common/rc
>  		_tl_prepare_test_list
>  	elif [ "$OLD_TEST_FS_MOUNT_OPTS" != "$TEST_FS_MOUNT_OPTS" ]; then
>  		# Unmount TEST_DEV to apply the updated mount options.
> diff --git a/common/config b/common/config
> index f90a66862..b93a6c0d3 100644
> --- a/common/config
> +++ b/common/config
> @@ -41,7 +41,6 @@
>  
>  . common/test_names
>  . common/exit
> -. common/config-sections
>  
>  # all tests should use a common language setting to prevent golden
>  # output mismatches.
> @@ -342,214 +341,3 @@ if [ -x /usr/sbin/selinuxenabled ] && /usr/sbin/selinuxenabled; then
>  	: ${SELINUX_MOUNT_OPTIONS:="-o context=$(stat -c %C /)"}
>  	export SELINUX_MOUNT_OPTIONS
>  fi
> -
> -_common_mount_opts()
> -{
> -	case $FSTYP in
> -	9p)
> -		echo $PLAN9_MOUNT_OPTIONS
> -		;;
> -	fuse)
> -		echo $FUSE_MOUNT_OPTIONS
> -		;;
> -	xfs)
> -		echo $XFS_MOUNT_OPTIONS
> -		;;
> -	udf)
> -		echo $UDF_MOUNT_OPTIONS
> -		;;
> -	nfs)
> -		echo $NFS_MOUNT_OPTIONS
> -		;;
> -	afs)
> -		echo $AFS_MOUNT_OPTIONS
> -		;;
> -	cifs)
> -		echo $CIFS_MOUNT_OPTIONS
> -		;;
> -	ceph)
> -		echo $CEPHFS_MOUNT_OPTIONS
> -		;;
> -	glusterfs)
> -		echo $GLUSTERFS_MOUNT_OPTIONS
> -		;;
> -	overlay)
> -		echo $OVERLAY_MOUNT_OPTIONS
> -		;;
> -	ext2|ext3|ext4)
> -		# acls & xattrs aren't turned on by default on ext$FOO
> -		echo "-o acl,user_xattr $EXT_MOUNT_OPTIONS"
> -		;;
> -	f2fs)
> -		echo "-o acl,user_xattr $F2FS_MOUNT_OPTIONS"
> -		;;
> -       reiser4)
> -		# acls & xattrs aren't supported by reiser4
> -		echo $REISER4_MOUNT_OPTIONS
> -		;;
> -	gfs2)
> -		# acls aren't turned on by default on gfs2
> -		echo "-o acl $GFS2_MOUNT_OPTIONS"
> -		;;
> -	tmpfs)
> -		# We need to specify the size at mount, use 1G by default
> -		echo "-o size=1G $TMPFS_MOUNT_OPTIONS"
> -		;;
> -	ubifs)
> -		echo $UBIFS_MOUNT_OPTIONS
> -		;;
> -	*)
> -		;;
> -	esac
> -}
> -
> -_mount_opts()
> -{
> -	export MOUNT_OPTIONS=$(_common_mount_opts)
> -}
> -
> -_test_mount_opts()
> -{
> -	export TEST_FS_MOUNT_OPTS=$(_common_mount_opts)
> -}
> -
> -_mkfs_opts()
> -{
> -	case $FSTYP in
> -	xfs)
> -		export MKFS_OPTIONS=$XFS_MKFS_OPTIONS
> -		;;
> -	udf)
> -		[ ! -z "$udf_fsize" ] && \
> -			UDF_MKFS_OPTIONS="$UDF_MKFS_OPTIONS -s $udf_fsize"
> -		export MKFS_OPTIONS=$UDF_MKFS_OPTIONS
> -		;;
> -	nfs)
> -		export MKFS_OPTIONS=$NFS_MKFS_OPTIONS
> -		;;
> -	afs)
> -		export MKFS_OPTIONS=$AFS_MKFS_OPTIONS
> -		;;
> -	cifs)
> -		export MKFS_OPTIONS=$CIFS_MKFS_OPTIONS
> -		;;
> -	ceph)
> -		export MKFS_OPTIONS=$CEPHFS_MKFS_OPTIONS
> -		;;
> -       reiser4)
> -		export MKFS_OPTIONS=$REISER4_MKFS_OPTIONS
> -		;;
> -	gfs2)
> -		export MKFS_OPTIONS="$GFS2_MKFS_OPTIONS -O -p lock_nolock"
> -		;;
> -	jfs)
> -		export MKFS_OPTIONS="$JFS_MKFS_OPTIONS -q"
> -		;;
> -	f2fs)
> -		export MKFS_OPTIONS="$F2FS_MKFS_OPTIONS"
> -		;;
> -	btrfs)
> -		export MKFS_OPTIONS="$BTRFS_MKFS_OPTIONS"
> -		;;
> -	bcachefs)
> -		export MKFS_OPTIONS=$BCACHEFS_MKFS_OPTIONS
> -		;;
> -	*)
> -		;;
> -	esac
> -}
> -
> -_fsck_opts()
> -{
> -	case $FSTYP in
> -	ext2|ext3|ext4)
> -		export FSCK_OPTIONS="-nf"
> -		;;
> -	reiser*)
> -		export FSCK_OPTIONS="--yes"
> -		;;
> -	f2fs)
> -		export FSCK_OPTIONS=""
> -		;;
> -	*)
> -		export FSCK_OPTIONS="-n"
> -		;;
> -	esac
> -}
> -
> -# check necessary running dependences then source sepcific fs helpers
> -_source_specific_fs()
> -{
> -	local fs=$1
> -
> -	if [ -z "$fs" ];then
> -		fs=$FSTYP
> -	fi
> -
> -	case "$fs" in
> -	xfs)
> -		[ "$XFS_LOGPRINT_PROG" = "" ] && _fatal "xfs_logprint not found"
> -		[ "$XFS_REPAIR_PROG" = "" ] && _fatal "xfs_repair not found"
> -		[ "$XFS_DB_PROG" = "" ] && _fatal "xfs_db not found"
> -		[ "$MKFS_XFS_PROG" = "" ] && _fatal "mkfs_xfs not found"
> -		[ "$XFS_INFO_PROG" = "" ] && _fatal "xfs_info not found"
> -
> -		. ./common/xfs
> -		;;
> -	udf)
> -		[ "$MKFS_UDF_PROG" = "" ] && _fatal "mkfs_udf/mkudffs not found"
> -		;;
> -	btrfs)
> -		[ "$MKFS_BTRFS_PROG" = "" ] && _fatal "mkfs.btrfs not found"
> -
> -		. ./common/btrfs
> -		;;
> -	ext4)
> -		[ "$MKFS_EXT4_PROG" = "" ] && _fatal "mkfs.ext4 not found"
> -		. ./common/ext4
> -		;;
> -	ext2|ext3)
> -		. ./common/ext4
> -		;;
> -	f2fs)
> -		[ "$MKFS_F2FS_PROG" = "" ] && _fatal "mkfs.f2fs not found"
> -		;;
> -	nfs)
> -		. ./common/nfs
> -		;;
> -	afs)
> -		;;
> -	cifs)
> -		;;
> -	9p)
> -		;;
> -	fuse)
> -		;;
> -	ceph)
> -		. ./common/ceph
> -		;;
> -	glusterfs)
> -		;;
> -	overlay)
> -		. ./common/overlay
> -		;;
> -	reiser4)
> -		[ "$MKFS_REISER4_PROG" = "" ] && _fatal "mkfs.reiser4 not found"
> -		;;
> -	pvfs2)
> -		;;
> -	ubifs)
> -		[ "$UBIUPDATEVOL_PROG" = "" ] && _fatal "ubiupdatevol not found"
> -		. ./common/ubifs
> -		;;
> -	esac
> -}
> -
> -_config_section_setup
> -_canonicalize_devices
> -# mkfs.xfs checks for TEST_DEV before permitting < 300M filesystems. TEST_DIR
> -# and QA_CHECK_FS are also checked by mkfs.xfs, but already exported elsewhere.
> -export TEST_DEV
> -
> -# make sure this script returns success
> -/bin/true
> diff --git a/common/preamble b/common/preamble
> index 0b684cc33..265b5649f 100644
> --- a/common/preamble
> +++ b/common/preamble
> @@ -51,7 +51,23 @@ _begin_fstest()
>  
>  	. ./common/exit
>  	. ./common/rc
> -	init_rc
> +
> +	# Explicitly source the filesystem specific functions the test may need.
> +	# This opens the door for template-file based functionality using
> +	# function redfinition (e.g. to provide _scratch_mkfs()), rather than
> +	# having everyting FSTYP specific being implemented in common/rc with
> +	# massive case statements.
> +	_source_specific_fs $FSTYP
> +
> +	# Always mount the test device because there many feature checks (i.e.
> +	# _requires_....() functions) that assume the TEST_DIR is mounted. Lots
> +	# of tests do not call _require_test to actually mount the test device
> +	# first, so if we don't mount the test device then the _requires...
> +	# checks are not probing the correct filesystem for support.
> +	_check_if_dev_already_mounted $TEST_DEV $TEST_DIR
> +	if [ $? -eq 1 ]; then
_check_if_dev_already_mounted can return 1 (not mounted) or 2 (wrong mount) - in that case shouldn't
we check for $? -ne 0 ?

Also, TEST_DEV is always mounted by "check", right? Is this just a paranoic check?
> +		_test_mount || _fail "Cannot mount $TEST_DEV on $TEST_DIR"
> +	fi
>  
>  	# remove previous $seqres.full before test
>  	rm -f $seqres.full $seqres.hints
> diff --git a/common/rc b/common/rc
> index 94c00d890..be6cd92c4 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -2,10 +2,11 @@
>  # SPDX-License-Identifier: GPL-2.0+
>  # Copyright (c) 2000-2006 Silicon Graphics, Inc.  All Rights Reserved.
>  
> -. common/config
> -
>  BC="$(type -P bc)" || BC=
>  
> +# make sure we have a standard umask
> +umask 022
> +
>  # Don't use sync(1) directly if at all possible. In most cases we only need to
>  # sync the fs under test, so we use syncfs if it is supported to prevent
>  # disturbance of other tests that may be running concurrently.
> @@ -246,17 +247,6 @@ _log_err()
>      echo "(see $seqres.full for details)"
>  }
>  
> -# make sure we have a standard umask
> -umask 022
> -
> -# check for correct setup and source the $FSTYP specific functions now
> -_source_specific_fs $FSTYP
> -
> -if [ ! -z "$REPORT_LIST" ]; then
> -	. ./common/report
> -	_assert_report_list
> -fi
> -
>  _get_filesize()
>  {
>      stat -c %s "$1"
> @@ -4934,6 +4924,8 @@ init_rc()
>  		_exit 1
>  	fi
>  
> +	_source_specific_fs $FSTYP
> +
>  	# if $TEST_DEV is not mounted, mount it now as XFS
>  	if [ -z "`_fs_type $TEST_DEV`" ]
>  	then
> @@ -4973,6 +4965,11 @@ init_rc()
>  	# it is supported.
>  	$XFS_IO_PROG -i -c quit 2>/dev/null && \
>  		export XFS_IO_PROG="$XFS_IO_PROG -i"
> +
> +	# mkfs.xfs checks for TEST_DEV before permitting < 300M filesystems.
> +	# TEST_DIR and QA_CHECK_FS are also checked by mkfs.xfs, but already
> +	# exported elsewhere.
> +	export TEST_DEV
>  }
>  
>  # get real device path name by following link
> @@ -5844,6 +5841,204 @@ _require_program() {
>  	_have_program "$1" || _notrun "$tag required"
>  }
>  
> -################################################################################
> -# make sure this script returns success
> -/bin/true
> +_common_mount_opts()
> +{
> +	case $FSTYP in
> +	9p)
> +		echo $PLAN9_MOUNT_OPTIONS
> +		;;
> +	fuse)
> +		echo $FUSE_MOUNT_OPTIONS
> +		;;
> +	xfs)
> +		echo $XFS_MOUNT_OPTIONS
> +		;;
> +	udf)
> +		echo $UDF_MOUNT_OPTIONS
> +		;;
> +	nfs)
> +		echo $NFS_MOUNT_OPTIONS
> +		;;
> +	afs)
> +		echo $AFS_MOUNT_OPTIONS
> +		;;
> +	cifs)
> +		echo $CIFS_MOUNT_OPTIONS
> +		;;
> +	ceph)
> +		echo $CEPHFS_MOUNT_OPTIONS
> +		;;
> +	glusterfs)
> +		echo $GLUSTERFS_MOUNT_OPTIONS
> +		;;
> +	overlay)
> +		echo $OVERLAY_MOUNT_OPTIONS
> +		;;
> +	ext2|ext3|ext4)
> +		# acls & xattrs aren't turned on by default on ext$FOO
> +		echo "-o acl,user_xattr $EXT_MOUNT_OPTIONS"
> +		;;
> +	f2fs)
> +		echo "-o acl,user_xattr $F2FS_MOUNT_OPTIONS"
> +		;;
> +	reiser4)
> +		# acls & xattrs aren't supported by reiser4
> +		echo $REISER4_MOUNT_OPTIONS
> +		;;
> +	gfs2)
> +		# acls aren't turned on by default on gfs2
> +		echo "-o acl $GFS2_MOUNT_OPTIONS"
> +		;;
> +	tmpfs)
> +		# We need to specify the size at mount, use 1G by default
> +		echo "-o size=1G $TMPFS_MOUNT_OPTIONS"
> +		;;
> +	ubifs)
> +		echo $UBIFS_MOUNT_OPTIONS
> +		;;
> +	*)
> +		;;
> +	esac
> +}
> +
> +_mount_opts()
> +{
> +	export MOUNT_OPTIONS=$(_common_mount_opts)
> +}
> +
> +_test_mount_opts()
> +{
> +	export TEST_FS_MOUNT_OPTS=$(_common_mount_opts)
> +}
> +
> +_mkfs_opts()
> +{
> +	case $FSTYP in
> +	xfs)
> +		export MKFS_OPTIONS=$XFS_MKFS_OPTIONS
> +		;;
> +	udf)
> +		[ ! -z "$udf_fsize" ] && \
> +			UDF_MKFS_OPTIONS="$UDF_MKFS_OPTIONS -s $udf_fsize"
> +		export MKFS_OPTIONS=$UDF_MKFS_OPTIONS
> +		;;
> +	nfs)
> +		export MKFS_OPTIONS=$NFS_MKFS_OPTIONS
> +		;;
> +	afs)
> +		export MKFS_OPTIONS=$AFS_MKFS_OPTIONS
> +		;;
> +	cifs)
> +		export MKFS_OPTIONS=$CIFS_MKFS_OPTIONS
> +		;;
> +	ceph)
> +		export MKFS_OPTIONS=$CEPHFS_MKFS_OPTIONS
> +		;;
> +	reiser4)
> +		export MKFS_OPTIONS=$REISER4_MKFS_OPTIONS
> +		;;
> +	gfs2)
> +		export MKFS_OPTIONS="$GFS2_MKFS_OPTIONS -O -p lock_nolock"
> +		;;
> +	jfs)
> +		export MKFS_OPTIONS="$JFS_MKFS_OPTIONS -q"
> +		;;
> +	f2fs)
> +		export MKFS_OPTIONS="$F2FS_MKFS_OPTIONS"
> +		;;
> +	btrfs)
> +		export MKFS_OPTIONS="$BTRFS_MKFS_OPTIONS"
> +		;;
> +	bcachefs)
> +		export MKFS_OPTIONS=$BCACHEFS_MKFS_OPTIONS
> +		;;
> +	*)
> +		;;
> +	esac
> +}
> +
> +_fsck_opts()
> +{
> +	case $FSTYP in
> +	ext2|ext3|ext4)
> +		export FSCK_OPTIONS="-nf"
> +		;;
> +	reiser*)
> +		export FSCK_OPTIONS="--yes"
> +		;;
> +	f2fs)
> +		export FSCK_OPTIONS=""
> +		;;
> +	*)
> +		export FSCK_OPTIONS="-n"
> +		;;
> +	esac
> +}
> +
> +# check necessary running dependences then source sepcific fs helpers
> +_source_specific_fs()
> +{
> +	local fs=$1
> +
> +	if [ -z "$fs" ];then
> +		fs=$FSTYP
> +	fi
> +
> +	case "$fs" in
> +	xfs)
> +		[ "$XFS_LOGPRINT_PROG" = "" ] && _fatal "xfs_logprint not found"
> +		[ "$XFS_REPAIR_PROG" = "" ] && _fatal "xfs_repair not found"
> +		[ "$XFS_DB_PROG" = "" ] && _fatal "xfs_db not found"
> +		[ "$MKFS_XFS_PROG" = "" ] && _fatal "mkfs_xfs not found"
> +		[ "$XFS_INFO_PROG" = "" ] && _fatal "xfs_info not found"
> +
> +		. ./common/xfs
> +		;;
> +	udf)
> +		[ "$MKFS_UDF_PROG" = "" ] && _fatal "mkfs_udf/mkudffs not found"
> +		;;
> +	btrfs)
> +		[ "$MKFS_BTRFS_PROG" = "" ] && _fatal "mkfs.btrfs not found"
> +
> +		. ./common/btrfs
> +		;;
> +	ext4)
> +		[ "$MKFS_EXT4_PROG" = "" ] && _fatal "mkfs.ext4 not found"
> +		. ./common/ext4
> +		;;
> +	ext2|ext3)
> +		. ./common/ext4
> +		;;
> +	f2fs)
> +		[ "$MKFS_F2FS_PROG" = "" ] && _fatal "mkfs.f2fs not found"
> +		;;
> +	nfs)
> +		. ./common/nfs
> +		;;
> +	afs)
> +		;;
> +	cifs)
> +		;;
> +	9p)
> +		;;
> +	fuse)
> +		;;
> +	ceph)
> +		. ./common/ceph
> +		;;
> +	glusterfs)
> +		;;
> +	overlay)
> +		. ./common/overlay
> +		;;
> +	reiser4)
> +		[ "$MKFS_REISER4_PROG" = "" ] && _fatal "mkfs.reiser4 not found"
> +		;;
> +	pvfs2)
> +		;;
> +	ubifs)
> +		[ "$UBIUPDATEVOL_PROG" = "" ] && _fatal "ubiupdatevol not found"
> +		. ./common/ubifs
> +		;;
> +	esac
> +}
So the logic behind moving the above functions from common/config to common/rc is because we want
the configuration work done only once (functions in common/config) and so we include common/config
only once per check/check-parallel context and the above functions (like _source_specific_fs(),
_fsck_opts() etc) are needed for each test - so we factor them into common/rc and include common/rc
per test instead of common/config which is not needed to be sourced for each test, am I correct?
--NR


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 15/28] check-parallel: de-batch test execution
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (13 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 14/28] fstests: further separate sourcing common/rc and common/config from initialisation Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-09 13:16   ` Nirjhar Roy
  2025-04-17  3:00 ` [PATCH 16/28] check-parallel: run sections directly Dave Chinner
                   ` (12 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

To improve how check-parallel runs tests, it needs to run tests
directly from the runner threads. We currently batch them based on
runtime before we execture any tests, but this results in runner 0
always having a test list with runtime longer than the test list for
runner N.

As a result, we can end up with higher numbered runners finishing
all their tests before runner 0 has even finished the first test it
was given to run. Hence we end up with check-parallel starting with
maximum concurrency, but the test concurrency reduces as the run
goes on.

To fix this, we need a dynamic test list such that each runner only
needs to be scheduled to run a single test at a time. When they have
finished the current test, they can pop the next test to run off the
time ordered stack and execute that. Hence test runners won't stop
running until there are no more tests to run, hence maximising
concurrency across the entire test run.

To do this, we first need a test list mechanism that is safe for
concurrent destacking from multiple test runners. We place the
test list in a temporary file, then use file locks to serialise
access to the temporary file.

We order the list in the test file from lowest runtime to
highest. This means that running tests from longest to shortest
runtime destacks from the end fo the file. This means that the next
test to run is always the last line fo the file and we can simply
use truncation based mechanisms to consume the test during
destacking.

Running tests individually via check like this is inefficient as
there is a lot of check setup and initialisation overhead.  However,
by increasing the utilisation of the test runner threads, overall
runtime of check-parallel does not increase with this change.
Reduction of this repeated overhead will also be addressed in future
patches.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel | 75 +++++++++++++++++++++++++++++---------------------
 1 file changed, 43 insertions(+), 32 deletions(-)

diff --git a/check-parallel b/check-parallel
index 6fc86fb92..e2cf2c8d0 100755
--- a/check-parallel
+++ b/check-parallel
@@ -18,6 +18,7 @@ run_section=""
 iam="check-parallel"
 
 tmp=/tmp/check-parallel.$$
+test_list="$tmp.test_list"
 
 . ./common/exit
 . ./common/test_names
@@ -150,9 +151,6 @@ if [ -d "$basedir/runner-0/" ]; then
 	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
 fi
 
-_tl_prepare_test_list
-_tl_strip_test_list
-
 # grab all previously run tests and order them from highest runtime to lowest
 # We are going to try to run the longer tests first, hopefully so we can avoid
 # massive thundering herds trying to run lots of really short tests in parallel
@@ -198,22 +196,22 @@ if ! $_tl_randomise -a ! $_tl_exact_order; then
 	fi
 fi
 
-# split the list amongst N runners
-split_runner_list()
+# Grab the next test to be run from the tail of the file.
+# Returns an empty string if there is no tests remaining to run.
+# File operations are run under flock so concurrent gets are serialised against
+# each other.
+get_next_test()
 {
-	local ix
-	local rx
-	local -a _list=( $_tl_tests )
-	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
-		seq="${_list[$ix]}"
-		rx=$((ix % $runners))
-		if ! _tl_expunge_test $seq; then
-			runner_list[$rx]+="${_list[$ix]} "
-		fi
-		#echo $seq
-	done
+	local test=
+
+	flock 99
+	test=$(tail -1 $test_list)
+	sed -i "\,$test,d" $test_list
+	flock -u 99
+	echo $test
 }
 
+
 _create_loop_device()
 {
         local file=$1 dev
@@ -240,6 +238,8 @@ _destroy_loop_device()
 
 runner_go()
 {
+	exec 99<>$tmp.test_list_lock
+
 	local id=$1
 	local me=$basedir/runner-$id
 	local _test=$me/test.img
@@ -250,6 +250,7 @@ runner_go()
 	local _scratch_log=$me/scratch-log.img
 	local _logwrites=$me/logwrites.img
 	local _results=$me/results-$2
+	local test_to_run=$(get_next_test)
 
 	mkdir -p $me
 
@@ -291,7 +292,15 @@ runner_go()
 	# Similarly, we need to run check in it's own PID namespace so that
 	# operations like pkill only affect the runner instance, not globally
 	# kill processes from other check instances.
-	tools/run_privatens ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
+	while [ -n "$test_to_run" ]; do
+		echo "Runner $id: running test $test_to_run"
+		unset FSTESTS_ISOL
+		if ! _tl_expunge_test $test_to_run; then
+			tools/run_privatens ./check $run_section $test_to_run >> $me/log 2>&1
+		fi
+
+		test_to_run=$(get_next_test)
+	done
 
 	wait
 	sleep 1
@@ -320,20 +329,32 @@ cleanup()
 	umount -R $basedir/*/test 2> /dev/null
 	umount -R $basedir/*/scratch 2> /dev/null
 	losetup --detach-all
+	rm -rf $tmp.*
 }
 
 trap "cleanup; exit" HUP INT QUIT TERM
 
 _config_setup_parallel
 
-split_runner_list
+_tl_setup_exclude_group "unreliable_in_parallel"
+_tl_prepare_test_list
+_tl_strip_test_list
+
+if ! $_tl_randomise -a ! $_tl_exact_order; then
+	if [ -f $basedir/runner-0/$prev_results/check.time ]; then
+		time_order_test_list
+	fi
+fi
+
+# reverse the order of tests so that the get_next_test() can pull from the file
+# tail rather than the head.
+echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
 if [ -n "$show_test_list" ]; then
 	echo Time ordered test list:
-	echo $_tl_tests
-	echo
+	cat $test_list
+	exit 0
 fi
 
-
 # Each parallel test runner needs to only see it's own mount points. If we
 # leave the basedir as shared, then all tests see all mounts and then we get
 # mount propagation issues cropping up. For example, cloning a new mount
@@ -349,20 +370,10 @@ mount --make-private $basedir
 
 now=`date +%Y-%m-%d-%H:%M:%S`
 for ((i = 0; i < $runners; i++)); do
-
-	if [ -n "$show_test_list" ]; then
-		echo "Runner $i: ${runner_list[$i]}"
-	else
-		runner_go $i $now &
-	fi
-
+	runner_go $i $now &
 done;
 wait
 
-if [ -n "$show_test_list" ]; then
-	exit 0
-fi
-
 echo -n "Tests run: "
 grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 15/28] check-parallel: de-batch test execution
  2025-04-17  3:00 ` [PATCH 15/28] check-parallel: de-batch test execution Dave Chinner
@ 2025-05-09 13:16   ` Nirjhar Roy
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy @ 2025-05-09 13:16 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> To improve how check-parallel runs tests, it needs to run tests
> directly from the runner threads. We currently batch them based on
> runtime before we execture any tests, but this results in runner 0
> always having a test list with runtime longer than the test list for
> runner N.
> 
> As a result, we can end up with higher numbered runners finishing
> all their tests before runner 0 has even finished the first test it
> was given to run. Hence we end up with check-parallel starting with
> maximum concurrency, but the test concurrency reduces as the run
> goes on.
> 
> To fix this, we need a dynamic test list such that each runner only
> needs to be scheduled to run a single test at a time. When they have
> finished the current test, they can pop the next test to run off the
> time ordered stack and execute that. Hence test runners won't stop
> running until there are no more tests to run, hence maximising
> concurrency across the entire test run.
> 
> To do this, we first need a test list mechanism that is safe for
> concurrent destacking from multiple test runners. We place the
> test list in a temporary file, then use file locks to serialise
> access to the temporary file.
> 
> We order the list in the test file from lowest runtime to
> highest. This means that running tests from longest to shortest
> runtime destacks from the end fo the file. This means that the next
> test to run is always the last line fo the file and we can simply
> use truncation based mechanisms to consume the test during
> destacking.
> 
> Running tests individually via check like this is inefficient as
> there is a lot of check setup and initialisation overhead.  However,
> by increasing the utilisation of the test runner threads, overall
> runtime of check-parallel does not increase with this change.
> Reduction of this repeated overhead will also be addressed in future
> patches.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 75 +++++++++++++++++++++++++++++---------------------
>  1 file changed, 43 insertions(+), 32 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index 6fc86fb92..e2cf2c8d0 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -18,6 +18,7 @@ run_section=""
>  iam="check-parallel"
>  
>  tmp=/tmp/check-parallel.$$
> +test_list="$tmp.test_list"
>  
>  . ./common/exit
>  . ./common/test_names
> @@ -150,9 +151,6 @@ if [ -d "$basedir/runner-0/" ]; then
>  	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
>  fi
>  
> -_tl_prepare_test_list
> -_tl_strip_test_list
> -
>  # grab all previously run tests and order them from highest runtime to lowest
>  # We are going to try to run the longer tests first, hopefully so we can avoid
>  # massive thundering herds trying to run lots of really short tests in parallel
> @@ -198,22 +196,22 @@ if ! $_tl_randomise -a ! $_tl_exact_order; then
>  	fi
>  fi
>  
> -# split the list amongst N runners
> -split_runner_list()
> +# Grab the next test to be run from the tail of the file.
> +# Returns an empty string if there is no tests remaining to run.
> +# File operations are run under flock so concurrent gets are serialised against
> +# each other.
> +get_next_test()
>  {
> -	local ix
> -	local rx
> -	local -a _list=( $_tl_tests )
> -	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
> -		seq="${_list[$ix]}"
> -		rx=$((ix % $runners))
> -		if ! _tl_expunge_test $seq; then
> -			runner_list[$rx]+="${_list[$ix]} "
> -		fi
> -		#echo $seq
> -	done
> +	local test=
> +
> +	flock 99
> +	test=$(tail -1 $test_list)
> +	sed -i "\,$test,d" $test_list
> +	flock -u 99
> +	echo $test
>  }
>  
> +
>  _create_loop_device()
>  {
>          local file=$1 dev
> @@ -240,6 +238,8 @@ _destroy_loop_device()
>  
>  runner_go()
>  {
> +	exec 99<>$tmp.test_list_lock
> +
>  	local id=$1
>  	local me=$basedir/runner-$id
>  	local _test=$me/test.img
> @@ -250,6 +250,7 @@ runner_go()
>  	local _scratch_log=$me/scratch-log.img
>  	local _logwrites=$me/logwrites.img
>  	local _results=$me/results-$2
> +	local test_to_run=$(get_next_test)
>  
>  	mkdir -p $me
>  
> @@ -291,7 +292,15 @@ runner_go()
>  	# Similarly, we need to run check in it's own PID namespace so that
>  	# operations like pkill only affect the runner instance, not globally
>  	# kill processes from other check instances.
> -	tools/run_privatens ./check $run_section -x unreliable_in_parallel --exact-order ${runner_list[$id]} >> $me/log 2>&1
> +	while [ -n "$test_to_run" ]; do
> +		echo "Runner $id: running test $test_to_run"
> +		unset FSTESTS_ISOL
> +		if ! _tl_expunge_test $test_to_run; then
> +			tools/run_privatens ./check $run_section $test_to_run >> $me/log 2>&1
> +		fi
> +
> +		test_to_run=$(get_next_test)
> +	done
>  
>  	wait
>  	sleep 1
> @@ -320,20 +329,32 @@ cleanup()
>  	umount -R $basedir/*/test 2> /dev/null
>  	umount -R $basedir/*/scratch 2> /dev/null
>  	losetup --detach-all
> +	rm -rf $tmp.*
>  }
>  
>  trap "cleanup; exit" HUP INT QUIT TERM
>  
>  _config_setup_parallel
>  
> -split_runner_list
> +_tl_setup_exclude_group "unreliable_in_parallel"
> +_tl_prepare_test_list
> +_tl_strip_test_list
> +
> +if ! $_tl_randomise -a ! $_tl_exact_order; then
> +	if [ -f $basedir/runner-0/$prev_results/check.time ]; then
> +		time_order_test_list
> +	fi
> +fi
> +
> +# reverse the order of tests so that the get_next_test() can pull from the file
> +# tail rather than the head.
> +echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
>  if [ -n "$show_test_list" ]; then
>  	echo Time ordered test list:
> -	echo $_tl_tests
> -	echo
> +	cat $test_list
> +	exit 0
_exit 0?

Looks good otherwise. I think this change will extract the maximum concurrency since we are kind of
uniformly distrubuting the long running tests instead of flooding runner-0 with the top n slow
running tests.
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
--NR
>  fi
>  
> -
>  # Each parallel test runner needs to only see it's own mount points. If we
>  # leave the basedir as shared, then all tests see all mounts and then we get
>  # mount propagation issues cropping up. For example, cloning a new mount
> @@ -349,20 +370,10 @@ mount --make-private $basedir
>  
>  now=`date +%Y-%m-%d-%H:%M:%S`
>  for ((i = 0; i < $runners; i++)); do
> -
> -	if [ -n "$show_test_list" ]; then
> -		echo "Runner $i: ${runner_list[$i]}"
> -	else
> -		runner_go $i $now &
> -	fi
> -
> +	runner_go $i $now &
>  done;
>  wait
>  
> -if [ -n "$show_test_list" ]; then
> -	exit 0
> -fi
> -
>  echo -n "Tests run: "
>  grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
>  


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 16/28] check-parallel: run sections directly
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (14 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 15/28] check-parallel: de-batch test execution Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-09 14:03   ` Nirjhar Roy
  2025-04-17  3:00 ` [PATCH 17/28] check-parallel: rebuild test list when FSTYP changes Dave Chinner
                   ` (11 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Currently we pass the section through to check for it to apply and
run. However, we do not reconfigure the test or external devices in
check-parallel after the initial setup, so thie results in devices
that may be incorrectly configured for the given section config we
want to run.

To fix this, we need to iterate the sections directly in
check-parallel and reconfigure the runner device setup between each
section that is run. This allows the test device and external
devices to be set up correctly for each section config.

As the test list is consumed as we walk it, we need to reset the
test list for each section that we run.

We still pass the config section through to check so that it can
also source the correct config from the section we are running.

We also add section exclusion support so that we skip sections the
same way that check currently does.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel         | 118 ++++++++++++++++++++++++++++-------------
 common/config-sections |   3 ++
 2 files changed, 84 insertions(+), 37 deletions(-)

diff --git a/check-parallel b/check-parallel
index e2cf2c8d0..23d29c7a8 100755
--- a/check-parallel
+++ b/check-parallel
@@ -15,6 +15,7 @@ runner_list=()
 runtimes=()
 show_test_list=
 run_section=""
+exclude_section=""
 iam="check-parallel"
 
 tmp=/tmp/check-parallel.$$
@@ -119,7 +120,8 @@ while [ $# -gt 0 ]; do
 
 	-f)	is_supported_fstype $2 ; export FSTYP=$2; shift ;;
 
-	-s)	run_section="$run_section -s $2"; shift ;;
+	-s)	run_section="$run_section $2"; shift ;;
+	-S)	exclude_section="$exclude_section $2"; shift ;;
 
 	-*)	usage ;;
 	*)	# not an argument, we've got tests now.
@@ -202,11 +204,12 @@ fi
 # each other.
 get_next_test()
 {
+	local test_file="$test_list.$1"
 	local test=
 
 	flock 99
-	test=$(tail -1 $test_list)
-	sed -i "\,$test,d" $test_list
+	test=$(tail -1 $test_file)
+	sed -i "\,$test,d" $test_file
 	flock -u 99
 	echo $test
 }
@@ -236,10 +239,34 @@ _destroy_loop_device()
         losetup -d $dev || _fail "Cannot destroy loop device $dev"
 }
 
-runner_go()
+run_tests()
 {
+	local section="$1"
+
 	exec 99<>$tmp.test_list_lock
 
+	local test_to_run=$(get_next_test $section)
+
+	# Run the tests in it's own mount namespace, as per the comment below
+	# that precedes making the basedir a private mount.
+	#
+	# Similarly, we need to run check in it's own PID namespace so that
+	# operations like pkill only affect the runner instance, not globally
+	# kill processes from other check instances.
+	while [ -n "$test_to_run" ]; do
+		echo -n " $test_to_run "
+		unset FSTESTS_ISOL
+		if ! _tl_expunge_test $test_to_run; then
+			tools/run_privatens ./check -s $section $test_to_run >> $me/log 2>&1
+		fi
+
+		test_to_run=$(get_next_test $section)
+	done
+}
+
+runner_go()
+{
+
 	local id=$1
 	local me=$basedir/runner-$id
 	local _test=$me/test.img
@@ -250,7 +277,7 @@ runner_go()
 	local _scratch_log=$me/scratch-log.img
 	local _logwrites=$me/logwrites.img
 	local _results=$me/results-$2
-	local test_to_run=$(get_next_test)
+	local section=$3
 
 	mkdir -p $me
 
@@ -286,21 +313,7 @@ runner_go()
 
 #	export DUMP_CORRUPT_FS=1
 
-	# Run the tests in it's own mount namespace, as per the comment below
-	# that precedes making the basedir a private mount.
-	#
-	# Similarly, we need to run check in it's own PID namespace so that
-	# operations like pkill only affect the runner instance, not globally
-	# kill processes from other check instances.
-	while [ -n "$test_to_run" ]; do
-		echo "Runner $id: running test $test_to_run"
-		unset FSTESTS_ISOL
-		if ! _tl_expunge_test $test_to_run; then
-			tools/run_privatens ./check $run_section $test_to_run >> $me/log 2>&1
-		fi
-
-		test_to_run=$(get_next_test)
-	done
+	run_tests $section
 
 	wait
 	sleep 1
@@ -322,6 +335,44 @@ runner_go()
 
 }
 
+run_section()
+{
+	local section="$1"
+	local now="$2"
+	local i
+
+	echo $run_section |grep -qw $section || return
+	echo $exclude_section |grep -qw $section && return
+
+	echo
+	echo Running section: $section
+	echo
+
+	parse_config_section $section
+
+	# set up consumable test list first
+	cp $test_list $test_list.$section
+	for ((i = 0; i < $runners; i++)); do
+		runner_go $i $now $section &
+	done
+	wait
+
+	echo
+	echo Section: $section
+	echo -n "Tests run: "
+	grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
+
+	echo -n "Tests _notrun: "
+	grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
+
+	echo -n "Failure count: "
+	grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
+	echo
+
+	echo Ten slowest tests - runtime in seconds:
+	cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
+}
+
 cleanup()
 {
 	killall -INT -q check
@@ -336,6 +387,8 @@ trap "cleanup; exit" HUP INT QUIT TERM
 
 _config_setup_parallel
 
+run_section=${run_section:="$HOST_OPTIONS_SECTIONS"}
+
 _tl_setup_exclude_group "unreliable_in_parallel"
 _tl_prepare_test_list
 _tl_strip_test_list
@@ -369,23 +422,14 @@ fi
 mount --make-private $basedir
 
 now=`date +%Y-%m-%d-%H:%M:%S`
-for ((i = 0; i < $runners; i++)); do
-	runner_go $i $now &
-done;
-wait
-
-echo -n "Tests run: "
-grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
-
-echo -n "Tests _notrun: "
-grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
-
-echo -n "Failure count: "
-grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
-echo
-
-echo Ten slowest tests - runtime in seconds:
-cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
+for section in $HOST_OPTIONS_SECTIONS; do
+	run_section $section $now
+	if [ "$sum_bad" != 0 ] && [ "$istop" = true ]; then
+		interrupt=false
+		status=`expr $sum_bad != 0`
+		exit
+	fi
+done
 
 echo
 echo Cleanup on Aisle 5?
diff --git a/common/config-sections b/common/config-sections
index 28bd11bab..c0ea097e8 100644
--- a/common/config-sections
+++ b/common/config-sections
@@ -436,6 +436,9 @@ _config_setup_parallel()
 		exit 1
 	fi
 
+	# strip check-parallel from the sections to run
+	export HOST_OPTIONS_SECTIONS=`echo $HOST_OPTIONS_SECTIONS | sed -e "s/$iam//"`
+
 	grep DEV $HOST_OPTIONS |grep -qv SIZE
 	if [ $? -ne 1 ]; then
 		echo "$iam config file has devices defined"
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 16/28] check-parallel: run sections directly
  2025-04-17  3:00 ` [PATCH 16/28] check-parallel: run sections directly Dave Chinner
@ 2025-05-09 14:03   ` Nirjhar Roy
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy @ 2025-05-09 14:03 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently we pass the section through to check for it to apply and
> run. However, we do not reconfigure the test or external devices in
If we are passing ./check -s <s1> -s <s2>, the devices will be reconfigured, right?
I was able to understand the implementation in this patch but I am not able to figure out the
reason. Could please explain "results in devices that may be incorrectly configured for the given
section config we want to run." 
Basically instead of check issuing multiple ./check -s <s1> -s <s2> ..., what we are doing is we are
issuing ./check -s <s1> # one section to ./check and each such ./check invocation is being made in
parallel - but what is wrong with multiple ./check -s <s1> -s <s2> in parallel?
> check-parallel after the initial setup, so thie results in devices
> that may be incorrectly configured for the given section config we
> want to run.
> 
> To fix this, we need to iterate the sections directly in
> check-parallel and reconfigure the runner device setup between each
> section that is run. This allows the test device and external
> devices to be set up correctly for each section config.
> 
> As the test list is consumed as we walk it, we need to reset the
> test list for each section that we run.
> 
> We still pass the config section through to check so that it can
> also source the correct config from the section we are running.
> 
> We also add section exclusion support so that we skip sections the
> same way that check currently does.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel         | 118 ++++++++++++++++++++++++++++-------------
>  common/config-sections |   3 ++
>  2 files changed, 84 insertions(+), 37 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index e2cf2c8d0..23d29c7a8 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -15,6 +15,7 @@ runner_list=()
>  runtimes=()
>  show_test_list=
>  run_section=""
> +exclude_section=""
>  iam="check-parallel"
>  
>  tmp=/tmp/check-parallel.$$
> @@ -119,7 +120,8 @@ while [ $# -gt 0 ]; do
>  
>  	-f)	is_supported_fstype $2 ; export FSTYP=$2; shift ;;
>  
> -	-s)	run_section="$run_section -s $2"; shift ;;
> +	-s)	run_section="$run_section $2"; shift ;;
> +	-S)	exclude_section="$exclude_section $2"; shift ;;
>  
>  	-*)	usage ;;
>  	*)	# not an argument, we've got tests now.
> @@ -202,11 +204,12 @@ fi
>  # each other.
>  get_next_test()
>  {
> +	local test_file="$test_list.$1"
>  	local test=
>  
>  	flock 99
> -	test=$(tail -1 $test_list)
> -	sed -i "\,$test,d" $test_list
> +	test=$(tail -1 $test_file)
> +	sed -i "\,$test,d" $test_file
>  	flock -u 99
>  	echo $test
>  }
> @@ -236,10 +239,34 @@ _destroy_loop_device()
>          losetup -d $dev || _fail "Cannot destroy loop device $dev"
>  }
>  
> -runner_go()
> +run_tests()
>  {
> +	local section="$1"
> +
>  	exec 99<>$tmp.test_list_lock
>  
> +	local test_to_run=$(get_next_test $section)
> +
> +	# Run the tests in it's own mount namespace, as per the comment below
> +	# that precedes making the basedir a private mount.
> +	#
> +	# Similarly, we need to run check in it's own PID namespace so that
> +	# operations like pkill only affect the runner instance, not globally
> +	# kill processes from other check instances.
> +	while [ -n "$test_to_run" ]; do
> +		echo -n " $test_to_run "
> +		unset FSTESTS_ISOL
> +		if ! _tl_expunge_test $test_to_run; then
> +			tools/run_privatens ./check -s $section $test_to_run >> $me/log 2>&1
> +		fi
> +
> +		test_to_run=$(get_next_test $section)
> +	done
> +}
> +
> +runner_go()
> +{
> +
>  	local id=$1
>  	local me=$basedir/runner-$id
>  	local _test=$me/test.img
> @@ -250,7 +277,7 @@ runner_go()
>  	local _scratch_log=$me/scratch-log.img
>  	local _logwrites=$me/logwrites.img
>  	local _results=$me/results-$2
> -	local test_to_run=$(get_next_test)
> +	local section=$3
>  
>  	mkdir -p $me
>  
> @@ -286,21 +313,7 @@ runner_go()
>  
>  #	export DUMP_CORRUPT_FS=1
>  
> -	# Run the tests in it's own mount namespace, as per the comment below
> -	# that precedes making the basedir a private mount.
> -	#
> -	# Similarly, we need to run check in it's own PID namespace so that
> -	# operations like pkill only affect the runner instance, not globally
> -	# kill processes from other check instances.
> -	while [ -n "$test_to_run" ]; do
> -		echo "Runner $id: running test $test_to_run"
> -		unset FSTESTS_ISOL
> -		if ! _tl_expunge_test $test_to_run; then
> -			tools/run_privatens ./check $run_section $test_to_run >> $me/log 2>&1
> -		fi
> -
> -		test_to_run=$(get_next_test)
> -	done
> +	run_tests $section
>  
>  	wait
>  	sleep 1
> @@ -322,6 +335,44 @@ runner_go()
>  
>  }
>  
> +run_section()
> +{
> +	local section="$1"
> +	local now="$2"
> +	local i
> +
> +	echo $run_section |grep -qw $section || return
> +	echo $exclude_section |grep -qw $section && return
> +
> +	echo
> +	echo Running section: $section
> +	echo
> +
> +	parse_config_section $section
> +
> +	# set up consumable test list first
> +	cp $test_list $test_list.$section
> +	for ((i = 0; i < $runners; i++)); do
> +		runner_go $i $now $section &
> +	done
> +	wait
> +
> +	echo
> +	echo Section: $section
> +	echo -n "Tests run: "
> +	grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
> +
> +	echo -n "Tests _notrun: "
> +	grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
> +
> +	echo -n "Failure count: "
> +	grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
> +	echo
> +
> +	echo Ten slowest tests - runtime in seconds:
> +	cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
> +}
> +
>  cleanup()
>  {
>  	killall -INT -q check
> @@ -336,6 +387,8 @@ trap "cleanup; exit" HUP INT QUIT TERM
>  
>  _config_setup_parallel
>  
> +run_section=${run_section:="$HOST_OPTIONS_SECTIONS"}
> +
>  _tl_setup_exclude_group "unreliable_in_parallel"
>  _tl_prepare_test_list
>  _tl_strip_test_list
> @@ -369,23 +422,14 @@ fi
>  mount --make-private $basedir
>  
>  now=`date +%Y-%m-%d-%H:%M:%S`
> -for ((i = 0; i < $runners; i++)); do
> -	runner_go $i $now &
> -done;
> -wait
> -
> -echo -n "Tests run: "
> -grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
> -
> -echo -n "Tests _notrun: "
> -grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
> -
> -echo -n "Failure count: "
> -grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
> -echo
> -
> -echo Ten slowest tests - runtime in seconds:
> -cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
> +for section in $HOST_OPTIONS_SECTIONS; do
> +	run_section $section $now
> +	if [ "$sum_bad" != 0 ] && [ "$istop" = true ]; then
> +		interrupt=false
> +		status=`expr $sum_bad != 0`
> +		exit
> +	fi
> +done
>  
>  echo
>  echo Cleanup on Aisle 5?
> diff --git a/common/config-sections b/common/config-sections
> index 28bd11bab..c0ea097e8 100644
> --- a/common/config-sections
> +++ b/common/config-sections
> @@ -436,6 +436,9 @@ _config_setup_parallel()
>  		exit 1
>  	fi
>  
> +	# strip check-parallel from the sections to run
> +	export HOST_OPTIONS_SECTIONS=`echo $HOST_OPTIONS_SECTIONS | sed -e "s/$iam//"`
> +
>  	grep DEV $HOST_OPTIONS |grep -qv SIZE
>  	if [ $? -ne 1 ]; then
>  		echo "$iam config file has devices defined"


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 17/28] check-parallel: rebuild test list when FSTYP changes
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (15 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 16/28] check-parallel: run sections directly Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-09 16:00   ` Nirjhar Roy
  2025-04-17  3:00 ` [PATCH 18/28] check-parallel: create a "results-latest" symlink Dave Chinner
                   ` (10 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

When a config section changes FSTYP, we do not rebuild the test list
that is to be run. This means that if we go from xfs to ext4, the
ext4 section will still try to run all the XFS tests, and won't try
to run the ext4 tests.

To fix this for check-parallel, check if the FSTYP has changed and
if it has strip the old tests from the test list and add the new
tests back to it.

This is effectively a zero-day bug in the check config section
support code.

While there, remain the internal _tl_file variable and the filename
used to hold command line tests. The check-parallel change uncovered
that it was using the same filename ($tmp.test_list) as the test
list code was using and they were stepping on each other.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel   | 81 +++++++++++++++++++++++++++++-------------------
 common/test_list |  8 ++---
 2 files changed, 53 insertions(+), 36 deletions(-)

diff --git a/check-parallel b/check-parallel
index 23d29c7a8..374ac8e96 100755
--- a/check-parallel
+++ b/check-parallel
@@ -175,7 +175,6 @@ time_order_test_list()
 	local rx=0
 	local ix
 	local jx
-	#set -x
 	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
 		echo $_tl_tests | grep -q ${_list[$ix]}
 		if [ $? == 0 ]; then
@@ -198,6 +197,26 @@ if ! $_tl_randomise -a ! $_tl_exact_order; then
 	fi
 fi
 
+# Build the test list from the test sepcification and FSTYP settings. This can
+# be called at any time to regenerate the list as the include/exclude lists
+# generated from the command line are retained separately to the test list
+# itself.
+setup_test_list()
+{
+	_tl_prepare_test_list
+	_tl_strip_test_list
+
+	if ! $_tl_randomise -a ! $_tl_exact_order; then
+		if [ -f $basedir/runner-0/$prev_results/check.time ]; then
+			time_order_test_list
+		fi
+	fi
+
+	# reverse the order of tests so that the get_next_test() can pull from the file
+	# tail rather than the head.
+	echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
+}
+
 # Grab the next test to be run from the tail of the file.
 # Returns an empty string if there is no tests remaining to run.
 # File operations are run under flock so concurrent gets are serialised against
@@ -242,6 +261,7 @@ _destroy_loop_device()
 run_tests()
 {
 	local section="$1"
+	local logfile="$2"
 
 	exec 99<>$tmp.test_list_lock
 
@@ -257,7 +277,7 @@ run_tests()
 		echo -n " $test_to_run "
 		unset FSTESTS_ISOL
 		if ! _tl_expunge_test $test_to_run; then
-			tools/run_privatens ./check -s $section $test_to_run >> $me/log 2>&1
+			tools/run_privatens ./check -s $section $test_to_run >> $logfile 2>&1
 		fi
 
 		test_to_run=$(get_next_test $section)
@@ -308,12 +328,12 @@ runner_go()
 	rm -f $RESULT_BASE/check.*
 
 	# Only supports default mkfs parameters right now
-	wipefs -a $TEST_DEV > $me/log 2>&1
-	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
+	wipefs -a $TEST_DEV >> $_results/log 2>&1
+	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $_results/log 2>&1
 
 #	export DUMP_CORRUPT_FS=1
 
-	run_tests $section
+	run_tests $section $_results/log
 
 	wait
 	sleep 1
@@ -327,10 +347,10 @@ runner_go()
 	_destroy_loop_device $SCRATCH_LOGDEV
 	_destroy_loop_device $LOGWRITES_DEV
 
-	grep -q Failures: $me/log
+	grep -q Failures: $_results/log
 	if [ $? -eq 0 ]; then
 		echo -n "Runner $id Failures: "
-		grep Failures: $me/log | uniq | sed -e "s/^.*Failures://"
+		grep Failures: $_results/log | uniq | sed -e "s/^.*Failures://"
 	fi
 
 }
@@ -339,19 +359,33 @@ run_section()
 {
 	local section="$1"
 	local now="$2"
+	local results="$basedir/*/results-$now"
 	local i
 
 	echo $run_section |grep -qw $section || return
 	echo $exclude_section |grep -qw $section && return
 
 	echo
-	echo Running section: $section
+	echo Running section: $section $now
 	echo
 
 	parse_config_section $section
 
-	# set up consumable test list first
+	# update the test list if necessary, then set up
+	# the consumable test list for this section to use.
+	if [ "$last_fstyp" != "$FSTYP" ]; then
+		setup_test_list
+		last_fstyp=$FSTYP
+	fi
+	echo FS Type: $FSTYP
+
+	if [ -n "$show_test_list" ]; then
+		echo Test list to run:
+		cat $test_list
+		return
+	fi
 	cp $test_list $test_list.$section
+
 	for ((i = 0; i < $runners; i++)); do
 		runner_go $i $now $section &
 	done
@@ -360,17 +394,18 @@ run_section()
 	echo
 	echo Section: $section
 	echo -n "Tests run: "
-	grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
+	grep Ran $results/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
 
 	echo -n "Tests _notrun: "
-	grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
+	grep "^Not run" $results/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
 
 	echo -n "Failure count: "
-	grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
+	grep Failures: $results/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
 	echo
 
 	echo Ten slowest tests - runtime in seconds:
-	cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
+	cat $results/check.time | sort -k 2 -nr | head -10
+
 }
 
 cleanup()
@@ -386,27 +421,8 @@ cleanup()
 trap "cleanup; exit" HUP INT QUIT TERM
 
 _config_setup_parallel
-
 run_section=${run_section:="$HOST_OPTIONS_SECTIONS"}
-
 _tl_setup_exclude_group "unreliable_in_parallel"
-_tl_prepare_test_list
-_tl_strip_test_list
-
-if ! $_tl_randomise -a ! $_tl_exact_order; then
-	if [ -f $basedir/runner-0/$prev_results/check.time ]; then
-		time_order_test_list
-	fi
-fi
-
-# reverse the order of tests so that the get_next_test() can pull from the file
-# tail rather than the head.
-echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
-if [ -n "$show_test_list" ]; then
-	echo Time ordered test list:
-	cat $test_list
-	exit 0
-fi
 
 # Each parallel test runner needs to only see it's own mount points. If we
 # leave the basedir as shared, then all tests see all mounts and then we get
@@ -422,6 +438,7 @@ fi
 mount --make-private $basedir
 
 now=`date +%Y-%m-%d-%H:%M:%S`
+last_fstyp=
 for section in $HOST_OPTIONS_SECTIONS; do
 	run_section $section $now
 	if [ "$sum_bad" != 0 ] && [ "$istop" = true ]; then
diff --git a/common/test_list b/common/test_list
index 2b3ae9fbf..092b3ed17 100644
--- a/common/test_list
+++ b/common/test_list
@@ -20,7 +20,7 @@ _XGROUP_LIST=
 _tl_exact_order=false
 _tl_randomise=false
 _tl_have_test_args=false
-_tl_file="$tmp.test_list"
+_tl_cli_tests="$tmp._tl_cli_tests"
 _tl_exclude_tests=()
 _tl_tests=
 
@@ -122,8 +122,8 @@ _tl_prepare_test_list()
 {
 	unset _tl_tests
 	# Tests specified on the command line
-	if [ -s $_tl_file ]; then
-		cat $_tl_file > $tmp.list
+	if [ -s $_tl_cli_tests ]; then
+		cat $_tl_cli_tests > $tmp.list
 	else
 		touch $tmp.list
 	fi
@@ -281,7 +281,7 @@ _tl_setup_cli()
 				if grep -Eq "^$test_name" $group_file; then
 					# in group file ... OK
 					echo $_tl_src_dir/$test_dir/$test_name \
-						>> $_tl_file
+						>> $_tl_cli_tests
 					_tl_have_test_args=true
 				else
 					# oops
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 17/28] check-parallel: rebuild test list when FSTYP changes
  2025-04-17  3:00 ` [PATCH 17/28] check-parallel: rebuild test list when FSTYP changes Dave Chinner
@ 2025-05-09 16:00   ` Nirjhar Roy
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy @ 2025-05-09 16:00 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When a config section changes FSTYP, we do not rebuild the test list
> that is to be run. This means that if we go from xfs to ext4, the
> ext4 section will still try to run all the XFS tests, and won't try
> to run the ext4 tests.
> 
> To fix this for check-parallel, check if the FSTYP has changed and
> if it has strip the old tests from the test list and add the new
> tests back to it.
> 
> This is effectively a zero-day bug in the check config section
> support code.
> 
> While there, remain the internal _tl_file variable and the filename
> used to hold command line tests. The check-parallel change uncovered
> that it was using the same filename ($tmp.test_list) as the test
> list code was using and they were stepping on each other.
Right.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel   | 81 +++++++++++++++++++++++++++++-------------------
>  common/test_list |  8 ++---
>  2 files changed, 53 insertions(+), 36 deletions(-)
> 
> diff --git a/check-parallel b/check-parallel
> index 23d29c7a8..374ac8e96 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -175,7 +175,6 @@ time_order_test_list()
>  	local rx=0
>  	local ix
>  	local jx
> -	#set -x
>  	for ((ix = 0; ix < ${#_list[*]}; ix++)); do
>  		echo $_tl_tests | grep -q ${_list[$ix]}
>  		if [ $? == 0 ]; then
> @@ -198,6 +197,26 @@ if ! $_tl_randomise -a ! $_tl_exact_order; then
>  	fi
>  fi
>  
> +# Build the test list from the test sepcification and FSTYP settings. This can
typo: specification
> +# be called at any time to regenerate the list as the include/exclude lists
> +# generated from the command line are retained separately to the test list
> +# itself.
> +setup_test_list()
> +{
> +	_tl_prepare_test_list
> +	_tl_strip_test_list
> +
> +	if ! $_tl_randomise -a ! $_tl_exact_order; then
> +		if [ -f $basedir/runner-0/$prev_results/check.time ]; then
> +			time_order_test_list
> +		fi
> +	fi
> +
> +	# reverse the order of tests so that the get_next_test() can pull from the file
> +	# tail rather than the head.
> +	echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
> +}
> +
>  # Grab the next test to be run from the tail of the file.
>  # Returns an empty string if there is no tests remaining to run.
>  # File operations are run under flock so concurrent gets are serialised against
> @@ -242,6 +261,7 @@ _destroy_loop_device()
>  run_tests()
>  {
>  	local section="$1"
> +	local logfile="$2"
>  
>  	exec 99<>$tmp.test_list_lock
>  
> @@ -257,7 +277,7 @@ run_tests()
>  		echo -n " $test_to_run "
>  		unset FSTESTS_ISOL
>  		if ! _tl_expunge_test $test_to_run; then
> -			tools/run_privatens ./check -s $section $test_to_run >> $me/log 2>&1
> +			tools/run_privatens ./check -s $section $test_to_run >> $logfile 2>&1
>  		fi
>  
>  		test_to_run=$(get_next_test $section)
> @@ -308,12 +328,12 @@ runner_go()
>  	rm -f $RESULT_BASE/check.*
>  
>  	# Only supports default mkfs parameters right now
> -	wipefs -a $TEST_DEV > $me/log 2>&1
> -	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $me/log 2>&1
> +	wipefs -a $TEST_DEV >> $_results/log 2>&1
> +	yes | mkfs -t $FSTYP $TEST_MKFS_OPTS $TEST_DEV >> $_results/log 2>&1
>  
>  #	export DUMP_CORRUPT_FS=1
>  
> -	run_tests $section
> +	run_tests $section $_results/log
>  
>  	wait
>  	sleep 1
> @@ -327,10 +347,10 @@ runner_go()
>  	_destroy_loop_device $SCRATCH_LOGDEV
>  	_destroy_loop_device $LOGWRITES_DEV
>  
> -	grep -q Failures: $me/log
> +	grep -q Failures: $_results/log
>  	if [ $? -eq 0 ]; then
>  		echo -n "Runner $id Failures: "
> -		grep Failures: $me/log | uniq | sed -e "s/^.*Failures://"
> +		grep Failures: $_results/log | uniq | sed -e "s/^.*Failures://"
>  	fi
>  
>  }
> @@ -339,19 +359,33 @@ run_section()
>  {
>  	local section="$1"
>  	local now="$2"
> +	local results="$basedir/*/results-$now"
This results points to all the individual _results for result-"$now" and _results is for each runner
id and an associated timestamp. 

So if we have 2 invocations at 2 different timestamps(t1 and t2) with 2 test runners, we will have 
$basedir/runner-0/result-t1
$basedir/runner-1/result-t1
$basedir/runner-0/result-t2
$basedir/runner-1/result-t2

I feel grouping the results for each invocation, i.e for each timestamp together looks more
convenient, doesn't it. What I am saying is the following:
Instead of the above if we can have

$basedir/r-t1/runner-0/result
$basedir/r-t1/runner-1/result
$basedir/r-t2/runner-0/result
$basedir/r-t2/runner-1/result

So all the files/results under one invocation are under one direction i.e, $basedir/r-t<n>/* where
t<n> is the timestamp we are looking for.

Please let me know what do you think of the above and if it has any pitfalls?
>  	local i
>  
>  	echo $run_section |grep -qw $section || return
>  	echo $exclude_section |grep -qw $section && return
>  
>  	echo
> -	echo Running section: $section
> +	echo Running section: $section $now
>  	echo
>  
>  	parse_config_section $section
>  
> -	# set up consumable test list first
> +	# update the test list if necessary, then set up
> +	# the consumable test list for this section to use.
> +	if [ "$last_fstyp" != "$FSTYP" ]; then
> +		setup_test_list
> +		last_fstyp=$FSTYP
> +	fi
> +	echo FS Type: $FSTYP
> +
> +	if [ -n "$show_test_list" ]; then
> +		echo Test list to run:
> +		cat $test_list
> +		return
> +	fi
>  	cp $test_list $test_list.$section
> +
>  	for ((i = 0; i < $runners; i++)); do
>  		runner_go $i $now $section &
>  	done
> @@ -360,17 +394,18 @@ run_section()
>  	echo
>  	echo Section: $section
>  	echo -n "Tests run: "
> -	grep Ran $basedir/*/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
> +	grep Ran $results/log | sed -e 's,^.*:,,' -e 's, ,\n,g' | sort | uniq | wc -l
>  
>  	echo -n "Tests _notrun: "
> -	grep "^Not run" $basedir/*/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
> +	grep "^Not run" $results/log | uniq | sed -e 's,^.*:,,' -e 's, ,\n,g' -e 's,^\n,,' | wc -l
>  
>  	echo -n "Failure count: "
> -	grep Failures: $basedir/*/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
> +	grep Failures: $results/log | uniq | sed -e "s/^.*Failures://" -e "s,\([0-9]\) \([gx]\),\1\n \2,g" |wc -l
>  	echo
>  
>  	echo Ten slowest tests - runtime in seconds:
> -	cat $basedir/*/results-$now/check.time | sort -k 2 -nr | head -10
> +	cat $results/check.time | sort -k 2 -nr | head -10
> +
>  }
>  
>  cleanup()
> @@ -386,27 +421,8 @@ cleanup()
>  trap "cleanup; exit" HUP INT QUIT TERM
>  
>  _config_setup_parallel
> -
>  run_section=${run_section:="$HOST_OPTIONS_SECTIONS"}
> -
>  _tl_setup_exclude_group "unreliable_in_parallel"
> -_tl_prepare_test_list
> -_tl_strip_test_list
> -
> -if ! $_tl_randomise -a ! $_tl_exact_order; then
> -	if [ -f $basedir/runner-0/$prev_results/check.time ]; then
> -		time_order_test_list
> -	fi
> -fi
> -
> -# reverse the order of tests so that the get_next_test() can pull from the file
> -# tail rather than the head.
> -echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
> -if [ -n "$show_test_list" ]; then
> -	echo Time ordered test list:
> -	cat $test_list
> -	exit 0
> -fi
>  
>  # Each parallel test runner needs to only see it's own mount points. If we
>  # leave the basedir as shared, then all tests see all mounts and then we get
> @@ -422,6 +438,7 @@ fi
>  mount --make-private $basedir
>  
>  now=`date +%Y-%m-%d-%H:%M:%S`
> +last_fstyp=
Minor: Can we make last_fstyp somehow local to this file? The reason why I am asking is that, we
have some similar logic in run_section() of check ("$OLD_FSTYP" != "$FSTYP" -> recreate TEST_DEV)
and maybe in an unlikely event if somebody tends to re-use this variable name unknowingly - haven't thought about how would this script behave though, but just in case.
--NR
>  for section in $HOST_OPTIONS_SECTIONS; do
>  	run_section $section $now
>  	if [ "$sum_bad" != 0 ] && [ "$istop" = true ]; then
> diff --git a/common/test_list b/common/test_list
> index 2b3ae9fbf..092b3ed17 100644
> --- a/common/test_list
> +++ b/common/test_list
> @@ -20,7 +20,7 @@ _XGROUP_LIST=
>  _tl_exact_order=false
>  _tl_randomise=false
>  _tl_have_test_args=false
> -_tl_file="$tmp.test_list"
> +_tl_cli_tests="$tmp._tl_cli_tests"
>  _tl_exclude_tests=()
>  _tl_tests=
>  
> @@ -122,8 +122,8 @@ _tl_prepare_test_list()
>  {
>  	unset _tl_tests
>  	# Tests specified on the command line
> -	if [ -s $_tl_file ]; then
> -		cat $_tl_file > $tmp.list
> +	if [ -s $_tl_cli_tests ]; then
> +		cat $_tl_cli_tests > $tmp.list
>  	else
>  		touch $tmp.list
>  	fi
> @@ -281,7 +281,7 @@ _tl_setup_cli()
>  				if grep -Eq "^$test_name" $group_file; then
>  					# in group file ... OK
>  					echo $_tl_src_dir/$test_dir/$test_name \
> -						>> $_tl_file
> +						>> $_tl_cli_tests
>  					_tl_have_test_args=true
>  				else
>  					# oops


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 18/28] check-parallel: create a "results-latest" symlink
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (16 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 17/28] check-parallel: rebuild test list when FSTYP changes Dave Chinner
@ 2025-04-17  3:00 ` Dave Chinner
  2025-05-10 13:12   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 19/28] check: factor test running Dave Chinner
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:00 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

check-parallel ensures that it doesn't overwrite results by creating
date-stamped results directories. This, however, makes it harder to
easy find the results for the test that is currently running or has
just completed.

To solve this problem, maintain a symlink to the latest results
directory so that it can always be found by the same name.

Also, if this symlink exists and is valid, use it as the source for
runtime ordering data for the next run instead of trying to find it
via ls and sort based ordering.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check-parallel | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/check-parallel b/check-parallel
index 374ac8e96..1b67709a2 100755
--- a/check-parallel
+++ b/check-parallel
@@ -149,7 +149,9 @@ if [[ $runners -le 0 || $runners -gt 1024 ]]; then
 	usage
 fi
 
-if [ -d "$basedir/runner-0/" ]; then
+if [ -L $basedir/runner-0/latest-result ]; then
+	prev_results="latest-result"
+elif [ -d "$basedir/runner-0/" ]; then
 	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
 fi
 
@@ -325,6 +327,9 @@ runner_go()
 	mkdir -p $TEST_DIR
 	mkdir -p $SCRATCH_MNT
 	mkdir -p $RESULT_BASE
+	rm -f $me/latest-result
+	ln -s $_results $me/latest-result
+
 	rm -f $RESULT_BASE/check.*
 
 	# Only supports default mkfs parameters right now
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 18/28] check-parallel: create a "results-latest" symlink
  2025-04-17  3:00 ` [PATCH 18/28] check-parallel: create a "results-latest" symlink Dave Chinner
@ 2025-05-10 13:12   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-10 13:12 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> check-parallel ensures that it doesn't overwrite results by creating
> date-stamped results directories. This, however, makes it harder to
> easy find the results for the test that is currently running or has
> just completed.
> 
> To solve this problem, maintain a symlink to the latest results
> directory so that it can always be found by the same name.
> 
> Also, if this symlink exists and is valid, use it as the source for
> runtime ordering data for the next run instead of trying to find it
> via ls and sort based ordering.
This looks good and useful. However, if we can have the above information in the README and in the
comment section of the script it will be really helpful for someone who is new. 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
--NR
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/check-parallel b/check-parallel
> index 374ac8e96..1b67709a2 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -149,7 +149,9 @@ if [[ $runners -le 0 || $runners -gt 1024 ]]; then
>  	usage
>  fi
>  
> -if [ -d "$basedir/runner-0/" ]; then
> +if [ -L $basedir/runner-0/latest-result ]; then
> +	prev_results="latest-result"
> +elif [ -d "$basedir/runner-0/" ]; then
>  	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
>  fi
>  
> @@ -325,6 +327,9 @@ runner_go()
>  	mkdir -p $TEST_DIR
>  	mkdir -p $SCRATCH_MNT
>  	mkdir -p $RESULT_BASE
> +	rm -f $me/latest-result
> +	ln -s $_results $me/latest-result
> +
>  	rm -f $RESULT_BASE/check.*
>  
>  	# Only supports default mkfs parameters right now


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 19/28] check: factor test running
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (17 preceding siblings ...)
  2025-04-17  3:00 ` [PATCH 18/28] check-parallel: create a "results-latest" symlink Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-05-12 13:57   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 20/28] [RFC] check-parallel: run tests directly without using check Dave Chinner
                   ` (8 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Rework the code that check uses to run an individual test,
separating the executing of the test from the various pre- and post-
test processing operations that are specific to check results
processing.

This essentially encapsulates the test running and result tracking
in it's own file, leaving just the section iteration and reporting
to the caller that is running the tests. The caller needs to define
the runseq() function that actually executes the test and some
environment variables (e.g. REPORT_DIR) so that the test execution
code will run correctly and stash results and reports in the correct
location.

This greatly simplifies the check script as a big chunk of code
dedicated to simply running a test and gathering the results is
completely abstracted away. This makes it clearer what check is now
doing at a high level in terms of iterating sections and generating
reports.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check            | 396 ++++++-----------------------------------------
 common/test_exec | 352 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 398 insertions(+), 350 deletions(-)
 create mode 100644 common/test_exec

diff --git a/check b/check
index fea86f7b9..106de0ee6 100755
--- a/check
+++ b/check
@@ -8,13 +8,9 @@ tmp=/tmp/$$
 status=0
 needwrap=true
 needsum=true
-try=()
 sum_bad=0
-bad=()
-notrun=()
 interrupt=true
 diff="diff -u"
-showme=false
 export here=`pwd`
 brief_test_summary=false
 do_report=false
@@ -38,14 +34,12 @@ export QA_CHECK_FS=${QA_CHECK_FS:=true}
 # number of diff lines from a failed test, 0 for whole output
 export DIFF_LENGTH=${DIFF_LENGTH:=10}
 
-# by default don't output timestamps
-timestamp=${TIMESTAMP:=false}
-
 rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
 
 . ./common/exit
 . ./common/test_names
 . ./common/test_list
+. ./common/test_exec
 . ./common/config
 . ./common/config-sections
 . ./common/rc
@@ -122,12 +116,6 @@ examples:
 	    exit 1
 }
 
-_timestamp()
-{
-    local now=`date "+%T"`
-    echo -n " [$now]"
-}
-
 # Process command arguments first.
 while [ $# -gt 0 ]; do
 	case "$1" in
@@ -155,7 +143,7 @@ while [ $# -gt 0 ]; do
 	-l)	diff="diff" ;;
 	-udiff)	diff="$diff -u" ;;
 
-	-n)	showme=true ;;
+	-n)	_te_dry_run="true" ;;
 	-i)	iterations=$2; shift ;;
 	-I) 	iterations=$2; istop=true; shift ;;
 	-T)	timestamp=true ;;
@@ -197,6 +185,9 @@ if [ ! -z "$REPORT_LIST" ]; then
 	_assert_report_list
 fi
 
+# by default don't output timestamps
+_te_emit_timestamps=${TIMESTAMP:=}
+
 # If the test config specified a soak test duration, see if there are any
 # unit suffixes that need converting to an integer seconds count.
 if [ -n "$SOAK_DURATION" ]; then
@@ -227,13 +218,6 @@ then
     exit 1
 fi
 
-_wipe_counters()
-{
-	try=()
-	notrun=()
-	bad=()
-}
-
 _global_log() {
 	echo "$1" >> $check.log
 	if $OPTIONS_HAVE_SECTIONS; then
@@ -252,11 +236,11 @@ _wrapup()
 	check="$RESULT_BASE/check"
 	$interrupt && sect_stop=`_wallclock`
 
-	if $showme && $needwrap; then
+	if [ "$_te_dry_run" == "true" ] && $needwrap; then
 		if $do_report; then
-			# $showme = all selected tests are notrun (no tries)
-			_make_section_report "$section" "${#notrun[*]}" "0" \
-					     "${#notrun[*]}" \
+			# $_te_dry_run = all selected tests are notrun (no tries)
+			_make_section_report "$section" "${#_te_notrun[*]}" "0" \
+					     "${#_te_notrun[*]}" \
 					     "$((sect_stop - sect_start))"
 		fi
 		needwrap=false
@@ -276,6 +260,7 @@ _wrapup()
 				cp $check.time ${REPORT_DIR}/check.time
 			fi
 		fi
+		set +x
 
 		_global_log ""
 		_global_log "Kernel version: $(uname -r)"
@@ -283,12 +268,12 @@ _wrapup()
 
 		echo "SECTION       -- $section" >>$tmp.summary
 		echo "=========================" >>$tmp.summary
-		if ((${#try[*]} > 0)); then
+		if ((${#_te_try[*]} > 0)); then
 			if [ $brief_test_summary == "false" ]; then
-				echo "Ran: ${try[*]}"
-				echo "Ran: ${try[*]}" >>$tmp.summary
+				echo "Ran: ${_te_try[*]}"
+				echo "Ran: ${_te_try[*]}" >>$tmp.summary
 			fi
-			_global_log "Ran: ${try[*]}"
+			_global_log "Ran: ${_te_try[*]}"
 		fi
 
 		$interrupt && echo "Interrupted!" | tee -a $check.log
@@ -297,30 +282,30 @@ _wrapup()
 				${REPORT_DIR}/check.log
 		fi
 
-		if ((${#notrun[*]} > 0)); then
+		if ((${#_te_notrun[*]} > 0)); then
 			if [ $brief_test_summary == "false" ]; then
-				echo "Not run: ${notrun[*]}"
-				echo "Not run: ${notrun[*]}" >>$tmp.summary
+				echo "Not run: ${_te_notrun[*]}"
+				echo "Not run: ${_te_notrun[*]}" >>$tmp.summary
 			fi
-			_global_log "Not run: ${notrun[*]}"
+			_global_log "Not run: ${_te_notrun[*]}"
 		fi
 
-		if ((${#bad[*]} > 0)); then
-			echo "Failures: ${bad[*]}"
-			echo "Failed ${#bad[*]} of ${#try[*]} tests"
-			_global_log "Failures: ${bad[*]}"
-			_global_log "Failed ${#bad[*]} of ${#try[*]} tests"
-			echo "Failures: ${bad[*]}" >>$tmp.summary
-			echo "Failed ${#bad[*]} of ${#try[*]} tests" >>$tmp.summary
+		if ((${#_te_bad[*]} > 0)); then
+			echo "Failures: ${_te_bad[*]}"
+			echo "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests"
+			_global_log "Failures: ${_te_bad[*]}"
+			_global_log "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests"
+			echo "Failures: ${_te_bad[*]}" >>$tmp.summary
+			echo "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests" >>$tmp.summary
 		else
-			echo "Passed all ${#try[*]} tests"
-			_global_log "Passed all ${#try[*]} tests"
-			echo "Passed all ${#try[*]} tests" >>$tmp.summary
+			echo "Passed all ${#_te_try[*]} tests"
+			_global_log "Passed all ${#_te_try[*]} tests"
+			echo "Passed all ${#_te_try[*]} tests" >>$tmp.summary
 		fi
 		echo "" >>$tmp.summary
 		if $do_report; then
-			_make_section_report "$section" "${#try[*]}" \
-					     "${#bad[*]}" "${#notrun[*]}" \
+			_make_section_report "$section" "${#_te_try[*]}" \
+					     "${#_te_bad[*]}" "${#_te_notrun[*]}" \
 					     "$((sect_stop - sect_start))"
 		fi
 
@@ -338,8 +323,8 @@ _wrapup()
 		needwrap=false
 	fi
 
-	sum_bad=`expr $sum_bad + ${#bad[*]}`
-	_wipe_counters
+	sum_bad=`expr $sum_bad + ${#_te_bad[*]}`
+	_te_wipe_counters
 	if ! $OPTIONS_HAVE_SECTIONS; then
 		rm -f $tmp.*
 	fi
@@ -348,7 +333,7 @@ _wrapup()
 _summary()
 {
 	_wrapup
-	if $showme; then
+	if [ "$_te_dry_run" == "true" ]; then
 		:
 	elif $needsum; then
 		count=`wc -L $tmp.summary | cut -f1 -d" "`
@@ -358,95 +343,6 @@ _summary()
 	rm -f $tmp.*
 }
 
-_check_filesystems()
-{
-	local ret=0
-
-	if [ -f ${RESULT_DIR}/require_test ]; then
-		if ! _check_test_fs ; then
-			ret=1
-			echo "Trying to repair broken TEST_DEV file system"
-			_repair_test_fs
-			_test_mount
-		fi
-		rm -f ${RESULT_DIR}/require_test*
-	else
-		_test_unmount 2> /dev/null
-	fi
-	if [ -f ${RESULT_DIR}/require_scratch ]; then
-		_check_scratch_fs || ret=1
-		rm -f ${RESULT_DIR}/require_scratch*
-	fi
-	_scratch_unmount 2> /dev/null
-	return $ret
-}
-
-# retain files which would be overwritten in subsequent reruns of the same test
-_stash_fail_loop_files() {
-	local seq_prefix="${REPORT_DIR}/${1}"
-	local cp_suffix="$2"
-
-	for i in ".full" ".dmesg" ".out.bad" ".notrun" ".core" ".hints"; do
-		rm -f "${seq_prefix}${i}${cp_suffix}"
-		if [ -f "${seq_prefix}${i}" ]; then
-			cp "${seq_prefix}${i}" "${seq_prefix}${i}${cp_suffix}"
-		fi
-	done
-}
-
-# Retain in @bad / @notrun the result of the just-run @test_seq. @try array
-# entries are added prior to execution.
-_stash_test_status() {
-	local test_seq="$1"
-	local test_status="$2"
-
-	if $do_report && [[ $test_status != "expunge" ]]; then
-		_make_testcase_report "$section" "$test_seq" \
-				      "$test_status" "$((stop - start))"
-	fi
-
-	if ((${#loop_status[*]} > 0)); then
-		# continuing or completing rerun-on-failure loop
-		_stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}"
-		loop_status+=("$test_status")
-		if ((${#loop_status[*]} > loop_on_fail)); then
-			printf "%s aggregate results across %d runs: " \
-				"$test_seq" "${#loop_status[*]}"
-			awk "BEGIN {
-				n=split(\"${loop_status[*]}\", arr);"'
-				for (i = 1; i <= n; i++)
-					stats[arr[i]]++;
-				for (x in stats)
-					printf("%s=%d (%.1f%%)",
-					       (i-- > n ? x : ", " x),
-					       stats[x], 100 * stats[x] / n);
-				}'
-			echo
-			loop_status=()
-		fi
-		return	# only stash @bad result for initial failure in loop
-	fi
-
-	case "$test_status" in
-	fail)
-		if ((loop_on_fail > 0)); then
-			# initial failure, start rerun-on-failure loop
-			_stash_fail_loop_files "$test_seq" ".rerun0"
-			loop_status+=("$test_status")
-		fi
-		bad+=("$test_seq")
-		;;
-	list|notrun)
-		notrun+=("$test_seq")
-		;;
-	pass|expunge)
-		;;
-	*)
-		echo "Unexpected test $test_seq status: $test_status"
-		;;
-	esac
-}
-
 # Can we run systemd scopes?
 HAVE_SYSTEMD_SCOPES=
 systemctl reset-failed "fstests-check" &>/dev/null
@@ -454,11 +350,7 @@ systemd-run --quiet --unit "fstests-check" --scope bash -c "exit 77" &> /dev/nul
 test $? -eq 77 && HAVE_SYSTEMD_SCOPES=yes
 
 # Make the check script unattractive to the OOM killer...
-OOM_SCORE_ADJ="/proc/self/oom_score_adj"
-function _adjust_oom_score() {
-	test -w "${OOM_SCORE_ADJ}" && echo "$1" > "${OOM_SCORE_ADJ}"
-}
-_adjust_oom_score -500
+_te_adjust_oom_score -500
 
 # ...and make the tests themselves somewhat more attractive to it, so that if
 # the system runs out of memory it'll be the test that gets killed and not the
@@ -472,7 +364,7 @@ _adjust_oom_score -500
 # when systemd tells them to terminate (e.g. programs stuck in D state when
 # systemd sends SIGKILL), so we use reset-failed to tear down the scope.
 _run_seq() {
-	local cmd=(bash -c "test -w ${OOM_SCORE_ADJ} && echo 250 > ${OOM_SCORE_ADJ}; exec ./$seq")
+	local cmd=(bash -c "test -w ${_te_oom_score_adj} && echo 250 > ${_te_oom_score_adj}; exec ./$seq")
 	local res
 
 	if [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then
@@ -617,222 +509,26 @@ function run_section()
 	seqres="$check"
 	_check_test_fs
 
-	loop_status=()	# track rerun-on-failure state
-	local tc_status ix
+	if $OPTIONS_HAVE_SECTIONS; then
+		REPORT_DIR="$RESULT_BASE/$section"
+	else
+		REPORT_DIR="$RESULT_BASE"
+	fi
+
+	local ix
 	local -a _list=( $_tl_tests )
-	for ((ix = 0; ix < ${#_list[*]}; !${#loop_status[*]} && ix++)); do
+	for ((ix = 0; ix < ${#_list[*]}; !${#_te_loop_status[*]} && ix++)); do
 		seq="${_list[$ix]}"
 
-		# the filename for the test and the name output are different.
-		# we don't include the tests/ directory in the name output.
-		export seqnum=$(_tl_strip_src_dir $seq)
-		group=${seqnum%%/*}
-		if $OPTIONS_HAVE_SECTIONS; then
-			REPORT_DIR="$RESULT_BASE/$section"
-		else
-			REPORT_DIR="$RESULT_BASE"
-		fi
-		export RESULT_DIR="$REPORT_DIR/$group"
-		seqres="$REPORT_DIR/$seqnum"
-
 		# Generate the entire section report with whatever test results
 		# we have so far.  Leave the $sect_time parameter empty so that
 		# it's a little more obvious that this test run is incomplete.
 		if $do_report; then
-			_make_section_report "$section" "${#try[*]}" \
-					     "${#bad[*]}" "${#notrun[*]}" \
+			_make_section_report "$section" "${#_te_try[*]}" \
+					     "${#_te_bad[*]}" "${#_te_notrun[*]}" \
 					     "" &> /dev/null
 		fi
-
-		echo -n "$seqnum"
-
-		if $showme; then
-			if _tl_expunge_test $seqnum; then
-				tc_status="expunge"
-			else
-				echo
-				start=0
-				stop=0
-				tc_status="list"
-			fi
-			_stash_test_status "$seqnum" "$tc_status"
-			continue
-		fi
-
-		tc_status="pass"
-		if [ ! -f $seq ]; then
-			echo " - no such test?"
-			_stash_test_status "$seqnum" "$tc_status"
-			continue
-		fi
-
-		# really going to try and run this one
-		mkdir -p $RESULT_DIR
-		rm -f ${RESULT_DIR}/require_scratch*
-		rm -f ${RESULT_DIR}/require_test*
-		rm -f $seqres.out.bad $seqres.hints
-
-		# check if we really should run it
-		if _tl_expunge_test $seqnum; then
-			tc_status="expunge"
-			_stash_test_status "$seqnum" "$tc_status"
-			continue
-		fi
-
-		# record that we really tried to run this test.
-		if ((!${#loop_status[*]})); then
-			try+=("$seqnum")
-		fi
-
-		awk 'BEGIN {lasttime="       "} \
-		     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
-		     END {printf "%s", lasttime}' "$check.time"
-		rm -f core $seqres.notrun
-
-		start=`_wallclock`
-		$timestamp && _timestamp
-		[ ! -x $seq ] && chmod u+x $seq # ensure we can run it
-		$LOGGER_PROG "run xfstest $seqnum"
-		if [ -w /dev/kmsg ]; then
-			export date_time=`date +"%F %T"`
-			echo "run fstests $seqnum at $date_time" > /dev/kmsg
-			# _check_dmesg depends on this log in dmesg
-			touch ${RESULT_DIR}/check_dmesg
-			rm -f ${RESULT_DIR}/dmesg_filter
-		fi
-		_try_wipe_scratch_devs > /dev/null 2>&1
-
-		# clear the WARN_ONCE state to allow a potential problem
-		# to be reported for each test
-		(echo 1 > $DEBUGFS_MNT/clear_warn_once) > /dev/null 2>&1
-
-		test_start_time="$(date +"%F %T")"
-		if [ "$DUMP_OUTPUT" = true ]; then
-			_run_seq 2>&1 | tee $tmp.out
-			# Because $? would get tee's return code
-			sts=${PIPESTATUS[0]}
-		else
-			_run_seq >$tmp.out 2>&1
-			sts=$?
-		fi
-
-		# If someone sets kernel.core_pattern or kernel.core_uses_pid,
-		# coredumps generated by fstests might have a longer name than
-		# just "core".  Use globbing to find the most common patterns,
-		# assuming there are no other coredump capture packages set up.
-		local cores=0
-		for i in core core.*; do
-			test -f "$i" || continue
-			if ((cores++ == 0)); then
-				_dump_err_cont "[dumped core]"
-			fi
-			(_adjust_oom_score 250; _save_coredump "$i")
-			tc_status="fail"
-		done
-
-		if [ -f $seqres.notrun ]; then
-			$timestamp && _timestamp
-			stop=`_wallclock`
-			$timestamp || echo -n "[not run] "
-			$timestamp && echo " [not run]" && \
-				      echo -n "	$seqnum -- "
-			cat $seqres.notrun
-			tc_status="notrun"
-			_stash_test_status "$seqnum" "$tc_status"
-
-			# Unmount the scratch fs so that we can wipe the scratch
-			# dev state prior to the next test run.
-			_scratch_unmount 2> /dev/null
-			continue;
-		fi
-
-		if [ $sts -ne 0 ]; then
-			_dump_err_cont "[failed, exit status $sts]"
-			_test_unmount 2> /dev/null
-			_scratch_unmount 2> /dev/null
-			rm -f ${RESULT_DIR}/require_test*
-			rm -f ${RESULT_DIR}/require_scratch*
-			# Even though we failed, there may be something interesting in
-			# dmesg which can help debugging.
-			_check_dmesg
-			tc_status="fail"
-		else
-			# The test apparently passed, so check for corruption
-			# and log messages that shouldn't be there.  Run the
-			# checking tools from a subshell with adjusted OOM
-			# score so that the OOM killer will target them instead
-			# of the check script itself.
-			(_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
-			_check_dmesg || tc_status="fail"
-
-			# Save any coredumps from the post-test fs checks
-			for i in core core.*; do
-				test -f "$i" || continue
-				if ((cores++ == 0)); then
-					_dump_err_cont "[dumped core]"
-				fi
-				(_adjust_oom_score 250; _save_coredump "$i")
-				tc_status="fail"
-			done
-		fi
-
-		# Reload the module after each test to check for leaks or
-		# other problems.
-		if [ -n "${TEST_FS_MODULE_RELOAD}" ]; then
-			_test_unmount 2> /dev/null
-			_scratch_unmount 2> /dev/null
-			modprobe -r fs-$FSTYP
-			modprobe fs-$FSTYP
-		fi
-
-		# Scan for memory leaks after every test so that associating
-		# a leak to a particular test will be as accurate as possible.
-		_check_kmemleak || tc_status="fail"
-
-		# test ends after all checks are done.
-		$timestamp && _timestamp
-		stop=`_wallclock`
-
-		if [ ! -f $seq.out ]; then
-			_dump_err "no qualified output"
-			tc_status="fail"
-			_stash_test_status "$seqnum" "$tc_status"
-			continue;
-		fi
-
-		# coreutils 8.16+ changed quote formats in error messages
-		# from `foo' to 'foo'. Filter old versions to match the new
-		# version.
-		sed -i "s/\`/\'/g" $tmp.out
-		if diff $seq.out $tmp.out >/dev/null 2>&1 ; then
-			if [ "$tc_status" != "fail" ]; then
-				echo "$seqnum `expr $stop - $start`" >>$tmp.time
-				echo -n " `expr $stop - $start`s"
-			fi
-			echo ""
-		else
-			_dump_err "- output mismatch (see $seqres.out.bad)"
-			mv $tmp.out $seqres.out.bad
-			$diff $seq.out $seqres.out.bad | {
-			if test "$DIFF_LENGTH" -le 0; then
-				cat
-			else
-				head -n "$DIFF_LENGTH"
-				echo "..."
-				echo "(Run '$diff $here/$seq.out $seqres.out.bad'" \
-					" to see the entire diff)"
-			fi; } | sed -e 's/^\(.\)/    \1/'
-			tc_status="fail"
-		fi
-		if [ -f $seqres.hints ]; then
-			if [ "$tc_status" == "fail" ]; then
-				echo
-				cat $seqres.hints
-			else
-				rm -f $seqres.hints
-			fi
-		fi
-		_stash_test_status "$seqnum" "$tc_status"
+		_te_run_test $seq
 	done
 
 	# Reset these three variables so that unmount output doesn't get
diff --git a/common/test_exec b/common/test_exec
new file mode 100644
index 000000000..63efa3d19
--- /dev/null
+++ b/common/test_exec
@@ -0,0 +1,352 @@
+##/bin/bash
+# SPDX-License-Identifier: GPL-2.0+
+# Copyright (c) 2000-2002,2006 Silicon Graphics, Inc.  All Rights Reserved.
+# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
+#
+# Test execution functions
+#
+# This file contains the functions to run a test and capture the results. The
+# caller context must source all the dependencies this code requires, as well
+# as provide certain global variables and define certain
+# functions to run and track the test status of the test being run.
+#
+# Any function or variable that is public should have a "_te_" prefix.
+
+# test status tracking variables. These are externally visible so that the
+# caller can do it's own test reporting based on the tracking provided by these
+# variables.
+_te_try=()
+_te_bad=()
+_te_notrun=()
+_te_loop_status=()
+_te_emit_timestamps=""
+_te_dry_run=""
+
+_te_wipe_counters()
+{
+	_te_try=()
+	_te_notrun=()
+	_te_bad=()
+	_te_loop_status=()
+}
+
+_te_oom_score_adj="/proc/self/oom_score_adj"
+_te_adjust_oom_score() {
+	test -w "${_te_oom_score_adj}" && echo "$1" > "${_te_oom_score_adj}"
+}
+
+_te_timestamp()
+{
+	if [ "$_te_emit_timestamps" == "true" ]; then
+		local now=`date "+%T"`
+		echo -n " [$now]"
+	fi
+}
+
+_te_check_filesystems()
+{
+	local ret=0
+
+	if [ -f ${RESULT_DIR}/require_test ]; then
+		if ! _check_test_fs ; then
+			ret=1
+			echo "Trying to repair broken TEST_DEV file system"
+			_repair_test_fs
+			_test_mount
+		fi
+		rm -f ${RESULT_DIR}/require_test*
+	else
+		_test_unmount 2> /dev/null
+	fi
+	if [ -f ${RESULT_DIR}/require_scratch ]; then
+		_check_scratch_fs || ret=1
+		rm -f ${RESULT_DIR}/require_scratch*
+	fi
+	_scratch_unmount 2> /dev/null
+	return $ret
+}
+
+# retain files which would be overwritten in subsequent reruns of the same test
+_te_stash_fail_loop_files() {
+	local seq_prefix="${REPORT_DIR}/${1}"
+	local cp_suffix="$2"
+
+	for i in ".full" ".dmesg" ".out.bad" ".notrun" ".core" ".hints"; do
+		rm -f "${seq_prefix}${i}${cp_suffix}"
+		if [ -f "${seq_prefix}${i}" ]; then
+			cp "${seq_prefix}${i}" "${seq_prefix}${i}${cp_suffix}"
+		fi
+	done
+}
+
+# Retain in @bad / @notrun the result of the just-run @test_seq. @try array
+# entries are added prior to execution.
+_te_stash_test_status() {
+	local test_seq="$1"
+	local test_status="$2"
+
+	if $do_report && [[ $test_status != "expunge" ]]; then
+		_make_testcase_report "$section" "$test_seq" \
+				      "$test_status" "$((stop - start))"
+	fi
+
+	if ((${#loop_status[*]} > 0)); then
+		# continuing or completing rerun-on-failure loop
+		_te_stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}"
+		_te_loop_status+=("$test_status")
+		if ((${#_te_loop_status[*]} > loop_on_fail)); then
+			printf "%s aggregate results across %d runs: " \
+				"$test_seq" "${#loop_status[*]}"
+			awk "BEGIN {
+				n=split(\"${loop_status[*]}\", arr);"'
+				for (i = 1; i <= n; i++)
+					stats[arr[i]]++;
+				for (x in stats)
+					printf("%s=%d (%.1f%%)",
+					       (i-- > n ? x : ", " x),
+					       stats[x], 100 * stats[x] / n);
+				}'
+			echo
+			_te_loop_status=()
+		fi
+		return	# only stash @bad result for initial failure in loop
+	fi
+
+	case "$test_status" in
+	fail)
+		if ((loop_on_fail > 0)); then
+			# initial failure, start rerun-on-failure loop
+			_te_stash_fail_loop_files "$test_seq" ".rerun0"
+			loop_status+=("$test_status")
+		fi
+		_te_bad+=("$test_seq")
+		;;
+	list|notrun)
+		_te_notrun+=("$test_seq")
+		;;
+	pass|expunge)
+		;;
+	*)
+		echo "Unexpected test $test_seq status: $test_status"
+		;;
+	esac
+}
+
+# Run a test.
+#
+# This currently relies on the caller defining global variables for test
+# reporting and status tracking:
+#
+# REPORT_DIR
+# tmp
+#
+# The caller also needs to define the functions _run_seq and _kill_seq
+# for executing and killing specific test binaries.
+# 
+_te_run_test()
+{
+	local seq="$1"
+	local tc_status="pass"
+	local start
+	local stop
+	local sts
+
+	# the filename for the test and the name output are different.
+	# we don't include the tests/ directory in the name output.
+	export seqnum=$(_tl_strip_src_dir $seq)
+	group=${seqnum%%/*}
+	export RESULT_DIR="$REPORT_DIR/$group"
+	seqres="$REPORT_DIR/$seqnum"
+
+	echo -n "$seqnum"
+
+	if [ "$_te_dry_run" == "true" ]; then
+		if _tl_expunge_test $seqnum; then
+			tc_status="expunge"
+		else
+			echo
+			start=0
+			stop=0
+			tc_status="list"
+		fi
+		_te_stash_test_status "$seqnum" "$tc_status"
+		return
+	fi
+
+	if [ ! -f $seq ]; then
+		echo " - no such test?"
+		_te_stash_test_status "$seqnum" "$tc_status"
+		return
+	fi
+
+	# really going to try and run this one
+	mkdir -p $RESULT_DIR
+	rm -f ${RESULT_DIR}/require_scratch*
+	rm -f ${RESULT_DIR}/require_test*
+	rm -f $seqres.out.bad $seqres.hints
+
+	# check if we really should run it
+	if _tl_expunge_test $seqnum; then
+		tc_status="expunge"
+		_te_stash_test_status "$seqnum" "$tc_status"
+		return
+	fi
+
+	# record that we really tried to run this test.
+	if ((!${#loop_status[*]})); then
+		_te_try+=("$seqnum")
+	fi
+
+	awk 'BEGIN {lasttime="       "} \
+	     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
+	     END {printf "%s", lasttime}' "$check.time"
+	rm -f core $seqres.notrun
+
+	start=`_wallclock`
+	_te_timestamp
+	[ ! -x $seq ] && chmod u+x $seq # ensure we can run it
+	$LOGGER_PROG "run xfstest $seqnum"
+	if [ -w /dev/kmsg ]; then
+		export date_time=`date +"%F %T"`
+		echo "run fstests $seqnum at $date_time" > /dev/kmsg
+		# _check_dmesg depends on this log in dmesg
+		touch ${RESULT_DIR}/check_dmesg
+		rm -f ${RESULT_DIR}/dmesg_filter
+	fi
+	_try_wipe_scratch_devs > /dev/null 2>&1
+
+	# clear the WARN_ONCE state to allow a potential problem
+	# to be reported for each test
+	(echo 1 > $DEBUGFS_MNT/clear_warn_once) > /dev/null 2>&1
+
+	test_start_time="$(date +"%F %T")"
+	if [ "$DUMP_OUTPUT" = true ]; then
+		_run_seq 2>&1 | tee $tmp.out
+		# Because $? would get tee's return code
+		sts=${PIPESTATUS[0]}
+	else
+		_run_seq >$tmp.out 2>&1
+		sts=$?
+	fi
+
+	# If someone sets kernel.core_pattern or kernel.core_uses_pid,
+	# coredumps generated by fstests might have a longer name than
+	# just "core".  Use globbing to find the most common patterns,
+	# assuming there are no other coredump capture packages set up.
+	local cores=0
+	for i in core core.*; do
+		test -f "$i" || continue
+		if ((cores++ == 0)); then
+			_dump_err_cont "[dumped core]"
+		fi
+		(_te_adjust_oom_score 250; _save_coredump "$i")
+		tc_status="fail"
+	done
+
+	if [ -f $seqres.notrun ]; then
+		stop=`_wallclock`
+		if [ "$_te_emit_timestamps" == "true" ]; then
+			_te_timestamp
+			echo " [not run]"
+			echo -n " $seqnum -- "
+		else
+			echo -n "[not run] "
+		fi
+		cat $seqres.notrun
+		tc_status="notrun"
+		_te_stash_test_status "$seqnum" "$tc_status"
+
+		# Unmount the scratch fs so that we can wipe the scratch
+		# dev state prior to the next test run.
+		_scratch_unmount 2> /dev/null
+		return
+	fi
+
+	if [ $sts -ne 0 ]; then
+		_dump_err_cont "[failed, exit status $sts]"
+		_test_unmount 2> /dev/null
+		_scratch_unmount 2> /dev/null
+		rm -f ${RESULT_DIR}/require_test*
+		rm -f ${RESULT_DIR}/require_scratch*
+		# Even though we failed, there may be something interesting in
+		# dmesg which can help debugging.
+		_check_dmesg
+		tc_status="fail"
+	else
+		# The test apparently passed, so check for corruption
+		# and log messages that shouldn't be there.  Run the
+		# checking tools from a subshell with adjusted OOM
+		# score so that the OOM killer will target them instead
+		# of the check script itself.
+		(_te_adjust_oom_score 250; _te_check_filesystems) || tc_status="fail"
+		_check_dmesg || tc_status="fail"
+
+		# Save any coredumps from the post-test fs checks
+		for i in core core.*; do
+			test -f "$i" || continue
+			if ((cores++ == 0)); then
+				_dump_err_cont "[dumped core]"
+			fi
+			(_te_adjust_oom_score 250; _save_coredump "$i")
+			tc_status="fail"
+		done
+	fi
+
+	# Reload the module after each test to check for leaks or
+	# other problems.
+	if [ -n "${TEST_FS_MODULE_RELOAD}" ]; then
+		_test_unmount 2> /dev/null
+		_scratch_unmount 2> /dev/null
+		modprobe -r fs-$FSTYP
+		modprobe fs-$FSTYP
+	fi
+
+	# Scan for memory leaks after every test so that associating
+	# a leak to a particular test will be as accurate as possible.
+	_check_kmemleak || tc_status="fail"
+
+	# test ends after all checks are done.
+	_te_timestamp
+	stop=`_wallclock`
+
+	if [ ! -f $seq.out ]; then
+		_dump_err "no qualified output"
+		tc_status="fail"
+		_te_stash_test_status "$seqnum" "$tc_status"
+		return;
+	fi
+
+	# coreutils 8.16+ changed quote formats in error messages
+	# from `foo' to 'foo'. Filter old versions to match the new
+	# version.
+	sed -i "s/\`/\'/g" $tmp.out
+	if diff $seq.out $tmp.out >/dev/null 2>&1 ; then
+		if [ "$tc_status" != "fail" ]; then
+			echo "$seqnum `expr $stop - $start`" >>$tmp.time
+			echo -n " `expr $stop - $start`s"
+		fi
+		echo ""
+	else
+		_dump_err "- output mismatch (see $seqres.out.bad)"
+		mv $tmp.out $seqres.out.bad
+		$diff $seq.out $seqres.out.bad | {
+		if test "$DIFF_LENGTH" -le 0; then
+			cat
+		else
+			head -n "$DIFF_LENGTH"
+			echo "..."
+			echo "(Run '$diff $here/$seq.out $seqres.out.bad'" \
+				" to see the entire diff)"
+		fi; } | sed -e 's/^\(.\)/    \1/'
+		tc_status="fail"
+	fi
+	if [ -f $seqres.hints ]; then
+		if [ "$tc_status" == "fail" ]; then
+			echo
+			cat $seqres.hints
+		else
+			rm -f $seqres.hints
+		fi
+	fi
+	_te_stash_test_status "$seqnum" "$tc_status"
+}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 19/28] check: factor test running
  2025-04-17  3:01 ` [PATCH 19/28] check: factor test running Dave Chinner
@ 2025-05-12 13:57   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-12 13:57 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Rework the code that check uses to run an individual test,
> separating the executing of the test from the various pre- and post-
> test processing operations that are specific to check results
> processing.
> 
> This essentially encapsulates the test running and result tracking
> in it's own file, leaving just the section iteration and reporting
> to the caller that is running the tests. The caller needs to define
> the runseq() function that actually executes the test and some
I couldn't understand the usage of runseq() function? Can you please elaborate this a little bit
more?
> environment variables (e.g. REPORT_DIR) so that the test execution
> code will run correctly and stash results and reports in the correct
> location.
> 
> This greatly simplifies the check script as a big chunk of code
> dedicated to simply running a test and gathering the results is
> completely abstracted away. This makes it clearer what check is now
> doing at a high level in terms of iterating sections and generating
> reports.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check            | 396 ++++++-----------------------------------------
>  common/test_exec | 352 +++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 398 insertions(+), 350 deletions(-)
>  create mode 100644 common/test_exec
> 
> diff --git a/check b/check
> index fea86f7b9..106de0ee6 100755
> --- a/check
> +++ b/check
> @@ -8,13 +8,9 @@ tmp=/tmp/$$
>  status=0
>  needwrap=true
>  needsum=true
> -try=()
>  sum_bad=0
> -bad=()
> -notrun=()
>  interrupt=true
>  diff="diff -u"
> -showme=false
>  export here=`pwd`
>  brief_test_summary=false
>  do_report=false
> @@ -38,14 +34,12 @@ export QA_CHECK_FS=${QA_CHECK_FS:=true}
>  # number of diff lines from a failed test, 0 for whole output
>  export DIFF_LENGTH=${DIFF_LENGTH:=10}
>  
> -# by default don't output timestamps
> -timestamp=${TIMESTAMP:=false}
> -
>  rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
>  
>  . ./common/exit
>  . ./common/test_names
>  . ./common/test_list
> +. ./common/test_exec
>  . ./common/config
>  . ./common/config-sections
>  . ./common/rc
> @@ -122,12 +116,6 @@ examples:
>  	    exit 1
>  }
>  
> -_timestamp()
> -{
> -    local now=`date "+%T"`
> -    echo -n " [$now]"
> -}
> -
>  # Process command arguments first.
>  while [ $# -gt 0 ]; do
>  	case "$1" in
> @@ -155,7 +143,7 @@ while [ $# -gt 0 ]; do
>  	-l)	diff="diff" ;;
>  	-udiff)	diff="$diff -u" ;;
>  
> -	-n)	showme=true ;;
> +	-n)	_te_dry_run="true" ;;
>  	-i)	iterations=$2; shift ;;
>  	-I) 	iterations=$2; istop=true; shift ;;
>  	-T)	timestamp=true ;;
> @@ -197,6 +185,9 @@ if [ ! -z "$REPORT_LIST" ]; then
>  	_assert_report_list
>  fi
>  
> +# by default don't output timestamps
> +_te_emit_timestamps=${TIMESTAMP:=}
> +
>  # If the test config specified a soak test duration, see if there are any
>  # unit suffixes that need converting to an integer seconds count.
>  if [ -n "$SOAK_DURATION" ]; then
> @@ -227,13 +218,6 @@ then
>      exit 1
>  fi
>  
> -_wipe_counters()
> -{
> -	try=()
> -	notrun=()
> -	bad=()
> -}
> -
>  _global_log() {
>  	echo "$1" >> $check.log
>  	if $OPTIONS_HAVE_SECTIONS; then
> @@ -252,11 +236,11 @@ _wrapup()
>  	check="$RESULT_BASE/check"
>  	$interrupt && sect_stop=`_wallclock`
>  
> -	if $showme && $needwrap; then
> +	if [ "$_te_dry_run" == "true" ] && $needwrap; then
>  		if $do_report; then
> -			# $showme = all selected tests are notrun (no tries)
> -			_make_section_report "$section" "${#notrun[*]}" "0" \
> -					     "${#notrun[*]}" \
> +			# $_te_dry_run = all selected tests are notrun (no tries)
> +			_make_section_report "$section" "${#_te_notrun[*]}" "0" \
> +					     "${#_te_notrun[*]}" \
>  					     "$((sect_stop - sect_start))"
>  		fi
>  		needwrap=false
> @@ -276,6 +260,7 @@ _wrapup()
>  				cp $check.time ${REPORT_DIR}/check.time
>  			fi
>  		fi
> +		set +x
>  
>  		_global_log ""
>  		_global_log "Kernel version: $(uname -r)"
> @@ -283,12 +268,12 @@ _wrapup()
>  
>  		echo "SECTION       -- $section" >>$tmp.summary
>  		echo "=========================" >>$tmp.summary
> -		if ((${#try[*]} > 0)); then
> +		if ((${#_te_try[*]} > 0)); then
>  			if [ $brief_test_summary == "false" ]; then
> -				echo "Ran: ${try[*]}"
> -				echo "Ran: ${try[*]}" >>$tmp.summary
> +				echo "Ran: ${_te_try[*]}"
> +				echo "Ran: ${_te_try[*]}" >>$tmp.summary
>  			fi
> -			_global_log "Ran: ${try[*]}"
> +			_global_log "Ran: ${_te_try[*]}"
>  		fi
>  
>  		$interrupt && echo "Interrupted!" | tee -a $check.log
> @@ -297,30 +282,30 @@ _wrapup()
>  				${REPORT_DIR}/check.log
>  		fi
>  
> -		if ((${#notrun[*]} > 0)); then
> +		if ((${#_te_notrun[*]} > 0)); then
>  			if [ $brief_test_summary == "false" ]; then
> -				echo "Not run: ${notrun[*]}"
> -				echo "Not run: ${notrun[*]}" >>$tmp.summary
> +				echo "Not run: ${_te_notrun[*]}"
> +				echo "Not run: ${_te_notrun[*]}" >>$tmp.summary
>  			fi
> -			_global_log "Not run: ${notrun[*]}"
> +			_global_log "Not run: ${_te_notrun[*]}"
>  		fi
>  
> -		if ((${#bad[*]} > 0)); then
> -			echo "Failures: ${bad[*]}"
> -			echo "Failed ${#bad[*]} of ${#try[*]} tests"
> -			_global_log "Failures: ${bad[*]}"
> -			_global_log "Failed ${#bad[*]} of ${#try[*]} tests"
> -			echo "Failures: ${bad[*]}" >>$tmp.summary
> -			echo "Failed ${#bad[*]} of ${#try[*]} tests" >>$tmp.summary
> +		if ((${#_te_bad[*]} > 0)); then
> +			echo "Failures: ${_te_bad[*]}"
> +			echo "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests"
> +			_global_log "Failures: ${_te_bad[*]}"
> +			_global_log "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests"
> +			echo "Failures: ${_te_bad[*]}" >>$tmp.summary
> +			echo "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests" >>$tmp.summary
>  		else
> -			echo "Passed all ${#try[*]} tests"
> -			_global_log "Passed all ${#try[*]} tests"
> -			echo "Passed all ${#try[*]} tests" >>$tmp.summary
> +			echo "Passed all ${#_te_try[*]} tests"
> +			_global_log "Passed all ${#_te_try[*]} tests"
> +			echo "Passed all ${#_te_try[*]} tests" >>$tmp.summary
>  		fi
>  		echo "" >>$tmp.summary
>  		if $do_report; then
> -			_make_section_report "$section" "${#try[*]}" \
> -					     "${#bad[*]}" "${#notrun[*]}" \
> +			_make_section_report "$section" "${#_te_try[*]}" \
> +					     "${#_te_bad[*]}" "${#_te_notrun[*]}" \
>  					     "$((sect_stop - sect_start))"
>  		fi
>  
> @@ -338,8 +323,8 @@ _wrapup()
>  		needwrap=false
>  	fi
>  
> -	sum_bad=`expr $sum_bad + ${#bad[*]}`
> -	_wipe_counters
> +	sum_bad=`expr $sum_bad + ${#_te_bad[*]}`
> +	_te_wipe_counters
>  	if ! $OPTIONS_HAVE_SECTIONS; then
>  		rm -f $tmp.*
>  	fi
> @@ -348,7 +333,7 @@ _wrapup()
>  _summary()
>  {
>  	_wrapup
> -	if $showme; then
> +	if [ "$_te_dry_run" == "true" ]; then
>  		:
>  	elif $needsum; then
>  		count=`wc -L $tmp.summary | cut -f1 -d" "`
> @@ -358,95 +343,6 @@ _summary()
>  	rm -f $tmp.*
>  }
>  
> -_check_filesystems()
> -{
> -	local ret=0
> -
> -	if [ -f ${RESULT_DIR}/require_test ]; then
> -		if ! _check_test_fs ; then
> -			ret=1
> -			echo "Trying to repair broken TEST_DEV file system"
> -			_repair_test_fs
> -			_test_mount
> -		fi
> -		rm -f ${RESULT_DIR}/require_test*
> -	else
> -		_test_unmount 2> /dev/null
> -	fi
> -	if [ -f ${RESULT_DIR}/require_scratch ]; then
> -		_check_scratch_fs || ret=1
> -		rm -f ${RESULT_DIR}/require_scratch*
> -	fi
> -	_scratch_unmount 2> /dev/null
> -	return $ret
> -}
> -
> -# retain files which would be overwritten in subsequent reruns of the same test
> -_stash_fail_loop_files() {
> -	local seq_prefix="${REPORT_DIR}/${1}"
> -	local cp_suffix="$2"
> -
> -	for i in ".full" ".dmesg" ".out.bad" ".notrun" ".core" ".hints"; do
> -		rm -f "${seq_prefix}${i}${cp_suffix}"
> -		if [ -f "${seq_prefix}${i}" ]; then
> -			cp "${seq_prefix}${i}" "${seq_prefix}${i}${cp_suffix}"
> -		fi
> -	done
> -}
> -
> -# Retain in @bad / @notrun the result of the just-run @test_seq. @try array
> -# entries are added prior to execution.
> -_stash_test_status() {
> -	local test_seq="$1"
> -	local test_status="$2"
> -
> -	if $do_report && [[ $test_status != "expunge" ]]; then
> -		_make_testcase_report "$section" "$test_seq" \
> -				      "$test_status" "$((stop - start))"
> -	fi
> -
> -	if ((${#loop_status[*]} > 0)); then
> -		# continuing or completing rerun-on-failure loop
> -		_stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}"
> -		loop_status+=("$test_status")
> -		if ((${#loop_status[*]} > loop_on_fail)); then
> -			printf "%s aggregate results across %d runs: " \
> -				"$test_seq" "${#loop_status[*]}"
> -			awk "BEGIN {
> -				n=split(\"${loop_status[*]}\", arr);"'
> -				for (i = 1; i <= n; i++)
> -					stats[arr[i]]++;
> -				for (x in stats)
> -					printf("%s=%d (%.1f%%)",
> -					       (i-- > n ? x : ", " x),
> -					       stats[x], 100 * stats[x] / n);
> -				}'
> -			echo
> -			loop_status=()
> -		fi
> -		return	# only stash @bad result for initial failure in loop
> -	fi
> -
> -	case "$test_status" in
> -	fail)
> -		if ((loop_on_fail > 0)); then
> -			# initial failure, start rerun-on-failure loop
> -			_stash_fail_loop_files "$test_seq" ".rerun0"
> -			loop_status+=("$test_status")
> -		fi
> -		bad+=("$test_seq")
> -		;;
> -	list|notrun)
> -		notrun+=("$test_seq")
> -		;;
> -	pass|expunge)
> -		;;
> -	*)
> -		echo "Unexpected test $test_seq status: $test_status"
> -		;;
> -	esac
> -}
> -
>  # Can we run systemd scopes?
>  HAVE_SYSTEMD_SCOPES=
>  systemctl reset-failed "fstests-check" &>/dev/null
> @@ -454,11 +350,7 @@ systemd-run --quiet --unit "fstests-check" --scope bash -c "exit 77" &> /dev/nul
>  test $? -eq 77 && HAVE_SYSTEMD_SCOPES=yes
>  
>  # Make the check script unattractive to the OOM killer...
> -OOM_SCORE_ADJ="/proc/self/oom_score_adj"
> -function _adjust_oom_score() {
> -	test -w "${OOM_SCORE_ADJ}" && echo "$1" > "${OOM_SCORE_ADJ}"
> -}
> -_adjust_oom_score -500
> +_te_adjust_oom_score -500
>  
>  # ...and make the tests themselves somewhat more attractive to it, so that if
>  # the system runs out of memory it'll be the test that gets killed and not the
> @@ -472,7 +364,7 @@ _adjust_oom_score -500
>  # when systemd tells them to terminate (e.g. programs stuck in D state when
>  # systemd sends SIGKILL), so we use reset-failed to tear down the scope.
>  _run_seq() {
> -	local cmd=(bash -c "test -w ${OOM_SCORE_ADJ} && echo 250 > ${OOM_SCORE_ADJ}; exec ./$seq")
> +	local cmd=(bash -c "test -w ${_te_oom_score_adj} && echo 250 > ${_te_oom_score_adj}; exec ./$seq")
nit: 80 chars exceeded
>  	local res
>  
>  	if [ -n "${HAVE_SYSTEMD_SCOPES}" ]; then
> @@ -617,222 +509,26 @@ function run_section()
>  	seqres="$check"
>  	_check_test_fs
>  
> -	loop_status=()	# track rerun-on-failure state
> -	local tc_status ix
> +	if $OPTIONS_HAVE_SECTIONS; then
> +		REPORT_DIR="$RESULT_BASE/$section"
> +	else
> +		REPORT_DIR="$RESULT_BASE"
> +	fi
> +
> +	local ix
>  	local -a _list=( $_tl_tests )
> -	for ((ix = 0; ix < ${#_list[*]}; !${#loop_status[*]} && ix++)); do
> +	for ((ix = 0; ix < ${#_list[*]}; !${#_te_loop_status[*]} && ix++)); do
>  		seq="${_list[$ix]}"
>  
> -		# the filename for the test and the name output are different.
> -		# we don't include the tests/ directory in the name output.
> -		export seqnum=$(_tl_strip_src_dir $seq)
> -		group=${seqnum%%/*}
> -		if $OPTIONS_HAVE_SECTIONS; then
> -			REPORT_DIR="$RESULT_BASE/$section"
> -		else
> -			REPORT_DIR="$RESULT_BASE"
> -		fi
> -		export RESULT_DIR="$REPORT_DIR/$group"
> -		seqres="$REPORT_DIR/$seqnum"
> -
>  		# Generate the entire section report with whatever test results
>  		# we have so far.  Leave the $sect_time parameter empty so that
>  		# it's a little more obvious that this test run is incomplete.
>  		if $do_report; then
> -			_make_section_report "$section" "${#try[*]}" \
> -					     "${#bad[*]}" "${#notrun[*]}" \
> +			_make_section_report "$section" "${#_te_try[*]}" \
> +					     "${#_te_bad[*]}" "${#_te_notrun[*]}" \
>  					     "" &> /dev/null
>  		fi
> -
> -		echo -n "$seqnum"
> -
> -		if $showme; then
> -			if _tl_expunge_test $seqnum; then
> -				tc_status="expunge"
> -			else
> -				echo
> -				start=0
> -				stop=0
> -				tc_status="list"
> -			fi
> -			_stash_test_status "$seqnum" "$tc_status"
> -			continue
> -		fi
> -
> -		tc_status="pass"
> -		if [ ! -f $seq ]; then
> -			echo " - no such test?"
> -			_stash_test_status "$seqnum" "$tc_status"
> -			continue
> -		fi
> -
> -		# really going to try and run this one
> -		mkdir -p $RESULT_DIR
> -		rm -f ${RESULT_DIR}/require_scratch*
> -		rm -f ${RESULT_DIR}/require_test*
> -		rm -f $seqres.out.bad $seqres.hints
> -
> -		# check if we really should run it
> -		if _tl_expunge_test $seqnum; then
> -			tc_status="expunge"
> -			_stash_test_status "$seqnum" "$tc_status"
> -			continue
> -		fi
> -
> -		# record that we really tried to run this test.
> -		if ((!${#loop_status[*]})); then
> -			try+=("$seqnum")
> -		fi
> -
> -		awk 'BEGIN {lasttime="       "} \
> -		     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
> -		     END {printf "%s", lasttime}' "$check.time"
> -		rm -f core $seqres.notrun
> -
> -		start=`_wallclock`
> -		$timestamp && _timestamp
> -		[ ! -x $seq ] && chmod u+x $seq # ensure we can run it
> -		$LOGGER_PROG "run xfstest $seqnum"
> -		if [ -w /dev/kmsg ]; then
> -			export date_time=`date +"%F %T"`
> -			echo "run fstests $seqnum at $date_time" > /dev/kmsg
> -			# _check_dmesg depends on this log in dmesg
> -			touch ${RESULT_DIR}/check_dmesg
> -			rm -f ${RESULT_DIR}/dmesg_filter
> -		fi
> -		_try_wipe_scratch_devs > /dev/null 2>&1
> -
> -		# clear the WARN_ONCE state to allow a potential problem
> -		# to be reported for each test
> -		(echo 1 > $DEBUGFS_MNT/clear_warn_once) > /dev/null 2>&1
> -
> -		test_start_time="$(date +"%F %T")"
> -		if [ "$DUMP_OUTPUT" = true ]; then
> -			_run_seq 2>&1 | tee $tmp.out
> -			# Because $? would get tee's return code
> -			sts=${PIPESTATUS[0]}
> -		else
> -			_run_seq >$tmp.out 2>&1
> -			sts=$?
> -		fi
> -
> -		# If someone sets kernel.core_pattern or kernel.core_uses_pid,
> -		# coredumps generated by fstests might have a longer name than
> -		# just "core".  Use globbing to find the most common patterns,
> -		# assuming there are no other coredump capture packages set up.
> -		local cores=0
> -		for i in core core.*; do
> -			test -f "$i" || continue
> -			if ((cores++ == 0)); then
> -				_dump_err_cont "[dumped core]"
> -			fi
> -			(_adjust_oom_score 250; _save_coredump "$i")
> -			tc_status="fail"
> -		done
> -
> -		if [ -f $seqres.notrun ]; then
> -			$timestamp && _timestamp
> -			stop=`_wallclock`
> -			$timestamp || echo -n "[not run] "
> -			$timestamp && echo " [not run]" && \
> -				      echo -n "	$seqnum -- "
> -			cat $seqres.notrun
> -			tc_status="notrun"
> -			_stash_test_status "$seqnum" "$tc_status"
> -
> -			# Unmount the scratch fs so that we can wipe the scratch
> -			# dev state prior to the next test run.
> -			_scratch_unmount 2> /dev/null
> -			continue;
> -		fi
> -
> -		if [ $sts -ne 0 ]; then
> -			_dump_err_cont "[failed, exit status $sts]"
> -			_test_unmount 2> /dev/null
> -			_scratch_unmount 2> /dev/null
> -			rm -f ${RESULT_DIR}/require_test*
> -			rm -f ${RESULT_DIR}/require_scratch*
> -			# Even though we failed, there may be something interesting in
> -			# dmesg which can help debugging.
> -			_check_dmesg
> -			tc_status="fail"
> -		else
> -			# The test apparently passed, so check for corruption
> -			# and log messages that shouldn't be there.  Run the
> -			# checking tools from a subshell with adjusted OOM
> -			# score so that the OOM killer will target them instead
> -			# of the check script itself.
> -			(_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
> -			_check_dmesg || tc_status="fail"
> -
> -			# Save any coredumps from the post-test fs checks
> -			for i in core core.*; do
> -				test -f "$i" || continue
> -				if ((cores++ == 0)); then
> -					_dump_err_cont "[dumped core]"
> -				fi
> -				(_adjust_oom_score 250; _save_coredump "$i")
> -				tc_status="fail"
> -			done
> -		fi
> -
> -		# Reload the module after each test to check for leaks or
> -		# other problems.
> -		if [ -n "${TEST_FS_MODULE_RELOAD}" ]; then
> -			_test_unmount 2> /dev/null
> -			_scratch_unmount 2> /dev/null
> -			modprobe -r fs-$FSTYP
> -			modprobe fs-$FSTYP
> -		fi
> -
> -		# Scan for memory leaks after every test so that associating
> -		# a leak to a particular test will be as accurate as possible.
> -		_check_kmemleak || tc_status="fail"
> -
> -		# test ends after all checks are done.
> -		$timestamp && _timestamp
> -		stop=`_wallclock`
> -
> -		if [ ! -f $seq.out ]; then
> -			_dump_err "no qualified output"
> -			tc_status="fail"
> -			_stash_test_status "$seqnum" "$tc_status"
> -			continue;
> -		fi
> -
> -		# coreutils 8.16+ changed quote formats in error messages
> -		# from `foo' to 'foo'. Filter old versions to match the new
> -		# version.
> -		sed -i "s/\`/\'/g" $tmp.out
> -		if diff $seq.out $tmp.out >/dev/null 2>&1 ; then
> -			if [ "$tc_status" != "fail" ]; then
> -				echo "$seqnum `expr $stop - $start`" >>$tmp.time
> -				echo -n " `expr $stop - $start`s"
> -			fi
> -			echo ""
> -		else
> -			_dump_err "- output mismatch (see $seqres.out.bad)"
> -			mv $tmp.out $seqres.out.bad
> -			$diff $seq.out $seqres.out.bad | {
> -			if test "$DIFF_LENGTH" -le 0; then
> -				cat
> -			else
> -				head -n "$DIFF_LENGTH"
> -				echo "..."
> -				echo "(Run '$diff $here/$seq.out $seqres.out.bad'" \
> -					" to see the entire diff)"
> -			fi; } | sed -e 's/^\(.\)/    \1/'
> -			tc_status="fail"
> -		fi
> -		if [ -f $seqres.hints ]; then
> -			if [ "$tc_status" == "fail" ]; then
> -				echo
> -				cat $seqres.hints
> -			else
> -				rm -f $seqres.hints
> -			fi
> -		fi
> -		_stash_test_status "$seqnum" "$tc_status"
> +		_te_run_test $seq
>  	done
>  
>  	# Reset these three variables so that unmount output doesn't get
> diff --git a/common/test_exec b/common/test_exec
> new file mode 100644
> index 000000000..63efa3d19
> --- /dev/null
> +++ b/common/test_exec
> @@ -0,0 +1,352 @@
> +##/bin/bash
> +# SPDX-License-Identifier: GPL-2.0+
> +# Copyright (c) 2000-2002,2006 Silicon Graphics, Inc.  All Rights Reserved.
> +# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
> +#
> +# Test execution functions
> +#
> +# This file contains the functions to run a test and capture the results. The
> +# caller context must source all the dependencies this code requires, as well
> +# as provide certain global variables and define certain
> +# functions to run and track the test status of the test being run.
> +#
> +# Any function or variable that is public should have a "_te_" prefix.
> +
> +# test status tracking variables. These are externally visible so that the
> +# caller can do it's own test reporting based on the tracking provided by these
> +# variables.
> +_te_try=()
> +_te_bad=()
> +_te_notrun=()
> +_te_loop_status=()
> +_te_emit_timestamps=""
> +_te_dry_run=""
> +
> +_te_wipe_counters()
> +{
> +	_te_try=()
> +	_te_notrun=()
> +	_te_bad=()
> +	_te_loop_status=()
> +}
> +
> +_te_oom_score_adj="/proc/self/oom_score_adj"
> +_te_adjust_oom_score() {
> +	test -w "${_te_oom_score_adj}" && echo "$1" > "${_te_oom_score_adj}"
> +}
> +
> +_te_timestamp()
> +{
> +	if [ "$_te_emit_timestamps" == "true" ]; then
> +		local now=`date "+%T"`
> +		echo -n " [$now]"
> +	fi
> +}
> +
> +_te_check_filesystems()
> +{
> +	local ret=0
> +
> +	if [ -f ${RESULT_DIR}/require_test ]; then
> +		if ! _check_test_fs ; then
> +			ret=1
> +			echo "Trying to repair broken TEST_DEV file system"
> +			_repair_test_fs
> +			_test_mount
> +		fi
> +		rm -f ${RESULT_DIR}/require_test*
> +	else
> +		_test_unmount 2> /dev/null
> +	fi
> +	if [ -f ${RESULT_DIR}/require_scratch ]; then
> +		_check_scratch_fs || ret=1
> +		rm -f ${RESULT_DIR}/require_scratch*
> +	fi
> +	_scratch_unmount 2> /dev/null
> +	return $ret
> +}
> +
> +# retain files which would be overwritten in subsequent reruns of the same test
> +_te_stash_fail_loop_files() {
> +	local seq_prefix="${REPORT_DIR}/${1}"
> +	local cp_suffix="$2"
> +
> +	for i in ".full" ".dmesg" ".out.bad" ".notrun" ".core" ".hints"; do
> +		rm -f "${seq_prefix}${i}${cp_suffix}"
> +		if [ -f "${seq_prefix}${i}" ]; then
> +			cp "${seq_prefix}${i}" "${seq_prefix}${i}${cp_suffix}"
> +		fi
> +	done
> +}
> +
> +# Retain in @bad / @notrun the result of the just-run @test_seq. @try array
> +# entries are added prior to execution.
> +_te_stash_test_status() {
> +	local test_seq="$1"
> +	local test_status="$2"
> +
> +	if $do_report && [[ $test_status != "expunge" ]]; then
> +		_make_testcase_report "$section" "$test_seq" \
> +				      "$test_status" "$((stop - start))"
> +	fi
> +
> +	if ((${#loop_status[*]} > 0)); then
Shouldn't the above be _te_loop_status?
> +		# continuing or completing rerun-on-failure loop
> +		_te_stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}"
> +		_te_loop_status+=("$test_status")
> +		if ((${#_te_loop_status[*]} > loop_on_fail)); then
> +			printf "%s aggregate results across %d runs: " \
> +				"$test_seq" "${#loop_status[*]}"
> +			awk "BEGIN {
> +				n=split(\"${loop_status[*]}\", arr);"'
> +				for (i = 1; i <= n; i++)
> +					stats[arr[i]]++;
> +				for (x in stats)
> +					printf("%s=%d (%.1f%%)",
> +					       (i-- > n ? x : ", " x),
> +					       stats[x], 100 * stats[x] / n);
> +				}'
> +			echo
> +			_te_loop_status=()
> +		fi
> +		return	# only stash @bad result for initial failure in loop
> +	fi
> +
> +	case "$test_status" in
> +	fail)
> +		if ((loop_on_fail > 0)); then
> +			# initial failure, start rerun-on-failure loop
> +			_te_stash_fail_loop_files "$test_seq" ".rerun0"
> +			loop_status+=("$test_status")
Same: Shouldn't the above be _te_loop_status? We should only use _te_loop_status, isn't it?
> +		fi
> +		_te_bad+=("$test_seq")
> +		;;
> +	list|notrun)
> +		_te_notrun+=("$test_seq")
> +		;;
> +	pass|expunge)
> +		;;
> +	*)
> +		echo "Unexpected test $test_seq status: $test_status"
> +		;;
> +	esac
> +}
> +
> +# Run a test.
> +#
> +# This currently relies on the caller defining global variables for test
> +# reporting and status tracking:
> +#
> +# REPORT_DIR
> +# tmp
> +#
> +# The caller also needs to define the functions _run_seq and _kill_seq
> +# for executing and killing specific test binaries.
I couldn't understand this part. _run_seq() is already defined, isn't it? 
--NR
> +# 
> +_te_run_test()
> +{
> +	local seq="$1"
> +	local tc_status="pass"
> +	local start
> +	local stop
> +	local sts
> +
> +	# the filename for the test and the name output are different.
> +	# we don't include the tests/ directory in the name output.
> +	export seqnum=$(_tl_strip_src_dir $seq)
> +	group=${seqnum%%/*}
> +	export RESULT_DIR="$REPORT_DIR/$group"
> +	seqres="$REPORT_DIR/$seqnum"
> +
> +	echo -n "$seqnum"
> +
> +	if [ "$_te_dry_run" == "true" ]; then
> +		if _tl_expunge_test $seqnum; then
> +			tc_status="expunge"
> +		else
> +			echo
> +			start=0
> +			stop=0
> +			tc_status="list"
> +		fi
> +		_te_stash_test_status "$seqnum" "$tc_status"
> +		return
> +	fi
> +
> +	if [ ! -f $seq ]; then
> +		echo " - no such test?"
> +		_te_stash_test_status "$seqnum" "$tc_status"
> +		return
> +	fi
> +
> +	# really going to try and run this one
> +	mkdir -p $RESULT_DIR
> +	rm -f ${RESULT_DIR}/require_scratch*
> +	rm -f ${RESULT_DIR}/require_test*
> +	rm -f $seqres.out.bad $seqres.hints
> +
> +	# check if we really should run it
> +	if _tl_expunge_test $seqnum; then
> +		tc_status="expunge"
> +		_te_stash_test_status "$seqnum" "$tc_status"
> +		return
> +	fi
> +
> +	# record that we really tried to run this test.
> +	if ((!${#loop_status[*]})); then
> +		_te_try+=("$seqnum")
> +	fi
> +
> +	awk 'BEGIN {lasttime="       "} \
> +	     $1 == "'$seqnum'" {lasttime=" " $2 "s ... "; exit} \
> +	     END {printf "%s", lasttime}' "$check.time"
> +	rm -f core $seqres.notrun
> +
> +	start=`_wallclock`
> +	_te_timestamp
> +	[ ! -x $seq ] && chmod u+x $seq # ensure we can run it
> +	$LOGGER_PROG "run xfstest $seqnum"
> +	if [ -w /dev/kmsg ]; then
> +		export date_time=`date +"%F %T"`
> +		echo "run fstests $seqnum at $date_time" > /dev/kmsg
> +		# _check_dmesg depends on this log in dmesg
> +		touch ${RESULT_DIR}/check_dmesg
> +		rm -f ${RESULT_DIR}/dmesg_filter
> +	fi
> +	_try_wipe_scratch_devs > /dev/null 2>&1
> +
> +	# clear the WARN_ONCE state to allow a potential problem
> +	# to be reported for each test
> +	(echo 1 > $DEBUGFS_MNT/clear_warn_once) > /dev/null 2>&1
> +
> +	test_start_time="$(date +"%F %T")"
> +	if [ "$DUMP_OUTPUT" = true ]; then
> +		_run_seq 2>&1 | tee $tmp.out
> +		# Because $? would get tee's return code
> +		sts=${PIPESTATUS[0]}
> +	else
> +		_run_seq >$tmp.out 2>&1
> +		sts=$?
> +	fi
> +
> +	# If someone sets kernel.core_pattern or kernel.core_uses_pid,
> +	# coredumps generated by fstests might have a longer name than
> +	# just "core".  Use globbing to find the most common patterns,
> +	# assuming there are no other coredump capture packages set up.
> +	local cores=0
> +	for i in core core.*; do
> +		test -f "$i" || continue
> +		if ((cores++ == 0)); then
> +			_dump_err_cont "[dumped core]"
> +		fi
> +		(_te_adjust_oom_score 250; _save_coredump "$i")
> +		tc_status="fail"
> +	done
> +
> +	if [ -f $seqres.notrun ]; then
> +		stop=`_wallclock`
> +		if [ "$_te_emit_timestamps" == "true" ]; then
> +			_te_timestamp
> +			echo " [not run]"
> +			echo -n " $seqnum -- "
> +		else
> +			echo -n "[not run] "
> +		fi
> +		cat $seqres.notrun
> +		tc_status="notrun"
> +		_te_stash_test_status "$seqnum" "$tc_status"
> +
> +		# Unmount the scratch fs so that we can wipe the scratch
> +		# dev state prior to the next test run.
> +		_scratch_unmount 2> /dev/null
> +		return
> +	fi
> +
> +	if [ $sts -ne 0 ]; then
> +		_dump_err_cont "[failed, exit status $sts]"
> +		_test_unmount 2> /dev/null
> +		_scratch_unmount 2> /dev/null
> +		rm -f ${RESULT_DIR}/require_test*
> +		rm -f ${RESULT_DIR}/require_scratch*
> +		# Even though we failed, there may be something interesting in
> +		# dmesg which can help debugging.
> +		_check_dmesg
> +		tc_status="fail"
> +	else
> +		# The test apparently passed, so check for corruption
> +		# and log messages that shouldn't be there.  Run the
> +		# checking tools from a subshell with adjusted OOM
> +		# score so that the OOM killer will target them instead
> +		# of the check script itself.
> +		(_te_adjust_oom_score 250; _te_check_filesystems) || tc_status="fail"
> +		_check_dmesg || tc_status="fail"
> +
> +		# Save any coredumps from the post-test fs checks
> +		for i in core core.*; do
> +			test -f "$i" || continue
> +			if ((cores++ == 0)); then
> +				_dump_err_cont "[dumped core]"
> +			fi
> +			(_te_adjust_oom_score 250; _save_coredump "$i")
> +			tc_status="fail"
> +		done
> +	fi
> +
> +	# Reload the module after each test to check for leaks or
> +	# other problems.
> +	if [ -n "${TEST_FS_MODULE_RELOAD}" ]; then
> +		_test_unmount 2> /dev/null
> +		_scratch_unmount 2> /dev/null
> +		modprobe -r fs-$FSTYP
> +		modprobe fs-$FSTYP
> +	fi
> +
> +	# Scan for memory leaks after every test so that associating
> +	# a leak to a particular test will be as accurate as possible.
> +	_check_kmemleak || tc_status="fail"
> +
> +	# test ends after all checks are done.
> +	_te_timestamp
> +	stop=`_wallclock`
> +
> +	if [ ! -f $seq.out ]; then
> +		_dump_err "no qualified output"
> +		tc_status="fail"
> +		_te_stash_test_status "$seqnum" "$tc_status"
> +		return;
> +	fi
> +
> +	# coreutils 8.16+ changed quote formats in error messages
> +	# from `foo' to 'foo'. Filter old versions to match the new
> +	# version.
> +	sed -i "s/\`/\'/g" $tmp.out
> +	if diff $seq.out $tmp.out >/dev/null 2>&1 ; then
> +		if [ "$tc_status" != "fail" ]; then
> +			echo "$seqnum `expr $stop - $start`" >>$tmp.time
> +			echo -n " `expr $stop - $start`s"
> +		fi
> +		echo ""
> +	else
> +		_dump_err "- output mismatch (see $seqres.out.bad)"
> +		mv $tmp.out $seqres.out.bad
> +		$diff $seq.out $seqres.out.bad | {
> +		if test "$DIFF_LENGTH" -le 0; then
> +			cat
> +		else
> +			head -n "$DIFF_LENGTH"
> +			echo "..."
> +			echo "(Run '$diff $here/$seq.out $seqres.out.bad'" \
> +				" to see the entire diff)"
> +		fi; } | sed -e 's/^\(.\)/    \1/'
> +		tc_status="fail"
> +	fi
> +	if [ -f $seqres.hints ]; then
> +		if [ "$tc_status" == "fail" ]; then
> +			echo
> +			cat $seqres.hints
> +		else
> +			rm -f $seqres.hints
> +		fi
> +	fi
> +	_te_stash_test_status "$seqnum" "$tc_status"
> +}


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 20/28] [RFC] check-parallel: run tests directly without using check
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (18 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 19/28] check: factor test running Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-05-13 14:48   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 21/28] generic/531: limit max files per CPU Dave Chinner
                   ` (7 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Now that we've factored the code that runs individual tests out of
check, we can start using it in check-parallel instead of calling
check to do so.

I originally thought that:

	We can do this by implementing a _run_seq() callback that runs the
	individual test in it's own private namespace, using the generic
	test execution and status tracking to record what happens with each
	test that is run by a given runner.

	This allows the runner to destage a test at a time from the global
	list and execute it in a batched setup, all without requiring us
	to run check for each test.

But, no, no we can't do that. We have to mount the test filesystem
inside the new private mount namespace context, otherwise all the
other runner contexts see it and we reintroduce shared mount
namespace problems...

Hence we need a helper script that sets up the test device and does
all that admin stuff before it starts grabbing tests from the test
list. This means that multiple tests will run inside the runner's
private namespace; the helper is really now a very minimal version
of check....

This requires that check-parallel also sets up the required
environment for all tests to run in. We can do this now simply by
sourcing common/config as this now sets up and exports the entire
environment that tests rely on.

Note: this change is mainly for RFC purposes right now - it still
needs more cleanup and probably some splitting up before it is ready
to go. I'm including it more as a informational "this is why various
factoring and rearrangement has been done" patch than a full review
request...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 check             |  29 ++----------
 check-parallel    |  75 ++++++++++++------------------
 common/config     |   6 +++
 common/rc         |   2 +
 common/test_exec  |  29 +++++++++++-
 common/test_list  |   6 +++
 tests/xfs/271     |   2 -
 tools/run_test.sh | 116 ++++++++++++++++++++++++++++++++++++++++++++++
 8 files changed, 191 insertions(+), 74 deletions(-)
 create mode 100755 tools/run_test.sh

diff --git a/check b/check
index 106de0ee6..f032f9c48 100755
--- a/check
+++ b/check
@@ -25,15 +25,6 @@ _err_msg=""
 # start the initialisation work now
 iam=check.$$
 
-# mkfs.xfs uses the presence of both of these variables to enable formerly
-# supported tiny filesystem configurations that fstests use for fuzz testing
-# in a controlled environment
-export MSGVERB="text:action"
-export QA_CHECK_FS=${QA_CHECK_FS:=true}
-
-# number of diff lines from a failed test, 0 for whole output
-export DIFF_LENGTH=${DIFF_LENGTH:=10}
-
 rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
 
 . ./common/exit
@@ -188,6 +179,10 @@ fi
 # by default don't output timestamps
 _te_emit_timestamps=${TIMESTAMP:=}
 
+# number of diff lines from a failed test, 0 for whole output
+_te_diff_length=${DIFF_LENGTH:=10}
+
+
 # If the test config specified a soak test duration, see if there are any
 # unit suffixes that need converting to an integer seconds count.
 if [ -n "$SOAK_DURATION" ]; then
@@ -245,21 +240,7 @@ _wrapup()
 		fi
 		needwrap=false
 	elif $needwrap; then
-		if [ -f $check.time -a -f $tmp.time ]; then
-			cat $check.time $tmp.time  \
-				| $AWK_PROG '
-				{ t[$1] = $2 }
-				END {
-					if (NR > 0) {
-						for (i in t) print i " " t[i]
-					}
-				}' \
-				| sort -n >$tmp.out
-			mv $tmp.out $check.time
-			if $OPTIONS_HAVE_SECTIONS; then
-				cp $check.time ${REPORT_DIR}/check.time
-			fi
-		fi
+		_te_time_report
 		set +x
 
 		_global_log ""
diff --git a/check-parallel b/check-parallel
index 1b67709a2..be3dfd346 100755
--- a/check-parallel
+++ b/check-parallel
@@ -18,13 +18,13 @@ run_section=""
 exclude_section=""
 iam="check-parallel"
 
-tmp=/tmp/check-parallel.$$
-test_list="$tmp.test_list"
+export here=`pwd`
 
 . ./common/exit
 . ./common/test_names
 . ./common/test_list
 . ./common/config-sections
+. ./common/config
 
 usage()
 {
@@ -155,6 +155,18 @@ elif [ -d "$basedir/runner-0/" ]; then
 	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
 fi
 
+# We keep the test list in the base directory because it needs to be common
+# across all test runners and their private namespaces. Because runners execute
+# in private /tmp/ instances, we can't keep it there, so use the shared
+# $basedir that hosts all the runners as the location of the test list.
+test_list="$basedir/test_list"
+
+# Similarly, set our tmp dir to be under the shared basedir so that all the
+# temp files that stuff like test execution use for expunge lists work
+# appropriately.
+export tmp="$basedir/tmp/check-parallel.$$"
+mkdir -p $basedir/tmp
+
 # grab all previously run tests and order them from highest runtime to lowest
 # We are going to try to run the longer tests first, hopefully so we can avoid
 # massive thundering herds trying to run lots of really short tests in parallel
@@ -219,22 +231,6 @@ setup_test_list()
 	echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
 }
 
-# Grab the next test to be run from the tail of the file.
-# Returns an empty string if there is no tests remaining to run.
-# File operations are run under flock so concurrent gets are serialised against
-# each other.
-get_next_test()
-{
-	local test_file="$test_list.$1"
-	local test=
-
-	flock 99
-	test=$(tail -1 $test_file)
-	sed -i "\,$test,d" $test_file
-	flock -u 99
-	echo $test
-}
-
 
 _create_loop_device()
 {
@@ -260,32 +256,6 @@ _destroy_loop_device()
         losetup -d $dev || _fail "Cannot destroy loop device $dev"
 }
 
-run_tests()
-{
-	local section="$1"
-	local logfile="$2"
-
-	exec 99<>$tmp.test_list_lock
-
-	local test_to_run=$(get_next_test $section)
-
-	# Run the tests in it's own mount namespace, as per the comment below
-	# that precedes making the basedir a private mount.
-	#
-	# Similarly, we need to run check in it's own PID namespace so that
-	# operations like pkill only affect the runner instance, not globally
-	# kill processes from other check instances.
-	while [ -n "$test_to_run" ]; do
-		echo -n " $test_to_run "
-		unset FSTESTS_ISOL
-		if ! _tl_expunge_test $test_to_run; then
-			tools/run_privatens ./check -s $section $test_to_run >> $logfile 2>&1
-		fi
-
-		test_to_run=$(get_next_test $section)
-	done
-}
-
 runner_go()
 {
 
@@ -301,6 +271,8 @@ runner_go()
 	local _results=$me/results-$2
 	local section=$3
 
+	export tmp=/tmp/check-parallel-runner-$id
+
 	mkdir -p $me
 
 	xfs_io -f -c "truncate $TEST_DEV_SIZE" $_test
@@ -323,6 +295,7 @@ runner_go()
 
 	export LOGWRITES_DEV=$(_create_loop_device $_logwrites)
 	export RESULT_BASE=$_results
+	export REPORT_DIR="$RESULT_BASE/$section"
 
 	mkdir -p $TEST_DIR
 	mkdir -p $SCRATCH_MNT
@@ -338,7 +311,7 @@ runner_go()
 
 #	export DUMP_CORRUPT_FS=1
 
-	run_tests $section $_results/log
+	tools/run_test.sh $basedir $test_list.$section >> $_results/log 2>&1
 
 	wait
 	sleep 1
@@ -351,6 +324,7 @@ runner_go()
 	_destroy_loop_device $SCRATCH_RTDEV
 	_destroy_loop_device $SCRATCH_LOGDEV
 	_destroy_loop_device $LOGWRITES_DEV
+	rm -f $tmp.*
 
 	grep -q Failures: $_results/log
 	if [ $? -eq 0 ]; then
@@ -366,6 +340,7 @@ run_section()
 	local now="$2"
 	local results="$basedir/*/results-$now"
 	local i
+	local threads=$runners
 
 	echo $run_section |grep -qw $section || return
 	echo $exclude_section |grep -qw $section && return
@@ -391,7 +366,15 @@ run_section()
 	fi
 	cp $test_list $test_list.$section
 
-	for ((i = 0; i < $runners; i++)); do
+	# only run as many runners are there are tests to run
+	i=$(cat $test_list.$section | wc -l)
+	if [ "$i" -lt "$runners" ]; then
+		threads=$i
+	fi
+
+	echo Test list $test_list.$section contains $i tests
+	echo Running $threads tests concurrently
+	for ((i = 0; i < $threads; i++)); do
 		runner_go $i $now $section &
 	done
 	wait
diff --git a/common/config b/common/config
index b93a6c0d3..149ef99c7 100644
--- a/common/config
+++ b/common/config
@@ -70,6 +70,12 @@ export LOAD_FACTOR=${LOAD_FACTOR:=1}
 export SOAK_DURATION=${SOAK_DURATION:=}
 export DEBUGFS_MNT=${DEBUGFS_MNT:="/sys/kernel/debug"}
 
+# mkfs.xfs uses the presence of both of these variables to enable formerly
+# supported tiny filesystem configurations that fstests use for fuzz testing
+# in a controlled environment
+export MSGVERB="text:action"
+export QA_CHECK_FS=${QA_CHECK_FS:=true}
+
 # some constants for overlayfs setup
 export OVL_UPPER="ovl-upper"
 export OVL_LOWER="ovl-lower"
diff --git a/common/rc b/common/rc
index be6cd92c4..5b7d2faf8 100644
--- a/common/rc
+++ b/common/rc
@@ -2936,6 +2936,8 @@ _require_xfs_io_command()
 			_notrun "O_TMPFILE is not supported"
 		;;
 	"fsmap")
+		df -h >> $seqres.full 2>&1
+		df -i >> $seqres.full 2>&1
 		testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
 		echo $testio | grep -q "Inappropriate ioctl" && \
 			_notrun "xfs_io $command support is missing"
diff --git a/common/test_exec b/common/test_exec
index 63efa3d19..e9c80bd59 100644
--- a/common/test_exec
+++ b/common/test_exec
@@ -21,6 +21,12 @@ _te_notrun=()
 _te_loop_status=()
 _te_emit_timestamps=""
 _te_dry_run=""
+_te_diff_length=10
+
+# This is needed because callers don't always source common/report and
+# if $do_report is undefined or true we'll try to call _make_testcase_report()
+# without it being defined.
+do_report=false
 
 _te_wipe_counters()
 {
@@ -43,6 +49,25 @@ _te_timestamp()
 	fi
 }
 
+_te_time_report()
+{
+	if [ -f $check.time -a -f $tmp.time ]; then
+		cat $check.time $tmp.time  \
+			| $AWK_PROG '
+			{ t[$1] = $2 }
+			END {
+				if (NR > 0) {
+					for (i in t) print i " " t[i]
+				}
+			}' \
+			| sort -n >$tmp.out
+		mv $tmp.out $check.time
+		if $OPTIONS_HAVE_SECTIONS; then
+			cp $check.time ${REPORT_DIR}/check.time
+		fi
+	fi
+}
+
 _te_check_filesystems()
 {
 	local ret=0
@@ -330,10 +355,10 @@ _te_run_test()
 		_dump_err "- output mismatch (see $seqres.out.bad)"
 		mv $tmp.out $seqres.out.bad
 		$diff $seq.out $seqres.out.bad | {
-		if test "$DIFF_LENGTH" -le 0; then
+		if test "$_te_diff_length" -le 0; then
 			cat
 		else
-			head -n "$DIFF_LENGTH"
+			head -n "$_te_diff_length"
 			echo "..."
 			echo "(Run '$diff $here/$seq.out $seqres.out.bad'" \
 				" to see the entire diff)"
diff --git a/common/test_list b/common/test_list
index 092b3ed17..4beb08b22 100644
--- a/common/test_list
+++ b/common/test_list
@@ -160,6 +160,11 @@ _tl_prepare_test_list()
 		trim_test_list $list
 	done
 
+	# Remove expunged tests
+	for f in "${_tl_exclude_tests[@]}"; do
+		trim_test_list tests/$f
+	done
+
 	# sort the list of tests into numeric order unless we're running tests
 	# in the exact order specified
 	if ! $_tl_exact_order; then
@@ -194,6 +199,7 @@ _tl_expunge_test()
 	return 1
 }
 
+
 _tl_setup_exclude_tests()
 {
 	local list="$1"
diff --git a/tests/xfs/271 b/tests/xfs/271
index 8a71746d6..ae4599282 100755
--- a/tests/xfs/271
+++ b/tests/xfs/271
@@ -22,8 +22,6 @@ _cleanup()
 _require_xfs_scratch_rmapbt
 _require_xfs_io_command "fsmap"
 
-rm -f "$seqres.full"
-
 echo "Format and mount"
 _scratch_mkfs > "$seqres.full" 2>&1
 _scratch_mount
diff --git a/tools/run_test.sh b/tools/run_test.sh
new file mode 100755
index 000000000..75f7a7dcd
--- /dev/null
+++ b/tools/run_test.sh
@@ -0,0 +1,116 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
+#
+# Test run helper for check-parallel
+#
+# check-parallel sets up all the devices and environments needs to run tests,
+# but we can't set up for a test until we are executing in a private pid/mount
+# namespace.
+#
+# This means we cannot simply run the test itself in a private name space; they
+# require things like the test device to already be mounted, and we require a
+# private mount namespace before we start mounting devices.
+#
+# Hence we need a helper that first creates a private namespace, then
+# does all the setup work for tests to run, then iterates tests until there
+# are no more tests to run.
+#
+# The tests to run are held in a shared file in $basedir so that we set up
+# a private /tmp for the namespace and no lose access to the test list.
+
+# re-execute in a private namespace as a first step
+if [ -z "${FSTESTS_ISOL}" ]; then
+	if [ -z "$1" ] || [ "$1" = "--help" ]; then
+		echo "Usage: $0 basedir test_list "
+		exit 1
+	fi
+
+	if [ ! -d "$1" ]; then
+		echo "invalid basedir ($1) specified"
+		exit 1
+	fi
+
+	FSTESTS_ISOL=privatens exec "$(dirname "$0")/../src/nsexec" -z -m -p "$0" "$@"
+	exit $?
+fi
+
+# Everything past this point runs in a private namespace.
+#
+# We set up private mounts for /proc and /tmp so they aren't visible outside
+# this mount namespace and it's children.
+for path in /proc /tmp; do
+	mountpoint "$path" >/dev/null && \
+		mount --make-private "$path"
+done
+mount -t proc proc /proc
+mount -t tmpfs tmpfs /tmp
+
+echo $PWD $FSTYP
+
+. ./common/exit
+. ./common/test_names
+. ./common/test_list
+. ./common/test_exec
+. ./common/rc
+
+basedir="$1"
+test_file="$2"
+_te_diff_length=${DIFF_LENGTH:=10}
+export tmp=/tmp/run-helper.$$
+
+# XXX: should be a _te_diff variable
+diff='diff -u'
+
+# Grab the next test to be run from the tail of the file.
+# Returns an empty string if there is no tests remaining to run.
+# File operations are run under flock so concurrent gets are serialised against
+# each other.
+get_next_test()
+{
+	local test=
+
+	flock 99
+	test=$(tail -1 $test_file)
+	sed -i "\,$test,d" $test_file
+	flock -u 99
+	echo $test
+}
+
+_run_seq()
+{
+	./$seq
+	return $?
+}
+
+exec 99<>$test_file.lock
+
+# XXX - refactor this back to a test_exec variable
+check="$RESULT_BASE/check"
+touch $check.time
+
+init_rc
+
+test_to_run=$(get_next_test)
+while [ -n "$test_to_run" ]; do
+	_te_run_test "tests/$test_to_run"
+
+	test_to_run=$(get_next_test)
+done
+
+_te_time_report
+
+if ((${#_te_try[*]} > 0)); then
+	echo "Ran: ${_te_try[*]}"
+fi
+
+if ((${#_te_notrun[*]} > 0)); then
+	echo "Not run: ${_te_notrun[*]}"
+fi
+
+if ((${#_te_bad[*]} > 0)); then
+	echo "Failures: ${_te_bad[*]}"
+	echo "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests"
+else
+	echo "Passed all ${#_te_try[*]} tests"
+fi
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 20/28] [RFC] check-parallel: run tests directly without using check
  2025-04-17  3:01 ` [PATCH 20/28] [RFC] check-parallel: run tests directly without using check Dave Chinner
@ 2025-05-13 14:48   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-13 14:48 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Now that we've factored the code that runs individual tests out of
> check, we can start using it in check-parallel instead of calling
> check to do so.
> 
> I originally thought that:
> 
> 	We can do this by implementing a _run_seq() callback that runs the
> 	individual test in it's own private namespace, using the generic
> 	test execution and status tracking to record what happens with each
> 	test that is run by a given runner.
> 
> 	This allows the runner to destage a test at a time from the global
> 	list and execute it in a batched setup, all without requiring us
> 	to run check for each test.
> 
> But, no, no we can't do that. We have to mount the test filesystem
> inside the new private mount namespace context, otherwise all the
> other runner contexts see it and we reintroduce shared mount
> namespace problems...
And earlier before this patch, we were using tools/run_privatens to direcly invoke check in a
private namespace and then let check do the rest of the work. Now, since check-parallel is moving
away from check, it is the responsibility of the check-parallel to setup a private namespace, then
setup the devices in the private namespace, then run the test and finally get the results. Am I
correct?
> 
> Hence we need a helper script that sets up the test device and does
> all that admin stuff before it starts grabbing tests from the test
> list. This means that multiple tests will run inside the runner's
> private namespace; the helper is really now a very minimal version
> of check....
> 
> This requires that check-parallel also sets up the required
> environment for all tests to run in. We can do this now simply by
> sourcing common/config as this now sets up and exports the entire
> environment that tests rely on.
> 
> Note: this change is mainly for RFC purposes right now - it still
> needs more cleanup and probably some splitting up before it is ready
> to go. I'm including it more as a informational "this is why various
> factoring and rearrangement has been done" patch than a full review
> request...
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check             |  29 ++----------
>  check-parallel    |  75 ++++++++++++------------------
>  common/config     |   6 +++
>  common/rc         |   2 +
>  common/test_exec  |  29 +++++++++++-
>  common/test_list  |   6 +++
>  tests/xfs/271     |   2 -
>  tools/run_test.sh | 116 ++++++++++++++++++++++++++++++++++++++++++++++
>  8 files changed, 191 insertions(+), 74 deletions(-)
>  create mode 100755 tools/run_test.sh
> 
> diff --git a/check b/check
> index 106de0ee6..f032f9c48 100755
> --- a/check
> +++ b/check
> @@ -25,15 +25,6 @@ _err_msg=""
>  # start the initialisation work now
>  iam=check.$$
>  
> -# mkfs.xfs uses the presence of both of these variables to enable formerly
> -# supported tiny filesystem configurations that fstests use for fuzz testing
> -# in a controlled environment
> -export MSGVERB="text:action"
> -export QA_CHECK_FS=${QA_CHECK_FS:=true}
> -
> -# number of diff lines from a failed test, 0 for whole output
> -export DIFF_LENGTH=${DIFF_LENGTH:=10}
> -
>  rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
>  
>  . ./common/exit
> @@ -188,6 +179,10 @@ fi
>  # by default don't output timestamps
>  _te_emit_timestamps=${TIMESTAMP:=}
>  
> +# number of diff lines from a failed test, 0 for whole output
> +_te_diff_length=${DIFF_LENGTH:=10}
> +
> +
>  # If the test config specified a soak test duration, see if there are any
>  # unit suffixes that need converting to an integer seconds count.
>  if [ -n "$SOAK_DURATION" ]; then
> @@ -245,21 +240,7 @@ _wrapup()
>  		fi
>  		needwrap=false
>  	elif $needwrap; then
> -		if [ -f $check.time -a -f $tmp.time ]; then
> -			cat $check.time $tmp.time  \
> -				| $AWK_PROG '
> -				{ t[$1] = $2 }
> -				END {
> -					if (NR > 0) {
> -						for (i in t) print i " " t[i]
> -					}
> -				}' \
> -				| sort -n >$tmp.out
> -			mv $tmp.out $check.time
> -			if $OPTIONS_HAVE_SECTIONS; then
> -				cp $check.time ${REPORT_DIR}/check.time
> -			fi
> -		fi
> +		_te_time_report
>  		set +x
>  
>  		_global_log ""
> diff --git a/check-parallel b/check-parallel
> index 1b67709a2..be3dfd346 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -18,13 +18,13 @@ run_section=""
>  exclude_section=""
>  iam="check-parallel"
>  
> -tmp=/tmp/check-parallel.$$
> -test_list="$tmp.test_list"
> +export here=`pwd`
>  
>  . ./common/exit
>  . ./common/test_names
>  . ./common/test_list
>  . ./common/config-sections
> +. ./common/config
>  
>  usage()
>  {
> @@ -155,6 +155,18 @@ elif [ -d "$basedir/runner-0/" ]; then
>  	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
>  fi
>  
> +# We keep the test list in the base directory because it needs to be common
> +# across all test runners and their private namespaces. Because runners execute
> +# in private /tmp/ instances, we can't keep it there, so use the shared
> +# $basedir that hosts all the runners as the location of the test list.
> +test_list="$basedir/test_list"
> +
> +# Similarly, set our tmp dir to be under the shared basedir so that all the
> +# temp files that stuff like test execution use for expunge lists work
> +# appropriately.
> +export tmp="$basedir/tmp/check-parallel.$$"
> +mkdir -p $basedir/tmp
> +
>  # grab all previously run tests and order them from highest runtime to lowest
>  # We are going to try to run the longer tests first, hopefully so we can avoid
>  # massive thundering herds trying to run lots of really short tests in parallel
> @@ -219,22 +231,6 @@ setup_test_list()
>  	echo $_tl_tests |sed -e 's/ /\n/g' | tac > $test_list
>  }
>  
> -# Grab the next test to be run from the tail of the file.
> -# Returns an empty string if there is no tests remaining to run.
> -# File operations are run under flock so concurrent gets are serialised against
> -# each other.
> -get_next_test()
> -{
> -	local test_file="$test_list.$1"
> -	local test=
> -
> -	flock 99
> -	test=$(tail -1 $test_file)
> -	sed -i "\,$test,d" $test_file
> -	flock -u 99
> -	echo $test
> -}
> -
>  
>  _create_loop_device()
>  {
> @@ -260,32 +256,6 @@ _destroy_loop_device()
>          losetup -d $dev || _fail "Cannot destroy loop device $dev"
>  }
>  
> -run_tests()
> -{
> -	local section="$1"
> -	local logfile="$2"
> -
> -	exec 99<>$tmp.test_list_lock
> -
> -	local test_to_run=$(get_next_test $section)
> -
> -	# Run the tests in it's own mount namespace, as per the comment below
> -	# that precedes making the basedir a private mount.
> -	#
> -	# Similarly, we need to run check in it's own PID namespace so that
> -	# operations like pkill only affect the runner instance, not globally
> -	# kill processes from other check instances.
> -	while [ -n "$test_to_run" ]; do
> -		echo -n " $test_to_run "
> -		unset FSTESTS_ISOL
> -		if ! _tl_expunge_test $test_to_run; then
> -			tools/run_privatens ./check -s $section $test_to_run >> $logfile 2>&1
> -		fi
> -
> -		test_to_run=$(get_next_test $section)
> -	done
> -}
> -
>  runner_go()
>  {
>  
> @@ -301,6 +271,8 @@ runner_go()
>  	local _results=$me/results-$2
>  	local section=$3
>  
> +	export tmp=/tmp/check-parallel-runner-$id
> +
>  	mkdir -p $me
>  
>  	xfs_io -f -c "truncate $TEST_DEV_SIZE" $_test
> @@ -323,6 +295,7 @@ runner_go()
>  
>  	export LOGWRITES_DEV=$(_create_loop_device $_logwrites)
>  	export RESULT_BASE=$_results
> +	export REPORT_DIR="$RESULT_BASE/$section"
>  
>  	mkdir -p $TEST_DIR
>  	mkdir -p $SCRATCH_MNT
> @@ -338,7 +311,7 @@ runner_go()
>  
>  #	export DUMP_CORRUPT_FS=1
>  
> -	run_tests $section $_results/log
> +	tools/run_test.sh $basedir $test_list.$section >> $_results/log 2>&1
>  
>  	wait
>  	sleep 1
> @@ -351,6 +324,7 @@ runner_go()
>  	_destroy_loop_device $SCRATCH_RTDEV
>  	_destroy_loop_device $SCRATCH_LOGDEV
>  	_destroy_loop_device $LOGWRITES_DEV
> +	rm -f $tmp.*
>  
>  	grep -q Failures: $_results/log
>  	if [ $? -eq 0 ]; then
> @@ -366,6 +340,7 @@ run_section()
>  	local now="$2"
>  	local results="$basedir/*/results-$now"
>  	local i
> +	local threads=$runners
>  
>  	echo $run_section |grep -qw $section || return
>  	echo $exclude_section |grep -qw $section && return
> @@ -391,7 +366,15 @@ run_section()
>  	fi
>  	cp $test_list $test_list.$section
>  
> -	for ((i = 0; i < $runners; i++)); do
> +	# only run as many runners are there are tests to run
> +	i=$(cat $test_list.$section | wc -l)
> +	if [ "$i" -lt "$runners" ]; then
> +		threads=$i
> +	fi
> +
> +	echo Test list $test_list.$section contains $i tests
> +	echo Running $threads tests concurrently
> +	for ((i = 0; i < $threads; i++)); do
>  		runner_go $i $now $section &
>  	done
>  	wait
> diff --git a/common/config b/common/config
> index b93a6c0d3..149ef99c7 100644
> --- a/common/config
> +++ b/common/config
> @@ -70,6 +70,12 @@ export LOAD_FACTOR=${LOAD_FACTOR:=1}
>  export SOAK_DURATION=${SOAK_DURATION:=}
>  export DEBUGFS_MNT=${DEBUGFS_MNT:="/sys/kernel/debug"}
>  
> +# mkfs.xfs uses the presence of both of these variables to enable formerly
> +# supported tiny filesystem configurations that fstests use for fuzz testing
> +# in a controlled environment
> +export MSGVERB="text:action"
> +export QA_CHECK_FS=${QA_CHECK_FS:=true}
> +
>  # some constants for overlayfs setup
>  export OVL_UPPER="ovl-upper"
>  export OVL_LOWER="ovl-lower"
> diff --git a/common/rc b/common/rc
> index be6cd92c4..5b7d2faf8 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -2936,6 +2936,8 @@ _require_xfs_io_command()
>  			_notrun "O_TMPFILE is not supported"
>  		;;
>  	"fsmap")
> +		df -h >> $seqres.full 2>&1
> +		df -i >> $seqres.full 2>&1
>  		testio=`$XFS_IO_PROG -f -c "fsmap" $testfile 2>&1`
>  		echo $testio | grep -q "Inappropriate ioctl" && \
>  			_notrun "xfs_io $command support is missing"
> diff --git a/common/test_exec b/common/test_exec
> index 63efa3d19..e9c80bd59 100644
> --- a/common/test_exec
> +++ b/common/test_exec
> @@ -21,6 +21,12 @@ _te_notrun=()
>  _te_loop_status=()
>  _te_emit_timestamps=""
>  _te_dry_run=""
> +_te_diff_length=10
> +
> +# This is needed because callers don't always source common/report and
> +# if $do_report is undefined or true we'll try to call _make_testcase_report()
> +# without it being defined.
> +do_report=false
>  
>  _te_wipe_counters()
>  {
> @@ -43,6 +49,25 @@ _te_timestamp()
>  	fi
>  }
>  
> +_te_time_report()
> +{
> +	if [ -f $check.time -a -f $tmp.time ]; then
> +		cat $check.time $tmp.time  \
> +			| $AWK_PROG '
> +			{ t[$1] = $2 }
> +			END {
> +				if (NR > 0) {
> +					for (i in t) print i " " t[i]
> +				}
> +			}' \
> +			| sort -n >$tmp.out
> +		mv $tmp.out $check.time
> +		if $OPTIONS_HAVE_SECTIONS; then
> +			cp $check.time ${REPORT_DIR}/check.time
> +		fi
> +	fi
> +}
> +
>  _te_check_filesystems()
>  {
>  	local ret=0
> @@ -330,10 +355,10 @@ _te_run_test()
>  		_dump_err "- output mismatch (see $seqres.out.bad)"
>  		mv $tmp.out $seqres.out.bad
>  		$diff $seq.out $seqres.out.bad | {
> -		if test "$DIFF_LENGTH" -le 0; then
> +		if test "$_te_diff_length" -le 0; then
>  			cat
>  		else
> -			head -n "$DIFF_LENGTH"
> +			head -n "$_te_diff_length"
>  			echo "..."
>  			echo "(Run '$diff $here/$seq.out $seqres.out.bad'" \
>  				" to see the entire diff)"
> diff --git a/common/test_list b/common/test_list
> index 092b3ed17..4beb08b22 100644
> --- a/common/test_list
> +++ b/common/test_list
> @@ -160,6 +160,11 @@ _tl_prepare_test_list()
>  		trim_test_list $list
>  	done
>  
> +	# Remove expunged tests
> +	for f in "${_tl_exclude_tests[@]}"; do
> +		trim_test_list tests/$f
> +	done
> +
>  	# sort the list of tests into numeric order unless we're running tests
>  	# in the exact order specified
>  	if ! $_tl_exact_order; then
> @@ -194,6 +199,7 @@ _tl_expunge_test()
>  	return 1
>  }
>  
> +
>  _tl_setup_exclude_tests()
>  {
>  	local list="$1"
> diff --git a/tests/xfs/271 b/tests/xfs/271
> index 8a71746d6..ae4599282 100755
> --- a/tests/xfs/271
> +++ b/tests/xfs/271
> @@ -22,8 +22,6 @@ _cleanup()
>  _require_xfs_scratch_rmapbt
>  _require_xfs_io_command "fsmap"
>  
> -rm -f "$seqres.full"
> -
>  echo "Format and mount"
>  _scratch_mkfs > "$seqres.full" 2>&1
>  _scratch_mount
> diff --git a/tools/run_test.sh b/tools/run_test.sh
> new file mode 100755
> index 000000000..75f7a7dcd
> --- /dev/null
> +++ b/tools/run_test.sh
> @@ -0,0 +1,116 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2025 Red Hat, Inc.  All Rights Reserved.
> +#
> +# Test run helper for check-parallel
> +#
> +# check-parallel sets up all the devices and environments needs to run tests,
> +# but we can't set up for a test until we are executing in a private pid/mount
> +# namespace.
> +#
> +# This means we cannot simply run the test itself in a private name space; they
> +# require things like the test device to already be mounted, and we require a
> +# private mount namespace before we start mounting devices.
> +#
> +# Hence we need a helper that first creates a private namespace, then
> +# does all the setup work for tests to run, then iterates tests until there
> +# are no more tests to run.
> +#
> +# The tests to run are held in a shared file in $basedir so that we set up
> +# a private /tmp for the namespace and no lose access to the test list.
> +
> +# re-execute in a private namespace as a first step
> +if [ -z "${FSTESTS_ISOL}" ]; then
> +	if [ -z "$1" ] || [ "$1" = "--help" ]; then
> +		echo "Usage: $0 basedir test_list "
> +		exit 1
> +	fi
> +
> +	if [ ! -d "$1" ]; then
> +		echo "invalid basedir ($1) specified"
> +		exit 1
> +	fi
> +
> +	FSTESTS_ISOL=privatens exec "$(dirname "$0")/../src/nsexec" -z -m -p "$0" "$@"
> +	exit $?
So when the first time tools/run_test.sh gets invoked for a particular runner, it sets FSTESTS_ISOL,
then does an exec with $0 i.e, the current script name. So tools/run_test.sh again gets invoked (as
a part of exec) but this time FSTESTS_ISOL is set, so this if condition isn't executed and hence we
move to the rest of the script. Simarly from second time, FSTESTS_ISOL is still set, so we don't
enter the if condition and move to the rest of the script. This is repeated for all the runners. Is
that correct?
> +fi
> +
> +# Everything past this point runs in a private namespace.
> +#
> +# We set up private mounts for /proc and /tmp so they aren't visible outside
> +# this mount namespace and it's children.
> +for path in /proc /tmp; do
> +	mountpoint "$path" >/dev/null && \
> +		mount --make-private "$path"
> +done
> +mount -t proc proc /proc
> +mount -t tmpfs tmpfs /tmp
> +
> +echo $PWD $FSTYP
> +
> +. ./common/exit
> +. ./common/test_names
> +. ./common/test_list
> +. ./common/test_exec
> +. ./common/rc
> +
> +basedir="$1"
> +test_file="$2"
> +_te_diff_length=${DIFF_LENGTH:=10}
> +export tmp=/tmp/run-helper.$$
> +
> +# XXX: should be a _te_diff variable
> +diff='diff -u'
> +
> +# Grab the next test to be run from the tail of the file.
> +# Returns an empty string if there is no tests remaining to run.
> +# File operations are run under flock so concurrent gets are serialised against
> +# each other.
> +get_next_test()
> +{
> +	local test=
> +
> +	flock 99
> +	test=$(tail -1 $test_file)
> +	sed -i "\,$test,d" $test_file
> +	flock -u 99
> +	echo $test
> +}
> +
> +_run_seq()
> +{
> +	./$seq
> +	return $?
> +}
Why can't we use the _run_seq() defined in "check"?
--NR
> +
> +exec 99<>$test_file.lock
> +
> +# XXX - refactor this back to a test_exec variable
> +check="$RESULT_BASE/check"
> +touch $check.time
> +
> +init_rc
> +
> +test_to_run=$(get_next_test)
> +while [ -n "$test_to_run" ]; do
> +	_te_run_test "tests/$test_to_run"
> +
> +	test_to_run=$(get_next_test)
> +done
> +
> +_te_time_report
> +
> +if ((${#_te_try[*]} > 0)); then
> +	echo "Ran: ${_te_try[*]}"
> +fi
> +
> +if ((${#_te_notrun[*]} > 0)); then
> +	echo "Not run: ${_te_notrun[*]}"
> +fi
> +
> +if ((${#_te_bad[*]} > 0)); then
> +	echo "Failures: ${_te_bad[*]}"
> +	echo "Failed ${#_te_bad[*]} of ${#_te_try[*]} tests"
> +else
> +	echo "Passed all ${#_te_try[*]} tests"
> +fi


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 21/28] generic/531: limit max files per CPU
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (19 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 20/28] [RFC] check-parallel: run tests directly without using check Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-05-10 13:15   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 22/28] fsync-tester.c: use syncfs() rather than sync() Dave Chinner
                   ` (6 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

Currently g/531 runs t_open_files on every CPU, and with default
kernel settings that means 50,000 files per CPU are tested. On 64p
machines this means the test tries to create and unlink over 3
million files. This takes a long time:

Ten slowest tests - runtime in seconds:
generic/531 534
.....

Yet generic/531 is included in the 'quick' test group. It is
anything but "quick" on large CPU count systems.

Further, small filesystems  like are typically used for fstests do
not have the inherent concurrency to scale out this workload
effectively. Even using the mkfs.xfs concurrency options requires
using >250GB scratch devices on 64p machines because it won't make
AGs smaller than 4GB. Hence to get 64-way concurrency in the
filesystem, we need huge devices to be set up, and that's not really
practical for check-parallel.

Hence limit the total number of files this test will create
to a sane number, and distribute them over all the CPUs so that
the test runtime does not blow out on big systems. LOAD_FACTOR can
still be used to increase runtime of the test by increasing the
total number of files created.

Limiting the total number of files created brings g/531
system back into the "quick" test range on a 64p system:

generic/531        5s

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 tests/generic/531 | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/generic/531 b/tests/generic/531
index 07dffd9fd..3f691c0f8 100755
--- a/tests/generic/531
+++ b/tests/generic/531
@@ -29,13 +29,13 @@ _scratch_mount
 # Try to load up all the CPUs, two threads per CPU.
 nr_cpus=$(( $(getconf _NPROCESSORS_ONLN) * 2 ))
 
-# Set ULIMIT_NOFILE to min(file-max / $nr_cpus / 2, 50000 files per LOAD_FACTOR)
+# Set ULIMIT_NOFILE to min(file-max / 2, 100000) / $nr_cpus files per LOAD_FACTOR)
 # so that this test doesn't take forever or OOM the box
-max_files=$((50000 * LOAD_FACTOR))
-max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))
+max_files=$((100000 * LOAD_FACTOR))
+max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / 2 ))
 test $max_allowable_files -gt 0 && test $max_files -gt $max_allowable_files && \
 	max_files=$max_allowable_files
-ulimit -n $max_files
+ulimit -n $((max_files / nr_cpus))
 
 # Open a lot of unlinked files
 echo create >> $seqres.full
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 21/28] generic/531: limit max files per CPU
  2025-04-17  3:01 ` [PATCH 21/28] generic/531: limit max files per CPU Dave Chinner
@ 2025-05-10 13:15   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-10 13:15 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Currently g/531 runs t_open_files on every CPU, and with default
> kernel settings that means 50,000 files per CPU are tested. On 64p
> machines this means the test tries to create and unlink over 3
> million files. This takes a long time:
> 
> Ten slowest tests - runtime in seconds:
> generic/531 534
> .....
> 
> Yet generic/531 is included in the 'quick' test group. It is
> anything but "quick" on large CPU count systems.
> 
> Further, small filesystems  like are typically used for fstests do
> not have the inherent concurrency to scale out this workload
> effectively. Even using the mkfs.xfs concurrency options requires
> using >250GB scratch devices on 64p machines because it won't make
> AGs smaller than 4GB. Hence to get 64-way concurrency in the
> filesystem, we need huge devices to be set up, and that's not really
> practical for check-parallel.
> 
> Hence limit the total number of files this test will create
> to a sane number, and distribute them over all the CPUs so that
> the test runtime does not blow out on big systems. LOAD_FACTOR can
> still be used to increase runtime of the test by increasing the
> total number of files created.
> 
> Limiting the total number of files created brings g/531
> system back into the "quick" test range on a 64p system:
> 
> generic/531        5s
Right, on my system with 16p, I too noticed the reduced test time. 
generic/531 24s ...  199s (24s with this change, 199s on for-next)
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
--NR
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  tests/generic/531 | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/tests/generic/531 b/tests/generic/531
> index 07dffd9fd..3f691c0f8 100755
> --- a/tests/generic/531
> +++ b/tests/generic/531
> @@ -29,13 +29,13 @@ _scratch_mount
>  # Try to load up all the CPUs, two threads per CPU.
>  nr_cpus=$(( $(getconf _NPROCESSORS_ONLN) * 2 ))
>  
> -# Set ULIMIT_NOFILE to min(file-max / $nr_cpus / 2, 50000 files per LOAD_FACTOR)
> +# Set ULIMIT_NOFILE to min(file-max / 2, 100000) / $nr_cpus files per LOAD_FACTOR)
>  # so that this test doesn't take forever or OOM the box
> -max_files=$((50000 * LOAD_FACTOR))
> -max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / $nr_cpus / 2 ))
> +max_files=$((100000 * LOAD_FACTOR))
> +max_allowable_files=$(( $(cat /proc/sys/fs/file-max) / 2 ))
>  test $max_allowable_files -gt 0 && test $max_files -gt $max_allowable_files && \
>  	max_files=$max_allowable_files
> -ulimit -n $max_files
> +ulimit -n $((max_files / nr_cpus))
>  
>  # Open a lot of unlinked files
>  echo create >> $seqres.full


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 22/28] fsync-tester.c: use syncfs() rather than sync()
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (20 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 21/28] generic/531: limit max files per CPU Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-04-30  9:08   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 23/28] open-by-handle.c: " Dave Chinner
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

generic/311 runs each fsync-tester unit test 4 times, and there are
20 separate tests. At least 9 of those unit tests run sync() to
flush dirty data periodically.

When running check-parallel, sync() can take a -long- time to
run as there can be dozens of filesystems that need to be synced,
not to mention sync getting hung up behind all the mount and
unmounts that are also being run. This results in:

Ten slowest tests - runtime in seconds:
generic/311 419

This test running for a really long time.

Convert the sync() calls to syncfs() so that they only try to sync
the filesystem under test and not the entire system. This avoids
interactions and delays with other tests and mount/unmount
operations, hence allowing both the test and the overall
check-parallel operation to run faster:

generic/311        166s

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 src/fsync-tester.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/fsync-tester.c b/src/fsync-tester.c
index 417187491..048cb4853 100644
--- a/src/fsync-tester.c
+++ b/src/fsync-tester.c
@@ -131,7 +131,7 @@ static int test_three(int *max_blocks, int prealloc, int rand_fsync,
 		/* Force a transaction commit in between just for fun */
 		if (blocks == sync_block && (do_sync || drop_caches)) {
 			if (do_sync)
-				sync();
+				syncfs(test_fd);
 			else
 				sync_file_range(test_fd, 0, 0,
 						SYNC_FILE_RANGE_WRITE|
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 22/28] fsync-tester.c: use syncfs() rather than sync()
  2025-04-17  3:01 ` [PATCH 22/28] fsync-tester.c: use syncfs() rather than sync() Dave Chinner
@ 2025-04-30  9:08   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  9:08 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> generic/311 runs each fsync-tester unit test 4 times, and there are
> 20 separate tests. At least 9 of those unit tests run sync() to
> flush dirty data periodically.
> 
> When running check-parallel, sync() can take a -long- time to
> run as there can be dozens of filesystems that need to be synced,
> not to mention sync getting hung up behind all the mount and
> unmounts that are also being run. This results in:
> 
> Ten slowest tests - runtime in seconds:
> generic/311 419
> 
> This test running for a really long time.
> 
> Convert the sync() calls to syncfs() so that they only try to sync
> the filesystem under test and not the entire system. This avoids
> interactions and delays with other tests and mount/unmount
> operations, hence allowing both the test and the overall
> check-parallel operation to run faster:
> 
> generic/311        166s
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  src/fsync-tester.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/fsync-tester.c b/src/fsync-tester.c
> index 417187491..048cb4853 100644
> --- a/src/fsync-tester.c
> +++ b/src/fsync-tester.c
> @@ -131,7 +131,7 @@ static int test_three(int *max_blocks, int
> prealloc, int rand_fsync,
>  		/* Force a transaction commit in between just for fun
> */
>  		if (blocks == sync_block && (do_sync || drop_caches)) {
>  			if (do_sync)
> -				sync();
> +				syncfs(test_fd);
Feel free to add 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
>  			else
>  				sync_file_range(test_fd, 0, 0,
>  						SYNC_FILE_RANGE_WRITE|


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 23/28] open-by-handle.c: use syncfs() rather than sync()
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (21 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 22/28] fsync-tester.c: use syncfs() rather than sync() Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-04-30  9:02   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 24/28] " Dave Chinner
                   ` (4 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

generic/467 runs open_by_handle at least 15 times. Each
execution runs sync() at least once, sometimes three times.

When running check-parallel, sync() can take a -long- time to
run as there can be dozens of filesystems that need to be synced,
not to mention sync getting hung up behind all the mount and
unmounts that are also being run. This results in:

Ten slowest tests - runtime in seconds:
generic/467 442
.....

This test running for a really long time.

Convert the sync() calls to syncfs() so that they only try to sync
the filesystem under test and not the entire system. This avoids
interactions and delays with other tests and mount/unmount
operations, hence allowing both the test and the overall
check-parallel operation to run faster:

generic/467        6s

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 src/open_by_handle.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/open_by_handle.c b/src/open_by_handle.c
index a99cce4b3..816bb5d12 100644
--- a/src/open_by_handle.c
+++ b/src/open_by_handle.c
@@ -430,7 +430,7 @@ int main(int argc, char **argv)
 	}
 
 	/* sync to get the new inodes to hit the disk */
-	sync();
+	syncfs(mount_fd);
 
 	/*
 	 * encode the file handles or read them from file (-i) and maybe store
@@ -563,10 +563,10 @@ int main(int argc, char **argv)
 	}
 
 	/* sync to get log forced for unlink transactions to hit the disk */
-	sync();
+	syncfs(mount_fd);
 
 	/* sync once more FTW */
-	sync();
+	syncfs(mount_fd);
 
 	/*
 	 * now drop the caches so that unlinked inodes are reclaimed and
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/28] open-by-handle.c: use syncfs() rather than sync()
  2025-04-17  3:01 ` [PATCH 23/28] open-by-handle.c: " Dave Chinner
@ 2025-04-30  9:02   ` Nirjhar Roy (IBM)
  2025-05-21  2:32     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  9:02 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> generic/467 runs open_by_handle at least 15 times. Each
> execution runs sync() at least once, sometimes three times.
> 
> When running check-parallel, sync() can take a -long- time to
> run as there can be dozens of filesystems that need to be synced,
> not to mention sync getting hung up behind all the mount and
> unmounts that are also being run. This results in:
> 
> Ten slowest tests - runtime in seconds:
> generic/467 442
> .....
> 
> This test running for a really long time.
> 
> Convert the sync() calls to syncfs() so that they only try to sync
> the filesystem under test and not the entire system. This avoids
> interactions and delays with other tests and mount/unmount
> operations, hence allowing both the test and the overall
> check-parallel operation to run faster:
> 
> generic/467        6s
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  src/open_by_handle.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/open_by_handle.c b/src/open_by_handle.c
> index a99cce4b3..816bb5d12 100644
> --- a/src/open_by_handle.c
> +++ b/src/open_by_handle.c
> @@ -430,7 +430,7 @@ int main(int argc, char **argv)
>  	}
>  
>  	/* sync to get the new inodes to hit the disk */
> -	sync();
> +	syncfs(mount_fd);
>  
>  	/*
>  	 * encode the file handles or read them from file (-i) and
> maybe store
> @@ -563,10 +563,10 @@ int main(int argc, char **argv)
>  	}
>  
>  	/* sync to get log forced for unlink transactions to hit the
> disk */
> -	sync();
> +	syncfs(mount_fd);
>  
>  	/* sync once more FTW */
> -	sync();
> +	syncfs(mount_fd);
I don't see mount_fd being closed at the end. Maybe all the file
descriptors are closed after the program finishes execution?
Other than this:
Feel free to add 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

>  
>  	/*
>  	 * now drop the caches so that unlinked inodes are reclaimed
> and


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/28] open-by-handle.c: use syncfs() rather than sync()
  2025-04-30  9:02   ` Nirjhar Roy (IBM)
@ 2025-05-21  2:32     ` Dave Chinner
  2025-05-26  5:11       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  2:32 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, Apr 30, 2025 at 02:32:33PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > generic/467 runs open_by_handle at least 15 times. Each
> > execution runs sync() at least once, sometimes three times.
> > 
> > When running check-parallel, sync() can take a -long- time to
> > run as there can be dozens of filesystems that need to be synced,
> > not to mention sync getting hung up behind all the mount and
> > unmounts that are also being run. This results in:
> > 
> > Ten slowest tests - runtime in seconds:
> > generic/467 442
> > .....
> > 
> > This test running for a really long time.
> > 
> > Convert the sync() calls to syncfs() so that they only try to sync
> > the filesystem under test and not the entire system. This avoids
> > interactions and delays with other tests and mount/unmount
> > operations, hence allowing both the test and the overall
> > check-parallel operation to run faster:
> > 
> > generic/467        6s
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  src/open_by_handle.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/src/open_by_handle.c b/src/open_by_handle.c
> > index a99cce4b3..816bb5d12 100644
> > --- a/src/open_by_handle.c
> > +++ b/src/open_by_handle.c
> > @@ -430,7 +430,7 @@ int main(int argc, char **argv)
> >  	}
> >  
> >  	/* sync to get the new inodes to hit the disk */
> > -	sync();
> > +	syncfs(mount_fd);
> >  
> >  	/*
> >  	 * encode the file handles or read them from file (-i) and
> > maybe store
> > @@ -563,10 +563,10 @@ int main(int argc, char **argv)
> >  	}
> >  
> >  	/* sync to get log forced for unlink transactions to hit the
> > disk */
> > -	sync();
> > +	syncfs(mount_fd);
> >  
> >  	/* sync once more FTW */
> > -	sync();
> > +	syncfs(mount_fd);
> I don't see mount_fd being closed at the end. Maybe all the file
> descriptors are closed after the program finishes execution?

Yes, they are closed on exit. Again, simple test code that isn't
running out of open files, so there's not a lot of reason for this
patch series to care about such existing issues in the code right
now.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 23/28] open-by-handle.c: use syncfs() rather than sync()
  2025-05-21  2:32     ` Dave Chinner
@ 2025-05-26  5:11       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  5:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 08:02, Dave Chinner wrote:
> On Wed, Apr 30, 2025 at 02:32:33PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> generic/467 runs open_by_handle at least 15 times. Each
>>> execution runs sync() at least once, sometimes three times.
>>>
>>> When running check-parallel, sync() can take a -long- time to
>>> run as there can be dozens of filesystems that need to be synced,
>>> not to mention sync getting hung up behind all the mount and
>>> unmounts that are also being run. This results in:
>>>
>>> Ten slowest tests - runtime in seconds:
>>> generic/467 442
>>> .....
>>>
>>> This test running for a really long time.
>>>
>>> Convert the sync() calls to syncfs() so that they only try to sync
>>> the filesystem under test and not the entire system. This avoids
>>> interactions and delays with other tests and mount/unmount
>>> operations, hence allowing both the test and the overall
>>> check-parallel operation to run faster:
>>>
>>> generic/467        6s
>>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>>> ---
>>>   src/open_by_handle.c | 6 +++---
>>>   1 file changed, 3 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/open_by_handle.c b/src/open_by_handle.c
>>> index a99cce4b3..816bb5d12 100644
>>> --- a/src/open_by_handle.c
>>> +++ b/src/open_by_handle.c
>>> @@ -430,7 +430,7 @@ int main(int argc, char **argv)
>>>   	}
>>>   
>>>   	/* sync to get the new inodes to hit the disk */
>>> -	sync();
>>> +	syncfs(mount_fd);
>>>   
>>>   	/*
>>>   	 * encode the file handles or read them from file (-i) and
>>> maybe store
>>> @@ -563,10 +563,10 @@ int main(int argc, char **argv)
>>>   	}
>>>   
>>>   	/* sync to get log forced for unlink transactions to hit the
>>> disk */
>>> -	sync();
>>> +	syncfs(mount_fd);
>>>   
>>>   	/* sync once more FTW */
>>> -	sync();
>>> +	syncfs(mount_fd);
>> I don't see mount_fd being closed at the end. Maybe all the file
>> descriptors are closed after the program finishes execution?
> Yes, they are closed on exit. Again, simple test code that isn't
> running out of open files, so there's not a lot of reason for this
> patch series to care about such existing issues in the code right
> now.

Yeah, makes sense.

--NR

>
> -Dave.

-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 24/28] open-by-handle.c: use syncfs() rather than sync()
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (22 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 23/28] open-by-handle.c: " Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-04-30  8:56   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 25/28] bulkstat_unlink_test_modified.c: remove unused test code Dave Chinner
                   ` (3 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

xfs/183 runs bulkstat_unlink_test to create 100 inodes, unlink on
and bulkstat them. It takes a ridiculously long time to run under
check-parallel because it runs sync() multiple times per iteration:

Ten slowest tests - runtime in seconds:
.....
xfs/183 328

When running check-parallel, sync() can take a -long- time to
run as there can be dozens of filesystems that need to be synced,
not to mention sync getting hung up behind all the mount and
unmounts that are also being run.

Convert the sync() calls to syncfs() so that they only try to sync
the filesystem under test and not the entire system. This avoids
interactions and delays with other tests and mount/unmount
operations, hence allowing both the test and the overall
check-parallel operation to run faster:

xfs/183        4s

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 src/bulkstat_unlink_test.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/src/bulkstat_unlink_test.c b/src/bulkstat_unlink_test.c
index d78cc2ac2..62e5bb978 100644
--- a/src/bulkstat_unlink_test.c
+++ b/src/bulkstat_unlink_test.c
@@ -88,7 +88,7 @@ main(int argc, char *argv[])
 		}
 
 		if (chknb) { /* Get the original number of inodes (lazy) */
-			sync();
+			syncfs(fd[nfiles]);
 			if (xfsctl(dirname, fd[nfiles], XFS_IOC_FSBULKSTAT, &a) != 0) {
 				printf("Warning (%s:%d), xfsctl(XFS_IOC_FSBULKSTAT) FAILED.\n", __FILE__, __LINE__);
 			}
@@ -118,7 +118,7 @@ main(int argc, char *argv[])
 			 *The files are still opened (but unlink()ed) ,
 			 * we should have more inodes than before
 			 */
-			sync();
+			syncfs(fd[nfiles]);
 			last_inode = 0;
 			if (xfsctl(dirname, fd[nfiles], XFS_IOC_FSBULKSTAT, &a) != 0) {
 				printf("Warning (%s:%d), xfsctl(XFS_IOC_FSBULKSTAT) FAILED.\n", __FILE__, __LINE__);
@@ -139,7 +139,7 @@ main(int argc, char *argv[])
 			 * The files are now closed, we should be back to our,
 			 * previous inode count
 			 */
-			sync();
+			syncfs(fd[nfiles]);
 			last_inode = 0;
 			if (xfsctl(dirname, fd[nfiles], XFS_IOC_FSBULKSTAT, &a) != 0) {
 				printf("Warning (%s:%d), xfsctl(XFS_IOC_FSBULKSTAT) FAILED.\n", __FILE__, __LINE__);
@@ -150,7 +150,7 @@ main(int argc, char *argv[])
 			}
 		}
 
-		sync();
+		syncfs(fd[nfiles]);
 		last_inode = 0;
 		for (;;) {
 			if ((e = xfsctl(dirname, fd[nfiles], XFS_IOC_FSBULKSTAT, &a)) < 0) {
@@ -173,11 +173,11 @@ main(int argc, char *argv[])
 			}
 		}
 
-		close(fd[nfiles]);
 		sprintf(fname, "rm -rf %s\n", dirname);
 		system(fname);
 
-		sync();
+		syncfs(fd[nfiles]);
+		close(fd[nfiles]);
 		sleep(2);
 		printf("passed\n");
 	}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 24/28] open-by-handle.c: use syncfs() rather than sync()
  2025-04-17  3:01 ` [PATCH 24/28] " Dave Chinner
@ 2025-04-30  8:56   ` Nirjhar Roy (IBM)
  2025-05-21  2:30     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  8:56 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs/183 runs bulkstat_unlink_test to create 100 inodes, unlink on
> and bulkstat them. It takes a ridiculously long time to run under
> check-parallel because it runs sync() multiple times per iteration:
> 
> Ten slowest tests - runtime in seconds:
> .....
> xfs/183 328
> 
> When running check-parallel, sync() can take a -long- time to
> run as there can be dozens of filesystems that need to be synced,
> not to mention sync getting hung up behind all the mount and
> unmounts that are also being run.
> 
> Convert the sync() calls to syncfs() so that they only try to sync
> the filesystem under test and not the entire system. This avoids
> interactions and delays with other tests and mount/unmount
> operations, hence allowing both the test and the overall
> check-parallel operation to run faster:
> 
> xfs/183        4s
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  src/bulkstat_unlink_test.c | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/src/bulkstat_unlink_test.c b/src/bulkstat_unlink_test.c
> index d78cc2ac2..62e5bb978 100644
> --- a/src/bulkstat_unlink_test.c
> +++ b/src/bulkstat_unlink_test.c
> @@ -88,7 +88,7 @@ main(int argc, char *argv[])
>  		}
>  
>  		if (chknb) { /* Get the original number of inodes
> (lazy) */
> -			sync();
> +			syncfs(fd[nfiles]);
This looks good to me. 
Minor: Is it safe to use int fd[nfiles + 1]; Isn't it better to use
malloc()?
Other than this:
Feel free to add 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
>  			if (xfsctl(dirname, fd[nfiles],
> XFS_IOC_FSBULKSTAT, &a) != 0) {
>  				printf("Warning (%s:%d),
> xfsctl(XFS_IOC_FSBULKSTAT) FAILED.\n", __FILE__, __LINE__);
>  			}
> @@ -118,7 +118,7 @@ main(int argc, char *argv[])
>  			 *The files are still opened (but unlink()ed) ,
>  			 * we should have more inodes than before
>  			 */
> -			sync();
> +			syncfs(fd[nfiles]);
>  			last_inode = 0;
>  			if (xfsctl(dirname, fd[nfiles],
> XFS_IOC_FSBULKSTAT, &a) != 0) {
>  				printf("Warning (%s:%d),
> xfsctl(XFS_IOC_FSBULKSTAT) FAILED.\n", __FILE__, __LINE__);
> @@ -139,7 +139,7 @@ main(int argc, char *argv[])
>  			 * The files are now closed, we should be back
> to our,
>  			 * previous inode count
>  			 */
> -			sync();
> +			syncfs(fd[nfiles]);
>  			last_inode = 0;
>  			if (xfsctl(dirname, fd[nfiles],
> XFS_IOC_FSBULKSTAT, &a) != 0) {
>  				printf("Warning (%s:%d),
> xfsctl(XFS_IOC_FSBULKSTAT) FAILED.\n", __FILE__, __LINE__);
> @@ -150,7 +150,7 @@ main(int argc, char *argv[])
>  			}
>  		}
>  
> -		sync();
> +		syncfs(fd[nfiles]);
>  		last_inode = 0;
>  		for (;;) {
>  			if ((e = xfsctl(dirname, fd[nfiles],
> XFS_IOC_FSBULKSTAT, &a)) < 0) {
> @@ -173,11 +173,11 @@ main(int argc, char *argv[])
>  			}
>  		}
>  
> -		close(fd[nfiles]);
>  		sprintf(fname, "rm -rf %s\n", dirname);
>  		system(fname);
>  
> -		sync();
> +		syncfs(fd[nfiles]);
> +		close(fd[nfiles]);
>  		sleep(2);
>  		printf("passed\n");
>  	}


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 24/28] open-by-handle.c: use syncfs() rather than sync()
  2025-04-30  8:56   ` Nirjhar Roy (IBM)
@ 2025-05-21  2:30     ` Dave Chinner
  2025-05-26  4:56       ` Nirjhar Roy (IBM)
  0 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  2:30 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, Apr 30, 2025 at 02:26:28PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > xfs/183 runs bulkstat_unlink_test to create 100 inodes, unlink on
> > and bulkstat them. It takes a ridiculously long time to run under
> > check-parallel because it runs sync() multiple times per iteration:
> > 
> > Ten slowest tests - runtime in seconds:
> > .....
> > xfs/183 328
> > 
> > When running check-parallel, sync() can take a -long- time to
> > run as there can be dozens of filesystems that need to be synced,
> > not to mention sync getting hung up behind all the mount and
> > unmounts that are also being run.
> > 
> > Convert the sync() calls to syncfs() so that they only try to sync
> > the filesystem under test and not the entire system. This avoids
> > interactions and delays with other tests and mount/unmount
> > operations, hence allowing both the test and the overall
> > check-parallel operation to run faster:
> > 
> > xfs/183        4s
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  src/bulkstat_unlink_test.c | 12 ++++++------
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/src/bulkstat_unlink_test.c b/src/bulkstat_unlink_test.c
> > index d78cc2ac2..62e5bb978 100644
> > --- a/src/bulkstat_unlink_test.c
> > +++ b/src/bulkstat_unlink_test.c
> > @@ -88,7 +88,7 @@ main(int argc, char *argv[])
> >  		}
> >  
> >  		if (chknb) { /* Get the original number of inodes
> > (lazy) */
> > -			sync();
> > +			syncfs(fd[nfiles]);
> This looks good to me. 
> Minor: Is it safe to use int fd[nfiles + 1];

Yes, it is. Variable size array declarations like this have
been supported by C and C compilers for a long time.

> Isn't it better to use
> malloc()?

See my previous comments about stack usage in userspace test code. :)

-Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 24/28] open-by-handle.c: use syncfs() rather than sync()
  2025-05-21  2:30     ` Dave Chinner
@ 2025-05-26  4:56       ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-05-26  4:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: fstests, zlang


On 5/21/25 08:00, Dave Chinner wrote:
> On Wed, Apr 30, 2025 at 02:26:28PM +0530, Nirjhar Roy (IBM) wrote:
>> On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
>>> From: Dave Chinner <dchinner@redhat.com>
>>>
>>> xfs/183 runs bulkstat_unlink_test to create 100 inodes, unlink on
>>> and bulkstat them. It takes a ridiculously long time to run under
>>> check-parallel because it runs sync() multiple times per iteration:
>>>
>>> Ten slowest tests - runtime in seconds:
>>> .....
>>> xfs/183 328
>>>
>>> When running check-parallel, sync() can take a -long- time to
>>> run as there can be dozens of filesystems that need to be synced,
>>> not to mention sync getting hung up behind all the mount and
>>> unmounts that are also being run.
>>>
>>> Convert the sync() calls to syncfs() so that they only try to sync
>>> the filesystem under test and not the entire system. This avoids
>>> interactions and delays with other tests and mount/unmount
>>> operations, hence allowing both the test and the overall
>>> check-parallel operation to run faster:
>>>
>>> xfs/183        4s
>>>
>>> Signed-off-by: Dave Chinner <dchinner@redhat.com>
>>> ---
>>>   src/bulkstat_unlink_test.c | 12 ++++++------
>>>   1 file changed, 6 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/src/bulkstat_unlink_test.c b/src/bulkstat_unlink_test.c
>>> index d78cc2ac2..62e5bb978 100644
>>> --- a/src/bulkstat_unlink_test.c
>>> +++ b/src/bulkstat_unlink_test.c
>>> @@ -88,7 +88,7 @@ main(int argc, char *argv[])
>>>   		}
>>>   
>>>   		if (chknb) { /* Get the original number of inodes
>>> (lazy) */
>>> -			sync();
>>> +			syncfs(fd[nfiles]);
>> This looks good to me.
>> Minor: Is it safe to use int fd[nfiles + 1];
> Yes, it is. Variable size array declarations like this have
> been supported by C and C compilers for a long time.
>
>> Isn't it better to use
>> malloc()?
> See my previous comments about stack usage in userspace test code. :)

Yes. I mostly meant in terms of stack overflow issues if "nfiles" is 
large, but I checked C stack space in the userspace is around 8M so that 
should be fine. Thanks.

--NR

>
> -Dave.
>
-- 
Nirjhar Roy
Linux Kernel Developer
IBM, Bangalore


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 25/28] bulkstat_unlink_test_modified.c: remove unused test code
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (23 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 24/28] " Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-04-30  8:47   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 26/28] stale-handle.c: use syncfs() rather than sync() Dave Chinner
                   ` (2 subsequent siblings)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

The built binary is not used by any test. Remove the dead code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 .gitignore                          |   1 -
 src/Makefile                        |   2 +-
 src/bulkstat_unlink_test_modified.c | 193 ----------------------------
 3 files changed, 1 insertion(+), 195 deletions(-)
 delete mode 100644 src/bulkstat_unlink_test_modified.c

diff --git a/.gitignore b/.gitignore
index 4fd817243..feb011c46 100644
--- a/.gitignore
+++ b/.gitignore
@@ -65,7 +65,6 @@ tags
 /src/btrfs_encoded_write
 /src/bulkstat_null_ocount
 /src/bulkstat_unlink_test
-/src/bulkstat_unlink_test_modified
 /src/checkpoint_journal
 /src/chprojid_fail
 /src/cloner
diff --git a/src/Makefile b/src/Makefile
index 6ac72b366..6a31ceb01 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -26,7 +26,7 @@ TARGETS = dirstress fill fill2 getpagesize holes lstat64 \
 LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
 	preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest \
 	locktest unwritten_mmap bulkstat_unlink_test deduperace \
-	bulkstat_unlink_test_modified t_dir_offset t_futimens t_immutable \
+	t_dir_offset t_futimens t_immutable \
 	stale_handle pwrite_mmap_blocked t_dir_offset2 seek_sanity_test \
 	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec cloner \
 	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
diff --git a/src/bulkstat_unlink_test_modified.c b/src/bulkstat_unlink_test_modified.c
deleted file mode 100644
index a106749dc..000000000
--- a/src/bulkstat_unlink_test_modified.c
+++ /dev/null
@@ -1,193 +0,0 @@
-/*
- * $Id: bulkstat_unlink_test_modified.c,v 1.1 2007/10/03 16:23:57 mohamedb.longdrop.melbourne.sgi.com Exp $
- * Test bulkstat doesn't returned unlinked inodes.
- * Mark Goodwin <markgw@sgi.com> Fri Jul 20 09:13:57 EST 2007
- *
- * This is a modified version of bulkstat_unlink_test.c to reproduce a specific
- * problem see pv 969192
- */
-#include <stdlib.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <fcntl.h>
-#include <xfs/xfs.h>
-#include <unistd.h>
-#include <string.h>
-
-int
-main(int argc, char *argv[])
-{
-    int e;
-    int fd = 0;
-    int i;
-    int j;
-    int k;
-    int nfiles;
-    int stride;
-    struct stat sbuf;
-    ino_t *inodelist;
-    __u32 *genlist;
-    struct xfs_fsop_bulkreq a;
-    struct xfs_bstat *ret;
-    int iterations;
-    char fname[MAXPATHLEN];
-    char *dirname;
-
-    if (argc != 5) {
-    	fprintf(stderr, "Usage: %s iterations nfiles stride dir\n", argv[0]);
-    	fprintf(stderr, "Create dir with nfiles, unlink each stride'th file, sync, bulkstat\n");
-	exit(1);
-    }
-
-    iterations = atoi(argv[1]);
-    nfiles = atoi(argv[2]);
-    stride = atoi(argv[3]);
-    dirname = argv[4];
-    if (!nfiles || !iterations) {
-	fprintf(stderr, "Iterations and nfiles showld be non zero.\n");
-    	exit(1);
-    }
-
-    inodelist = (ino_t *)malloc(nfiles * sizeof(ino_t));
-    genlist = (__u32 *)malloc(nfiles * sizeof(__u32));
-    ret = (struct xfs_bstat *)malloc(nfiles * sizeof(struct xfs_bstat));
-
-    for (k=0; k < iterations; k++) {
-	xfs_ino_t last_inode = 0;
-	int count = 0;
-	int testFiles = 0;
-
-	printf("Iteration %d ... \n", k);
-
-	memset(inodelist, 0, nfiles * sizeof(ino_t));
-	memset(genlist, 0, nfiles * sizeof(__u32));
-	memset(ret, 0, nfiles * sizeof(struct xfs_bstat));
-	memset(&a, 0, sizeof(struct xfs_fsop_bulkreq));
-	a.lastip = (__u64 *)&last_inode;
-	a.icount = nfiles;
-	a.ubuffer = ret;
-	a.ocount = &count;
-
-	if (mkdir(dirname, 0755) < 0) {
-	    perror(dirname);
-	    exit(1);
-	}
-
-	/* create nfiles and store their inode numbers in inodelist */
-	for (i=0; i < nfiles; i++) {
-	    sprintf(fname, "%s/file%06d", dirname, i);
-	    if ((fd = open(fname, O_RDWR | O_CREAT | O_TRUNC, 0644)) < 0) {
-		perror(fname);
-		exit(1);
-	    }
-	    write(fd, fname, sizeof(fname));
-	    if (fstat(fd, &sbuf) < 0) {
-		perror(fname);
-		exit(1);
-	    }
-	    inodelist[i] = sbuf.st_ino;
-	    close(fd);
-	}
-	
-	sync();
-	
-	/* collect bs_gen for the nfiles files */
-	if ((fd = open(dirname, O_RDONLY)) < 0) {
-	    perror(dirname);
-	    exit(1);
-	}
-
-	testFiles = 0;
-	for (;;) {
-	    if ((e = xfsctl(dirname, fd, XFS_IOC_FSBULKSTAT, &a)) < 0) {
-		perror("XFS_IOC_FSBULKSTAT1:");
-		exit(1);
-	    }
-
-	    if (count == 0)
-		break;
-
-	    for (i=0; i < count; i++) {
-		for (j=0; j < nfiles; j += stride) {
-		    if (ret[i].bs_ino == inodelist[j]) {
-			genlist[j] = ret[i].bs_gen;
-			testFiles++;
-		    }
-		}
-	    }
-	}
-	close(fd);
-	
-	printf("testFiles %d ... \n", testFiles);
-
-	/* remove some of the first set of files */
-	for (i=0; i < nfiles; i += stride) {
-	    sprintf(fname, "%s/file%06d", dirname, i);
-	    if (unlink(fname) < 0) {
-	    	perror(fname);
-		exit(1);
-	    }
-	}
-
-	/* create a new set of files (replacing the unlinked ones) */
-	for (i=0; i < nfiles; i += stride) {
-	    sprintf(fname, "%s/file%06d", dirname, i);
-	    if ((fd = open(fname, O_RDWR | O_CREAT | O_TRUNC, 0644)) < 0) {
-		perror(fname);
-		exit(1);
-	    }
-	    write(fd, fname, sizeof(fname));
-	    close(fd);
-	}
-
-	sync();
-	last_inode = 0; count = 0;
-
-	if ((fd = open(dirname, O_RDONLY)) < 0) {
-	    perror(dirname);
-	    exit(1);
-	}
-
-	for (;;) {
-	    if ((e = xfsctl(dirname, fd, XFS_IOC_FSBULKSTAT, &a)) < 0) {
-		perror("XFS_IOC_FSBULKSTAT:");
-		exit(1);
-	    }
-
-	    if (count == 0)
-		    break;
-
-	    for (i=0; i < count; i++) {
-		for (j=0; j < nfiles; j += stride) {
-		    if ((ret[i].bs_ino == inodelist[j]) &&
-			(ret[i].bs_gen == genlist[j])) {
-			/* oops, the same inode with old gen number */
-			printf("Unlinked inode %llu with generation %d "
-			       "returned by bulkstat\n",
-				(unsigned long long)inodelist[j],
-				 genlist[j]);
-			exit(1);
-		    }
-		    if (ret[i].bs_ino == inodelist[j] &&
-			ret[i].bs_gen != genlist[j] + 1) {
-			/* oops, the new gen number is not 1 bigger than the old */
-			printf("Inode with old generation %d, new generation %d\n",
-			genlist[j], ret[i].bs_gen);
-			exit(1);
-		    }
-		}
-	    }
-	}
-
-	close(fd);
-
-	sprintf(fname, "rm -rf %s\n", dirname);
-	system(fname);
-
-	sync();
-	sleep(2);
-	printf("passed\n");
-    }
-
-    exit(0);
-}
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 25/28] bulkstat_unlink_test_modified.c: remove unused test code
  2025-04-17  3:01 ` [PATCH 25/28] bulkstat_unlink_test_modified.c: remove unused test code Dave Chinner
@ 2025-04-30  8:47   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  8:47 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> The built binary is not used by any test. Remove the dead code.
Yes.
Feel free to add 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  .gitignore                          |   1 -
>  src/Makefile                        |   2 +-
>  src/bulkstat_unlink_test_modified.c | 193 --------------------------
> --
>  3 files changed, 1 insertion(+), 195 deletions(-)
>  delete mode 100644 src/bulkstat_unlink_test_modified.c
> 
> diff --git a/.gitignore b/.gitignore
> index 4fd817243..feb011c46 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -65,7 +65,6 @@ tags
>  /src/btrfs_encoded_write
>  /src/bulkstat_null_ocount
>  /src/bulkstat_unlink_test
> -/src/bulkstat_unlink_test_modified
>  /src/checkpoint_journal
>  /src/chprojid_fail
>  /src/cloner
> diff --git a/src/Makefile b/src/Makefile
> index 6ac72b366..6a31ceb01 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -26,7 +26,7 @@ TARGETS = dirstress fill fill2 getpagesize holes
> lstat64 \
>  LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize
> preallo_rw_pattern_reader \
>  	preallo_rw_pattern_writer ftrunc trunc fs_perms testx looptest
> \
>  	locktest unwritten_mmap bulkstat_unlink_test deduperace \
> -	bulkstat_unlink_test_modified t_dir_offset t_futimens
> t_immutable \
> +	t_dir_offset t_futimens t_immutable \
>  	stale_handle pwrite_mmap_blocked t_dir_offset2 seek_sanity_test
> \
>  	seek_copy_test t_readdir_1 t_readdir_2 fsync-tester nsexec
> cloner \
>  	renameat2 t_getcwd e4compact test-nextquota punch-alternating \
> diff --git a/src/bulkstat_unlink_test_modified.c
> b/src/bulkstat_unlink_test_modified.c
> deleted file mode 100644
> index a106749dc..000000000
> --- a/src/bulkstat_unlink_test_modified.c
> +++ /dev/null
> @@ -1,193 +0,0 @@
> -/*
> - * $Id: bulkstat_unlink_test_modified.c,v 1.1 2007/10/03 16:23:57
> mohamedb.longdrop.melbourne.sgi.com Exp $
> - * Test bulkstat doesn't returned unlinked inodes.
> - * Mark Goodwin <markgw@sgi.com> Fri Jul 20 09:13:57 EST 2007
> - *
> - * This is a modified version of bulkstat_unlink_test.c to reproduce
> a specific
> - * problem see pv 969192
> - */
> -#include <stdlib.h>
> -#include <sys/types.h>
> -#include <sys/stat.h>
> -#include <fcntl.h>
> -#include <xfs/xfs.h>
> -#include <unistd.h>
> -#include <string.h>
> -
> -int
> -main(int argc, char *argv[])
> -{
> -    int e;
> -    int fd = 0;
> -    int i;
> -    int j;
> -    int k;
> -    int nfiles;
> -    int stride;
> -    struct stat sbuf;
> -    ino_t *inodelist;
> -    __u32 *genlist;
> -    struct xfs_fsop_bulkreq a;
> -    struct xfs_bstat *ret;
> -    int iterations;
> -    char fname[MAXPATHLEN];
> -    char *dirname;
> -
> -    if (argc != 5) {
> -    	fprintf(stderr, "Usage: %s iterations nfiles stride dir\n",
> argv[0]);
> -    	fprintf(stderr, "Create dir with nfiles, unlink each stride'th
> file, sync, bulkstat\n");
> -	exit(1);
> -    }
> -
> -    iterations = atoi(argv[1]);
> -    nfiles = atoi(argv[2]);
> -    stride = atoi(argv[3]);
> -    dirname = argv[4];
> -    if (!nfiles || !iterations) {
> -	fprintf(stderr, "Iterations and nfiles showld be non zero.\n");
> -    	exit(1);
> -    }
> -
> -    inodelist = (ino_t *)malloc(nfiles * sizeof(ino_t));
> -    genlist = (__u32 *)malloc(nfiles * sizeof(__u32));
> -    ret = (struct xfs_bstat *)malloc(nfiles * sizeof(struct
> xfs_bstat));
> -
> -    for (k=0; k < iterations; k++) {
> -	xfs_ino_t last_inode = 0;
> -	int count = 0;
> -	int testFiles = 0;
> -
> -	printf("Iteration %d ... \n", k);
> -
> -	memset(inodelist, 0, nfiles * sizeof(ino_t));
> -	memset(genlist, 0, nfiles * sizeof(__u32));
> -	memset(ret, 0, nfiles * sizeof(struct xfs_bstat));
> -	memset(&a, 0, sizeof(struct xfs_fsop_bulkreq));
> -	a.lastip = (__u64 *)&last_inode;
> -	a.icount = nfiles;
> -	a.ubuffer = ret;
> -	a.ocount = &count;
> -
> -	if (mkdir(dirname, 0755) < 0) {
> -	    perror(dirname);
> -	    exit(1);
> -	}
> -
> -	/* create nfiles and store their inode numbers in inodelist */
> -	for (i=0; i < nfiles; i++) {
> -	    sprintf(fname, "%s/file%06d", dirname, i);
> -	    if ((fd = open(fname, O_RDWR | O_CREAT | O_TRUNC, 0644)) <
> 0) {
> -		perror(fname);
> -		exit(1);
> -	    }
> -	    write(fd, fname, sizeof(fname));
> -	    if (fstat(fd, &sbuf) < 0) {
> -		perror(fname);
> -		exit(1);
> -	    }
> -	    inodelist[i] = sbuf.st_ino;
> -	    close(fd);
> -	}
> -	
> -	sync();
> -	
> -	/* collect bs_gen for the nfiles files */
> -	if ((fd = open(dirname, O_RDONLY)) < 0) {
> -	    perror(dirname);
> -	    exit(1);
> -	}
> -
> -	testFiles = 0;
> -	for (;;) {
> -	    if ((e = xfsctl(dirname, fd, XFS_IOC_FSBULKSTAT, &a)) < 0)
> {
> -		perror("XFS_IOC_FSBULKSTAT1:");
> -		exit(1);
> -	    }
> -
> -	    if (count == 0)
> -		break;
> -
> -	    for (i=0; i < count; i++) {
> -		for (j=0; j < nfiles; j += stride) {
> -		    if (ret[i].bs_ino == inodelist[j]) {
> -			genlist[j] = ret[i].bs_gen;
> -			testFiles++;
> -		    }
> -		}
> -	    }
> -	}
> -	close(fd);
> -	
> -	printf("testFiles %d ... \n", testFiles);
> -
> -	/* remove some of the first set of files */
> -	for (i=0; i < nfiles; i += stride) {
> -	    sprintf(fname, "%s/file%06d", dirname, i);
> -	    if (unlink(fname) < 0) {
> -	    	perror(fname);
> -		exit(1);
> -	    }
> -	}
> -
> -	/* create a new set of files (replacing the unlinked ones) */
> -	for (i=0; i < nfiles; i += stride) {
> -	    sprintf(fname, "%s/file%06d", dirname, i);
> -	    if ((fd = open(fname, O_RDWR | O_CREAT | O_TRUNC, 0644)) <
> 0) {
> -		perror(fname);
> -		exit(1);
> -	    }
> -	    write(fd, fname, sizeof(fname));
> -	    close(fd);
> -	}
> -
> -	sync();
> -	last_inode = 0; count = 0;
> -
> -	if ((fd = open(dirname, O_RDONLY)) < 0) {
> -	    perror(dirname);
> -	    exit(1);
> -	}
> -
> -	for (;;) {
> -	    if ((e = xfsctl(dirname, fd, XFS_IOC_FSBULKSTAT, &a)) < 0)
> {
> -		perror("XFS_IOC_FSBULKSTAT:");
> -		exit(1);
> -	    }
> -
> -	    if (count == 0)
> -		    break;
> -
> -	    for (i=0; i < count; i++) {
> -		for (j=0; j < nfiles; j += stride) {
> -		    if ((ret[i].bs_ino == inodelist[j]) &&
> -			(ret[i].bs_gen == genlist[j])) {
> -			/* oops, the same inode with old gen number */
> -			printf("Unlinked inode %llu with generation %d
> "
> -			       "returned by bulkstat\n",
> -				(unsigned long long)inodelist[j],
> -				 genlist[j]);
> -			exit(1);
> -		    }
> -		    if (ret[i].bs_ino == inodelist[j] &&
> -			ret[i].bs_gen != genlist[j] + 1) {
> -			/* oops, the new gen number is not 1 bigger
> than the old */
> -			printf("Inode with old generation %d, new
> generation %d\n",
> -			genlist[j], ret[i].bs_gen);
> -			exit(1);
> -		    }
> -		}
> -	    }
> -	}
> -
> -	close(fd);
> -
> -	sprintf(fname, "rm -rf %s\n", dirname);
> -	system(fname);
> -
> -	sync();
> -	sleep(2);
> -	printf("passed\n");
> -    }
> -
> -    exit(0);
> -}


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 26/28] stale-handle.c: use syncfs() rather than sync()
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (24 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 25/28] bulkstat_unlink_test_modified.c: remove unused test code Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-04-30  8:34   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 27/28] scaleread: remove dead test code Dave Chinner
  2025-04-17  3:01 ` [PATCH 28/28] xfs/259: no need to call sync Dave Chinner
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

xfs/238 runs stale_handle to create 1000 inodes, grab their file
handles, unlink them and then try to open them from the stored file
handles.  It takes a ridiculously long time to run under
check-parallel because it runs sync() multiple times:

xfs/238        144s

When running check-parallel, sync() can take a -long- time to
run as there can be dozens of filesystems that need to be synced,
not to mention sync getting hung up behind all the mount and
unmounts that are also being run.

Convert the sync() calls to syncfs() so that they only try to sync
the filesystem under test and not the entire system. This avoids
interactions and delays with other tests and mount/unmount
operations, hence allowing both the test and the overall
check-parallel operation to run faster:

xfs/238        5s

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 src/stale_handle.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/src/stale_handle.c b/src/stale_handle.c
index 2acd4968c..fd8b3ec61 100644
--- a/src/stale_handle.c
+++ b/src/stale_handle.c
@@ -21,7 +21,7 @@
 int main(int argc, char **argv)
 {
 	int	i;
-	int	fd;
+	int	fd, test_dir_fd;
 	int	ret;
 	int	failed = 0;
 	char	fname[MAXPATHLEN];
@@ -44,6 +44,12 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
+	test_dir_fd = open(test_dir, O_RDONLY|O_DIRECTORY);
+	if (test_dir_fd < 0) {
+		perror(test_dir);
+		return EXIT_FAILURE;
+	}
+
 	ret = path_to_fshandle(test_dir, (void **)fshandle, &fshlen);
 	if (ret < 0) {
 		perror("path_to_fshandle");
@@ -66,7 +72,7 @@ int main(int argc, char **argv)
 	}
 
 	/* sync to get the new inodes to hit the disk */
-	sync();
+	syncfs(test_dir_fd);
 
 	/* create the handles */
 	for (i=0; i < NUMFILES; i++) {
@@ -89,10 +95,10 @@ int main(int argc, char **argv)
 	}
 
 	/* sync to get log forced for unlink transactions to hit the disk */
-	sync();
+	syncfs(test_dir_fd);
 
 	/* sync once more FTW */
-	sync();
+	syncfs(test_dir_fd);
 
 	/*
 	 * now drop the caches so that unlinked inodes are reclaimed and
@@ -120,6 +126,7 @@ int main(int argc, char **argv)
 		free_handle(handle[i], hlen[i]);
 		failed++;
 	}
+	close(test_dir_fd);
 	if (failed)
 		return EXIT_FAILURE;
 	return EXIT_SUCCESS;
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 26/28] stale-handle.c: use syncfs() rather than sync()
  2025-04-17  3:01 ` [PATCH 26/28] stale-handle.c: use syncfs() rather than sync() Dave Chinner
@ 2025-04-30  8:34   ` Nirjhar Roy (IBM)
  2025-05-21  2:24     ` Dave Chinner
  0 siblings, 1 reply; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  8:34 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> xfs/238 runs stale_handle to create 1000 inodes, grab their file
> handles, unlink them and then try to open them from the stored file
> handles.  It takes a ridiculously long time to run under
> check-parallel because it runs sync() multiple times:
> 
> xfs/238        144s
> 
> When running check-parallel, sync() can take a -long- time to
> run as there can be dozens of filesystems that need to be synced,
> not to mention sync getting hung up behind all the mount and
> unmounts that are also being run.
> 
> Convert the sync() calls to syncfs() so that they only try to sync
> the filesystem under test and not the entire system. This avoids
> interactions and delays with other tests and mount/unmount
> operations, hence allowing both the test and the overall
> check-parallel operation to run faster:
Yes, filesystem wise sync is makes more sense. I remember similar
changes with your previous check-parralel patch series as well.
> 
> xfs/238        5s
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  src/stale_handle.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/src/stale_handle.c b/src/stale_handle.c
> index 2acd4968c..fd8b3ec61 100644
> --- a/src/stale_handle.c
> +++ b/src/stale_handle.c
> @@ -21,7 +21,7 @@
>  int main(int argc, char **argv)
>  {
>  	int	i;
> -	int	fd;
> +	int	fd, test_dir_fd;
>  	int	ret;
>  	int	failed = 0;
>  	char	fname[MAXPATHLEN];
Not related to this change: 
A lot of local variables are being used in this test:
char	fname[MAXPATHLEN];
void	*handle[NUMFILES];
size_t	hlen[NUMFILES];
char fshandle[256];
Is this fine? Shouldn't we limit the usage of so much local stack?

> @@ -44,6 +44,12 @@ int main(int argc, char **argv)
>  		return EXIT_FAILURE;
>  	}
>  
> +	test_dir_fd = open(test_dir, O_RDONLY|O_DIRECTORY);
> +	if (test_dir_fd < 0) {
> +		perror(test_dir);
> +		return EXIT_FAILURE;
> +	}
> +
Now that we are already opening the test_dir to get an fd of the
filessytem being tested, should we remove the stat call(line 42) just
before this open call and instead use this open call to check the
presence/absence of test_dir?

Other than this, the rest of the changes look good to me (some minor
comments below but not related to this change).

Feel free to add 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

 	ret = path_to_fshandle(test_dir, (void **)fshandle, &fshlen);
>  	if (ret < 0) {
>  		perror("path_to_fshandle");
> @@ -66,7 +72,7 @@ int main(int argc, char **argv)
>  	}
>  
>  	/* sync to get the new inodes to hit the disk */
> -	sync();
> +	syncfs(test_dir_fd);
>  
>  	/* create the handles */
>  	for (i=0; i < NUMFILES; i++) {
Not related to this change and minor:
Shouldn't we replace usage of sprintf with snprintf()s?
--NR
> @@ -89,10 +95,10 @@ int main(int argc, char **argv)
>  	}
>  
>  	/* sync to get log forced for unlink transactions to hit the
> disk */
> -	sync();
> +	syncfs(test_dir_fd);
>  
>  	/* sync once more FTW */
> -	sync();
> +	syncfs(test_dir_fd);
>  
>  	/*
>  	 * now drop the caches so that unlinked inodes are reclaimed
> and
> @@ -120,6 +126,7 @@ int main(int argc, char **argv)
>  		free_handle(handle[i], hlen[i]);
>  		failed++;
>  	}
> +	close(test_dir_fd);
>  	if (failed)
>  		return EXIT_FAILURE;
>  	return EXIT_SUCCESS;


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 26/28] stale-handle.c: use syncfs() rather than sync()
  2025-04-30  8:34   ` Nirjhar Roy (IBM)
@ 2025-05-21  2:24     ` Dave Chinner
  0 siblings, 0 replies; 80+ messages in thread
From: Dave Chinner @ 2025-05-21  2:24 UTC (permalink / raw)
  To: Nirjhar Roy (IBM); +Cc: fstests, zlang

On Wed, Apr 30, 2025 at 02:04:08PM +0530, Nirjhar Roy (IBM) wrote:
> On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> > xfs/238 runs stale_handle to create 1000 inodes, grab their file
> > handles, unlink them and then try to open them from the stored file
> > handles.  It takes a ridiculously long time to run under
> > check-parallel because it runs sync() multiple times:
> > 
> > xfs/238        144s
> > 
> > When running check-parallel, sync() can take a -long- time to
> > run as there can be dozens of filesystems that need to be synced,
> > not to mention sync getting hung up behind all the mount and
> > unmounts that are also being run.
> > 
> > Convert the sync() calls to syncfs() so that they only try to sync
> > the filesystem under test and not the entire system. This avoids
> > interactions and delays with other tests and mount/unmount
> > operations, hence allowing both the test and the overall
> > check-parallel operation to run faster:
> Yes, filesystem wise sync is makes more sense. I remember similar
> changes with your previous check-parralel patch series as well.
> > 
> > xfs/238        5s
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  src/stale_handle.c | 15 +++++++++++----
> >  1 file changed, 11 insertions(+), 4 deletions(-)
> > 
> > diff --git a/src/stale_handle.c b/src/stale_handle.c
> > index 2acd4968c..fd8b3ec61 100644
> > --- a/src/stale_handle.c
> > +++ b/src/stale_handle.c
> > @@ -21,7 +21,7 @@
> >  int main(int argc, char **argv)
> >  {
> >  	int	i;
> > -	int	fd;
> > +	int	fd, test_dir_fd;
> >  	int	ret;
> >  	int	failed = 0;
> >  	char	fname[MAXPATHLEN];
> Not related to this change: 
> A lot of local variables are being used in this test:
> char	fname[MAXPATHLEN];
> void	*handle[NUMFILES];
> size_t	hlen[NUMFILES];
> char fshandle[256];
> Is this fine? Shouldn't we limit the usage of so much local stack?

It's userspace and it's not a big/complex program. Using stack like
this instead of allocating memory is simple, fast, and easy to
maintain. There's no reason to complicate the code unnecessarily.

> > @@ -44,6 +44,12 @@ int main(int argc, char **argv)
> >  		return EXIT_FAILURE;
> >  	}
> >  
> > +	test_dir_fd = open(test_dir, O_RDONLY|O_DIRECTORY);
> > +	if (test_dir_fd < 0) {
> > +		perror(test_dir);
> > +		return EXIT_FAILURE;
> > +	}
> > +
> Now that we are already opening the test_dir to get an fd of the
> filessytem being tested, should we remove the stat call(line 42) just
> before this open call and instead use this open call to check the
> presence/absence of test_dir?

We can, it doesn't matter either way...

> Other than this, the rest of the changes look good to me (some minor
> comments below but not related to this change).
> 
> Feel free to add 
> Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

Thanks!

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 27/28] scaleread: remove dead test code
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (25 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 26/28] stale-handle.c: use syncfs() rather than sync() Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-04-30  8:10   ` Nirjhar Roy (IBM)
  2025-04-17  3:01 ` [PATCH 28/28] xfs/259: no need to call sync Dave Chinner
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

From: Dave Chinner <dchinner@redhat.com>

scaleread.{c,sh} is a one-off test case for a "will-it-scale" page
cache read workload from NASA back in 2003. This has not been
directly exercised by fstests since it was added 20 years ago.
The scaleread.c source code isn't even built by src/Makefile, so
it is definitely stale, dead code. Remove it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 src/Makefile     |   2 +-
 src/scaleread.c  | 224 -----------------------------------------------
 src/scaleread.sh |  64 --------------
 3 files changed, 1 insertion(+), 289 deletions(-)
 delete mode 100644 src/scaleread.c
 delete mode 100644 src/scaleread.sh

diff --git a/src/Makefile b/src/Makefile
index 6a31ceb01..fe7441068 100644
--- a/src/Makefile
+++ b/src/Makefile
@@ -37,7 +37,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize preallo_rw_pattern_reader \
 	detached_mounts_propagation ext4_resize t_readdir_3 splice2pipe \
 	uuid_ioctl t_snapshot_deleted_subvolume fiemap-fault min_dio_alignment
 
-EXTRA_EXECS = dmerror fill2attr fill2fs fill2fs_check scaleread.sh \
+EXTRA_EXECS = dmerror fill2attr fill2fs fill2fs_check \
 	      btrfs_crc32c_forged_name.py popdir.pl popattr.py \
 	      soak_duration.awk parse-dev-tree.awk parse-extent-tree.awk
 
diff --git a/src/scaleread.c b/src/scaleread.c
deleted file mode 100644
index 4a1def005..000000000
--- a/src/scaleread.c
+++ /dev/null
@@ -1,224 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Copyright (c) 2003-2004 Silicon Graphics, Inc.
- * All Rights Reserved.
- */
-/*
- * Test scaling of multiple processes opening/reading
- * a number of small files simultaneously.
- *	- create <f> files
- *	- fork <n> processes
- *	- wait for all processes ready
- *	- start all proceses at the same time
- *	- each processes opens , read, closes each file
- *	- option to resync each process at each file
- *
- *	test [-c cpus] [-b bytes] [-f files] [-v] [-s] [-S]
- *			OR
- *	test -i [-b bytes] [-f files] 
- */
-#include <unistd.h>
-#include <string.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <sys/types.h>
-#include <sys/stat.h>
-#include <sys/wait.h>
-#include <fcntl.h>
-#include <stdlib.h>
-#include <sys/ipc.h>
-#include <sys/shm.h>
-
-void do_initfiles(void);
-void slave(int);
-
-#define VPRINT(x...)	do { if(verbose) fprintf(x);} while(0)
-#define perrorx(s) do {perror(s); exit(1);} while (0)
-
-long bytes=8192;
-int cpus=1;
-int init=0;
-int strided=0;
-int files=1;
-int blksize=512;
-int syncstep=0;
-int verbose=0;
-
-typedef struct {
-        volatile long   go;
-        long            fill[15];
-        volatile long   rdy[512];
-} share_t;
-
-share_t	*sharep;
-
-
-int
-runon(int cpu)
-{
-#ifdef sys_sched_setaffinity
-	unsigned long mask[8];
-	
-	if (cpu < 0 || cpu >= 512)
-		return -1;
-	memset(mask, 0, sizeof(mask));
-	mask[cpu/64] |= 1UL<<(cpu&63);
-
-	if (syscall(sys_sched_setaffinity, 0, sizeof(mask), mask))
-		return -1;
-#endif
-	return 0;
-}
-
-long
-scaled_atol(char *p)
-{
-	long val;
-	char  *pe;
-
-	val = strtol(p, &pe, 0);
-	if (*pe == 'K' || *pe == 'k')
-		val *= 1024L;
-	else if (*pe == 'M' || *pe == 'm')
-		val *= 1024L*1024L;
-	else if (*pe == 'G' || *pe == 'g')
-		val *= 1024L*1024L*1024L;
-	else if (*pe == 'p' || *pe == 'P')
-		val *= getpagesize();
-	return val;
-}
-
-
-int
-main(int argc, char** argv) {
-        int shmid;
-        static  char            optstr[] = "c:b:f:sSivH";
-        int                     notdone, stat, i, j, c, er=0;
-
-        opterr=1;
-        while ((c = getopt(argc, argv, optstr)) != EOF)
-                switch (c) {
-                case 'c':
-                        cpus = atoi(optarg);
-                        break;
-                case 'b':
-                        bytes = scaled_atol(optarg);
-                        break;
-                case 'f':
-                        files = atoi(optarg);
-                        break;
-                case 'i':
-                        init++;
-                        break;
-                case 's':
-                        syncstep++;
-                        break;
-                case 'S':
-                        strided++;
-                        break;
-                case 'v':
-                        verbose++;
-                        break;
-                case '?':
-                        er = 1;
-                        break;
-                }
-        if (er) {
-                printf("usage: %s %s\n", argv[0], optstr);
-                exit(1);
-        }
-
-
-	if ((shmid = shmget(IPC_PRIVATE, sizeof (share_t), IPC_CREAT|SHM_R|SHM_W))  == -1)
-		perrorx("shmget failed");
-	sharep = (share_t*)shmat(shmid, (void*)0, SHM_R|SHM_W);
-	memset(sharep, -1, sizeof (share_t));
-
-	if (init) {
-		do_initfiles();
-		exit(0);
-	}
-        for (i=0; i<cpus; i++) {
-                if (fork() == 0)
-                        slave(i);
-        }
-
-	for (i=0; i<files; i++) {
-		VPRINT(stderr, "%d:", i);
-		notdone = cpus;
-		do {
-			for (j=0; j<cpus; j++) {
-				if (sharep->rdy[j] == i) {
-					sharep->rdy[j] = -1;
-					VPRINT(stderr, " %d", j);
-					notdone--;
-				}
-			}
-		} while (notdone);
-		VPRINT(stderr, "\n");
-		sharep->go = i;
-		if (!syncstep)
-			break;
-	}
-	VPRINT(stderr, "\n");
-
-        while (wait(&stat)> 0)
-		VPRINT(stderr, ".");
-	VPRINT(stderr, "\n");
-
-	exit(0);
-}
-
-void 
-slave(int id)
-{
-	int	i, fd, byte;
-	char	*buf, filename[32];
-
-	runon (id+1);
-	buf = malloc(blksize);
-	bzero(buf, blksize);
-	for (i=0; i<files; i++) {
-		if (!i || syncstep) {
-			sharep->rdy[id] = i;
-			while(sharep->go != i);
-		}
-		sprintf(filename, "/tmp/tst.%d", (strided ? ((i + id) % files) : i));
-		if ((fd = open (filename, O_RDONLY)) < 0) {
-			perrorx(filename);
-		}
-	
-		for (byte=0; byte<bytes; byte+=blksize) {
-			if (read (fd, buf, blksize) != blksize)
-				perrorx("read of file failed");
-		}
-		close(fd);
-	}
-	exit(0);
-}
-
-void
-do_initfiles(void)
-{
-	int	i, fd, byte;
-	char	*buf, filename[32];
-
-	buf = malloc(blksize);
-	bzero(buf, blksize);
-
-	for (i=0; i<files; i++) {
-		sprintf(filename, "/tmp/tst.%d", i);
-		unlink(filename);
-		if ((fd = open (filename, O_RDWR|O_CREAT, 0644)) < 0)
-			perrorx(filename);
-	
-		for (byte=0; byte<bytes; byte+=blksize) {
-			if (write (fd, buf, blksize) != blksize)
-				perrorx("write of file failed");
-		}
-		close(fd);
-	}
-	sync();
-}
-
-
diff --git a/src/scaleread.sh b/src/scaleread.sh
deleted file mode 100644
index 691b8eb12..000000000
--- a/src/scaleread.sh
+++ /dev/null
@@ -1,64 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2003-2004 Silicon Graphics, Inc.  All Rights Reserved.
-#
-
-help() {
-cat <<END
-Measure scaling of multiple cpus readin the same set of files.
-(NASA testcase).
-	Usage:  $0 [-b <bytes>] [-f <files>] [-s] [-B] [-v] cpus ...
-			or
-		$0 -i [-b <bytes>] [-f <files>] 
-
-	  -b file size in bytes
-	  -f number of files
-	  -s keep processes synchronized when reading files
-	  -B use bcfree to free buffer cache pages before each run
-END
-exit 1
-}
-
-err () {
-	echo "ERROR - $*"
-	exit 1
-}
-
-BYTES=8192
-FILES=10
-SYNC=""
-VERBOSE=""
-STRIDED=""
-BCFREE=0
-INIT=0
-OPTS="f:b:vsiSBH"
-while getopts "$OPTS" c ; do
-	case $c in
-		H)  help;;
-		f)  FILES=${OPTARG};;
-		b)  BYTES=${OPTARG};;
-		i)  INIT=1;;
-		B)  BCFREE=1;;
-		S)  STRIDED="-S";;
-		s)  SYNC="-s";;
-		v)  VERBOSE="-v";;
-		\?) help;;
-	esac
-
-done
-shift `expr $OPTIND - 1`
-
-if [ $INIT -gt 0 ] ; then
-	echo "Initializing $BYTES bytes, $FILES files"
-	./scaleread $VERBOSE -i -b $BYTES -f $FILES 
-	sync
-else
-	[ $# -gt 0 ] || help
-	echo "Testing $BYTES bytes, $FILES files"
-	for CPUS in $* ; do
-		[ $BCFREE -eq 0 ] || bcfree -a
-		/usr/bin/time -f "$CPUS:  %e wall,    %S sys,   %U user" ./scaleread \
-			$SYNC $STRIDED $VERBOSE -b $BYTES -f $FILES -c $CPUS
-	done
-fi
-
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 27/28] scaleread: remove dead test code
  2025-04-17  3:01 ` [PATCH 27/28] scaleread: remove dead test code Dave Chinner
@ 2025-04-30  8:10   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  8:10 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> scaleread.{c,sh} is a one-off test case for a "will-it-scale" page
> cache read workload from NASA back in 2003. This has not been
> directly exercised by fstests since it was added 20 years ago.
> The scaleread.c source code isn't even built by src/Makefile, so
> it is definitely stale, dead code. Remove it.
I did a grep for "scaleread" and yes, it seems that the file
scaleread.sh only gets installed/copied into $(PKG_LIB_DIR)/src,
however it doesn't get invoked and called from any call site. Also, the
binary "scaleread" is only referenced/invoked from scaleread.sh but
again, neither scaleread is compiled in the Makefile nor scaleread.sh
gets invoked from anywhere. So this change makes sense to me.

Feel free to add 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  src/Makefile     |   2 +-
>  src/scaleread.c  | 224 -------------------------------------------
> ----
>  src/scaleread.sh |  64 --------------
>  3 files changed, 1 insertion(+), 289 deletions(-)
>  delete mode 100644 src/scaleread.c
>  delete mode 100644 src/scaleread.sh
> 
> diff --git a/src/Makefile b/src/Makefile
> index 6a31ceb01..fe7441068 100644
> --- a/src/Makefile
> +++ b/src/Makefile
> @@ -37,7 +37,7 @@ LINUX_TARGETS = xfsctl bstat t_mtab getdevicesize
> preallo_rw_pattern_reader \
>  	detached_mounts_propagation ext4_resize t_readdir_3 splice2pipe
> \
>  	uuid_ioctl t_snapshot_deleted_subvolume fiemap-fault
> min_dio_alignment
>  
> -EXTRA_EXECS = dmerror fill2attr fill2fs fill2fs_check scaleread.sh \
> +EXTRA_EXECS = dmerror fill2attr fill2fs fill2fs_check \
>  	      btrfs_crc32c_forged_name.py popdir.pl popattr.py \
>  	      soak_duration.awk parse-dev-tree.awk parse-extent-
> tree.awk
>  
> diff --git a/src/scaleread.c b/src/scaleread.c
> deleted file mode 100644
> index 4a1def005..000000000
> --- a/src/scaleread.c
> +++ /dev/null
> @@ -1,224 +0,0 @@
> -// SPDX-License-Identifier: GPL-2.0
> -/*
> - * Copyright (c) 2003-2004 Silicon Graphics, Inc.
> - * All Rights Reserved.
> - */
> -/*
> - * Test scaling of multiple processes opening/reading
> - * a number of small files simultaneously.
> - *	- create <f> files
> - *	- fork <n> processes
> - *	- wait for all processes ready
> - *	- start all proceses at the same time
> - *	- each processes opens , read, closes each file
> - *	- option to resync each process at each file
> - *
> - *	test [-c cpus] [-b bytes] [-f files] [-v] [-s] [-S]
> - *			OR
> - *	test -i [-b bytes] [-f files] 
> - */
> -#include <unistd.h>
> -#include <string.h>
> -#include <stdio.h>
> -#include <stdlib.h>
> -#include <sys/types.h>
> -#include <sys/stat.h>
> -#include <sys/wait.h>
> -#include <fcntl.h>
> -#include <stdlib.h>
> -#include <sys/ipc.h>
> -#include <sys/shm.h>
> -
> -void do_initfiles(void);
> -void slave(int);
> -
> -#define VPRINT(x...)	do { if(verbose) fprintf(x);} while(0)
> -#define perrorx(s) do {perror(s); exit(1);} while (0)
> -
> -long bytes=8192;
> -int cpus=1;
> -int init=0;
> -int strided=0;
> -int files=1;
> -int blksize=512;
> -int syncstep=0;
> -int verbose=0;
> -
> -typedef struct {
> -        volatile long   go;
> -        long            fill[15];
> -        volatile long   rdy[512];
> -} share_t;
> -
> -share_t	*sharep;
> -
> -
> -int
> -runon(int cpu)
> -{
> -#ifdef sys_sched_setaffinity
> -	unsigned long mask[8];
> -	
> -	if (cpu < 0 || cpu >= 512)
> -		return -1;
> -	memset(mask, 0, sizeof(mask));
> -	mask[cpu/64] |= 1UL<<(cpu&63);
> -
> -	if (syscall(sys_sched_setaffinity, 0, sizeof(mask), mask))
> -		return -1;
> -#endif
> -	return 0;
> -}
> -
> -long
> -scaled_atol(char *p)
> -{
> -	long val;
> -	char  *pe;
> -
> -	val = strtol(p, &pe, 0);
> -	if (*pe == 'K' || *pe == 'k')
> -		val *= 1024L;
> -	else if (*pe == 'M' || *pe == 'm')
> -		val *= 1024L*1024L;
> -	else if (*pe == 'G' || *pe == 'g')
> -		val *= 1024L*1024L*1024L;
> -	else if (*pe == 'p' || *pe == 'P')
> -		val *= getpagesize();
> -	return val;
> -}
> -
> -
> -int
> -main(int argc, char** argv) {
> -        int shmid;
> -        static  char            optstr[] = "c:b:f:sSivH";
> -        int                     notdone, stat, i, j, c, er=0;
> -
> -        opterr=1;
> -        while ((c = getopt(argc, argv, optstr)) != EOF)
> -                switch (c) {
> -                case 'c':
> -                        cpus = atoi(optarg);
> -                        break;
> -                case 'b':
> -                        bytes = scaled_atol(optarg);
> -                        break;
> -                case 'f':
> -                        files = atoi(optarg);
> -                        break;
> -                case 'i':
> -                        init++;
> -                        break;
> -                case 's':
> -                        syncstep++;
> -                        break;
> -                case 'S':
> -                        strided++;
> -                        break;
> -                case 'v':
> -                        verbose++;
> -                        break;
> -                case '?':
> -                        er = 1;
> -                        break;
> -                }
> -        if (er) {
> -                printf("usage: %s %s\n", argv[0], optstr);
> -                exit(1);
> -        }
> -
> -
> -	if ((shmid = shmget(IPC_PRIVATE, sizeof (share_t),
> IPC_CREAT|SHM_R|SHM_W))  == -1)
> -		perrorx("shmget failed");
> -	sharep = (share_t*)shmat(shmid, (void*)0, SHM_R|SHM_W);
> -	memset(sharep, -1, sizeof (share_t));
> -
> -	if (init) {
> -		do_initfiles();
> -		exit(0);
> -	}
> -        for (i=0; i<cpus; i++) {
> -                if (fork() == 0)
> -                        slave(i);
> -        }
> -
> -	for (i=0; i<files; i++) {
> -		VPRINT(stderr, "%d:", i);
> -		notdone = cpus;
> -		do {
> -			for (j=0; j<cpus; j++) {
> -				if (sharep->rdy[j] == i) {
> -					sharep->rdy[j] = -1;
> -					VPRINT(stderr, " %d", j);
> -					notdone--;
> -				}
> -			}
> -		} while (notdone);
> -		VPRINT(stderr, "\n");
> -		sharep->go = i;
> -		if (!syncstep)
> -			break;
> -	}
> -	VPRINT(stderr, "\n");
> -
> -        while (wait(&stat)> 0)
> -		VPRINT(stderr, ".");
> -	VPRINT(stderr, "\n");
> -
> -	exit(0);
> -}
> -
> -void 
> -slave(int id)
> -{
> -	int	i, fd, byte;
> -	char	*buf, filename[32];
> -
> -	runon (id+1);
> -	buf = malloc(blksize);
> -	bzero(buf, blksize);
> -	for (i=0; i<files; i++) {
> -		if (!i || syncstep) {
> -			sharep->rdy[id] = i;
> -			while(sharep->go != i);
> -		}
> -		sprintf(filename, "/tmp/tst.%d", (strided ? ((i + id) %
> files) : i));
> -		if ((fd = open (filename, O_RDONLY)) < 0) {
> -			perrorx(filename);
> -		}
> -	
> -		for (byte=0; byte<bytes; byte+=blksize) {
> -			if (read (fd, buf, blksize) != blksize)
> -				perrorx("read of file failed");
> -		}
> -		close(fd);
> -	}
> -	exit(0);
> -}
> -
> -void
> -do_initfiles(void)
> -{
> -	int	i, fd, byte;
> -	char	*buf, filename[32];
> -
> -	buf = malloc(blksize);
> -	bzero(buf, blksize);
> -
> -	for (i=0; i<files; i++) {
> -		sprintf(filename, "/tmp/tst.%d", i);
> -		unlink(filename);
> -		if ((fd = open (filename, O_RDWR|O_CREAT, 0644)) < 0)
> -			perrorx(filename);
> -	
> -		for (byte=0; byte<bytes; byte+=blksize) {
> -			if (write (fd, buf, blksize) != blksize)
> -				perrorx("write of file failed");
> -		}
> -		close(fd);
> -	}
> -	sync();
> -}
> -
> -
> diff --git a/src/scaleread.sh b/src/scaleread.sh
> deleted file mode 100644
> index 691b8eb12..000000000
> --- a/src/scaleread.sh
> +++ /dev/null
> @@ -1,64 +0,0 @@
> -#!/bin/sh
> -#
> -# Copyright (c) 2003-2004 Silicon Graphics, Inc.  All Rights
> Reserved.
> -#
> -
> -help() {
> -cat <<END
> -Measure scaling of multiple cpus readin the same set of files.
> -(NASA testcase).
> -	Usage:  $0 [-b <bytes>] [-f <files>] [-s] [-B] [-v] cpus ...
> -			or
> -		$0 -i [-b <bytes>] [-f <files>] 
> -
> -	  -b file size in bytes
> -	  -f number of files
> -	  -s keep processes synchronized when reading files
> -	  -B use bcfree to free buffer cache pages before each run
> -END
> -exit 1
> -}
> -
> -err () {
> -	echo "ERROR - $*"
> -	exit 1
> -}
> -
> -BYTES=8192
> -FILES=10
> -SYNC=""
> -VERBOSE=""
> -STRIDED=""
> -BCFREE=0
> -INIT=0
> -OPTS="f:b:vsiSBH"
> -while getopts "$OPTS" c ; do
> -	case $c in
> -		H)  help;;
> -		f)  FILES=${OPTARG};;
> -		b)  BYTES=${OPTARG};;
> -		i)  INIT=1;;
> -		B)  BCFREE=1;;
> -		S)  STRIDED="-S";;
> -		s)  SYNC="-s";;
> -		v)  VERBOSE="-v";;
> -		\?) help;;
> -	esac
> -
> -done
> -shift `expr $OPTIND - 1`
> -
> -if [ $INIT -gt 0 ] ; then
> -	echo "Initializing $BYTES bytes, $FILES files"
> -	./scaleread $VERBOSE -i -b $BYTES -f $FILES 
> -	sync
> -else
> -	[ $# -gt 0 ] || help
> -	echo "Testing $BYTES bytes, $FILES files"
> -	for CPUS in $* ; do
> -		[ $BCFREE -eq 0 ] || bcfree -a
> -		/usr/bin/time -f "$CPUS:  %e wall,    %S sys,   %U
> user" ./scaleread \
> -			$SYNC $STRIDED $VERBOSE -b $BYTES -f $FILES -c
> $CPUS
> -	done
> -fi
> -


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 28/28] xfs/259: no need to call sync
  2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
                   ` (26 preceding siblings ...)
  2025-04-17  3:01 ` [PATCH 27/28] scaleread: remove dead test code Dave Chinner
@ 2025-04-17  3:01 ` Dave Chinner
  2025-04-30  7:56   ` Nirjhar Roy (IBM)
  27 siblings, 1 reply; 80+ messages in thread
From: Dave Chinner @ 2025-04-17  3:01 UTC (permalink / raw)
  To: fstests; +Cc: zlang

xfs/259 runs sync every time through it's loop.
It takes a ridiculously long time to run under
check-parallel:

xfs/259        461s

When running check-parallel, sync can take a -long- time to
run as there can be dozens of filesystems that need to be synced,
not to mention sync getting hung up behind all the mount and
unmounts that are also being run.

sync is used at the end of the loop before destroying the loop
device, but the contents of the loop device is completely discarded
at the start of the next loop. i.e. the image file is unlinked and
recreated. Hence the sync call does nothing useful and only slows
down the test. Removing it runs the test much faster:

xfs/259        23s

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 tests/xfs/259 | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tests/xfs/259 b/tests/xfs/259
index c2d26381a..c40ba3a0e 100755
--- a/tests/xfs/259
+++ b/tests/xfs/259
@@ -52,7 +52,6 @@ for del in $sizes_to_check; do
 		$MKFS_XFS_PROG -l size=32m -b size=$bs $loop_dev |  _filter_mkfs \
 			>/dev/null 2> $tmp.mkfs || echo "mkfs failed!"
 		. $tmp.mkfs
-		sync
 		_destroy_loop_device $loop_dev
 		unset loop_dev
 	done
-- 
2.45.2

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 28/28] xfs/259: no need to call sync
  2025-04-17  3:01 ` [PATCH 28/28] xfs/259: no need to call sync Dave Chinner
@ 2025-04-30  7:56   ` Nirjhar Roy (IBM)
  0 siblings, 0 replies; 80+ messages in thread
From: Nirjhar Roy (IBM) @ 2025-04-30  7:56 UTC (permalink / raw)
  To: Dave Chinner, fstests; +Cc: zlang

On Thu, 2025-04-17 at 13:01 +1000, Dave Chinner wrote:
> xfs/259 runs sync every time through it's loop.
> It takes a ridiculously long time to run under
> check-parallel:
> 
> xfs/259        461s
> 
> When running check-parallel, sync can take a -long- time to
> run as there can be dozens of filesystems that need to be synced,
> not to mention sync getting hung up behind all the mount and
> unmounts that are also being run.
> 
> sync is used at the end of the loop before destroying the loop
> device, but the contents of the loop device is completely discarded
> at the start of the next loop. i.e. the image file is unlinked and
> recreated. Hence the sync call does nothing useful and only slows
> down the test. Removing it runs the test much faster:
> 
> xfs/259        23s
Yes. The sync doesn't do anything useful since the loop device is
destroyed and the underlying loop device image file is also re-created
at the beginning of the loop using dd. 
This looks good to me.
Feel free to add 
Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  tests/xfs/259 | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/tests/xfs/259 b/tests/xfs/259
> index c2d26381a..c40ba3a0e 100755
> --- a/tests/xfs/259
> +++ b/tests/xfs/259
> @@ -52,7 +52,6 @@ for del in $sizes_to_check; do
>  		$MKFS_XFS_PROG -l size=32m -b size=$bs $loop_dev
> |  _filter_mkfs \
>  			>/dev/null 2> $tmp.mkfs || echo "mkfs failed!"
>  		. $tmp.mkfs
> -		sync
>  		_destroy_loop_device $loop_dev
>  		unset loop_dev
>  	done


^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2025-05-26  9:04 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
2025-04-17  3:00 ` [PATCH 01/28] fstests: remove support for non-numeric test names Dave Chinner
2025-04-30  9:17   ` Nirjhar Roy (IBM)
2025-05-21  2:39     ` Dave Chinner
2025-05-26  5:14       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 02/28] _scratch_mkfs_sized: obey USE_EXTERNAL for XFS filesystems Dave Chinner
2025-05-05  6:14   ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 03/28] fstests: move test exit functions to common/exit Dave Chinner
2025-04-17  3:00 ` [PATCH 04/28] check-parallel: report how many tests were _notrun Dave Chinner
2025-05-05  9:58   ` Nirjhar Roy (IBM)
2025-05-21  2:53     ` Dave Chinner
2025-05-26  6:09       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 05/28] check: factor out test list building code Dave Chinner
2025-05-06 11:32   ` Nirjhar Roy (IBM)
2025-05-21  3:55     ` Dave Chinner
2025-05-26  6:48       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 06/28] check-parallel: use common group list parsing code Dave Chinner
2025-05-06 15:56   ` Nirjhar Roy (IBM)
2025-05-21  4:13     ` Dave Chinner
2025-05-26  6:58       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 07/28] check-parallel: adjust concurrency according to CPU count Dave Chinner
2025-05-07  6:45   ` Nirjhar Roy (IBM)
2025-05-21  4:32     ` Dave Chinner
2025-05-26  8:50       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 08/28] check-parallel: add logwrite device support Dave Chinner
2025-05-07  8:18   ` Nirjhar Roy (IBM)
2025-05-21 10:07     ` Dave Chinner
2025-05-26  8:59       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI Dave Chinner
2025-05-07  8:49   ` Nirjhar Roy (IBM)
2025-05-21 10:17     ` Dave Chinner
2025-05-26  9:00       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation Dave Chinner
2025-05-07  9:02   ` Nirjhar Roy (IBM)
2025-05-21 10:19     ` Dave Chinner
2025-05-26  9:04       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 11/28] check-parallel: initial support for specifying device sizes Dave Chinner
2025-05-07 10:05   ` Nirjhar Roy (IBM)
2025-05-21 11:11     ` Dave Chinner
2025-04-17  3:00 ` [PATCH 12/28] config: move config section code to it's own file Dave Chinner
2025-05-09  6:09   ` Nirjhar Roy
2025-05-21 11:28     ` Dave Chinner
2025-04-17  3:00 ` [PATCH 13/28] check-parallel: introduce config file support Dave Chinner
2025-05-09 12:01   ` Nirjhar Roy
2025-05-21 12:23     ` Dave Chinner
2025-04-17  3:00 ` [PATCH 14/28] fstests: further separate sourcing common/rc and common/config from initialisation Dave Chinner
2025-05-10 14:08   ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 15/28] check-parallel: de-batch test execution Dave Chinner
2025-05-09 13:16   ` Nirjhar Roy
2025-04-17  3:00 ` [PATCH 16/28] check-parallel: run sections directly Dave Chinner
2025-05-09 14:03   ` Nirjhar Roy
2025-04-17  3:00 ` [PATCH 17/28] check-parallel: rebuild test list when FSTYP changes Dave Chinner
2025-05-09 16:00   ` Nirjhar Roy
2025-04-17  3:00 ` [PATCH 18/28] check-parallel: create a "results-latest" symlink Dave Chinner
2025-05-10 13:12   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 19/28] check: factor test running Dave Chinner
2025-05-12 13:57   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 20/28] [RFC] check-parallel: run tests directly without using check Dave Chinner
2025-05-13 14:48   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 21/28] generic/531: limit max files per CPU Dave Chinner
2025-05-10 13:15   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 22/28] fsync-tester.c: use syncfs() rather than sync() Dave Chinner
2025-04-30  9:08   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 23/28] open-by-handle.c: " Dave Chinner
2025-04-30  9:02   ` Nirjhar Roy (IBM)
2025-05-21  2:32     ` Dave Chinner
2025-05-26  5:11       ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 24/28] " Dave Chinner
2025-04-30  8:56   ` Nirjhar Roy (IBM)
2025-05-21  2:30     ` Dave Chinner
2025-05-26  4:56       ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 25/28] bulkstat_unlink_test_modified.c: remove unused test code Dave Chinner
2025-04-30  8:47   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 26/28] stale-handle.c: use syncfs() rather than sync() Dave Chinner
2025-04-30  8:34   ` Nirjhar Roy (IBM)
2025-05-21  2:24     ` Dave Chinner
2025-04-17  3:01 ` [PATCH 27/28] scaleread: remove dead test code Dave Chinner
2025-04-30  8:10   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 28/28] xfs/259: no need to call sync Dave Chinner
2025-04-30  7:56   ` Nirjhar Roy (IBM)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.