Re: [PATCH 07/28] check-parallel: adjust concurrency according to CPU count

FS/XFS testing framework
 help / color / mirror / Atom feed

From: "Nirjhar Roy (IBM)" <nirjhar.roy.lists@gmail.com>
To: Dave Chinner <david@fromorbit.com>, fstests@vger.kernel.org
Cc: zlang@kernel.org
Subject: Re: [PATCH 07/28] check-parallel: adjust concurrency according to CPU count
Date: Wed, 07 May 2025 12:15:09 +0530	[thread overview]
Message-ID: <e434a271402602fa7ec5efbcd9d92329ecdd4614.camel@gmail.com> (raw)
In-Reply-To: <20250417031208.1852171-8-david@fromorbit.com>

On Thu, 2025-04-17 at 13:00 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Concurrency is currently hard coded at 64 worker threads. This is
> too many for small CPU count machines; the idea is to create a
> sustained load of roughly one test per CPU as they are mostly single
> threaded/single process tests. The number "64" was chosen because
> I've been developing this functionality on a 64p VM.
> 
> Rather than hard coding the concurrency, probe the number of CPUs
> available and create that many running contexts as the default
> concurrency to use.
> 
> Further, add a CLI option to specify the number of threads to run so
> that we can over- or under-commit the CPU resources to enable direct
> benchmarking of performance with different levels of concurrency.
> 
> Let's use that capability to show how much check-parallel can
> benefit small systems. Using a single check execution thread for all
> tests inside a 4p control group to limit maximum CPU usage to the
> equivalent of a small 4p machine:
> 
> $ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 1 -g quick -s xfs -x dump -X generic/531
> Runner 0 Failures:  generic/504
> Tests run: 921
> Tests _notrun: 272
> Failure count: 2
> .....
> 
> real    61m31.362s
> user    0m0.029s
> sys     0m0.059s
> 
> the quick group on XFS takes *over an hour* to run.
> 
> If we use the same 4p control group setup and run with 8 test
> execution threads to ensure the 4 CPUs are fully utilised for most
> of the test run:
> 
> $ time sudo numactl -C 4-7 ./check-parallel -D /mnt/xfs -t 8 -g quick -s xfs -x dump -X generic/531
> Runner 7 Failures:  generic/504
> Tests run: 921
> Tests _notrun: 145
> Failure count: 1
> .....
> 
> real    17m33.124s
> user    0m0.009s
> sys     0m0.017s
> 
> The same test run takes only 17m33s. The same number of tests were
> run, the same failures occurred. [ Ignore the differences in
> notrun/failure count - the multi-file aggregation currently doesn't
> work correctly for the single log file case. ]
> 
> That's a reduction in test runtime of ~72% for a 4 CPU system. Or,
> if we want to measure it the other way, we get a ~3.5x improvement
> in runtime scalability. i.e. going from 1 -> 4 CPUs being used for
> test execution (4x increase) we get a 3.5x improvement in
> scalability when we go from check to check-parallel.
The functionality looks useful to me and the implementation is also
fine as far as I understand. I have some minor/nit comments below.
Other than that:

Reviewed-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com>

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  check-parallel | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/check-parallel b/check-parallel
> index cb5d6aedf..0649a417f 100755
> --- a/check-parallel
> +++ b/check-parallel
> @@ -10,7 +10,7 @@
>  # the loop devices.
>  
>  basedir=""
> -runners=64
> +runners=$(getconf _NPROCESSORS_CONF)
Minor: Not related to this change. Maybe we should have a _get_nproc()
just like 
_get_page_size()
{
	echo $(getconf PAGE_SIZE)
}
_get_nproc()
{
	echo $(getconf _NPROCESSORS_CONF)
}
and replace all the $(getconf _NPROCESSORS_CONF) with $(_get_nproc)
>  runner_list=()
>  runtimes=()
>  show_test_list=
> @@ -30,6 +30,7 @@ usage()
>  
>  check options
>      -D <dir>		Directory to run in
> +    -t <n>		Number of concurrent tests to  run
Minor: Maybe we should mention the valid range of <n> i.e, 0 to 1024?
>      -n			Output test list, do not run tests

Nit: Maybe there is some spacing issue here? "Number of concurrent ..."
and "Output test..." don't begin together.
--NR
>      -r			randomize test order
>      --exact-order	run tests in the exact order specified
> @@ -81,6 +82,7 @@ while [ $# -gt 0 ]; do
>  	-\? | -h | --help) usage ;;
>  
>  	-D)	basedir=$2; shift ;;
> +	-t)	runners=$2; shift ;;
>  	-g)	_tl_setup_group $2 ; shift ;;
>  	-e)	_tl_setup_exclude_tests $2 ; shift ;;
>  	-E)	_tl_setup_exclude_file $2 ; shift ;;
> @@ -111,6 +113,11 @@ if [ ! -d "$basedir" ]; then
>  	echo "Invalid basedir specification"
>  	usage
>  fi
> +if [[ $runners -le 0 || $runners -gt 1024 ]]; then
> +	echo "Invalid thread specificaton: $runners"
Minor: Maybe we should mention the valid range of "runners" in this
error message (0 to 1024)?
> +	usage
> +fi
> +
>  if [ -d "$basedir/runner-0/" ]; then
>  	prev_results=`ls -tr $basedir/runner-0/ | grep results | tail -1`
>  fi

next prev parent reply	other threads:[~2025-05-07  6:45 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-17  3:00 [PATCH 00/28] check-parallel: Running tests without check Dave Chinner
2025-04-17  3:00 ` [PATCH 01/28] fstests: remove support for non-numeric test names Dave Chinner
2025-04-30  9:17   ` Nirjhar Roy (IBM)
2025-05-21  2:39     ` Dave Chinner
2025-05-26  5:14       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 02/28] _scratch_mkfs_sized: obey USE_EXTERNAL for XFS filesystems Dave Chinner
2025-05-05  6:14   ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 03/28] fstests: move test exit functions to common/exit Dave Chinner
2025-04-17  3:00 ` [PATCH 04/28] check-parallel: report how many tests were _notrun Dave Chinner
2025-05-05  9:58   ` Nirjhar Roy (IBM)
2025-05-21  2:53     ` Dave Chinner
2025-05-26  6:09       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 05/28] check: factor out test list building code Dave Chinner
2025-05-06 11:32   ` Nirjhar Roy (IBM)
2025-05-21  3:55     ` Dave Chinner
2025-05-26  6:48       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 06/28] check-parallel: use common group list parsing code Dave Chinner
2025-05-06 15:56   ` Nirjhar Roy (IBM)
2025-05-21  4:13     ` Dave Chinner
2025-05-26  6:58       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 07/28] check-parallel: adjust concurrency according to CPU count Dave Chinner
2025-05-07  6:45   ` Nirjhar Roy (IBM) [this message]
2025-05-21  4:32     ` Dave Chinner
2025-05-26  8:50       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 08/28] check-parallel: add logwrite device support Dave Chinner
2025-05-07  8:18   ` Nirjhar Roy (IBM)
2025-05-21 10:07     ` Dave Chinner
2025-05-26  8:59       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 09/28] check-parallel: allow FSTYP selection from the CLI Dave Chinner
2025-05-07  8:49   ` Nirjhar Roy (IBM)
2025-05-21 10:17     ` Dave Chinner
2025-05-26  9:00       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 10/28] check-parallel: use PID namespaces for runner process isolation Dave Chinner
2025-05-07  9:02   ` Nirjhar Roy (IBM)
2025-05-21 10:19     ` Dave Chinner
2025-05-26  9:04       ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 11/28] check-parallel: initial support for specifying device sizes Dave Chinner
2025-05-07 10:05   ` Nirjhar Roy (IBM)
2025-05-21 11:11     ` Dave Chinner
2025-04-17  3:00 ` [PATCH 12/28] config: move config section code to it's own file Dave Chinner
2025-05-09  6:09   ` Nirjhar Roy
2025-05-21 11:28     ` Dave Chinner
2025-04-17  3:00 ` [PATCH 13/28] check-parallel: introduce config file support Dave Chinner
2025-05-09 12:01   ` Nirjhar Roy
2025-05-21 12:23     ` Dave Chinner
2025-04-17  3:00 ` [PATCH 14/28] fstests: further separate sourcing common/rc and common/config from initialisation Dave Chinner
2025-05-10 14:08   ` Nirjhar Roy (IBM)
2025-04-17  3:00 ` [PATCH 15/28] check-parallel: de-batch test execution Dave Chinner
2025-05-09 13:16   ` Nirjhar Roy
2025-04-17  3:00 ` [PATCH 16/28] check-parallel: run sections directly Dave Chinner
2025-05-09 14:03   ` Nirjhar Roy
2025-04-17  3:00 ` [PATCH 17/28] check-parallel: rebuild test list when FSTYP changes Dave Chinner
2025-05-09 16:00   ` Nirjhar Roy
2025-04-17  3:00 ` [PATCH 18/28] check-parallel: create a "results-latest" symlink Dave Chinner
2025-05-10 13:12   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 19/28] check: factor test running Dave Chinner
2025-05-12 13:57   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 20/28] [RFC] check-parallel: run tests directly without using check Dave Chinner
2025-05-13 14:48   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 21/28] generic/531: limit max files per CPU Dave Chinner
2025-05-10 13:15   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 22/28] fsync-tester.c: use syncfs() rather than sync() Dave Chinner
2025-04-30  9:08   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 23/28] open-by-handle.c: " Dave Chinner
2025-04-30  9:02   ` Nirjhar Roy (IBM)
2025-05-21  2:32     ` Dave Chinner
2025-05-26  5:11       ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 24/28] " Dave Chinner
2025-04-30  8:56   ` Nirjhar Roy (IBM)
2025-05-21  2:30     ` Dave Chinner
2025-05-26  4:56       ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 25/28] bulkstat_unlink_test_modified.c: remove unused test code Dave Chinner
2025-04-30  8:47   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 26/28] stale-handle.c: use syncfs() rather than sync() Dave Chinner
2025-04-30  8:34   ` Nirjhar Roy (IBM)
2025-05-21  2:24     ` Dave Chinner
2025-04-17  3:01 ` [PATCH 27/28] scaleread: remove dead test code Dave Chinner
2025-04-30  8:10   ` Nirjhar Roy (IBM)
2025-04-17  3:01 ` [PATCH 28/28] xfs/259: no need to call sync Dave Chinner
2025-04-30  7:56   ` Nirjhar Roy (IBM)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e434a271402602fa7ec5efbcd9d92329ecdd4614.camel@gmail.com \
    --to=nirjhar.roy.lists@gmail.com \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=zlang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox