[PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests
@ 2024-08-29 22:52 Reinette Chatre
  2024-08-29 22:52 ` [PATCH 1/6] selftests/resctrl: Fix sparse warnings Reinette Chatre
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Reinette Chatre @ 2024-08-29 22:52 UTC (permalink / raw)
  To: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	ilpo.jarvinen
  Cc: maciej.wieczor-retman, reinette.chatre, linux-kselftest,
	linux-kernel

The resctrl selftests for Memory Bandwidth Allocation (MBA) and Memory
Bandwidth Monitoring (MBM) are failing on some (for example [1]) Emerald
Rapids systems. The test failures result from the following two
properties of these systems:
1) Emerald Rapids systems can have up to 320MB L3 cache. The resctrl
   MBA and MBM selftests measure memory traffic for which a hardcoded
   250MB buffer has been sufficient so far. On platforms with L3 cache
   larger than the buffer, the buffer fits in the L3 cache and thus
   no/very little memory traffic is generated during the "memory
   bandwidth" tests.
2) Some platform features, for example RAS features or memory
   performance features that generate memory traffic may drive accesses
   that are counted differently by performance counters and MBM
   respectively, for instance generating "overhead" traffic which is not
   counted against any specific RMID. Until now these counting
   differences have always been "in the noise". On Emerald Rapids
   systems the maximum MBA throttling (10% memory bandwidth)
   throttles memory bandwidth to where memory accesses by these other
   platform features push the memory bandwidth difference between
   memory controller performance counters and resctrl (MBM) beyond the
   tests' hardcoded tolerance.

Make the tests more robust against platform variations:
1) Let the buffer used by memory bandwidth tests be guided by the size
   of the L3 cache.
2) Larger buffers require longer initialization time before the buffer can
   be used to measurement. Rework the tests to ensure that buffer
   initialization is complete before measurements start.
3) Do not compare performance counters and MBM measurements at low
   bandwidth. The value of "low" is hardcoded to 750MiB based on
   measurements on Emerald Rapids, Sapphire Rapids, and Ice Lake
   systems. This limit is not applicable to AMD systems since it
   only applies to the MBA and MBM tests that are isolated to Intel.

[1]
https://ark.intel.com/content/www/us/en/ark/products/237261/intel-xeon-platinum-8592-processor-320m-cache-1-9-ghz.html

Reinette Chatre (6):
  selftests/resctrl: Fix sparse warnings
  selftests/resctrl: Ensure measurements skip initialization of default
    benchmark
  selftests/resctrl: Simplify benchmark parameter passing
  selftests/resctrl: Use cache size to determine "fill_buf" buffer size
  selftests/resctrl: Do not compare performance counters and resctrl at
    low bandwidth
  selftests/resctrl: Keep results from first test run

 tools/testing/selftests/resctrl/cmt_test.c    |  33 +--
 tools/testing/selftests/resctrl/fill_buf.c    |  19 +-
 tools/testing/selftests/resctrl/mba_test.c    |  26 +-
 tools/testing/selftests/resctrl/mbm_test.c    |  25 +-
 tools/testing/selftests/resctrl/resctrl.h     |  57 +++--
 .../testing/selftests/resctrl/resctrl_tests.c |  15 +-
 tools/testing/selftests/resctrl/resctrl_val.c | 223 +++++-------------
 7 files changed, 152 insertions(+), 246 deletions(-)

-- 
2.46.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/6] selftests/resctrl: Fix sparse warnings
  2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
@ 2024-08-29 22:52 ` Reinette Chatre
  2024-08-30 10:29   ` Ilpo Järvinen
  2024-08-29 22:52 ` [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark Reinette Chatre
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-08-29 22:52 UTC (permalink / raw)
  To: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	ilpo.jarvinen
  Cc: maciej.wieczor-retman, reinette.chatre, linux-kselftest,
	linux-kernel

Fix following sparse warnings:
 tools/testing/selftests/resctrl/resctrl_val.c:47:6: warning: symbol 'membw_initialize_perf_event_attr' was not declared. Should it be static?
 tools/testing/selftests/resctrl/resctrl_val.c:64:6: warning: symbol 'membw_ioctl_perf_event_ioc_reset_enable' was not declared. Should it be
static?
 tools/testing/selftests/resctrl/resctrl_val.c:70:6: warning: symbol 'membw_ioctl_perf_event_ioc_disable' was not declared. Should it be static?
 tools/testing/selftests/resctrl/resctrl_val.c:81:6: warning: symbol 'get_event_and_umask' was not declared. Should it be static?

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/resctrl_val.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
index 8c275f6b4dd7..70e8e31f5d1a 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -44,7 +44,7 @@ static int imcs;
 static struct imc_counter_config imc_counters_config[MAX_IMCS][2];
 static const struct resctrl_test *current_test;
 
-void membw_initialize_perf_event_attr(int i, int j)
+static void membw_initialize_perf_event_attr(int i, int j)
 {
 	memset(&imc_counters_config[i][j].pe, 0,
 	       sizeof(struct perf_event_attr));
@@ -61,13 +61,13 @@ void membw_initialize_perf_event_attr(int i, int j)
 		PERF_FORMAT_TOTAL_TIME_ENABLED | PERF_FORMAT_TOTAL_TIME_RUNNING;
 }
 
-void membw_ioctl_perf_event_ioc_reset_enable(int i, int j)
+static void membw_ioctl_perf_event_ioc_reset_enable(int i, int j)
 {
 	ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_RESET, 0);
 	ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_ENABLE, 0);
 }
 
-void membw_ioctl_perf_event_ioc_disable(int i, int j)
+static void membw_ioctl_perf_event_ioc_disable(int i, int j)
 {
 	ioctl(imc_counters_config[i][j].fd, PERF_EVENT_IOC_DISABLE, 0);
 }
@@ -78,7 +78,7 @@ void membw_ioctl_perf_event_ioc_disable(int i, int j)
  * @count:		iMC number
  * @op:			Operation (read/write)
  */
-void get_event_and_umask(char *cas_count_cfg, int count, bool op)
+static void get_event_and_umask(char *cas_count_cfg, int count, bool op)
 {
 	char *token[MAX_TOKENS];
 	int i = 0;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/6] selftests/resctrl: Fix sparse warnings
  2024-08-29 22:52 ` [PATCH 1/6] selftests/resctrl: Fix sparse warnings Reinette Chatre
@ 2024-08-30 10:29   ` Ilpo Järvinen
  0 siblings, 0 replies; 30+ messages in thread
From: Ilpo Järvinen @ 2024-08-30 10:29 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 801 bytes --]

On Thu, 29 Aug 2024, Reinette Chatre wrote:

> Fix following sparse warnings:
>  tools/testing/selftests/resctrl/resctrl_val.c:47:6: warning: symbol 'membw_initialize_perf_event_attr' was not declared. Should it be static?
>  tools/testing/selftests/resctrl/resctrl_val.c:64:6: warning: symbol 'membw_ioctl_perf_event_ioc_reset_enable' was not declared. Should it be
> static?
>  tools/testing/selftests/resctrl/resctrl_val.c:70:6: warning: symbol 'membw_ioctl_perf_event_ioc_disable' was not declared. Should it be static?
>  tools/testing/selftests/resctrl/resctrl_val.c:81:6: warning: symbol 'get_event_and_umask' was not declared. Should it be static?
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

--
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
  2024-08-29 22:52 ` [PATCH 1/6] selftests/resctrl: Fix sparse warnings Reinette Chatre
@ 2024-08-29 22:52 ` Reinette Chatre
  2024-08-30 10:56   ` Ilpo Järvinen
  2024-08-29 22:52 ` [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing Reinette Chatre
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-08-29 22:52 UTC (permalink / raw)
  To: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	ilpo.jarvinen
  Cc: maciej.wieczor-retman, reinette.chatre, linux-kselftest,
	linux-kernel

The CMT, MBA, and MBM tests rely on the resctrl_val() wrapper to
start and run a benchmark while providing test specific flows
via callbacks to do test specific configuration and measurements.

At a high level, the resctrl_val() flow is:
	a) Start by fork()ing a child process that installs a signal
	   handler for SIGUSR1 that, on receipt of SIGUSR1, will
	   start running a benchmark.
	b) Assign the child process created in (a) to the resctrl
	   control and monitoring group that dictates the memory and
	   cache allocations with which the process can run and will
	   contain all resctrl monitoring data of that process.
	c) Once parent and child are considered "ready" (determined via
	   a message over a pipe) the parent signals the child (via
	   SIGUSR1) to start the benchmark, waits one second for the
	   benchmark to run, and then starts collecting monitoring data
	   for the tests, potentially also changing allocation
	   configuration depending on the various test callbacks.

A problem with the above flow is the "black box" view of the
benchmark that is combined with an arbitrarily chosen
"wait one second" before measurements start. No matter what
the benchmark does, it is given one second to initialize before
measurements start.

The default benchmark "fill_buf" consists of two parts,
first it prepares a buffer (allocate, initialize, then flush), then it
reads from the buffer (in unpredictable ways) until terminated.
Depending on the system and the size of the buffer, the first "prepare"
part may not be complete by the time the one second delay expires. Test
measurements may thus start before the work needing to be measured runs.

Split the default benchmark into its "prepare" and "runtime" parts and
simplify the resctrl_val() wrapper while doing so. This same split
cannot be done for the user provided benchmark (without a user
interface change), so the current behavior is maintained for user
provided benchmark.

Assign the test itself to the control and monitoring group and run the
"prepare" part of the benchmark in this context, ensuring it runs with
required cache and memory bandwidth allocations. With the benchmark
preparation complete it is only needed to fork() the "runtime" part
of the benchmark (or entire user provided benchmark).

Keep the "wait one second" delay before measurements start. For the
default "fill_buf" benchmark this time now covers only the "runtime"
portion that needs to be measured. For the user provided benchmark this
delay maintains current behavior.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/fill_buf.c    |  19 +-
 tools/testing/selftests/resctrl/resctrl.h     |   2 +-
 tools/testing/selftests/resctrl/resctrl_val.c | 225 ++++++------------
 3 files changed, 70 insertions(+), 176 deletions(-)

diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
index ae120f1735c0..12c71bb44cb6 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -114,7 +114,7 @@ void fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
 	*value_sink = ret;
 }
 
-static void fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
+void fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
 {
 	while (1) {
 		fill_one_span_write(buf, buf_size);
@@ -150,20 +150,3 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush)
 
 	return buf;
 }
-
-int run_fill_buf(size_t buf_size, int memflush, int op, bool once)
-{
-	unsigned char *buf;
-
-	buf = alloc_buffer(buf_size, memflush);
-	if (!buf)
-		return -1;
-
-	if (op == 0)
-		fill_cache_read(buf, buf_size, once);
-	else
-		fill_cache_write(buf, buf_size, once);
-	free(buf);
-
-	return 0;
-}
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index 2dda56084588..0afbc4dd18e4 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -142,7 +142,7 @@ int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu,
 unsigned char *alloc_buffer(size_t buf_size, int memflush);
 void mem_flush(unsigned char *buf, size_t buf_size);
 void fill_cache_read(unsigned char *buf, size_t buf_size, bool once);
-int run_fill_buf(size_t buf_size, int memflush, int op, bool once);
+void fill_cache_write(unsigned char *buf, size_t buf_size, bool once);
 int initialize_mem_bw_imc(void);
 int measure_mem_bw(const struct user_params *uparams,
 		   struct resctrl_val_param *param, pid_t bm_pid,
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
index 70e8e31f5d1a..574b72604f95 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -448,7 +448,7 @@ static int get_mem_bw_resctrl(FILE *fp, unsigned long *mbm_total)
 	return 0;
 }
 
-static pid_t bm_pid, ppid;
+static pid_t bm_pid;
 
 void ctrlc_handler(int signum, siginfo_t *info, void *ptr)
 {
@@ -506,13 +506,6 @@ void signal_handler_unregister(void)
 	}
 }
 
-static void parent_exit(pid_t ppid)
-{
-	kill(ppid, SIGKILL);
-	umount_resctrlfs();
-	exit(EXIT_FAILURE);
-}
-
 /*
  * print_results_bw:	the memory bandwidth results are stored in a file
  * @filename:		file that stores the results
@@ -614,61 +607,6 @@ int measure_mem_bw(const struct user_params *uparams,
 	return ret;
 }
 
-/*
- * run_benchmark - Run a specified benchmark or fill_buf (default benchmark)
- *		   in specified signal. Direct benchmark stdio to /dev/null.
- * @signum:	signal number
- * @info:	signal info
- * @ucontext:	user context in signal handling
- */
-static void run_benchmark(int signum, siginfo_t *info, void *ucontext)
-{
-	int operation, ret, memflush;
-	char **benchmark_cmd;
-	size_t span;
-	bool once;
-	FILE *fp;
-
-	benchmark_cmd = info->si_ptr;
-
-	/*
-	 * Direct stdio of child to /dev/null, so that only parent writes to
-	 * stdio (console)
-	 */
-	fp = freopen("/dev/null", "w", stdout);
-	if (!fp) {
-		ksft_perror("Unable to direct benchmark status to /dev/null");
-		parent_exit(ppid);
-	}
-
-	if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
-		/* Execute default fill_buf benchmark */
-		span = strtoul(benchmark_cmd[1], NULL, 10);
-		memflush =  atoi(benchmark_cmd[2]);
-		operation = atoi(benchmark_cmd[3]);
-		if (!strcmp(benchmark_cmd[4], "true")) {
-			once = true;
-		} else if (!strcmp(benchmark_cmd[4], "false")) {
-			once = false;
-		} else {
-			ksft_print_msg("Invalid once parameter\n");
-			parent_exit(ppid);
-		}
-
-		if (run_fill_buf(span, memflush, operation, once))
-			fprintf(stderr, "Error in running fill buffer\n");
-	} else {
-		/* Execute specified benchmark */
-		ret = execvp(benchmark_cmd[0], benchmark_cmd);
-		if (ret)
-			ksft_perror("execvp");
-	}
-
-	fclose(stdout);
-	ksft_print_msg("Unable to run specified benchmark\n");
-	parent_exit(ppid);
-}
-
 /*
  * resctrl_val:	execute benchmark and measure memory bandwidth on
  *			the benchmark
@@ -684,11 +622,13 @@ int resctrl_val(const struct resctrl_test *test,
 		const char * const *benchmark_cmd,
 		struct resctrl_val_param *param)
 {
-	struct sigaction sigact;
-	int ret = 0, pipefd[2];
-	char pipe_message = 0;
-	union sigval value;
-	int domain_id;
+	int domain_id, operation = 0, memflush = 1;
+	size_t span = DEFAULT_SPAN;
+	unsigned char *buf = NULL;
+	cpu_set_t old_affinity;
+	bool once = false;
+	int ret = 0;
+	pid_t ppid;
 
 	if (strcmp(param->filename, "") == 0)
 		sprintf(param->filename, "stdio");
@@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test *test,
 		return ret;
 	}
 
-	/*
-	 * If benchmark wasn't successfully started by child, then child should
-	 * kill parent, so save parent's pid
-	 */
 	ppid = getpid();
 
-	if (pipe(pipefd)) {
-		ksft_perror("Unable to create pipe");
+	/* Taskset test to specified CPU. */
+	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
+	if (ret)
+		return ret;
 
-		return -1;
+	/* Write test to specified control & monitoring group in resctrl FS. */
+	ret = write_bm_pid_to_resctrl(ppid, param->ctrlgrp, param->mongrp);
+	if (ret)
+		goto reset_affinity;
+
+	if (param->init) {
+		ret = param->init(param, domain_id);
+		if (ret)
+			goto reset_affinity;
 	}
 
 	/*
-	 * Fork to start benchmark, save child's pid so that it can be killed
-	 * when needed
+	 * If not running user provided benchmark, run the default
+	 * "fill_buf". First phase of "fill_buf" is to prepare the
+	 * buffer that the benchmark will operate on. No measurements
+	 * are needed during this phase and prepared memory will be
+	 * passed to next part of benchmark via copy-on-write. TBD
+	 * how this impacts "write" benchmark, but no test currently
+	 * uses this.
 	 */
-	fflush(stdout);
+	if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
+		span = strtoul(benchmark_cmd[1], NULL, 10);
+		memflush =  atoi(benchmark_cmd[2]);
+		operation = atoi(benchmark_cmd[3]);
+		if (!strcmp(benchmark_cmd[4], "true")) {
+			once = true;
+		} else if (!strcmp(benchmark_cmd[4], "false")) {
+			once = false;
+		} else {
+			ksft_print_msg("Invalid once parameter\n");
+			ret = -EINVAL;
+			goto reset_affinity;
+		}
+
+		buf = alloc_buffer(span, memflush);
+		if (!buf) {
+			ret = -ENOMEM;
+			goto reset_affinity;
+		}
+	}
+
 	bm_pid = fork();
 	if (bm_pid == -1) {
+		ret = -errno;
 		ksft_perror("Unable to fork");
-
-		return -1;
+		goto free_buf;
 	}
 
+	/*
+	 * What needs to be measured runs in separate process until
+	 * terminated.
+	 */
 	if (bm_pid == 0) {
-		/*
-		 * Mask all signals except SIGUSR1, parent uses SIGUSR1 to
-		 * start benchmark
-		 */
-		sigfillset(&sigact.sa_mask);
-		sigdelset(&sigact.sa_mask, SIGUSR1);
-
-		sigact.sa_sigaction = run_benchmark;
-		sigact.sa_flags = SA_SIGINFO;
-
-		/* Register for "SIGUSR1" signal from parent */
-		if (sigaction(SIGUSR1, &sigact, NULL)) {
-			ksft_perror("Can't register child for signal");
-			parent_exit(ppid);
+		if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
+			if (operation == 0)
+				fill_cache_read(buf, span, once);
+			else
+				fill_cache_write(buf, span, once);
+		} else {
+			execvp(benchmark_cmd[0], (char **)benchmark_cmd);
 		}
-
-		/* Tell parent that child is ready */
-		close(pipefd[0]);
-		pipe_message = 1;
-		if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
-		    sizeof(pipe_message)) {
-			ksft_perror("Failed signaling parent process");
-			close(pipefd[1]);
-			return -1;
-		}
-		close(pipefd[1]);
-
-		/* Suspend child until delivery of "SIGUSR1" from parent */
-		sigsuspend(&sigact.sa_mask);
-
-		ksft_perror("Child is done");
-		parent_exit(ppid);
+		exit(EXIT_SUCCESS);
 	}
 
 	ksft_print_msg("Benchmark PID: %d\n", (int)bm_pid);
 
-	/*
-	 * The cast removes constness but nothing mutates benchmark_cmd within
-	 * the context of this process. At the receiving process, it becomes
-	 * argv, which is mutable, on exec() but that's after fork() so it
-	 * doesn't matter for the process running the tests.
-	 */
-	value.sival_ptr = (void *)benchmark_cmd;
-
-	/* Taskset benchmark to specified cpu */
-	ret = taskset_benchmark(bm_pid, uparams->cpu, NULL);
-	if (ret)
-		goto out;
-
-	/* Write benchmark to specified control&monitoring grp in resctrl FS */
-	ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp);
-	if (ret)
-		goto out;
-
-	if (param->init) {
-		ret = param->init(param, domain_id);
-		if (ret)
-			goto out;
-	}
-
-	/* Parent waits for child to be ready. */
-	close(pipefd[1]);
-	while (pipe_message != 1) {
-		if (read(pipefd[0], &pipe_message, sizeof(pipe_message)) <
-		    sizeof(pipe_message)) {
-			ksft_perror("Failed reading message from child process");
-			close(pipefd[0]);
-			goto out;
-		}
-	}
-	close(pipefd[0]);
-
-	/* Signal child to start benchmark */
-	if (sigqueue(bm_pid, SIGUSR1, value) == -1) {
-		ksft_perror("sigqueue SIGUSR1 to child");
-		ret = -1;
-		goto out;
-	}
-
-	/* Give benchmark enough time to fully run */
+	/* Give benchmark enough time to fully run. */
 	sleep(1);
 
 	/* Test runs until the callback setup() tells the test to stop. */
@@ -821,8 +730,10 @@ int resctrl_val(const struct resctrl_test *test,
 			break;
 	}
 
-out:
 	kill(bm_pid, SIGKILL);
-
+free_buf:
+	free(buf);
+reset_affinity:
+	taskset_restore(ppid, &old_affinity);
 	return ret;
 }
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-08-29 22:52 ` [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark Reinette Chatre
@ 2024-08-30 10:56   ` Ilpo Järvinen
  2024-08-30 16:00     ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-08-30 10:56 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

On Thu, 29 Aug 2024, Reinette Chatre wrote:

> The CMT, MBA, and MBM tests rely on the resctrl_val() wrapper to
> start and run a benchmark while providing test specific flows
> via callbacks to do test specific configuration and measurements.
> 
> At a high level, the resctrl_val() flow is:
> 	a) Start by fork()ing a child process that installs a signal
> 	   handler for SIGUSR1 that, on receipt of SIGUSR1, will
> 	   start running a benchmark.
> 	b) Assign the child process created in (a) to the resctrl
> 	   control and monitoring group that dictates the memory and
> 	   cache allocations with which the process can run and will
> 	   contain all resctrl monitoring data of that process.
> 	c) Once parent and child are considered "ready" (determined via
> 	   a message over a pipe) the parent signals the child (via
> 	   SIGUSR1) to start the benchmark, waits one second for the
> 	   benchmark to run, and then starts collecting monitoring data
> 	   for the tests, potentially also changing allocation
> 	   configuration depending on the various test callbacks.
> 
> A problem with the above flow is the "black box" view of the
> benchmark that is combined with an arbitrarily chosen
> "wait one second" before measurements start. No matter what
> the benchmark does, it is given one second to initialize before
> measurements start.
> 
> The default benchmark "fill_buf" consists of two parts,
> first it prepares a buffer (allocate, initialize, then flush), then it
> reads from the buffer (in unpredictable ways) until terminated.
> Depending on the system and the size of the buffer, the first "prepare"
> part may not be complete by the time the one second delay expires. Test
> measurements may thus start before the work needing to be measured runs.
> 
> Split the default benchmark into its "prepare" and "runtime" parts and
> simplify the resctrl_val() wrapper while doing so. This same split
> cannot be done for the user provided benchmark (without a user
> interface change), so the current behavior is maintained for user
> provided benchmark.
> 
> Assign the test itself to the control and monitoring group and run the
> "prepare" part of the benchmark in this context, ensuring it runs with
> required cache and memory bandwidth allocations. With the benchmark
> preparation complete it is only needed to fork() the "runtime" part
> of the benchmark (or entire user provided benchmark).
> 
> Keep the "wait one second" delay before measurements start. For the
> default "fill_buf" benchmark this time now covers only the "runtime"
> portion that needs to be measured. For the user provided benchmark this
> delay maintains current behavior.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  tools/testing/selftests/resctrl/fill_buf.c    |  19 +-
>  tools/testing/selftests/resctrl/resctrl.h     |   2 +-
>  tools/testing/selftests/resctrl/resctrl_val.c | 225 ++++++------------
>  3 files changed, 70 insertions(+), 176 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
> index ae120f1735c0..12c71bb44cb6 100644
> --- a/tools/testing/selftests/resctrl/fill_buf.c
> +++ b/tools/testing/selftests/resctrl/fill_buf.c
> @@ -114,7 +114,7 @@ void fill_cache_read(unsigned char *buf, size_t buf_size, bool once)
>  	*value_sink = ret;
>  }
>  
> -static void fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
> +void fill_cache_write(unsigned char *buf, size_t buf_size, bool once)
>  {
>  	while (1) {
>  		fill_one_span_write(buf, buf_size);
> @@ -150,20 +150,3 @@ unsigned char *alloc_buffer(size_t buf_size, int memflush)
>  
>  	return buf;
>  }
> -
> -int run_fill_buf(size_t buf_size, int memflush, int op, bool once)
> -{
> -	unsigned char *buf;
> -
> -	buf = alloc_buffer(buf_size, memflush);
> -	if (!buf)
> -		return -1;
> -
> -	if (op == 0)
> -		fill_cache_read(buf, buf_size, once);
> -	else
> -		fill_cache_write(buf, buf_size, once);
> -	free(buf);
> -
> -	return 0;
> -}
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index 2dda56084588..0afbc4dd18e4 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -142,7 +142,7 @@ int perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu,
>  unsigned char *alloc_buffer(size_t buf_size, int memflush);
>  void mem_flush(unsigned char *buf, size_t buf_size);
>  void fill_cache_read(unsigned char *buf, size_t buf_size, bool once);
> -int run_fill_buf(size_t buf_size, int memflush, int op, bool once);
> +void fill_cache_write(unsigned char *buf, size_t buf_size, bool once);
>  int initialize_mem_bw_imc(void);
>  int measure_mem_bw(const struct user_params *uparams,
>  		   struct resctrl_val_param *param, pid_t bm_pid,
> diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
> index 70e8e31f5d1a..574b72604f95 100644
> --- a/tools/testing/selftests/resctrl/resctrl_val.c
> +++ b/tools/testing/selftests/resctrl/resctrl_val.c
> @@ -448,7 +448,7 @@ static int get_mem_bw_resctrl(FILE *fp, unsigned long *mbm_total)
>  	return 0;
>  }
>  
> -static pid_t bm_pid, ppid;
> +static pid_t bm_pid;
>  
>  void ctrlc_handler(int signum, siginfo_t *info, void *ptr)
>  {
> @@ -506,13 +506,6 @@ void signal_handler_unregister(void)
>  	}
>  }
>  
> -static void parent_exit(pid_t ppid)
> -{
> -	kill(ppid, SIGKILL);
> -	umount_resctrlfs();
> -	exit(EXIT_FAILURE);
> -}
> -
>  /*
>   * print_results_bw:	the memory bandwidth results are stored in a file
>   * @filename:		file that stores the results
> @@ -614,61 +607,6 @@ int measure_mem_bw(const struct user_params *uparams,
>  	return ret;
>  }
>  
> -/*
> - * run_benchmark - Run a specified benchmark or fill_buf (default benchmark)
> - *		   in specified signal. Direct benchmark stdio to /dev/null.
> - * @signum:	signal number
> - * @info:	signal info
> - * @ucontext:	user context in signal handling
> - */
> -static void run_benchmark(int signum, siginfo_t *info, void *ucontext)
> -{
> -	int operation, ret, memflush;
> -	char **benchmark_cmd;
> -	size_t span;
> -	bool once;
> -	FILE *fp;
> -
> -	benchmark_cmd = info->si_ptr;
> -
> -	/*
> -	 * Direct stdio of child to /dev/null, so that only parent writes to
> -	 * stdio (console)
> -	 */
> -	fp = freopen("/dev/null", "w", stdout);
> -	if (!fp) {
> -		ksft_perror("Unable to direct benchmark status to /dev/null");
> -		parent_exit(ppid);
> -	}
> -
> -	if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
> -		/* Execute default fill_buf benchmark */
> -		span = strtoul(benchmark_cmd[1], NULL, 10);
> -		memflush =  atoi(benchmark_cmd[2]);
> -		operation = atoi(benchmark_cmd[3]);
> -		if (!strcmp(benchmark_cmd[4], "true")) {
> -			once = true;
> -		} else if (!strcmp(benchmark_cmd[4], "false")) {
> -			once = false;
> -		} else {
> -			ksft_print_msg("Invalid once parameter\n");
> -			parent_exit(ppid);
> -		}
> -
> -		if (run_fill_buf(span, memflush, operation, once))
> -			fprintf(stderr, "Error in running fill buffer\n");
> -	} else {
> -		/* Execute specified benchmark */
> -		ret = execvp(benchmark_cmd[0], benchmark_cmd);
> -		if (ret)
> -			ksft_perror("execvp");
> -	}
> -
> -	fclose(stdout);
> -	ksft_print_msg("Unable to run specified benchmark\n");
> -	parent_exit(ppid);
> -}
> -
>  /*
>   * resctrl_val:	execute benchmark and measure memory bandwidth on
>   *			the benchmark
> @@ -684,11 +622,13 @@ int resctrl_val(const struct resctrl_test *test,
>  		const char * const *benchmark_cmd,
>  		struct resctrl_val_param *param)
>  {
> -	struct sigaction sigact;
> -	int ret = 0, pipefd[2];
> -	char pipe_message = 0;
> -	union sigval value;
> -	int domain_id;
> +	int domain_id, operation = 0, memflush = 1;
> +	size_t span = DEFAULT_SPAN;
> +	unsigned char *buf = NULL;
> +	cpu_set_t old_affinity;
> +	bool once = false;
> +	int ret = 0;
> +	pid_t ppid;
>  
>  	if (strcmp(param->filename, "") == 0)
>  		sprintf(param->filename, "stdio");
> @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test *test,
>  		return ret;
>  	}
>  
> -	/*
> -	 * If benchmark wasn't successfully started by child, then child should
> -	 * kill parent, so save parent's pid
> -	 */
>  	ppid = getpid();
>  
> -	if (pipe(pipefd)) {
> -		ksft_perror("Unable to create pipe");
> +	/* Taskset test to specified CPU. */
> +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);

Previously only CPU affinity for bm_pid was set but now it's set before 
fork(). Quickly checking the Internet, it seems that CPU affinity gets
inherited on fork() so now both processes will have the same affinity
which might make the other process to interfere with the measurement.

> +	if (ret)
> +		return ret;
>  
> -		return -1;
> +	/* Write test to specified control & monitoring group in resctrl FS. */
> +	ret = write_bm_pid_to_resctrl(ppid, param->ctrlgrp, param->mongrp);

Previously, this was done for bm_pid but now it's done for the parent. I'm 
not sure how inheritance goes with resctrl on fork(), will the forked PID 
get added to the list of PIDs or not? You probably know the answer :-).
Neither behavior, however, seems to result in the intended behavior as we 
either get interfering processes (if inherited) or no desired resctrl 
setup for the benchmark process.

> +	if (ret)
> +		goto reset_affinity;
> +
> +	if (param->init) {
> +		ret = param->init(param, domain_id);
> +		if (ret)
> +			goto reset_affinity;
>  	}
>  
>  	/*
> -	 * Fork to start benchmark, save child's pid so that it can be killed
> -	 * when needed
> +	 * If not running user provided benchmark, run the default
> +	 * "fill_buf". First phase of "fill_buf" is to prepare the
> +	 * buffer that the benchmark will operate on. No measurements
> +	 * are needed during this phase and prepared memory will be
> +	 * passed to next part of benchmark via copy-on-write. TBD
> +	 * how this impacts "write" benchmark, but no test currently
> +	 * uses this.
>  	 */
> -	fflush(stdout);

Please don't remove fflush() in front of fork() as it leads to duplicating 
messages.

-- 
 i.

> +	if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
> +		span = strtoul(benchmark_cmd[1], NULL, 10);
> +		memflush =  atoi(benchmark_cmd[2]);
> +		operation = atoi(benchmark_cmd[3]);
> +		if (!strcmp(benchmark_cmd[4], "true")) {
> +			once = true;
> +		} else if (!strcmp(benchmark_cmd[4], "false")) {
> +			once = false;
> +		} else {
> +			ksft_print_msg("Invalid once parameter\n");
> +			ret = -EINVAL;
> +			goto reset_affinity;
> +		}
> +
> +		buf = alloc_buffer(span, memflush);
> +		if (!buf) {
> +			ret = -ENOMEM;
> +			goto reset_affinity;
> +		}
> +	}
> +
>  	bm_pid = fork();
>  	if (bm_pid == -1) {
> +		ret = -errno;
>  		ksft_perror("Unable to fork");
> -
> -		return -1;
> +		goto free_buf;
>  	}
>  
> +	/*
> +	 * What needs to be measured runs in separate process until
> +	 * terminated.
> +	 */
>  	if (bm_pid == 0) {
> -		/*
> -		 * Mask all signals except SIGUSR1, parent uses SIGUSR1 to
> -		 * start benchmark
> -		 */
> -		sigfillset(&sigact.sa_mask);
> -		sigdelset(&sigact.sa_mask, SIGUSR1);
> -
> -		sigact.sa_sigaction = run_benchmark;
> -		sigact.sa_flags = SA_SIGINFO;
> -
> -		/* Register for "SIGUSR1" signal from parent */
> -		if (sigaction(SIGUSR1, &sigact, NULL)) {
> -			ksft_perror("Can't register child for signal");
> -			parent_exit(ppid);
> +		if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
> +			if (operation == 0)
> +				fill_cache_read(buf, span, once);
> +			else
> +				fill_cache_write(buf, span, once);
> +		} else {
> +			execvp(benchmark_cmd[0], (char **)benchmark_cmd);
>  		}
> -
> -		/* Tell parent that child is ready */
> -		close(pipefd[0]);
> -		pipe_message = 1;
> -		if (write(pipefd[1], &pipe_message, sizeof(pipe_message)) <
> -		    sizeof(pipe_message)) {
> -			ksft_perror("Failed signaling parent process");
> -			close(pipefd[1]);
> -			return -1;
> -		}
> -		close(pipefd[1]);
> -
> -		/* Suspend child until delivery of "SIGUSR1" from parent */
> -		sigsuspend(&sigact.sa_mask);
> -
> -		ksft_perror("Child is done");
> -		parent_exit(ppid);
> +		exit(EXIT_SUCCESS);
>  	}
>  
>  	ksft_print_msg("Benchmark PID: %d\n", (int)bm_pid);
>  
> -	/*
> -	 * The cast removes constness but nothing mutates benchmark_cmd within
> -	 * the context of this process. At the receiving process, it becomes
> -	 * argv, which is mutable, on exec() but that's after fork() so it
> -	 * doesn't matter for the process running the tests.
> -	 */
> -	value.sival_ptr = (void *)benchmark_cmd;
> -
> -	/* Taskset benchmark to specified cpu */
> -	ret = taskset_benchmark(bm_pid, uparams->cpu, NULL);
> -	if (ret)
> -		goto out;
> -
> -	/* Write benchmark to specified control&monitoring grp in resctrl FS */
> -	ret = write_bm_pid_to_resctrl(bm_pid, param->ctrlgrp, param->mongrp);
> -	if (ret)
> -		goto out;
> -
> -	if (param->init) {
> -		ret = param->init(param, domain_id);
> -		if (ret)
> -			goto out;
> -	}
> -
> -	/* Parent waits for child to be ready. */
> -	close(pipefd[1]);
> -	while (pipe_message != 1) {
> -		if (read(pipefd[0], &pipe_message, sizeof(pipe_message)) <
> -		    sizeof(pipe_message)) {
> -			ksft_perror("Failed reading message from child process");
> -			close(pipefd[0]);
> -			goto out;
> -		}
> -	}
> -	close(pipefd[0]);
> -
> -	/* Signal child to start benchmark */
> -	if (sigqueue(bm_pid, SIGUSR1, value) == -1) {
> -		ksft_perror("sigqueue SIGUSR1 to child");
> -		ret = -1;
> -		goto out;
> -	}
> -
> -	/* Give benchmark enough time to fully run */
> +	/* Give benchmark enough time to fully run. */
>  	sleep(1);
>  
>  	/* Test runs until the callback setup() tells the test to stop. */
> @@ -821,8 +730,10 @@ int resctrl_val(const struct resctrl_test *test,
>  			break;
>  	}
>  
> -out:
>  	kill(bm_pid, SIGKILL);
> -
> +free_buf:
> +	free(buf);
> +reset_affinity:
> +	taskset_restore(ppid, &old_affinity);
>  	return ret;
>  }
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-08-30 10:56   ` Ilpo Järvinen
@ 2024-08-30 16:00     ` Reinette Chatre
  2024-09-04 11:57       ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-08-30 16:00 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

Thank you for taking a look.

On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
> On Thu, 29 Aug 2024, Reinette Chatre wrote:
> 

...

>> @@ -684,11 +622,13 @@ int resctrl_val(const struct resctrl_test *test,
>>   		const char * const *benchmark_cmd,
>>   		struct resctrl_val_param *param)
>>   {
>> -	struct sigaction sigact;
>> -	int ret = 0, pipefd[2];
>> -	char pipe_message = 0;
>> -	union sigval value;
>> -	int domain_id;
>> +	int domain_id, operation = 0, memflush = 1;
>> +	size_t span = DEFAULT_SPAN;
>> +	unsigned char *buf = NULL;
>> +	cpu_set_t old_affinity;
>> +	bool once = false;
>> +	int ret = 0;
>> +	pid_t ppid;
>>   
>>   	if (strcmp(param->filename, "") == 0)
>>   		sprintf(param->filename, "stdio");
>> @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test *test,
>>   		return ret;
>>   	}
>>   
>> -	/*
>> -	 * If benchmark wasn't successfully started by child, then child should
>> -	 * kill parent, so save parent's pid
>> -	 */
>>   	ppid = getpid();
>>   
>> -	if (pipe(pipefd)) {
>> -		ksft_perror("Unable to create pipe");
>> +	/* Taskset test to specified CPU. */
>> +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
> 
> Previously only CPU affinity for bm_pid was set but now it's set before
> fork(). Quickly checking the Internet, it seems that CPU affinity gets
> inherited on fork() so now both processes will have the same affinity
> which might make the other process to interfere with the measurement.

Setting the affinity is intended to ensure that the buffer preparation
occurs in the same topology as where the runtime portion will run.
This preparation is done before the work to be measured starts.

This does tie in with the association with the resctrl group and I
will elaborate more below ...

> 
>> +	if (ret)
>> +		return ret;
>>   
>> -		return -1;
>> +	/* Write test to specified control & monitoring group in resctrl FS. */
>> +	ret = write_bm_pid_to_resctrl(ppid, param->ctrlgrp, param->mongrp);
> 
> Previously, this was done for bm_pid but now it's done for the parent. I'm
> not sure how inheritance goes with resctrl on fork(), will the forked PID
> get added to the list of PIDs or not? You probably know the answer :-).

Yes. A process fork()ed will land in the same resctrl group as its parent.

> Neither behavior, however, seems to result in the intended behavior as we
> either get interfering processes (if inherited) or no desired resctrl
> setup for the benchmark process.

There are two processes to consider in the resource group, the parent (that
sets up the buffer and does the measurements) and the child (that runs the
workload to be measured). Thanks to your commit da50de0a92f3 ("selftests/resctrl:
Calculate resctrl FS derived mem bw over sleep(1) only") the parent
will be sleeping while the child runs its workload and there is no
other interference I am aware of. The only additional measurements
that I can see would be the work needed to actually start and stop the
measurements and from what I can tell this falls into the noise.

Please do keep in mind that the performance counters used, iMC, cannot actually
be bound to a single CPU since it is a per-socket PMU. The measurements have
thus never been as fine grained as the code pretends it to be.

>> +	if (ret)
>> +		goto reset_affinity;
>> +
>> +	if (param->init) {
>> +		ret = param->init(param, domain_id);
>> +		if (ret)
>> +			goto reset_affinity;
>>   	}
>>   
>>   	/*
>> -	 * Fork to start benchmark, save child's pid so that it can be killed
>> -	 * when needed
>> +	 * If not running user provided benchmark, run the default
>> +	 * "fill_buf". First phase of "fill_buf" is to prepare the
>> +	 * buffer that the benchmark will operate on. No measurements
>> +	 * are needed during this phase and prepared memory will be
>> +	 * passed to next part of benchmark via copy-on-write. TBD
>> +	 * how this impacts "write" benchmark, but no test currently
>> +	 * uses this.
>>   	 */
>> -	fflush(stdout);
> 
> Please don't remove fflush() in front of fork() as it leads to duplicating
> messages.
> 

Indeed. Shaopeng just fixed this for us. Thank you!.

Reinette


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-08-30 16:00     ` Reinette Chatre
@ 2024-09-04 11:57       ` Ilpo Järvinen
  2024-09-04 21:15         ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-04 11:57 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 4331 bytes --]

On Fri, 30 Aug 2024, Reinette Chatre wrote:
> 
> Thank you for taking a look.
> 
> On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
> > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > 
> 
> ...
> 
> > > @@ -684,11 +622,13 @@ int resctrl_val(const struct resctrl_test *test,
> > >   		const char * const *benchmark_cmd,
> > >   		struct resctrl_val_param *param)
> > >   {
> > > -	struct sigaction sigact;
> > > -	int ret = 0, pipefd[2];
> > > -	char pipe_message = 0;
> > > -	union sigval value;
> > > -	int domain_id;
> > > +	int domain_id, operation = 0, memflush = 1;
> > > +	size_t span = DEFAULT_SPAN;
> > > +	unsigned char *buf = NULL;
> > > +	cpu_set_t old_affinity;
> > > +	bool once = false;
> > > +	int ret = 0;
> > > +	pid_t ppid;
> > >     	if (strcmp(param->filename, "") == 0)
> > >   		sprintf(param->filename, "stdio");
> > > @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test *test,
> > >   		return ret;
> > >   	}
> > >   -	/*
> > > -	 * If benchmark wasn't successfully started by child, then child
> > > should
> > > -	 * kill parent, so save parent's pid
> > > -	 */
> > >   	ppid = getpid();
> > >   -	if (pipe(pipefd)) {
> > > -		ksft_perror("Unable to create pipe");
> > > +	/* Taskset test to specified CPU. */
> > > +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
> > 
> > Previously only CPU affinity for bm_pid was set but now it's set before
> > fork(). Quickly checking the Internet, it seems that CPU affinity gets
> > inherited on fork() so now both processes will have the same affinity
> > which might make the other process to interfere with the measurement.
> 
> Setting the affinity is intended to ensure that the buffer preparation
> occurs in the same topology as where the runtime portion will run.
> This preparation is done before the work to be measured starts.
> 
> This does tie in with the association with the resctrl group and I
> will elaborate more below ...

Okay, that's useful to retain but thinking this further, now we're also 
going do non-trivial amount of work in between the setup and the test by 
forking. I guess that doesn't matter for memflush = true case but might be 
meaningful for the memflush = false case that seems to be there to allow 
keeping caches hot (I personally haven't thought how to use "caches hot" 
test but we do have that capability by the fact that memflush paremeter 
exists). 

> > > +	if (ret)
> > > +		return ret;
> > >   -		return -1;
> > > +	/* Write test to specified control & monitoring group in resctrl FS.
> > > */
> > > +	ret = write_bm_pid_to_resctrl(ppid, param->ctrlgrp, param->mongrp);
> > 
> > Previously, this was done for bm_pid but now it's done for the parent. I'm
> > not sure how inheritance goes with resctrl on fork(), will the forked PID
> > get added to the list of PIDs or not? You probably know the answer :-).
> 
> Yes. A process fork()ed will land in the same resctrl group as its parent.
> 
> > Neither behavior, however, seems to result in the intended behavior as we
> > either get interfering processes (if inherited) or no desired resctrl
> > setup for the benchmark process.
> 
> There are two processes to consider in the resource group, the parent (that
> sets up the buffer and does the measurements) and the child (that runs the
> workload to be measured). Thanks to your commit da50de0a92f3
> ("selftests/resctrl:
> Calculate resctrl FS derived mem bw over sleep(1) only") the parent
> will be sleeping while the child runs its workload and there is no
> other interference I am aware of. The only additional measurements
> that I can see would be the work needed to actually start and stop the
> measurements and from what I can tell this falls into the noise.
> 
> Please do keep in mind that the performance counters used, iMC, cannot
> actually
> be bound to a single CPU since it is a per-socket PMU. The measurements have
> thus never been as fine grained as the code pretends it to be.

I was thinking if I should note the amount of work is small. Maybe it's 
fine to leave that noise there and I'm just overly cautious :-), when I 
used to do networking research in the past life, I wanted to eliminate as
much noise sources so I guess it comes from that background.

-- 
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-09-04 11:57       ` Ilpo Järvinen
@ 2024-09-04 21:15         ` Reinette Chatre
  2024-09-05 12:10           ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-09-04 21:15 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>
>> Thank you for taking a look.
>>
>> On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>
>>
>> ...
>>
>>>> @@ -684,11 +622,13 @@ int resctrl_val(const struct resctrl_test *test,
>>>>    		const char * const *benchmark_cmd,
>>>>    		struct resctrl_val_param *param)
>>>>    {
>>>> -	struct sigaction sigact;
>>>> -	int ret = 0, pipefd[2];
>>>> -	char pipe_message = 0;
>>>> -	union sigval value;
>>>> -	int domain_id;
>>>> +	int domain_id, operation = 0, memflush = 1;
>>>> +	size_t span = DEFAULT_SPAN;
>>>> +	unsigned char *buf = NULL;
>>>> +	cpu_set_t old_affinity;
>>>> +	bool once = false;
>>>> +	int ret = 0;
>>>> +	pid_t ppid;
>>>>      	if (strcmp(param->filename, "") == 0)
>>>>    		sprintf(param->filename, "stdio");
>>>> @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test *test,
>>>>    		return ret;
>>>>    	}
>>>>    -	/*
>>>> -	 * If benchmark wasn't successfully started by child, then child
>>>> should
>>>> -	 * kill parent, so save parent's pid
>>>> -	 */
>>>>    	ppid = getpid();
>>>>    -	if (pipe(pipefd)) {
>>>> -		ksft_perror("Unable to create pipe");
>>>> +	/* Taskset test to specified CPU. */
>>>> +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
>>>
>>> Previously only CPU affinity for bm_pid was set but now it's set before
>>> fork(). Quickly checking the Internet, it seems that CPU affinity gets
>>> inherited on fork() so now both processes will have the same affinity
>>> which might make the other process to interfere with the measurement.
>>
>> Setting the affinity is intended to ensure that the buffer preparation
>> occurs in the same topology as where the runtime portion will run.
>> This preparation is done before the work to be measured starts.
>>
>> This does tie in with the association with the resctrl group and I
>> will elaborate more below ...
> 
> Okay, that's useful to retain but thinking this further, now we're also
> going do non-trivial amount of work in between the setup and the test by

Could you please elaborate how the amount of work during setup can be an
issue? I have been focused on the measurements that are done afterwards
that do have clear boundaries from what I can tell.

> forking. I guess that doesn't matter for memflush = true case but might be
> meaningful for the memflush = false case that seems to be there to allow
> keeping caches hot (I personally haven't thought how to use "caches hot"
> test but we do have that capability by the fact that memflush paremeter
> exists).

I believe that memflush = true will always be needed/used by the tests
relying on memory bandwidth measurement since that reduces cache hits during
measurement phase and avoids the additional guessing on how long the workload
should be run before reliable/consistent measurements can start.

Thinking about the memflush = false case I now think that we should use that
for the CMT test. The buffer is allocated and initialized while the task
is configured with appropriate allocation limits so there should not be a
reason to flush the buffer from the cache. In fact, flushing the cache introduces
the requirement to guess the workload's "settle" time (time to allocate the buffer
into the cache again) before its occupancy can be measured. As a quick test I
set memflush = false on one system and it brought down the average diff between
the cache portion size and the occupancy counts. I'll try it out on a few more
systems to confirm.
  
>>>> +	if (ret)
>>>> +		return ret;
>>>>    -		return -1;
>>>> +	/* Write test to specified control & monitoring group in resctrl FS.
>>>> */
>>>> +	ret = write_bm_pid_to_resctrl(ppid, param->ctrlgrp, param->mongrp);
>>>
>>> Previously, this was done for bm_pid but now it's done for the parent. I'm
>>> not sure how inheritance goes with resctrl on fork(), will the forked PID
>>> get added to the list of PIDs or not? You probably know the answer :-).
>>
>> Yes. A process fork()ed will land in the same resctrl group as its parent.
>>
>>> Neither behavior, however, seems to result in the intended behavior as we
>>> either get interfering processes (if inherited) or no desired resctrl
>>> setup for the benchmark process.
>>
>> There are two processes to consider in the resource group, the parent (that
>> sets up the buffer and does the measurements) and the child (that runs the
>> workload to be measured). Thanks to your commit da50de0a92f3
>> ("selftests/resctrl:
>> Calculate resctrl FS derived mem bw over sleep(1) only") the parent
>> will be sleeping while the child runs its workload and there is no
>> other interference I am aware of. The only additional measurements
>> that I can see would be the work needed to actually start and stop the
>> measurements and from what I can tell this falls into the noise.
>>
>> Please do keep in mind that the performance counters used, iMC, cannot
>> actually
>> be bound to a single CPU since it is a per-socket PMU. The measurements have
>> thus never been as fine grained as the code pretends it to be.
> 
> I was thinking if I should note the amount of work is small. Maybe it's
> fine to leave that noise there and I'm just overly cautious :-), when I
> used to do networking research in the past life, I wanted to eliminate as
> much noise sources so I guess it comes from that background.

The goal of these tests are to verify *resctrl*, these are not intended to be
hardware validation tests. I think it would be better for resctrl if more time
is spent on functional tests of resctrl than these performance tests.

Reinette

  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-09-04 21:15         ` Reinette Chatre
@ 2024-09-05 12:10           ` Ilpo Järvinen
  2024-09-05 18:08             ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-05 12:10 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 6250 bytes --]

On Wed, 4 Sep 2024, Reinette Chatre wrote:
> On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
> > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
> > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:

> > > > > @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test
> > > > > *test,
> > > > >    		return ret;
> > > > >    	}
> > > > >    -	/*
> > > > > -	 * If benchmark wasn't successfully started by child, then
> > > > > child
> > > > > should
> > > > > -	 * kill parent, so save parent's pid
> > > > > -	 */
> > > > >    	ppid = getpid();
> > > > >    -	if (pipe(pipefd)) {
> > > > > -		ksft_perror("Unable to create pipe");
> > > > > +	/* Taskset test to specified CPU. */
> > > > > +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
> > > > 
> > > > Previously only CPU affinity for bm_pid was set but now it's set before
> > > > fork(). Quickly checking the Internet, it seems that CPU affinity gets
> > > > inherited on fork() so now both processes will have the same affinity
> > > > which might make the other process to interfere with the measurement.
> > > 
> > > Setting the affinity is intended to ensure that the buffer preparation
> > > occurs in the same topology as where the runtime portion will run.
> > > This preparation is done before the work to be measured starts.
> > > 
> > > This does tie in with the association with the resctrl group and I
> > > will elaborate more below ...
> > 
> > Okay, that's useful to retain but thinking this further, now we're also
> > going do non-trivial amount of work in between the setup and the test by
> 
> Could you please elaborate how the amount of work during setup can be an
> issue? I have been focused on the measurements that are done afterwards
> that do have clear boundaries from what I can tell.

Well, you used it as a justification: "Setting the affinity is intended 
to ensure that the buffer preparation occurs in the same topology as where 
the runtime portion will run." So I assumed you had some expectations about 
"preparations" done outside of those "clear boundaries" but now you seem
to take entirely opposite stance?

fork() quite heavy operation as it has to copy various things including 
the address space which you just made to contain a huge mem blob. :-)

BTW, perhaps we could use some lighter weighted fork variant in the 
resctrl selftests, the processes don't really need to be that separated 
to justify using full fork() (and custom benchmarks will do execvp()).

> > forking. I guess that doesn't matter for memflush = true case but might be
> > meaningful for the memflush = false case that seems to be there to allow
> > keeping caches hot (I personally haven't thought how to use "caches hot"
> > test but we do have that capability by the fact that memflush paremeter
> > exists).
> 
> I believe that memflush = true will always be needed/used by the tests
> relying on memory bandwidth measurement since that reduces cache hits during
> measurement phase and avoids the additional guessing on how long the workload
> should be run before reliable/consistent measurements can start.
>
> Thinking about the memflush = false case I now think that we should use that
> for the CMT test. The buffer is allocated and initialized while the task
> is configured with appropriate allocation limits so there should not be a
> reason to flush the buffer from the cache. In fact, flushing the cache
> introduces
> the requirement to guess the workload's "settle" time (time to allocate the
> buffer
> into the cache again) before its occupancy can be measured. As a quick test I
> set memflush = false on one system and it brought down the average diff
> between
> the cache portion size and the occupancy counts. I'll try it out on a few more
> systems to confirm.

Oh great!

I've not really figured out the logic used in the old CMT test because 
there was the rewrite for it in the series I started to upstream some of 
these improvements from. But I was unable to rebase successfully that 
rewrite either because somebody had used a non-publically available tree 
as a basis for it so I never did even have time to understand what even 
the rewritten test did thanks to the very complex diff.

> > > > Neither behavior, however, seems to result in the intended behavior as
> > > > we
> > > > either get interfering processes (if inherited) or no desired resctrl
> > > > setup for the benchmark process.
> > > 
> > > There are two processes to consider in the resource group, the parent
> > > (that
> > > sets up the buffer and does the measurements) and the child (that runs the
> > > workload to be measured). Thanks to your commit da50de0a92f3
> > > ("selftests/resctrl:
> > > Calculate resctrl FS derived mem bw over sleep(1) only") the parent
> > > will be sleeping while the child runs its workload and there is no
> > > other interference I am aware of. The only additional measurements
> > > that I can see would be the work needed to actually start and stop the
> > > measurements and from what I can tell this falls into the noise.
> > > 
> > > Please do keep in mind that the performance counters used, iMC, cannot
> > > actually
> > > be bound to a single CPU since it is a per-socket PMU. The measurements
> > > have
> > > thus never been as fine grained as the code pretends it to be.
> > 
> > I was thinking if I should note the amount of work is small. Maybe it's
> > fine to leave that noise there and I'm just overly cautious :-), when I
> > used to do networking research in the past life, I wanted to eliminate as
> > much noise sources so I guess it comes from that background.
> 
> The goal of these tests are to verify *resctrl*, these are not intended to be
> hardware validation tests. I think it would be better for resctrl if more time
> is spent on functional tests of resctrl than these performance tests.

This sounds so easy... (no offence) :-) If only there wouldn't be the 
black boxes and we'd have good and fine-grained ways to instrument it,
it would be so much easier to realize non-statistical means to do 
functional tests.

-- 
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-09-05 12:10           ` Ilpo Järvinen
@ 2024-09-05 18:08             ` Reinette Chatre
  2024-09-06 10:00               ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-09-05 18:08 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 9/5/24 5:10 AM, Ilpo Järvinen wrote:
> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>> On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>> On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
> 
>>>>>> @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test
>>>>>> *test,
>>>>>>     		return ret;
>>>>>>     	}
>>>>>>     -	/*
>>>>>> -	 * If benchmark wasn't successfully started by child, then
>>>>>> child
>>>>>> should
>>>>>> -	 * kill parent, so save parent's pid
>>>>>> -	 */
>>>>>>     	ppid = getpid();
>>>>>>     -	if (pipe(pipefd)) {
>>>>>> -		ksft_perror("Unable to create pipe");
>>>>>> +	/* Taskset test to specified CPU. */
>>>>>> +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
>>>>>
>>>>> Previously only CPU affinity for bm_pid was set but now it's set before
>>>>> fork(). Quickly checking the Internet, it seems that CPU affinity gets
>>>>> inherited on fork() so now both processes will have the same affinity
>>>>> which might make the other process to interfere with the measurement.
>>>>
>>>> Setting the affinity is intended to ensure that the buffer preparation
>>>> occurs in the same topology as where the runtime portion will run.
>>>> This preparation is done before the work to be measured starts.
>>>>
>>>> This does tie in with the association with the resctrl group and I
>>>> will elaborate more below ...
>>>
>>> Okay, that's useful to retain but thinking this further, now we're also
>>> going do non-trivial amount of work in between the setup and the test by
>>
>> Could you please elaborate how the amount of work during setup can be an
>> issue? I have been focused on the measurements that are done afterwards
>> that do have clear boundaries from what I can tell.
> 
> Well, you used it as a justification: "Setting the affinity is intended
> to ensure that the buffer preparation occurs in the same topology as where
> the runtime portion will run." So I assumed you had some expectations about
> "preparations" done outside of those "clear boundaries" but now you seem
> to take entirely opposite stance?

I do not follow you here. With the "clear boundaries" I meant the
measurements and associated variables that have  a clear start/init and
stop/read while the other task sleeps. Yes, preparations are done outside
of this since that should not be measured. You stated "now we're also going
do non-trivial amount of work in between the setup and the test" ...
could you clarify what the problem is with this? Before this work
the "non-trivial amount of work" (for "fill_buf") was done as part of the
test with (wrong) guesses about how long it takes. This work aims to improve
this.

> 
> fork() quite heavy operation as it has to copy various things including
> the address space which you just made to contain a huge mem blob. :-)

As highlighted in a comment found in the patch, the kernel uses copy-on-write
and all tests only read from the buffer after fork().

> 
> BTW, perhaps we could use some lighter weighted fork variant in the
> resctrl selftests, the processes don't really need to be that separated
> to justify using full fork() (and custom benchmarks will do execvp()).

Are you thinking about pthreads? It is not clear to me that this is
necessary. It is also not clear to me what problem you are describing that
needs this solution. When I understand that better it will be easier to
discuss solutions.

> 
>>> forking. I guess that doesn't matter for memflush = true case but might be
>>> meaningful for the memflush = false case that seems to be there to allow
>>> keeping caches hot (I personally haven't thought how to use "caches hot"
>>> test but we do have that capability by the fact that memflush paremeter
>>> exists).
>>
>> I believe that memflush = true will always be needed/used by the tests
>> relying on memory bandwidth measurement since that reduces cache hits during
>> measurement phase and avoids the additional guessing on how long the workload
>> should be run before reliable/consistent measurements can start.
>>
>> Thinking about the memflush = false case I now think that we should use that
>> for the CMT test. The buffer is allocated and initialized while the task
>> is configured with appropriate allocation limits so there should not be a
>> reason to flush the buffer from the cache. In fact, flushing the cache
>> introduces
>> the requirement to guess the workload's "settle" time (time to allocate the
>> buffer
>> into the cache again) before its occupancy can be measured. As a quick test I
>> set memflush = false on one system and it brought down the average diff
>> between
>> the cache portion size and the occupancy counts. I'll try it out on a few more
>> systems to confirm.
> 
> Oh great!
> 
> I've not really figured out the logic used in the old CMT test because
> there was the rewrite for it in the series I started to upstream some of
> these improvements from. But I was unable to rebase successfully that
> rewrite either because somebody had used a non-publically available tree
> as a basis for it so I never did even have time to understand what even
> the rewritten test did thanks to the very complex diff.
> 
>>>>> Neither behavior, however, seems to result in the intended behavior as
>>>>> we
>>>>> either get interfering processes (if inherited) or no desired resctrl
>>>>> setup for the benchmark process.
>>>>
>>>> There are two processes to consider in the resource group, the parent
>>>> (that
>>>> sets up the buffer and does the measurements) and the child (that runs the
>>>> workload to be measured). Thanks to your commit da50de0a92f3
>>>> ("selftests/resctrl:
>>>> Calculate resctrl FS derived mem bw over sleep(1) only") the parent
>>>> will be sleeping while the child runs its workload and there is no
>>>> other interference I am aware of. The only additional measurements
>>>> that I can see would be the work needed to actually start and stop the
>>>> measurements and from what I can tell this falls into the noise.
>>>>
>>>> Please do keep in mind that the performance counters used, iMC, cannot
>>>> actually
>>>> be bound to a single CPU since it is a per-socket PMU. The measurements
>>>> have
>>>> thus never been as fine grained as the code pretends it to be.
>>>
>>> I was thinking if I should note the amount of work is small. Maybe it's
>>> fine to leave that noise there and I'm just overly cautious :-), when I
>>> used to do networking research in the past life, I wanted to eliminate as
>>> much noise sources so I guess it comes from that background.
>>
>> The goal of these tests are to verify *resctrl*, these are not intended to be
>> hardware validation tests. I think it would be better for resctrl if more time
>> is spent on functional tests of resctrl than these performance tests.
> 
> This sounds so easy... (no offence) :-) If only there wouldn't be the
> black boxes and we'd have good and fine-grained ways to instrument it,
> it would be so much easier to realize non-statistical means to do
> functional tests.
>

To me functional tests of resctrl indeed sounds easier. Tests can interact with the
resctrl interface to ensure it works as expected ... test the boundaries
of user input to the various files, test task movement between groups, test moving of
monitor groups, test assigning CPUs to resource groups, ensure the "mode" of a
resource group can be changed and behaves as expected (for example, shared vs exclusive),
ensure changes to one file are reflected in others, like changing schemata is reflected
in "size" and "bit_usage", etc. etc. These are all tests of *resctrl* that supports
development and can be used to verify all is well as major changes (that we are seeing
more and more of) are made to the kernel subsystem. None of this is "black box" and
is all deterministic with obvious pass/fail. This can be made even more reliable with
not just checking of resctrl files but see if changes via resctrl is reflected in MSRs
as expected.

Reinette

  

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-09-05 18:08             ` Reinette Chatre
@ 2024-09-06 10:00               ` Ilpo Järvinen
  2024-09-07  0:05                 ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-06 10:00 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 7381 bytes --]

On Thu, 5 Sep 2024, Reinette Chatre wrote:
> On 9/5/24 5:10 AM, Ilpo Järvinen wrote:
> > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
> > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
> > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > 
> > > > > > > @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test
> > > > > > > *test,
> > > > > > >     		return ret;
> > > > > > >     	}
> > > > > > >     -	/*
> > > > > > > -	 * If benchmark wasn't successfully started by child, then
> > > > > > > child
> > > > > > > should
> > > > > > > -	 * kill parent, so save parent's pid
> > > > > > > -	 */
> > > > > > >     	ppid = getpid();
> > > > > > >     -	if (pipe(pipefd)) {
> > > > > > > -		ksft_perror("Unable to create pipe");
> > > > > > > +	/* Taskset test to specified CPU. */
> > > > > > > +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
> > > > > > 
> > > > > > Previously only CPU affinity for bm_pid was set but now it's set
> > > > > > before
> > > > > > fork(). Quickly checking the Internet, it seems that CPU affinity
> > > > > > gets
> > > > > > inherited on fork() so now both processes will have the same
> > > > > > affinity
> > > > > > which might make the other process to interfere with the
> > > > > > measurement.
> > > > > 
> > > > > Setting the affinity is intended to ensure that the buffer preparation
> > > > > occurs in the same topology as where the runtime portion will run.
> > > > > This preparation is done before the work to be measured starts.
> > > > > 
> > > > > This does tie in with the association with the resctrl group and I
> > > > > will elaborate more below ...
> > > > 
> > > > Okay, that's useful to retain but thinking this further, now we're also
> > > > going do non-trivial amount of work in between the setup and the test by
> > > 
> > > Could you please elaborate how the amount of work during setup can be an
> > > issue? I have been focused on the measurements that are done afterwards
> > > that do have clear boundaries from what I can tell.
> > 
> > Well, you used it as a justification: "Setting the affinity is intended
> > to ensure that the buffer preparation occurs in the same topology as where
> > the runtime portion will run." So I assumed you had some expectations about
> > "preparations" done outside of those "clear boundaries" but now you seem
> > to take entirely opposite stance?
> 
> I do not follow you here. With the "clear boundaries" I meant the
> measurements and associated variables that have  a clear start/init and
> stop/read while the other task sleeps. Yes, preparations are done outside
> of this since that should not be measured.

Of course the preparations are not measured (at least not after this
patch :-)).

But that's not what I meant. You said the preparations are to be done 
using the same topology as the test but if it's outside of the measurement 
period, the topology during preparations only matters if you make some
hidden assumption that some state carries from preparations to the actual 
test. If no state carry-over is assumed, the topology during preparations
is not important. So which way it is? Is the topology during preparations 
important or not?

You used the topology argument to justify why both parent and child are 
now in the same resctrl group unlike before when only the actual test was.

> You stated "now we're also going
> do non-trivial amount of work in between the setup and the test" ...
> could you clarify what the problem is with this? Before this work
> the "non-trivial amount of work" (for "fill_buf") was done as part of the
> test with (wrong) guesses about how long it takes. This work aims to improve
> this.

I understand why you're trying to do with this change.

However, I was trying to say that before preparations occurred right 
before the test which is no longer the case because there's fork() in 
between.

Does that extra work impact the state carry-over from preparations to the 
test?

> > fork() quite heavy operation as it has to copy various things including
> > the address space which you just made to contain a huge mem blob. :-)
> 
> As highlighted in a comment found in the patch, the kernel uses copy-on-write
> and all tests only read from the buffer after fork().

Write test is possible using -b fill_buf ... as mentioned in comments to 
the other patch.

> > BTW, perhaps we could use some lighter weighted fork variant in the
> > resctrl selftests, the processes don't really need to be that separated
> > to justify using full fork() (and custom benchmarks will do execvp()).
> 
> Are you thinking about pthreads? It is not clear to me that this is
> necessary. It is also not clear to me what problem you are describing that
> needs this solution. When I understand that better it will be easier to
> discuss solutions.

I was trying to say I see advantage of retaining the address space which
is not possible with fork(), so perhaps using clone() with CLONE_VM would 
be useful and maybe some other flags too. I think this would solve the 
write test case.

> > > > I was thinking if I should note the amount of work is small. Maybe it's
> > > > fine to leave that noise there and I'm just overly cautious :-), when I
> > > > used to do networking research in the past life, I wanted to eliminate
> > > > as
> > > > much noise sources so I guess it comes from that background.
> > > 
> > > The goal of these tests are to verify *resctrl*, these are not intended to
> > > be
> > > hardware validation tests. I think it would be better for resctrl if more
> > > time
> > > is spent on functional tests of resctrl than these performance tests.
> > 
> > This sounds so easy... (no offence) :-) If only there wouldn't be the
> > black boxes and we'd have good and fine-grained ways to instrument it,
> > it would be so much easier to realize non-statistical means to do
> > functional tests.
> > 
> 
> To me functional tests of resctrl indeed sounds easier. Tests can interact
> with the
> resctrl interface to ensure it works as expected ... test the boundaries
> of user input to the various files, test task movement between groups, test
> moving of
> monitor groups, test assigning CPUs to resource groups, ensure the "mode" of a
> resource group can be changed and behaves as expected (for example, shared vs
> exclusive),
> ensure changes to one file are reflected in others, like changing schemata is
> reflected
> in "size" and "bit_usage", etc. etc. These are all tests of *resctrl* that
> supports
> development and can be used to verify all is well as major changes (that we
> are seeing
> more and more of) are made to the kernel subsystem. None of this is "black
> box" and
> is all deterministic with obvious pass/fail. This can be made even more
> reliable with
> not just checking of resctrl files but see if changes via resctrl is reflected
> in MSRs
> as expected.

Okay, I get it now, you meant testing the kernel-userspace interfaces 
and with using MSRs as a further validation step to test kernel-HW 
interface too.

I'll probably take a look at this when I've some spare time what I can 
come up with.

-- 
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-09-06 10:00               ` Ilpo Järvinen
@ 2024-09-07  0:05                 ` Reinette Chatre
  2024-09-09 12:52                   ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-09-07  0:05 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 9/6/24 3:00 AM, Ilpo Järvinen wrote:
> On Thu, 5 Sep 2024, Reinette Chatre wrote:
>> On 9/5/24 5:10 AM, Ilpo Järvinen wrote:
>>> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>>>> On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
>>>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>>>> On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
>>>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>
>>>>>>>> @@ -699,111 +639,80 @@ int resctrl_val(const struct resctrl_test
>>>>>>>> *test,
>>>>>>>>      		return ret;
>>>>>>>>      	}
>>>>>>>>      -	/*
>>>>>>>> -	 * If benchmark wasn't successfully started by child, then
>>>>>>>> child
>>>>>>>> should
>>>>>>>> -	 * kill parent, so save parent's pid
>>>>>>>> -	 */
>>>>>>>>      	ppid = getpid();
>>>>>>>>      -	if (pipe(pipefd)) {
>>>>>>>> -		ksft_perror("Unable to create pipe");
>>>>>>>> +	/* Taskset test to specified CPU. */
>>>>>>>> +	ret = taskset_benchmark(ppid, uparams->cpu, &old_affinity);
>>>>>>>
>>>>>>> Previously only CPU affinity for bm_pid was set but now it's set
>>>>>>> before
>>>>>>> fork(). Quickly checking the Internet, it seems that CPU affinity
>>>>>>> gets
>>>>>>> inherited on fork() so now both processes will have the same
>>>>>>> affinity
>>>>>>> which might make the other process to interfere with the
>>>>>>> measurement.
>>>>>>
>>>>>> Setting the affinity is intended to ensure that the buffer preparation
>>>>>> occurs in the same topology as where the runtime portion will run.
>>>>>> This preparation is done before the work to be measured starts.
>>>>>>
>>>>>> This does tie in with the association with the resctrl group and I
>>>>>> will elaborate more below ...
>>>>>
>>>>> Okay, that's useful to retain but thinking this further, now we're also
>>>>> going do non-trivial amount of work in between the setup and the test by
>>>>
>>>> Could you please elaborate how the amount of work during setup can be an
>>>> issue? I have been focused on the measurements that are done afterwards
>>>> that do have clear boundaries from what I can tell.
>>>
>>> Well, you used it as a justification: "Setting the affinity is intended
>>> to ensure that the buffer preparation occurs in the same topology as where
>>> the runtime portion will run." So I assumed you had some expectations about
>>> "preparations" done outside of those "clear boundaries" but now you seem
>>> to take entirely opposite stance?
>>
>> I do not follow you here. With the "clear boundaries" I meant the
>> measurements and associated variables that have  a clear start/init and
>> stop/read while the other task sleeps. Yes, preparations are done outside
>> of this since that should not be measured.
> 
> Of course the preparations are not measured (at least not after this
> patch :-)).
> 
> But that's not what I meant. You said the preparations are to be done
> using the same topology as the test but if it's outside of the measurement
> period, the topology during preparations only matters if you make some
> hidden assumption that some state carries from preparations to the actual
> test. If no state carry-over is assumed, the topology during preparations
> is not important. So which way it is? Is the topology during preparations
> important or not?

Topology during preparations is important.

In the CMT test this is more relevant with the transition to using
memflush = false. The preparation phase and measure phase uses the same
cache alloc configuration and just as importantly, the same monitoring
configuration. During preparation the cache portion that will be
used during measurement will be filled with the contents of the buffer.
During measurement it will be the same cache portion into which any new reads
will be allocated and measured. In fact, the preparation phase will thus form
part of the measurement. If preparation was done with different
configuration, then I see a problem whether memflush = true as well as when
memflush = false. In the case of memflush = true it will have the familiar
issue of the test needing to "guess" the workload settle time. In the case
of memflush = false the buffer will remain allocated into the cache portion
used during preparation phase, when the workload runs it will read the
data from a cache portion that does not belong to it and since it does
not need to allocate into its own cache portion its LLC occupancy counts will
not be accurate (the LLC occupancy associated with the buffer will be attributed
to prepare portion).

I am not familiar with the details of memory allocation but as I understand
Linux does attempt to satisfy memory requests from the node to which the
requesting CPU is assigned. For the MBM and MBA tests I thus believe it is
important to allocate the memory from where it will be used. I have encountered
systems where CPU0 and CPU1 are on different sockets and by default the workload
is set to run on CPU1. If the preparation phase runs on CPU0 then it may be
that memory could be allocated from a different NUMA node than where the workload will
be running. By doing preparation within the same topology as what the
workload will be running I believe that memory will be allocated appropriate
to workload and thus result in more reliable measurements.

> 
> You used the topology argument to justify why both parent and child are
> now in the same resctrl group unlike before when only the actual test was.
> 
>> You stated "now we're also going
>> do non-trivial amount of work in between the setup and the test" ...
>> could you clarify what the problem is with this? Before this work
>> the "non-trivial amount of work" (for "fill_buf") was done as part of the
>> test with (wrong) guesses about how long it takes. This work aims to improve
>> this.
> 
> I understand why you're trying to do with this change.
> 
> However, I was trying to say that before preparations occurred right
> before the test which is no longer the case because there's fork() in
> between.

If by "test" you mean the measurement phase then in the case of "fill_buf"
preparations only now reliably occur before the measurement. Original behavior
is maintained with user provided benchmark.

> 
> Does that extra work impact the state carry-over from preparations to the
> test?

It is not clear to me what extra work or state you are referring to.

> 
>>> fork() quite heavy operation as it has to copy various things including
>>> the address space which you just made to contain a huge mem blob. :-)
>>
>> As highlighted in a comment found in the patch, the kernel uses copy-on-write
>> and all tests only read from the buffer after fork().
> 
> Write test is possible using -b fill_buf ... as mentioned in comments to
> the other patch.

Yes, it is theoretically possible, but looking closer it is not supported. Note
how measure_mem_bw() is always called with hardcoded "reads". Reading through
the history of commits I do not believe modifying fill_buf parameters was
ever intended to be a use case for the "-b" parameter. It seems intended
to provide an alternate benchmark to fill_buf.

I am not interested in adding support for the write test because I do not see
how it helps us to improve resctrl subsystem health. Considering that
nobody has ever complained about the write test being broken I think all
that dead code should just be removed instead.

I prefer the focus be on the health of the resctrl subsystem instead of additional
hardware performance testing. I do not think it is energy well spent to further
tune these performance tests unless it benefits resctrl subsystem health. These
performance tests are difficult to maintain and I have not yet seen how they
benefit the resctrl subsystem.

>>> BTW, perhaps we could use some lighter weighted fork variant in the
>>> resctrl selftests, the processes don't really need to be that separated
>>> to justify using full fork() (and custom benchmarks will do execvp()).
>>
>> Are you thinking about pthreads? It is not clear to me that this is
>> necessary. It is also not clear to me what problem you are describing that
>> needs this solution. When I understand that better it will be easier to
>> discuss solutions.
> 
> I was trying to say I see advantage of retaining the address space which
> is not possible with fork(), so perhaps using clone() with CLONE_VM would
> be useful and maybe some other flags too. I think this would solve the
> write test case.

clone() brings with it the complexity of needing to manage the child's
stack. This again aims for a performance improvement. What does this fix?
What is the benefit to resctrl health? I would prefer that our energy
instead be focused on improving resctrl subsystem health.

Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark
  2024-09-07  0:05                 ` Reinette Chatre
@ 2024-09-09 12:52                   ` Ilpo Järvinen
  0 siblings, 0 replies; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-09 12:52 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 9702 bytes --]

On Fri, 6 Sep 2024, Reinette Chatre wrote:
> On 9/6/24 3:00 AM, Ilpo Järvinen wrote:
> > On Thu, 5 Sep 2024, Reinette Chatre wrote:
> > > On 9/5/24 5:10 AM, Ilpo Järvinen wrote:
> > > > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > > > On 9/4/24 4:57 AM, Ilpo Järvinen wrote:
> > > > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > > > On 8/30/24 3:56 AM, Ilpo Järvinen wrote:
> > > > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > 
> > > > > > > > > @@ -699,111 +639,80 @@ int resctrl_val(const struct
> > > > > > > > > resctrl_test
> > > > > > > > > *test,
> > > > > > > > >      		return ret;
> > > > > > > > >      	}
> > > > > > > > >      -	/*
> > > > > > > > > -	 * If benchmark wasn't successfully started by child,
> > > > > > > > > then
> > > > > > > > > child
> > > > > > > > > should
> > > > > > > > > -	 * kill parent, so save parent's pid
> > > > > > > > > -	 */
> > > > > > > > >      	ppid = getpid();
> > > > > > > > >      -	if (pipe(pipefd)) {
> > > > > > > > > -		ksft_perror("Unable to create pipe");
> > > > > > > > > +	/* Taskset test to specified CPU. */
> > > > > > > > > +	ret = taskset_benchmark(ppid, uparams->cpu,
> > > > > > > > > &old_affinity);
> > > > > > > > 
> > > > > > > > Previously only CPU affinity for bm_pid was set but now it's set
> > > > > > > > before
> > > > > > > > fork(). Quickly checking the Internet, it seems that CPU
> > > > > > > > affinity
> > > > > > > > gets
> > > > > > > > inherited on fork() so now both processes will have the same
> > > > > > > > affinity
> > > > > > > > which might make the other process to interfere with the
> > > > > > > > measurement.
> > > > > > > 
> > > > > > > Setting the affinity is intended to ensure that the buffer
> > > > > > > preparation
> > > > > > > occurs in the same topology as where the runtime portion will run.
> > > > > > > This preparation is done before the work to be measured starts.
> > > > > > > 
> > > > > > > This does tie in with the association with the resctrl group and I
> > > > > > > will elaborate more below ...
> > > > > > 
> > > > > > Okay, that's useful to retain but thinking this further, now we're
> > > > > > also
> > > > > > going do non-trivial amount of work in between the setup and the
> > > > > > test by
> > > > > 
> > > > > Could you please elaborate how the amount of work during setup can be
> > > > > an
> > > > > issue? I have been focused on the measurements that are done
> > > > > afterwards
> > > > > that do have clear boundaries from what I can tell.
> > > > 
> > > > Well, you used it as a justification: "Setting the affinity is intended
> > > > to ensure that the buffer preparation occurs in the same topology as
> > > > where
> > > > the runtime portion will run." So I assumed you had some expectations
> > > > about
> > > > "preparations" done outside of those "clear boundaries" but now you seem
> > > > to take entirely opposite stance?
> > > 
> > > I do not follow you here. With the "clear boundaries" I meant the
> > > measurements and associated variables that have  a clear start/init and
> > > stop/read while the other task sleeps. Yes, preparations are done outside
> > > of this since that should not be measured.
> > 
> > Of course the preparations are not measured (at least not after this
> > patch :-)).
> > 
> > But that's not what I meant. You said the preparations are to be done
> > using the same topology as the test but if it's outside of the measurement
> > period, the topology during preparations only matters if you make some
> > hidden assumption that some state carries from preparations to the actual
> > test. If no state carry-over is assumed, the topology during preparations
> > is not important. So which way it is? Is the topology during preparations
> > important or not?
> 
> Topology during preparations is important.
> 
> In the CMT test this is more relevant with the transition to using
> memflush = false. The preparation phase and measure phase uses the same
> cache alloc configuration and just as importantly, the same monitoring
> configuration. During preparation the cache portion that will be
> used during measurement will be filled with the contents of the buffer.
> During measurement it will be the same cache portion into which any new reads
> will be allocated and measured. In fact, the preparation phase will thus form
> part of the measurement. If preparation was done with different
> configuration, then I see a problem whether memflush = true as well as when
> memflush = false. In the case of memflush = true it will have the familiar
> issue of the test needing to "guess" the workload settle time. In the case
> of memflush = false the buffer will remain allocated into the cache portion
> used during preparation phase, when the workload runs it will read the
> data from a cache portion that does not belong to it and since it does
> not need to allocate into its own cache portion its LLC occupancy counts will
> not be accurate (the LLC occupancy associated with the buffer will be
> attributed
> to prepare portion).
> 
> I am not familiar with the details of memory allocation but as I understand
> Linux does attempt to satisfy memory requests from the node to which the
> requesting CPU is assigned. For the MBM and MBA tests I thus believe it is
> important to allocate the memory from where it will be used. I have
> encountered
> systems where CPU0 and CPU1 are on different sockets and by default the
> workload
> is set to run on CPU1. If the preparation phase runs on CPU0 then it may be
> that memory could be allocated from a different NUMA node than where the
> workload will
> be running. By doing preparation within the same topology as what the
> workload will be running I believe that memory will be allocated appropriate
> to workload and thus result in more reliable measurements.
> 
> > 
> > You used the topology argument to justify why both parent and child are
> > now in the same resctrl group unlike before when only the actual test was.
> > 
> > > You stated "now we're also going
> > > do non-trivial amount of work in between the setup and the test" ...
> > > could you clarify what the problem is with this? Before this work
> > > the "non-trivial amount of work" (for "fill_buf") was done as part of the
> > > test with (wrong) guesses about how long it takes. This work aims to
> > > improve
> > > this.
> > 
> > I understand why you're trying to do with this change.
> > 
> > However, I was trying to say that before preparations occurred right
> > before the test which is no longer the case because there's fork() in
> > between.
> 
> If by "test" you mean the measurement phase then in the case of "fill_buf"
> preparations only now reliably occur before the measurement. Original behavior
> is maintained with user provided benchmark.
> 
> > 
> > Does that extra work impact the state carry-over from preparations to the
> > test?
> 
> It is not clear to me what extra work or state you are referring to.
> 
> > 
> > > > fork() quite heavy operation as it has to copy various things including
> > > > the address space which you just made to contain a huge mem blob. :-)
> > > 
> > > As highlighted in a comment found in the patch, the kernel uses
> > > copy-on-write
> > > and all tests only read from the buffer after fork().
> > 
> > Write test is possible using -b fill_buf ... as mentioned in comments to
> > the other patch.
> 
> Yes, it is theoretically possible, but looking closer it is not supported.
> Note
> how measure_mem_bw() is always called with hardcoded "reads". Reading through
> the history of commits I do not believe modifying fill_buf parameters was
> ever intended to be a use case for the "-b" parameter. It seems intended
> to provide an alternate benchmark to fill_buf.
> 
> I am not interested in adding support for the write test because I do not see
> how it helps us to improve resctrl subsystem health. Considering that
> nobody has ever complained about the write test being broken I think all
> that dead code should just be removed instead.
> 
> I prefer the focus be on the health of the resctrl subsystem instead of
> additional
> hardware performance testing. I do not think it is energy well spent to
> further
> tune these performance tests unless it benefits resctrl subsystem health.
> These
> performance tests are difficult to maintain and I have not yet seen how they
> benefit the resctrl subsystem.
> 
> > > > BTW, perhaps we could use some lighter weighted fork variant in the
> > > > resctrl selftests, the processes don't really need to be that separated
> > > > to justify using full fork() (and custom benchmarks will do execvp()).
> > > 
> > > Are you thinking about pthreads? It is not clear to me that this is
> > > necessary. It is also not clear to me what problem you are describing that
> > > needs this solution. When I understand that better it will be easier to
> > > discuss solutions.
> > 
> > I was trying to say I see advantage of retaining the address space which
> > is not possible with fork(), so perhaps using clone() with CLONE_VM would
> > be useful and maybe some other flags too. I think this would solve the
> > write test case.
> 
> clone() brings with it the complexity of needing to manage the child's
> stack. This again aims for a performance improvement. What does this fix?
> What is the benefit to resctrl health? I would prefer that our energy
> instead be focused on improving resctrl subsystem health.

Fair enough.

-- 
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing
  2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
  2024-08-29 22:52 ` [PATCH 1/6] selftests/resctrl: Fix sparse warnings Reinette Chatre
  2024-08-29 22:52 ` [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark Reinette Chatre
@ 2024-08-29 22:52 ` Reinette Chatre
  2024-08-30 11:13   ` Ilpo Järvinen
  2024-08-29 22:52 ` [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size Reinette Chatre
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-08-29 22:52 UTC (permalink / raw)
  To: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	ilpo.jarvinen
  Cc: maciej.wieczor-retman, reinette.chatre, linux-kselftest,
	linux-kernel

The benchmark used during the CMT, MBM, and MBA tests can be provided by
the user via (-b) parameter to the tests, if not provided the default
"fill_buf" benchmark is used.

The "fill_buf" benchmark requires parameters and these are managed as
an array of strings.

Using an array of strings to manage the "fill_buf" parameters is
complex because it requires transformations to/from strings at every
producer and consumer. This is made worse for the individual tests
where the default benchmark parameters values may not be appropriate and
additional data wrangling is required. For example, the CMT test
duplicates the entire array of strings in order to replace one of
the parameters.

Replace the "array of strings" parameters used for "fill_buf" with a
struct that contains the "fill_buf" parameters that can be used directly
without transformations to/from strings. Make these parameters
part of the parameters associated with each test so that each test can
set benchmark parameters that are appropriate for it.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/cmt_test.c    | 28 +++--------
 tools/testing/selftests/resctrl/mba_test.c    |  7 ++-
 tools/testing/selftests/resctrl/mbm_test.c    |  9 +++-
 tools/testing/selftests/resctrl/resctrl.h     | 49 +++++++++++++------
 .../testing/selftests/resctrl/resctrl_tests.c | 15 +-----
 tools/testing/selftests/resctrl/resctrl_val.c | 38 +++++---------
 6 files changed, 69 insertions(+), 77 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
index 0c045080d808..f09d5dfab38c 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -116,15 +116,12 @@ static void cmt_test_cleanup(void)
 
 static int cmt_run_test(const struct resctrl_test *test, const struct user_params *uparams)
 {
-	const char * const *cmd = uparams->benchmark_cmd;
-	const char *new_cmd[BENCHMARK_ARGS];
 	unsigned long cache_total_size = 0;
 	int n = uparams->bits ? : 5;
 	unsigned long long_mask;
-	char *span_str = NULL;
 	int count_of_bits;
 	size_t span;
-	int ret, i;
+	int ret;
 
 	ret = get_full_cbm("L3", &long_mask);
 	if (ret)
@@ -155,32 +152,21 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param
 
 	span = cache_portion_size(cache_total_size, param.mask, long_mask);
 
-	if (strcmp(cmd[0], "fill_buf") == 0) {
-		/* Duplicate the command to be able to replace span in it */
-		for (i = 0; uparams->benchmark_cmd[i]; i++)
-			new_cmd[i] = uparams->benchmark_cmd[i];
-		new_cmd[i] = NULL;
-
-		ret = asprintf(&span_str, "%zu", span);
-		if (ret < 0)
-			return -1;
-		new_cmd[1] = span_str;
-		cmd = new_cmd;
-	}
+	param.fill_buf.buf_size = span;
+	param.fill_buf.memflush = 1;
+	param.fill_buf.operation = 0;
+	param.fill_buf.once = false;
 
 	remove(RESULT_FILE_NAME);
 
-	ret = resctrl_val(test, uparams, cmd, &param);
+	ret = resctrl_val(test, uparams, &param);
 	if (ret)
-		goto out;
+		return ret;
 
 	ret = check_results(&param, span, n);
 	if (ret && (get_vendor() == ARCH_INTEL))
 		ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
 
-out:
-	free(span_str);
-
 	return ret;
 }
 
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index ab8496a4925b..8ad433495f61 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -174,7 +174,12 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param
 
 	remove(RESULT_FILE_NAME);
 
-	ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
+	param.fill_buf.buf_size = DEFAULT_SPAN;
+	param.fill_buf.memflush = 1;
+	param.fill_buf.operation = 0;
+	param.fill_buf.once = false;
+
+	ret = resctrl_val(test, uparams, &param);
 	if (ret)
 		return ret;
 
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 6b5a3b52d861..b6883f274c74 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -142,11 +142,16 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param
 
 	remove(RESULT_FILE_NAME);
 
-	ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
+	param.fill_buf.buf_size = DEFAULT_SPAN;
+	param.fill_buf.memflush = 1;
+	param.fill_buf.operation = 0;
+	param.fill_buf.once = false;
+
+	ret = resctrl_val(test, uparams, &param);
 	if (ret)
 		return ret;
 
-	ret = check_results(DEFAULT_SPAN);
+	ret = check_results(param.fill_buf.buf_size);
 	if (ret && (get_vendor() == ARCH_INTEL))
 		ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
 
diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index 0afbc4dd18e4..0e5456165a6a 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -79,6 +79,26 @@ struct resctrl_test {
 	void		(*cleanup)(void);
 };
 
+/*
+ * fill_buf_param:	"fill_buf" benchmark parameters
+ * @buf_size:		Size (in bytes) of buffer used in benchmark.
+ *			"fill_buf" allocates and initializes buffer of
+ *			@buf_size.
+ * @operation:		If 0, then only read operations are performed on
+ *			the buffer, if 1 then only write operations are
+ *			performed on the buffer.
+ * @memflush:		1 if buffer should be flushed after
+ *			allocation and initialization.
+ * @once:		Benchmark will perform @operation once if true,
+ *			infinitely (until terminated) if false.
+ */
+struct fill_buf_param {
+	size_t		buf_size;
+	int		operation;
+	int		memflush;
+	int		once;
+};
+
 /*
  * resctrl_val_param:	resctrl test parameters
  * @ctrlgrp:		Name of the control monitor group (con_mon grp)
@@ -87,21 +107,23 @@ struct resctrl_test {
  * @init:		Callback function to initialize test environment
  * @setup:		Callback function to setup per test run environment
  * @measure:		Callback that performs the measurement (a single test)
+ * @fill_buf:		Parameters for default "fill_buf" benchmark
  */
 struct resctrl_val_param {
-	const char	*ctrlgrp;
-	const char	*mongrp;
-	char		filename[64];
-	unsigned long	mask;
-	int		num_of_runs;
-	int		(*init)(const struct resctrl_val_param *param,
-				int domain_id);
-	int		(*setup)(const struct resctrl_test *test,
-				 const struct user_params *uparams,
-				 struct resctrl_val_param *param);
-	int		(*measure)(const struct user_params *uparams,
-				   struct resctrl_val_param *param,
-				   pid_t bm_pid);
+	const char		*ctrlgrp;
+	const char		*mongrp;
+	char			filename[64];
+	unsigned long		mask;
+	int			num_of_runs;
+	int			(*init)(const struct resctrl_val_param *param,
+					int domain_id);
+	int			(*setup)(const struct resctrl_test *test,
+					 const struct user_params *uparams,
+					 struct resctrl_val_param *param);
+	int			(*measure)(const struct user_params *uparams,
+					   struct resctrl_val_param *param,
+					   pid_t bm_pid);
+	struct fill_buf_param	fill_buf;
 };
 
 struct perf_event_read {
@@ -151,7 +173,6 @@ void initialize_mem_bw_resctrl(const struct resctrl_val_param *param,
 			       int domain_id);
 int resctrl_val(const struct resctrl_test *test,
 		const struct user_params *uparams,
-		const char * const *benchmark_cmd,
 		struct resctrl_val_param *param);
 unsigned long create_bit_mask(unsigned int start, unsigned int len);
 unsigned int count_contiguous_bits(unsigned long val, unsigned int *start);
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index ecbb7605a981..ce8fcc769d57 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -162,7 +162,7 @@ int main(int argc, char **argv)
 	bool test_param_seen = false;
 	struct user_params uparams;
 	char *span_str = NULL;
-	int ret, c, i;
+	int c, i;
 
 	init_user_params(&uparams);
 
@@ -257,19 +257,6 @@ int main(int argc, char **argv)
 
 	filter_dmesg();
 
-	if (!uparams.benchmark_cmd[0]) {
-		/* If no benchmark is given by "-b" argument, use fill_buf. */
-		uparams.benchmark_cmd[0] = "fill_buf";
-		ret = asprintf(&span_str, "%u", DEFAULT_SPAN);
-		if (ret < 0)
-			ksft_exit_fail_msg("Out of memory!\n");
-		uparams.benchmark_cmd[1] = span_str;
-		uparams.benchmark_cmd[2] = "1";
-		uparams.benchmark_cmd[3] = "0";
-		uparams.benchmark_cmd[4] = "false";
-		uparams.benchmark_cmd[5] = NULL;
-	}
-
 	ksft_set_plan(tests);
 
 	for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++)
diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
index 574b72604f95..9a5a9a67e059 100644
--- a/tools/testing/selftests/resctrl/resctrl_val.c
+++ b/tools/testing/selftests/resctrl/resctrl_val.c
@@ -612,21 +612,17 @@ int measure_mem_bw(const struct user_params *uparams,
  *			the benchmark
  * @test:		test information structure
  * @uparams:		user supplied parameters
- * @benchmark_cmd:	benchmark command and its arguments
  * @param:		parameters passed to resctrl_val()
  *
  * Return:		0 when the test was run, < 0 on error.
  */
 int resctrl_val(const struct resctrl_test *test,
 		const struct user_params *uparams,
-		const char * const *benchmark_cmd,
 		struct resctrl_val_param *param)
 {
-	int domain_id, operation = 0, memflush = 1;
-	size_t span = DEFAULT_SPAN;
 	unsigned char *buf = NULL;
 	cpu_set_t old_affinity;
-	bool once = false;
+	int domain_id;
 	int ret = 0;
 	pid_t ppid;
 
@@ -666,21 +662,9 @@ int resctrl_val(const struct resctrl_test *test,
 	 * how this impacts "write" benchmark, but no test currently
 	 * uses this.
 	 */
-	if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
-		span = strtoul(benchmark_cmd[1], NULL, 10);
-		memflush =  atoi(benchmark_cmd[2]);
-		operation = atoi(benchmark_cmd[3]);
-		if (!strcmp(benchmark_cmd[4], "true")) {
-			once = true;
-		} else if (!strcmp(benchmark_cmd[4], "false")) {
-			once = false;
-		} else {
-			ksft_print_msg("Invalid once parameter\n");
-			ret = -EINVAL;
-			goto reset_affinity;
-		}
-
-		buf = alloc_buffer(span, memflush);
+	if (!uparams->benchmark_cmd[0]) {
+		buf = alloc_buffer(param->fill_buf.buf_size,
+				   param->fill_buf.memflush);
 		if (!buf) {
 			ret = -ENOMEM;
 			goto reset_affinity;
@@ -699,13 +683,17 @@ int resctrl_val(const struct resctrl_test *test,
 	 * terminated.
 	 */
 	if (bm_pid == 0) {
-		if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
-			if (operation == 0)
-				fill_cache_read(buf, span, once);
+		if (!uparams->benchmark_cmd[0]) {
+			if (param->fill_buf.operation == 0)
+				fill_cache_read(buf,
+						param->fill_buf.buf_size,
+						param->fill_buf.once);
 			else
-				fill_cache_write(buf, span, once);
+				fill_cache_write(buf,
+						 param->fill_buf.buf_size,
+						 param->fill_buf.once);
 		} else {
-			execvp(benchmark_cmd[0], (char **)benchmark_cmd);
+			execvp(uparams->benchmark_cmd[0], (char **)uparams->benchmark_cmd);
 		}
 		exit(EXIT_SUCCESS);
 	}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing
  2024-08-29 22:52 ` [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing Reinette Chatre
@ 2024-08-30 11:13   ` Ilpo Järvinen
  2024-08-30 16:01     ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-08-30 11:13 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

On Thu, 29 Aug 2024, Reinette Chatre wrote:

> The benchmark used during the CMT, MBM, and MBA tests can be provided by
> the user via (-b) parameter to the tests, if not provided the default
> "fill_buf" benchmark is used.
> 
> The "fill_buf" benchmark requires parameters and these are managed as
> an array of strings.
> 
> Using an array of strings to manage the "fill_buf" parameters is
> complex because it requires transformations to/from strings at every
> producer and consumer. This is made worse for the individual tests
> where the default benchmark parameters values may not be appropriate and
> additional data wrangling is required. For example, the CMT test
> duplicates the entire array of strings in order to replace one of
> the parameters.
> 
> Replace the "array of strings" parameters used for "fill_buf" with a
> struct that contains the "fill_buf" parameters that can be used directly
> without transformations to/from strings. Make these parameters
> part of the parameters associated with each test so that each test can
> set benchmark parameters that are appropriate for it.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  tools/testing/selftests/resctrl/cmt_test.c    | 28 +++--------
>  tools/testing/selftests/resctrl/mba_test.c    |  7 ++-
>  tools/testing/selftests/resctrl/mbm_test.c    |  9 +++-
>  tools/testing/selftests/resctrl/resctrl.h     | 49 +++++++++++++------
>  .../testing/selftests/resctrl/resctrl_tests.c | 15 +-----
>  tools/testing/selftests/resctrl/resctrl_val.c | 38 +++++---------
>  6 files changed, 69 insertions(+), 77 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
> index 0c045080d808..f09d5dfab38c 100644
> --- a/tools/testing/selftests/resctrl/cmt_test.c
> +++ b/tools/testing/selftests/resctrl/cmt_test.c
> @@ -116,15 +116,12 @@ static void cmt_test_cleanup(void)
>  
>  static int cmt_run_test(const struct resctrl_test *test, const struct user_params *uparams)
>  {
> -	const char * const *cmd = uparams->benchmark_cmd;
> -	const char *new_cmd[BENCHMARK_ARGS];
>  	unsigned long cache_total_size = 0;
>  	int n = uparams->bits ? : 5;
>  	unsigned long long_mask;
> -	char *span_str = NULL;
>  	int count_of_bits;
>  	size_t span;
> -	int ret, i;
> +	int ret;
>  
>  	ret = get_full_cbm("L3", &long_mask);
>  	if (ret)
> @@ -155,32 +152,21 @@ static int cmt_run_test(const struct resctrl_test *test, const struct user_param
>  
>  	span = cache_portion_size(cache_total_size, param.mask, long_mask);
>  
> -	if (strcmp(cmd[0], "fill_buf") == 0) {
> -		/* Duplicate the command to be able to replace span in it */
> -		for (i = 0; uparams->benchmark_cmd[i]; i++)
> -			new_cmd[i] = uparams->benchmark_cmd[i];
> -		new_cmd[i] = NULL;
> -
> -		ret = asprintf(&span_str, "%zu", span);
> -		if (ret < 0)
> -			return -1;
> -		new_cmd[1] = span_str;
> -		cmd = new_cmd;
> -	}
> +	param.fill_buf.buf_size = span;
> +	param.fill_buf.memflush = 1;
> +	param.fill_buf.operation = 0;
> +	param.fill_buf.once = false;
>  
>  	remove(RESULT_FILE_NAME);
>  
> -	ret = resctrl_val(test, uparams, cmd, &param);
> +	ret = resctrl_val(test, uparams, &param);
>  	if (ret)
> -		goto out;
> +		return ret;
>  
>  	ret = check_results(&param, span, n);
>  	if (ret && (get_vendor() == ARCH_INTEL))
>  		ksft_print_msg("Intel CMT may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
>  
> -out:
> -	free(span_str);
> -
>  	return ret;
>  }
>  
> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
> index ab8496a4925b..8ad433495f61 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -174,7 +174,12 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param
>  
>  	remove(RESULT_FILE_NAME);
>  
> -	ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
> +	param.fill_buf.buf_size = DEFAULT_SPAN;
> +	param.fill_buf.memflush = 1;
> +	param.fill_buf.operation = 0;
> +	param.fill_buf.once = false;
> +
> +	ret = resctrl_val(test, uparams, &param);
>  	if (ret)
>  		return ret;
>  
> diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
> index 6b5a3b52d861..b6883f274c74 100644
> --- a/tools/testing/selftests/resctrl/mbm_test.c
> +++ b/tools/testing/selftests/resctrl/mbm_test.c
> @@ -142,11 +142,16 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param
>  
>  	remove(RESULT_FILE_NAME);
>  
> -	ret = resctrl_val(test, uparams, uparams->benchmark_cmd, &param);
> +	param.fill_buf.buf_size = DEFAULT_SPAN;
> +	param.fill_buf.memflush = 1;
> +	param.fill_buf.operation = 0;
> +	param.fill_buf.once = false;
> +
> +	ret = resctrl_val(test, uparams, &param);
>  	if (ret)
>  		return ret;
>  
> -	ret = check_results(DEFAULT_SPAN);
> +	ret = check_results(param.fill_buf.buf_size);
>  	if (ret && (get_vendor() == ARCH_INTEL))
>  		ksft_print_msg("Intel MBM may be inaccurate when Sub-NUMA Clustering is enabled. Check BIOS configuration.\n");
>  
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index 0afbc4dd18e4..0e5456165a6a 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -79,6 +79,26 @@ struct resctrl_test {
>  	void		(*cleanup)(void);
>  };
>  
> +/*
> + * fill_buf_param:	"fill_buf" benchmark parameters
> + * @buf_size:		Size (in bytes) of buffer used in benchmark.
> + *			"fill_buf" allocates and initializes buffer of
> + *			@buf_size.
> + * @operation:		If 0, then only read operations are performed on
> + *			the buffer, if 1 then only write operations are
> + *			performed on the buffer.
> + * @memflush:		1 if buffer should be flushed after
> + *			allocation and initialization.
> + * @once:		Benchmark will perform @operation once if true,
> + *			infinitely (until terminated) if false.
> + */
> +struct fill_buf_param {
> +	size_t		buf_size;
> +	int		operation;
> +	int		memflush;
> +	int		once;
> +};
> +
>  /*
>   * resctrl_val_param:	resctrl test parameters
>   * @ctrlgrp:		Name of the control monitor group (con_mon grp)
> @@ -87,21 +107,23 @@ struct resctrl_test {
>   * @init:		Callback function to initialize test environment
>   * @setup:		Callback function to setup per test run environment
>   * @measure:		Callback that performs the measurement (a single test)
> + * @fill_buf:		Parameters for default "fill_buf" benchmark
>   */
>  struct resctrl_val_param {
> -	const char	*ctrlgrp;
> -	const char	*mongrp;
> -	char		filename[64];
> -	unsigned long	mask;
> -	int		num_of_runs;
> -	int		(*init)(const struct resctrl_val_param *param,
> -				int domain_id);
> -	int		(*setup)(const struct resctrl_test *test,
> -				 const struct user_params *uparams,
> -				 struct resctrl_val_param *param);
> -	int		(*measure)(const struct user_params *uparams,
> -				   struct resctrl_val_param *param,
> -				   pid_t bm_pid);
> +	const char		*ctrlgrp;
> +	const char		*mongrp;
> +	char			filename[64];
> +	unsigned long		mask;
> +	int			num_of_runs;
> +	int			(*init)(const struct resctrl_val_param *param,
> +					int domain_id);
> +	int			(*setup)(const struct resctrl_test *test,
> +					 const struct user_params *uparams,
> +					 struct resctrl_val_param *param);
> +	int			(*measure)(const struct user_params *uparams,
> +					   struct resctrl_val_param *param,
> +					   pid_t bm_pid);
> +	struct fill_buf_param	fill_buf;
>  };
>  
>  struct perf_event_read {
> @@ -151,7 +173,6 @@ void initialize_mem_bw_resctrl(const struct resctrl_val_param *param,
>  			       int domain_id);
>  int resctrl_val(const struct resctrl_test *test,
>  		const struct user_params *uparams,
> -		const char * const *benchmark_cmd,
>  		struct resctrl_val_param *param);
>  unsigned long create_bit_mask(unsigned int start, unsigned int len);
>  unsigned int count_contiguous_bits(unsigned long val, unsigned int *start);
> diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
> index ecbb7605a981..ce8fcc769d57 100644
> --- a/tools/testing/selftests/resctrl/resctrl_tests.c
> +++ b/tools/testing/selftests/resctrl/resctrl_tests.c
> @@ -162,7 +162,7 @@ int main(int argc, char **argv)
>  	bool test_param_seen = false;
>  	struct user_params uparams;
>  	char *span_str = NULL;
> -	int ret, c, i;
> +	int c, i;
>  
>  	init_user_params(&uparams);
>  
> @@ -257,19 +257,6 @@ int main(int argc, char **argv)
>  
>  	filter_dmesg();
>  
> -	if (!uparams.benchmark_cmd[0]) {
> -		/* If no benchmark is given by "-b" argument, use fill_buf. */
> -		uparams.benchmark_cmd[0] = "fill_buf";
> -		ret = asprintf(&span_str, "%u", DEFAULT_SPAN);
> -		if (ret < 0)
> -			ksft_exit_fail_msg("Out of memory!\n");
> -		uparams.benchmark_cmd[1] = span_str;
> -		uparams.benchmark_cmd[2] = "1";
> -		uparams.benchmark_cmd[3] = "0";
> -		uparams.benchmark_cmd[4] = "false";
> -		uparams.benchmark_cmd[5] = NULL;
> -	}
> -
>  	ksft_set_plan(tests);
>  
>  	for (i = 0; i < ARRAY_SIZE(resctrl_tests); i++)
> diff --git a/tools/testing/selftests/resctrl/resctrl_val.c b/tools/testing/selftests/resctrl/resctrl_val.c
> index 574b72604f95..9a5a9a67e059 100644
> --- a/tools/testing/selftests/resctrl/resctrl_val.c
> +++ b/tools/testing/selftests/resctrl/resctrl_val.c
> @@ -612,21 +612,17 @@ int measure_mem_bw(const struct user_params *uparams,
>   *			the benchmark
>   * @test:		test information structure
>   * @uparams:		user supplied parameters
> - * @benchmark_cmd:	benchmark command and its arguments
>   * @param:		parameters passed to resctrl_val()
>   *
>   * Return:		0 when the test was run, < 0 on error.
>   */
>  int resctrl_val(const struct resctrl_test *test,
>  		const struct user_params *uparams,
> -		const char * const *benchmark_cmd,
>  		struct resctrl_val_param *param)
>  {
> -	int domain_id, operation = 0, memflush = 1;
> -	size_t span = DEFAULT_SPAN;
>  	unsigned char *buf = NULL;
>  	cpu_set_t old_affinity;
> -	bool once = false;
> +	int domain_id;
>  	int ret = 0;
>  	pid_t ppid;
>  
> @@ -666,21 +662,9 @@ int resctrl_val(const struct resctrl_test *test,
>  	 * how this impacts "write" benchmark, but no test currently
>  	 * uses this.
>  	 */
> -	if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
> -		span = strtoul(benchmark_cmd[1], NULL, 10);
> -		memflush =  atoi(benchmark_cmd[2]);
> -		operation = atoi(benchmark_cmd[3]);
> -		if (!strcmp(benchmark_cmd[4], "true")) {
> -			once = true;
> -		} else if (!strcmp(benchmark_cmd[4], "false")) {
> -			once = false;
> -		} else {
> -			ksft_print_msg("Invalid once parameter\n");
> -			ret = -EINVAL;
> -			goto reset_affinity;
> -		}
> -
> -		buf = alloc_buffer(span, memflush);
> +	if (!uparams->benchmark_cmd[0]) {
> +		buf = alloc_buffer(param->fill_buf.buf_size,
> +				   param->fill_buf.memflush);
>  		if (!buf) {
>  			ret = -ENOMEM;
>  			goto reset_affinity;
> @@ -699,13 +683,17 @@ int resctrl_val(const struct resctrl_test *test,
>  	 * terminated.
>  	 */
>  	if (bm_pid == 0) {
> -		if (strcmp(benchmark_cmd[0], "fill_buf") == 0) {
> -			if (operation == 0)
> -				fill_cache_read(buf, span, once);
> +		if (!uparams->benchmark_cmd[0]) {
> +			if (param->fill_buf.operation == 0)
> +				fill_cache_read(buf,
> +						param->fill_buf.buf_size,
> +						param->fill_buf.once);
>  			else
> -				fill_cache_write(buf, span, once);
> +				fill_cache_write(buf,
> +						 param->fill_buf.buf_size,
> +						 param->fill_buf.once);
>  		} else {
> -			execvp(benchmark_cmd[0], (char **)benchmark_cmd);
> +			execvp(uparams->benchmark_cmd[0], (char **)uparams->benchmark_cmd);
>  		}
>  		exit(EXIT_SUCCESS);
>  	}
> 

If I didn't miss anything important, this change takes away the ability to
alter fill_buf's parameters using -b option which to me felt the most 
useful way to use that parameter. The current code of course was lacks 
many safeguards for that case but still felt an useful feature.

I suggest that while parsing -b parameter, check if it starts with 
"fill_buf", and if it does, parse the argument into fill_buf_param in 
user_params which will override the default fill_buf parameters.

While parsing, adding new sanity checks wouldn't be a bad idea.

It might be some parameters might be better to be overridden always by the 
tests, e.g. "once" but specifying "operation" (W instead or R) or 
"buf_size" seems okay use cases to me.

-- 
 i.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing
  2024-08-30 11:13   ` Ilpo Järvinen
@ 2024-08-30 16:01     ` Reinette Chatre
  0 siblings, 0 replies; 30+ messages in thread
From: Reinette Chatre @ 2024-08-30 16:01 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 8/30/24 4:13 AM, Ilpo Järvinen wrote:
> On Thu, 29 Aug 2024, Reinette Chatre wrote:
> 
>> The benchmark used during the CMT, MBM, and MBA tests can be provided by
>> the user via (-b) parameter to the tests, if not provided the default
>> "fill_buf" benchmark is used.
>>
>> The "fill_buf" benchmark requires parameters and these are managed as
>> an array of strings.
>>
>> Using an array of strings to manage the "fill_buf" parameters is
>> complex because it requires transformations to/from strings at every
>> producer and consumer. This is made worse for the individual tests
>> where the default benchmark parameters values may not be appropriate and
>> additional data wrangling is required. For example, the CMT test
>> duplicates the entire array of strings in order to replace one of
>> the parameters.
>>
>> Replace the "array of strings" parameters used for "fill_buf" with a
>> struct that contains the "fill_buf" parameters that can be used directly
>> without transformations to/from strings. Make these parameters
>> part of the parameters associated with each test so that each test can
>> set benchmark parameters that are appropriate for it.
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---

...

> 
> If I didn't miss anything important, this change takes away the ability to
> alter fill_buf's parameters using -b option which to me felt the most
> useful way to use that parameter. The current code of course was lacks
> many safeguards for that case but still felt an useful feature.

hmmm ... thank you for pointing this out. I did not consider this use case.
I have never received feedback on how these tests are used  and
if folks even use "-b", for "fill_buf" parameter changes or something else.

> 
> I suggest that while parsing -b parameter, check if it starts with
> "fill_buf", and if it does, parse the argument into fill_buf_param in
> user_params which will override the default fill_buf parameters.
> 
> While parsing, adding new sanity checks wouldn't be a bad idea.
> 
> It might be some parameters might be better to be overridden always by the
> tests, e.g. "once" but specifying "operation" (W instead or R) or
> "buf_size" seems okay use cases to me.
> 

Will do. Thank you for the suggestion.

Reinette


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size
  2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
                   ` (2 preceding siblings ...)
  2024-08-29 22:52 ` [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing Reinette Chatre
@ 2024-08-29 22:52 ` Reinette Chatre
  2024-08-30 11:25   ` Ilpo Järvinen
  2024-08-29 22:52 ` [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth Reinette Chatre
  2024-08-29 22:52 ` [PATCH 6/6] selftests/resctrl: Keep results from first test run Reinette Chatre
  5 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-08-29 22:52 UTC (permalink / raw)
  To: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	ilpo.jarvinen
  Cc: maciej.wieczor-retman, reinette.chatre, linux-kselftest,
	linux-kernel

By default the MBM and MBA tests use the "fill_buf" benchmark to
read from a buffer with the goal to measure the memory bandwidth
generated by this buffer access.

Care should be taken when sizing the buffer used by the "fill_buf"
benchmark. If the buffer is small enough to fit in the cache then
it cannot be expected that the benchmark will generate much memory
bandwidth. For example, on a system with 320MB L3 cache the existing
hardcoded default of 250MB is insufficient.

Use the measured cache size to determine a buffer size that can be
expected to trigger memory access while keeping the existing default
as minimum that has been appropriate for testing so far.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/mba_test.c | 8 +++++++-
 tools/testing/selftests/resctrl/mbm_test.c | 8 +++++++-
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index 8ad433495f61..cad473b81a64 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -170,11 +170,17 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param
 		.setup		= mba_setup,
 		.measure	= mba_measure,
 	};
+	unsigned long cache_total_size = 0;
 	int ret;
 
 	remove(RESULT_FILE_NAME);
 
-	param.fill_buf.buf_size = DEFAULT_SPAN;
+	ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
+	if (ret)
+		return ret;
+
+	param.fill_buf.buf_size = cache_total_size > DEFAULT_SPAN ?
+				  cache_total_size * 2 : DEFAULT_SPAN;
 	param.fill_buf.memflush = 1;
 	param.fill_buf.operation = 0;
 	param.fill_buf.once = false;
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index b6883f274c74..734bfa4f42b3 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -138,11 +138,17 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param
 		.setup		= mbm_setup,
 		.measure	= mbm_measure,
 	};
+	unsigned long cache_total_size = 0;
 	int ret;
 
 	remove(RESULT_FILE_NAME);
 
-	param.fill_buf.buf_size = DEFAULT_SPAN;
+	ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
+	if (ret)
+		return ret;
+
+	param.fill_buf.buf_size = cache_total_size > DEFAULT_SPAN ?
+				  cache_total_size * 2 : DEFAULT_SPAN;
 	param.fill_buf.memflush = 1;
 	param.fill_buf.operation = 0;
 	param.fill_buf.once = false;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size
  2024-08-29 22:52 ` [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size Reinette Chatre
@ 2024-08-30 11:25   ` Ilpo Järvinen
  2024-08-30 16:00     ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-08-30 11:25 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

On Thu, 29 Aug 2024, Reinette Chatre wrote:

> By default the MBM and MBA tests use the "fill_buf" benchmark to
> read from a buffer with the goal to measure the memory bandwidth
> generated by this buffer access.
> 
> Care should be taken when sizing the buffer used by the "fill_buf"
> benchmark. If the buffer is small enough to fit in the cache then
> it cannot be expected that the benchmark will generate much memory
> bandwidth. For example, on a system with 320MB L3 cache the existing
> hardcoded default of 250MB is insufficient.
> 
> Use the measured cache size to determine a buffer size that can be
> expected to trigger memory access while keeping the existing default
> as minimum that has been appropriate for testing so far.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  tools/testing/selftests/resctrl/mba_test.c | 8 +++++++-
>  tools/testing/selftests/resctrl/mbm_test.c | 8 +++++++-
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
> index 8ad433495f61..cad473b81a64 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -170,11 +170,17 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param
>  		.setup		= mba_setup,
>  		.measure	= mba_measure,
>  	};
> +	unsigned long cache_total_size = 0;
>  	int ret;
>  
>  	remove(RESULT_FILE_NAME);
>  
> -	param.fill_buf.buf_size = DEFAULT_SPAN;
> +	ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
> +	if (ret)
> +		return ret;
> +
> +	param.fill_buf.buf_size = cache_total_size > DEFAULT_SPAN ?
> +				  cache_total_size * 2 : DEFAULT_SPAN;

Should the check leave a bit of safeguard so that the buf_size is at 
least 2x (or x1.25 or some other factor)?

In here buf_size immediate jumps from 1x -> 2x when cache_total_size goes 
from DEFAULT_SPAN to DEFAULT_SPAN+1 (obviously L3 size won't be odd like 
that but I think you get my point).

Also, user might want to override this as mentioned in my reply to the 
previous patch.

-- 
 i.

>  	param.fill_buf.memflush = 1;
>  	param.fill_buf.operation = 0;
>  	param.fill_buf.once = false;
> diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
> index b6883f274c74..734bfa4f42b3 100644
> --- a/tools/testing/selftests/resctrl/mbm_test.c
> +++ b/tools/testing/selftests/resctrl/mbm_test.c
> @@ -138,11 +138,17 @@ static int mbm_run_test(const struct resctrl_test *test, const struct user_param
>  		.setup		= mbm_setup,
>  		.measure	= mbm_measure,
>  	};
> +	unsigned long cache_total_size = 0;
>  	int ret;
>  
>  	remove(RESULT_FILE_NAME);
>  
> -	param.fill_buf.buf_size = DEFAULT_SPAN;
> +	ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
> +	if (ret)
> +		return ret;
> +
> +	param.fill_buf.buf_size = cache_total_size > DEFAULT_SPAN ?
> +				  cache_total_size * 2 : DEFAULT_SPAN;
>  	param.fill_buf.memflush = 1;
>  	param.fill_buf.operation = 0;
>  	param.fill_buf.once = false;
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size
  2024-08-30 11:25   ` Ilpo Järvinen
@ 2024-08-30 16:00     ` Reinette Chatre
  0 siblings, 0 replies; 30+ messages in thread
From: Reinette Chatre @ 2024-08-30 16:00 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 8/30/24 4:25 AM, Ilpo Järvinen wrote:
> On Thu, 29 Aug 2024, Reinette Chatre wrote:
> 
>> By default the MBM and MBA tests use the "fill_buf" benchmark to
>> read from a buffer with the goal to measure the memory bandwidth
>> generated by this buffer access.
>>
>> Care should be taken when sizing the buffer used by the "fill_buf"
>> benchmark. If the buffer is small enough to fit in the cache then
>> it cannot be expected that the benchmark will generate much memory
>> bandwidth. For example, on a system with 320MB L3 cache the existing
>> hardcoded default of 250MB is insufficient.
>>
>> Use the measured cache size to determine a buffer size that can be
>> expected to trigger memory access while keeping the existing default
>> as minimum that has been appropriate for testing so far.
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>>   tools/testing/selftests/resctrl/mba_test.c | 8 +++++++-
>>   tools/testing/selftests/resctrl/mbm_test.c | 8 +++++++-
>>   2 files changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
>> index 8ad433495f61..cad473b81a64 100644
>> --- a/tools/testing/selftests/resctrl/mba_test.c
>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>> @@ -170,11 +170,17 @@ static int mba_run_test(const struct resctrl_test *test, const struct user_param
>>   		.setup		= mba_setup,
>>   		.measure	= mba_measure,
>>   	};
>> +	unsigned long cache_total_size = 0;
>>   	int ret;
>>   
>>   	remove(RESULT_FILE_NAME);
>>   
>> -	param.fill_buf.buf_size = DEFAULT_SPAN;
>> +	ret = get_cache_size(uparams->cpu, "L3", &cache_total_size);
>> +	if (ret)
>> +		return ret;
>> +
>> +	param.fill_buf.buf_size = cache_total_size > DEFAULT_SPAN ?
>> +				  cache_total_size * 2 : DEFAULT_SPAN;
> 
> Should the check leave a bit of safeguard so that the buf_size is at
> least 2x (or x1.25 or some other factor)?
> 
> In here buf_size immediate jumps from 1x -> 2x when cache_total_size goes
> from DEFAULT_SPAN to DEFAULT_SPAN+1 (obviously L3 size won't be odd like
> that but I think you get my point).

Good catch. Will fix.

> 
> Also, user might want to override this as mentioned in my reply to the
> previous patch.
> 

ack.

Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
                   ` (3 preceding siblings ...)
  2024-08-29 22:52 ` [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size Reinette Chatre
@ 2024-08-29 22:52 ` Reinette Chatre
  2024-08-30 11:42   ` Ilpo Järvinen
  2024-08-29 22:52 ` [PATCH 6/6] selftests/resctrl: Keep results from first test run Reinette Chatre
  5 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-08-29 22:52 UTC (permalink / raw)
  To: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	ilpo.jarvinen
  Cc: maciej.wieczor-retman, reinette.chatre, linux-kselftest,
	linux-kernel

The MBA test incrementally throttles memory bandwidth, each time
followed by a comparison between the memory bandwidth observed
by the performance counters and resctrl respectively.

While a comparison between performance counters and resctrl is
generally appropriate, they do not have an identical view of
memory bandwidth. For example RAS features or memory performance
features that generate memory traffic may drive accesses that are
counted differently by performance counters and MBM respectively,
for instance generating "overhead" traffic which is not counted
against any specific RMID. As a ratio, this different view of memory
bandwidth becomes more apparent at low memory bandwidths.

It is not practical to enable/disable the various features that
may generate memory bandwidth to give performance counters and
resctrl an identical view. Instead, do not compare performance
counters and resctrl view of memory bandwidth when the memory
bandwidth is low.

Bandwidth throttling behaves differently across platforms
so it is not appropriate to drop measurement data simply based
on the throttling level. Instead, use a threshold of 750MiB
that has been observed to support adequate comparison between
performance counters and resctrl.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
 tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
 2 files changed, 13 insertions(+)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index cad473b81a64..204b9ac4b108 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)

 		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
 		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
+		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
+			ksft_print_msg("Bandwidth below threshold (%d MiB).  Dropping results from MBA schemata %u.\n",
+					THROTTLE_THRESHOLD,
+					ALLOCATION_MAX - ALLOCATION_STEP * allocation);
+			break;
+		}
+
 		avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
 		avg_diff_per = (int)(avg_diff * 100);

diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
index 0e5456165a6a..e65c5fb76b17 100644
--- a/tools/testing/selftests/resctrl/resctrl.h
+++ b/tools/testing/selftests/resctrl/resctrl.h
@@ -43,6 +43,12 @@

 #define DEFAULT_SPAN		(250 * MB)

+/*
+ * Memory bandwidth (in MiB) below which the bandwidth comparisons
+ * between iMC and resctrl are considered unreliable.
+ */
+#define THROTTLE_THRESHOLD	750
+
 /*
  * user_params:		User supplied parameters
  * @cpu:		CPU number to which the benchmark will be bound to
-- 
2.46.0

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-08-29 22:52 ` [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth Reinette Chatre
@ 2024-08-30 11:42   ` Ilpo Järvinen
  2024-08-30 16:00     ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-08-30 11:42 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

On Thu, 29 Aug 2024, Reinette Chatre wrote:

> The MBA test incrementally throttles memory bandwidth, each time
> followed by a comparison between the memory bandwidth observed
> by the performance counters and resctrl respectively.
> 
> While a comparison between performance counters and resctrl is
> generally appropriate, they do not have an identical view of
> memory bandwidth. For example RAS features or memory performance
> features that generate memory traffic may drive accesses that are
> counted differently by performance counters and MBM respectively,
> for instance generating "overhead" traffic which is not counted
> against any specific RMID. As a ratio, this different view of memory
> bandwidth becomes more apparent at low memory bandwidths.

Interesting.

I did some time back prototype with a change to MBM test such that instead 
of using once=false I changed fill_buf to be able to run N passes through 
the buffer which allowed me to know how many reads were performed by the 
benchmark. This yielded numerical difference between all those 3 values
(# of reads, MBM, perf) which also varied from arch to another so it 
didn't end up making an usable test.

I guess I now have an explanation for at least a part of the differences.

> It is not practical to enable/disable the various features that
> may generate memory bandwidth to give performance counters and
> resctrl an identical view. Instead, do not compare performance
> counters and resctrl view of memory bandwidth when the memory
> bandwidth is low.
> 
> Bandwidth throttling behaves differently across platforms
> so it is not appropriate to drop measurement data simply based
> on the throttling level. Instead, use a threshold of 750MiB
> that has been observed to support adequate comparison between
> performance counters and resctrl.
> 
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
>  tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>  tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
> index cad473b81a64..204b9ac4b108 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
>  
>  		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>  		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
> +			ksft_print_msg("Bandwidth below threshold (%d MiB).  Dropping results from MBA schemata %u.\n",
> +					THROTTLE_THRESHOLD,
> +					ALLOCATION_MAX - ALLOCATION_STEP * allocation);

The second one too should be %d.

-- 
 i.

> +			break;
> +		}
> +
>  		avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
>  		avg_diff_per = (int)(avg_diff * 100);
>  
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index 0e5456165a6a..e65c5fb76b17 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -43,6 +43,12 @@
>  
>  #define DEFAULT_SPAN		(250 * MB)
>  
> +/*
> + * Memory bandwidth (in MiB) below which the bandwidth comparisons
> + * between iMC and resctrl are considered unreliable.
> + */
> +#define THROTTLE_THRESHOLD	750
> +
>  /*
>   * user_params:		User supplied parameters
>   * @cpu:		CPU number to which the benchmark will be bound to
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-08-30 11:42   ` Ilpo Järvinen
@ 2024-08-30 16:00     ` Reinette Chatre
  2024-09-04 11:43       ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-08-30 16:00 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> On Thu, 29 Aug 2024, Reinette Chatre wrote:
> 
>> The MBA test incrementally throttles memory bandwidth, each time
>> followed by a comparison between the memory bandwidth observed
>> by the performance counters and resctrl respectively.
>>
>> While a comparison between performance counters and resctrl is
>> generally appropriate, they do not have an identical view of
>> memory bandwidth. For example RAS features or memory performance
>> features that generate memory traffic may drive accesses that are
>> counted differently by performance counters and MBM respectively,
>> for instance generating "overhead" traffic which is not counted
>> against any specific RMID. As a ratio, this different view of memory
>> bandwidth becomes more apparent at low memory bandwidths.
> 
> Interesting.
> 
> I did some time back prototype with a change to MBM test such that instead
> of using once=false I changed fill_buf to be able to run N passes through
> the buffer which allowed me to know how many reads were performed by the
> benchmark. This yielded numerical difference between all those 3 values
> (# of reads, MBM, perf) which also varied from arch to another so it
> didn't end up making an usable test.
> 
> I guess I now have an explanation for at least a part of the differences.
> 
>> It is not practical to enable/disable the various features that
>> may generate memory bandwidth to give performance counters and
>> resctrl an identical view. Instead, do not compare performance
>> counters and resctrl view of memory bandwidth when the memory
>> bandwidth is low.
>>
>> Bandwidth throttling behaves differently across platforms
>> so it is not appropriate to drop measurement data simply based
>> on the throttling level. Instead, use a threshold of 750MiB
>> that has been observed to support adequate comparison between
>> performance counters and resctrl.
>>
>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>> ---
>>   tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>   tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>   2 files changed, 13 insertions(+)
>>
>> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
>> index cad473b81a64..204b9ac4b108 100644
>> --- a/tools/testing/selftests/resctrl/mba_test.c
>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
>>   
>>   		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>   		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
>> +			ksft_print_msg("Bandwidth below threshold (%d MiB).  Dropping results from MBA schemata %u.\n",
>> +					THROTTLE_THRESHOLD,
>> +					ALLOCATION_MAX - ALLOCATION_STEP * allocation);
> 
> The second one too should be %d.
> 

hmmm ... I intended to have it be consistent with the ksft_print_msg() that
follows. Perhaps allocation can be made unsigned instead?

Reinette

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-08-30 16:00     ` Reinette Chatre
@ 2024-09-04 11:43       ` Ilpo Järvinen
  2024-09-04 21:15         ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-04 11:43 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 3627 bytes --]

On Fri, 30 Aug 2024, Reinette Chatre wrote:
> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > 
> > > The MBA test incrementally throttles memory bandwidth, each time
> > > followed by a comparison between the memory bandwidth observed
> > > by the performance counters and resctrl respectively.
> > > 
> > > While a comparison between performance counters and resctrl is
> > > generally appropriate, they do not have an identical view of
> > > memory bandwidth. For example RAS features or memory performance
> > > features that generate memory traffic may drive accesses that are
> > > counted differently by performance counters and MBM respectively,
> > > for instance generating "overhead" traffic which is not counted
> > > against any specific RMID. As a ratio, this different view of memory
> > > bandwidth becomes more apparent at low memory bandwidths.
> > 
> > Interesting.
> > 
> > I did some time back prototype with a change to MBM test such that instead
> > of using once=false I changed fill_buf to be able to run N passes through
> > the buffer which allowed me to know how many reads were performed by the
> > benchmark. This yielded numerical difference between all those 3 values
> > (# of reads, MBM, perf) which also varied from arch to another so it
> > didn't end up making an usable test.
> > 
> > I guess I now have an explanation for at least a part of the differences.
> > 
> > > It is not practical to enable/disable the various features that
> > > may generate memory bandwidth to give performance counters and
> > > resctrl an identical view. Instead, do not compare performance
> > > counters and resctrl view of memory bandwidth when the memory
> > > bandwidth is low.
> > > 
> > > Bandwidth throttling behaves differently across platforms
> > > so it is not appropriate to drop measurement data simply based
> > > on the throttling level. Instead, use a threshold of 750MiB
> > > that has been observed to support adequate comparison between
> > > performance counters and resctrl.
> > > 
> > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > ---
> > >   tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
> > >   tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
> > >   2 files changed, 13 insertions(+)
> > > 
> > > diff --git a/tools/testing/selftests/resctrl/mba_test.c
> > > b/tools/testing/selftests/resctrl/mba_test.c
> > > index cad473b81a64..204b9ac4b108 100644
> > > --- a/tools/testing/selftests/resctrl/mba_test.c
> > > +++ b/tools/testing/selftests/resctrl/mba_test.c
> > > @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc,
> > > unsigned long *bw_resc)
> > >     		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> > >   		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> > > +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
> > > THROTTLE_THRESHOLD) {
> > > +			ksft_print_msg("Bandwidth below threshold (%d MiB).
> > > Dropping results from MBA schemata %u.\n",
> > > +					THROTTLE_THRESHOLD,
> > > +					ALLOCATION_MAX - ALLOCATION_STEP *
> > > allocation);
> > 
> > The second one too should be %d.
> > 
> 
> hmmm ... I intended to have it be consistent with the ksft_print_msg() that
> follows. Perhaps allocation can be made unsigned instead?

If you go that way, then it would be good to make the related defines and 
allocation in mba_setup() unsigned too, although the latter is a bit scary 
because it does allocation -= ALLOCATION_STEP which could underflow if the 
defines are ever changed.

-- 
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-09-04 11:43       ` Ilpo Järvinen
@ 2024-09-04 21:15         ` Reinette Chatre
  2024-09-05 11:45           ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-09-04 21:15 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>
>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>> followed by a comparison between the memory bandwidth observed
>>>> by the performance counters and resctrl respectively.
>>>>
>>>> While a comparison between performance counters and resctrl is
>>>> generally appropriate, they do not have an identical view of
>>>> memory bandwidth. For example RAS features or memory performance
>>>> features that generate memory traffic may drive accesses that are
>>>> counted differently by performance counters and MBM respectively,
>>>> for instance generating "overhead" traffic which is not counted
>>>> against any specific RMID. As a ratio, this different view of memory
>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>
>>> Interesting.
>>>
>>> I did some time back prototype with a change to MBM test such that instead
>>> of using once=false I changed fill_buf to be able to run N passes through
>>> the buffer which allowed me to know how many reads were performed by the
>>> benchmark. This yielded numerical difference between all those 3 values
>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>> didn't end up making an usable test.
>>>
>>> I guess I now have an explanation for at least a part of the differences.
>>>
>>>> It is not practical to enable/disable the various features that
>>>> may generate memory bandwidth to give performance counters and
>>>> resctrl an identical view. Instead, do not compare performance
>>>> counters and resctrl view of memory bandwidth when the memory
>>>> bandwidth is low.
>>>>
>>>> Bandwidth throttling behaves differently across platforms
>>>> so it is not appropriate to drop measurement data simply based
>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>> that has been observed to support adequate comparison between
>>>> performance counters and resctrl.
>>>>
>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>> ---
>>>>    tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>    tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>    2 files changed, 13 insertions(+)
>>>>
>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>> index cad473b81a64..204b9ac4b108 100644
>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc,
>>>> unsigned long *bw_resc)
>>>>      		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>    		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>> THROTTLE_THRESHOLD) {
>>>> +			ksft_print_msg("Bandwidth below threshold (%d MiB).
>>>> Dropping results from MBA schemata %u.\n",
>>>> +					THROTTLE_THRESHOLD,
>>>> +					ALLOCATION_MAX - ALLOCATION_STEP *
>>>> allocation);
>>>
>>> The second one too should be %d.
>>>
>>
>> hmmm ... I intended to have it be consistent with the ksft_print_msg() that
>> follows. Perhaps allocation can be made unsigned instead?
> 
> If you go that way, then it would be good to make the related defines and
> allocation in mba_setup() unsigned too, although the latter is a bit scary

Sure, will look into that.

> because it does allocation -= ALLOCATION_STEP which could underflow if the
> defines are ever changed.
> 

Is this not already covered in the following check:
	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
		return END_OF_TESTS;

We are talking about hardcoded constants though.

Reinette


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-09-04 21:15         ` Reinette Chatre
@ 2024-09-05 11:45           ` Ilpo Järvinen
  2024-09-05 18:08             ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-05 11:45 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 5013 bytes --]

On Wed, 4 Sep 2024, Reinette Chatre wrote:
> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > 
> > > > > The MBA test incrementally throttles memory bandwidth, each time
> > > > > followed by a comparison between the memory bandwidth observed
> > > > > by the performance counters and resctrl respectively.
> > > > > 
> > > > > While a comparison between performance counters and resctrl is
> > > > > generally appropriate, they do not have an identical view of
> > > > > memory bandwidth. For example RAS features or memory performance
> > > > > features that generate memory traffic may drive accesses that are
> > > > > counted differently by performance counters and MBM respectively,
> > > > > for instance generating "overhead" traffic which is not counted
> > > > > against any specific RMID. As a ratio, this different view of memory
> > > > > bandwidth becomes more apparent at low memory bandwidths.
> > > > 
> > > > Interesting.
> > > > 
> > > > I did some time back prototype with a change to MBM test such that
> > > > instead
> > > > of using once=false I changed fill_buf to be able to run N passes
> > > > through
> > > > the buffer which allowed me to know how many reads were performed by the
> > > > benchmark. This yielded numerical difference between all those 3 values
> > > > (# of reads, MBM, perf) which also varied from arch to another so it
> > > > didn't end up making an usable test.
> > > > 
> > > > I guess I now have an explanation for at least a part of the
> > > > differences.
> > > > 
> > > > > It is not practical to enable/disable the various features that
> > > > > may generate memory bandwidth to give performance counters and
> > > > > resctrl an identical view. Instead, do not compare performance
> > > > > counters and resctrl view of memory bandwidth when the memory
> > > > > bandwidth is low.
> > > > > 
> > > > > Bandwidth throttling behaves differently across platforms
> > > > > so it is not appropriate to drop measurement data simply based
> > > > > on the throttling level. Instead, use a threshold of 750MiB
> > > > > that has been observed to support adequate comparison between
> > > > > performance counters and resctrl.
> > > > > 
> > > > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > > > ---
> > > > >    tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
> > > > >    tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
> > > > >    2 files changed, 13 insertions(+)
> > > > > 
> > > > > diff --git a/tools/testing/selftests/resctrl/mba_test.c
> > > > > b/tools/testing/selftests/resctrl/mba_test.c
> > > > > index cad473b81a64..204b9ac4b108 100644
> > > > > --- a/tools/testing/selftests/resctrl/mba_test.c
> > > > > +++ b/tools/testing/selftests/resctrl/mba_test.c
> > > > > @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc,
> > > > > unsigned long *bw_resc)
> > > > >      		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> > > > >    		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> > > > > +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
> > > > > THROTTLE_THRESHOLD) {
> > > > > +			ksft_print_msg("Bandwidth below threshold (%d
> > > > > MiB).
> > > > > Dropping results from MBA schemata %u.\n",
> > > > > +					THROTTLE_THRESHOLD,
> > > > > +					ALLOCATION_MAX -
> > > > > ALLOCATION_STEP *
> > > > > allocation);
> > > > 
> > > > The second one too should be %d.
> > > > 
> > > 
> > > hmmm ... I intended to have it be consistent with the ksft_print_msg()
> > > that
> > > follows. Perhaps allocation can be made unsigned instead?
> > 
> > If you go that way, then it would be good to make the related defines and
> > allocation in mba_setup() unsigned too, although the latter is a bit scary
> 
> Sure, will look into that.
> 
> > because it does allocation -= ALLOCATION_STEP which could underflow if the
> > defines are ever changed.
> > 
> 
> Is this not already covered in the following check:
> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
> 		return END_OF_TESTS;
> 
> We are talking about hardcoded constants though.

Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but 
it's also very non-intuitive to let the value underflow and then check for 
that with the > operator.

You're correct that they're constants so one would need to tweak the 
source to end up into this condition in the first place.

Perhaps I'm being overly pendantic here but I in general don't like 
leaving trappy and non-obvious logic like that lying around because one 
day one of such will come back biting.

So, if a variable is unsigned and we ever do subtraction (or adding 
negative numbers to it), I'd prefer additional check to prevent ever 
underflowing it unexpectedly. Or leave them signed.

-- 
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-09-05 11:45           ` Ilpo Järvinen
@ 2024-09-05 18:08             ` Reinette Chatre
  2024-09-06  8:44               ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-09-05 18:08 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>>>
>>>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>>>> followed by a comparison between the memory bandwidth observed
>>>>>> by the performance counters and resctrl respectively.
>>>>>>
>>>>>> While a comparison between performance counters and resctrl is
>>>>>> generally appropriate, they do not have an identical view of
>>>>>> memory bandwidth. For example RAS features or memory performance
>>>>>> features that generate memory traffic may drive accesses that are
>>>>>> counted differently by performance counters and MBM respectively,
>>>>>> for instance generating "overhead" traffic which is not counted
>>>>>> against any specific RMID. As a ratio, this different view of memory
>>>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>>>
>>>>> Interesting.
>>>>>
>>>>> I did some time back prototype with a change to MBM test such that
>>>>> instead
>>>>> of using once=false I changed fill_buf to be able to run N passes
>>>>> through
>>>>> the buffer which allowed me to know how many reads were performed by the
>>>>> benchmark. This yielded numerical difference between all those 3 values
>>>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>>>> didn't end up making an usable test.
>>>>>
>>>>> I guess I now have an explanation for at least a part of the
>>>>> differences.
>>>>>
>>>>>> It is not practical to enable/disable the various features that
>>>>>> may generate memory bandwidth to give performance counters and
>>>>>> resctrl an identical view. Instead, do not compare performance
>>>>>> counters and resctrl view of memory bandwidth when the memory
>>>>>> bandwidth is low.
>>>>>>
>>>>>> Bandwidth throttling behaves differently across platforms
>>>>>> so it is not appropriate to drop measurement data simply based
>>>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>>>> that has been observed to support adequate comparison between
>>>>>> performance counters and resctrl.
>>>>>>
>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>>>> ---
>>>>>>     tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>>>     tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>>>     2 files changed, 13 insertions(+)
>>>>>>
>>>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>>>> index cad473b81a64..204b9ac4b108 100644
>>>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc,
>>>>>> unsigned long *bw_resc)
>>>>>>       		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>>>     		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>>>> THROTTLE_THRESHOLD) {
>>>>>> +			ksft_print_msg("Bandwidth below threshold (%d
>>>>>> MiB).
>>>>>> Dropping results from MBA schemata %u.\n",
>>>>>> +					THROTTLE_THRESHOLD,
>>>>>> +					ALLOCATION_MAX -
>>>>>> ALLOCATION_STEP *
>>>>>> allocation);
>>>>>
>>>>> The second one too should be %d.
>>>>>
>>>>
>>>> hmmm ... I intended to have it be consistent with the ksft_print_msg()
>>>> that
>>>> follows. Perhaps allocation can be made unsigned instead?
>>>
>>> If you go that way, then it would be good to make the related defines and
>>> allocation in mba_setup() unsigned too, although the latter is a bit scary
>>
>> Sure, will look into that.
>>
>>> because it does allocation -= ALLOCATION_STEP which could underflow if the
>>> defines are ever changed.
>>>
>>
>> Is this not already covered in the following check:
>> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
>> 		return END_OF_TESTS;
>>
>> We are talking about hardcoded constants though.
> 
> Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
> it's also very non-intuitive to let the value underflow and then check for
> that with the > operator.

My understanding is that this is the traditional way of checking overflow
(or more accurately wraparound). There are several examples of this pattern
in the kernel and it is also the pattern that I understand Linus referred
to as "traditional" in [1]. Even the compiler's intrinsic overflow checkers
do checking in this way (perform the subtraction and then check if it
overflowed) [2].

> 
> You're correct that they're constants so one would need to tweak the
> source to end up into this condition in the first place.
> 
> Perhaps I'm being overly pendantic here but I in general don't like
> leaving trappy and non-obvious logic like that lying around because one
> day one of such will come back biting.

It is not clear to me what is "trappy" about this. The current checks will
catch the wraparound if somebody changes ALLOCATION_STEP in your scenario, no?

> 
> So, if a variable is unsigned and we ever do subtraction (or adding
> negative numbers to it), I'd prefer additional check to prevent ever
> underflowing it unexpectedly. Or leave them signed.

To support checking at the time of subtraction we either need to change the
flow of that function since it cannot exit with failure if that subtraction
fails because of overflow/wraparound, or we need to introduce more state that
will be an additional check that the existing
"if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
would have caught anyway.

In either case, to do this checking at the time of subtraction the ideal way
would be to use check_sub_overflow() ... but it again does exactly what
you find to be non-intuitive since the implementation in
tools/include/linux/overflow.h uses the gcc intrinsics that does subtraction
first and then checks if overflow occurred.

It is not clear to me what problem you are aiming to solve here. If the
major concern is that the current logic is not obvious, perhaps it can
be clarified with a comment as below:

  	if (runs_per_allocation++ != 0)
  		return 0;
  
+	/*
+	 * Do not attempt allocation outside valid range. Safeguard
+	 * against any potential wraparound caused by subtraction.
+	 */
  	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
  		return END_OF_TESTS;
  
Reinette

[1] https://lwn.net/ml/linux-kernel/CAHk-=whS7FSbBoo1gxe+83twO2JeGNsUKMhAcfWymw9auqBvjg@mail.gmail.com/
[2] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-09-05 18:08             ` Reinette Chatre
@ 2024-09-06  8:44               ` Ilpo Järvinen
  2024-09-07  0:05                 ` Reinette Chatre
  0 siblings, 1 reply; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-06  8:44 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 8109 bytes --]

On Thu, 5 Sep 2024, Reinette Chatre wrote:
> On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
> > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > > > 
> > > > > > > The MBA test incrementally throttles memory bandwidth, each time
> > > > > > > followed by a comparison between the memory bandwidth observed
> > > > > > > by the performance counters and resctrl respectively.
> > > > > > > 
> > > > > > > While a comparison between performance counters and resctrl is
> > > > > > > generally appropriate, they do not have an identical view of
> > > > > > > memory bandwidth. For example RAS features or memory performance
> > > > > > > features that generate memory traffic may drive accesses that are
> > > > > > > counted differently by performance counters and MBM respectively,
> > > > > > > for instance generating "overhead" traffic which is not counted
> > > > > > > against any specific RMID. As a ratio, this different view of
> > > > > > > memory
> > > > > > > bandwidth becomes more apparent at low memory bandwidths.
> > > > > > 
> > > > > > Interesting.
> > > > > > 
> > > > > > I did some time back prototype with a change to MBM test such that
> > > > > > instead
> > > > > > of using once=false I changed fill_buf to be able to run N passes
> > > > > > through
> > > > > > the buffer which allowed me to know how many reads were performed by
> > > > > > the
> > > > > > benchmark. This yielded numerical difference between all those 3
> > > > > > values
> > > > > > (# of reads, MBM, perf) which also varied from arch to another so it
> > > > > > didn't end up making an usable test.
> > > > > > 
> > > > > > I guess I now have an explanation for at least a part of the
> > > > > > differences.
> > > > > > 
> > > > > > > It is not practical to enable/disable the various features that
> > > > > > > may generate memory bandwidth to give performance counters and
> > > > > > > resctrl an identical view. Instead, do not compare performance
> > > > > > > counters and resctrl view of memory bandwidth when the memory
> > > > > > > bandwidth is low.
> > > > > > > 
> > > > > > > Bandwidth throttling behaves differently across platforms
> > > > > > > so it is not appropriate to drop measurement data simply based
> > > > > > > on the throttling level. Instead, use a threshold of 750MiB
> > > > > > > that has been observed to support adequate comparison between
> > > > > > > performance counters and resctrl.
> > > > > > > 
> > > > > > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > > > > > ---
> > > > > > >     tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
> > > > > > >     tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
> > > > > > >     2 files changed, 13 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > index cad473b81a64..204b9ac4b108 100644
> > > > > > > --- a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > +++ b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
> > > > > > > *bw_imc,
> > > > > > > unsigned long *bw_resc)
> > > > > > >       		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> > > > > > >     		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> > > > > > > +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
> > > > > > > THROTTLE_THRESHOLD) {
> > > > > > > +			ksft_print_msg("Bandwidth below threshold (%d
> > > > > > > MiB).
> > > > > > > Dropping results from MBA schemata %u.\n",
> > > > > > > +					THROTTLE_THRESHOLD,
> > > > > > > +					ALLOCATION_MAX -
> > > > > > > ALLOCATION_STEP *
> > > > > > > allocation);
> > > > > > 
> > > > > > The second one too should be %d.
> > > > > > 
> > > > > 
> > > > > hmmm ... I intended to have it be consistent with the ksft_print_msg()
> > > > > that
> > > > > follows. Perhaps allocation can be made unsigned instead?
> > > > 
> > > > If you go that way, then it would be good to make the related defines
> > > > and
> > > > allocation in mba_setup() unsigned too, although the latter is a bit
> > > > scary
> > > 
> > > Sure, will look into that.
> > > 
> > > > because it does allocation -= ALLOCATION_STEP which could underflow if
> > > > the
> > > > defines are ever changed.
> > > > 
> > > 
> > > Is this not already covered in the following check:
> > > 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
> > > 		return END_OF_TESTS;
> > > 
> > > We are talking about hardcoded constants though.
> > 
> > Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
> > it's also very non-intuitive to let the value underflow and then check for
> > that with the > operator.
> 
> My understanding is that this is the traditional way of checking overflow
> (or more accurately wraparound). There are several examples of this pattern
> in the kernel and it is also the pattern that I understand Linus referred
> to as "traditional" in [1]. Even the compiler's intrinsic overflow checkers
> do checking in this way (perform the subtraction and then check if it
> overflowed) [2].

Fair enough. I've never come across that kind of claim before.

> > You're correct that they're constants so one would need to tweak the
> > source to end up into this condition in the first place.
> > 
> > Perhaps I'm being overly pendantic here but I in general don't like
> > leaving trappy and non-obvious logic like that lying around because one
> > day one of such will come back biting.
> 
> It is not clear to me what is "trappy" about this. The current checks will
> catch the wraparound if somebody changes ALLOCATION_STEP in your scenario, no?
>
> > So, if a variable is unsigned and we ever do subtraction (or adding
> > negative numbers to it), I'd prefer additional check to prevent ever
> > underflowing it unexpectedly. Or leave them signed.
> 
> To support checking at the time of subtraction we either need to change the
> flow of that function since it cannot exit with failure if that subtraction
> fails because of overflow/wraparound, or we need to introduce more state that
> will be an additional check that the existing
> "if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
> would have caught anyway.
> 
> In either case, to do this checking at the time of subtraction the ideal way
> would be to use check_sub_overflow() ... but it again does exactly what
> you find to be non-intuitive since the implementation in
> tools/include/linux/overflow.h uses the gcc intrinsics that does subtraction
> first and then checks if overflow occurred.

It's trappy because by glance, that check looks unnecessary since 
allocation starts from max and goes downwards. Also worth to note,
the check is not immediately after the decrement but done on the next 
iteration.

The risk here is that somebody removes allocation > ALLOCATION_MAX check.

Something called check_sub_overflow() is not subject to a similar risk 
regardless of what operations occur inside it.

> It is not clear to me what problem you are aiming to solve here. If the
> major concern is that the current logic is not obvious, perhaps it can
> be clarified with a comment as below:
> 
>  	if (runs_per_allocation++ != 0)
>  		return 0;
>  +	/*
> +	 * Do not attempt allocation outside valid range. Safeguard
> +	 * against any potential wraparound caused by subtraction.
> +	 */
>  	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
>  		return END_OF_TESTS;

That would probably help but then it seems Linus is against such attempts
and considers this hole in the cheese (i.e., representing something that 
is clearly a negative number with a positive number) "traditional".

-- 
 i.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-09-06  8:44               ` Ilpo Järvinen
@ 2024-09-07  0:05                 ` Reinette Chatre
  2024-09-09  8:13                   ` Ilpo Järvinen
  0 siblings, 1 reply; 30+ messages in thread
From: Reinette Chatre @ 2024-09-07  0:05 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

Hi Ilpo,

On 9/6/24 1:44 AM, Ilpo Järvinen wrote:
> On Thu, 5 Sep 2024, Reinette Chatre wrote:
>> On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
>>> On Wed, 4 Sep 2024, Reinette Chatre wrote:
>>>> On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
>>>>> On Fri, 30 Aug 2024, Reinette Chatre wrote:
>>>>>> On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
>>>>>>> On Thu, 29 Aug 2024, Reinette Chatre wrote:
>>>>>>>
>>>>>>>> The MBA test incrementally throttles memory bandwidth, each time
>>>>>>>> followed by a comparison between the memory bandwidth observed
>>>>>>>> by the performance counters and resctrl respectively.
>>>>>>>>
>>>>>>>> While a comparison between performance counters and resctrl is
>>>>>>>> generally appropriate, they do not have an identical view of
>>>>>>>> memory bandwidth. For example RAS features or memory performance
>>>>>>>> features that generate memory traffic may drive accesses that are
>>>>>>>> counted differently by performance counters and MBM respectively,
>>>>>>>> for instance generating "overhead" traffic which is not counted
>>>>>>>> against any specific RMID. As a ratio, this different view of
>>>>>>>> memory
>>>>>>>> bandwidth becomes more apparent at low memory bandwidths.
>>>>>>>
>>>>>>> Interesting.
>>>>>>>
>>>>>>> I did some time back prototype with a change to MBM test such that
>>>>>>> instead
>>>>>>> of using once=false I changed fill_buf to be able to run N passes
>>>>>>> through
>>>>>>> the buffer which allowed me to know how many reads were performed by
>>>>>>> the
>>>>>>> benchmark. This yielded numerical difference between all those 3
>>>>>>> values
>>>>>>> (# of reads, MBM, perf) which also varied from arch to another so it
>>>>>>> didn't end up making an usable test.
>>>>>>>
>>>>>>> I guess I now have an explanation for at least a part of the
>>>>>>> differences.
>>>>>>>
>>>>>>>> It is not practical to enable/disable the various features that
>>>>>>>> may generate memory bandwidth to give performance counters and
>>>>>>>> resctrl an identical view. Instead, do not compare performance
>>>>>>>> counters and resctrl view of memory bandwidth when the memory
>>>>>>>> bandwidth is low.
>>>>>>>>
>>>>>>>> Bandwidth throttling behaves differently across platforms
>>>>>>>> so it is not appropriate to drop measurement data simply based
>>>>>>>> on the throttling level. Instead, use a threshold of 750MiB
>>>>>>>> that has been observed to support adequate comparison between
>>>>>>>> performance counters and resctrl.
>>>>>>>>
>>>>>>>> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
>>>>>>>> ---
>>>>>>>>      tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
>>>>>>>>      tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
>>>>>>>>      2 files changed, 13 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> index cad473b81a64..204b9ac4b108 100644
>>>>>>>> --- a/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> +++ b/tools/testing/selftests/resctrl/mba_test.c
>>>>>>>> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
>>>>>>>> *bw_imc,
>>>>>>>> unsigned long *bw_resc)
>>>>>>>>        		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
>>>>>>>>      		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
>>>>>>>> +		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc <
>>>>>>>> THROTTLE_THRESHOLD) {
>>>>>>>> +			ksft_print_msg("Bandwidth below threshold (%d
>>>>>>>> MiB).
>>>>>>>> Dropping results from MBA schemata %u.\n",
>>>>>>>> +					THROTTLE_THRESHOLD,
>>>>>>>> +					ALLOCATION_MAX -
>>>>>>>> ALLOCATION_STEP *
>>>>>>>> allocation);
>>>>>>>
>>>>>>> The second one too should be %d.
>>>>>>>
>>>>>>
>>>>>> hmmm ... I intended to have it be consistent with the ksft_print_msg()
>>>>>> that
>>>>>> follows. Perhaps allocation can be made unsigned instead?
>>>>>
>>>>> If you go that way, then it would be good to make the related defines
>>>>> and
>>>>> allocation in mba_setup() unsigned too, although the latter is a bit
>>>>> scary
>>>>
>>>> Sure, will look into that.
>>>>
>>>>> because it does allocation -= ALLOCATION_STEP which could underflow if
>>>>> the
>>>>> defines are ever changed.
>>>>>
>>>>
>>>> Is this not already covered in the following check:
>>>> 	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
>>>> 		return END_OF_TESTS;
>>>>
>>>> We are talking about hardcoded constants though.
>>>
>>> Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
>>> it's also very non-intuitive to let the value underflow and then check for
>>> that with the > operator.
>>
>> My understanding is that this is the traditional way of checking overflow
>> (or more accurately wraparound). There are several examples of this pattern
>> in the kernel and it is also the pattern that I understand Linus referred
>> to as "traditional" in [1]. Even the compiler's intrinsic overflow checkers
>> do checking in this way (perform the subtraction and then check if it
>> overflowed) [2].
> 
> Fair enough. I've never come across that kind of claim before.
> 
>>> You're correct that they're constants so one would need to tweak the
>>> source to end up into this condition in the first place.
>>>
>>> Perhaps I'm being overly pendantic here but I in general don't like
>>> leaving trappy and non-obvious logic like that lying around because one
>>> day one of such will come back biting.
>>
>> It is not clear to me what is "trappy" about this. The current checks will
>> catch the wraparound if somebody changes ALLOCATION_STEP in your scenario, no?
>>
>>> So, if a variable is unsigned and we ever do subtraction (or adding
>>> negative numbers to it), I'd prefer additional check to prevent ever
>>> underflowing it unexpectedly. Or leave them signed.
>>
>> To support checking at the time of subtraction we either need to change the
>> flow of that function since it cannot exit with failure if that subtraction
>> fails because of overflow/wraparound, or we need to introduce more state that
>> will be an additional check that the existing
>> "if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
>> would have caught anyway.
>>
>> In either case, to do this checking at the time of subtraction the ideal way
>> would be to use check_sub_overflow() ... but it again does exactly what
>> you find to be non-intuitive since the implementation in
>> tools/include/linux/overflow.h uses the gcc intrinsics that does subtraction
>> first and then checks if overflow occurred.
> 
> It's trappy because by glance, that check looks unnecessary since
> allocation starts from max and goes downwards. Also worth to note,
> the check is not immediately after the decrement but done on the next
> iteration.

Right. This is probably what causes most confusion.

Considering that, what do you think of below that avoids wraparound entirely:

---8<---
From: Reinette Chatre <reinette.chatre@intel.com>
Subject: [PATCH] selftests/resctrl: Make wraparound handling obvious

Within mba_setup() the programmed bandwidth delay value starts
at the maximum (100, or rather ALLOCATION_MAX) and progresses
towards ALLOCATION_MIN by decrementing with ALLOCATION_STEP.

The programmed bandwidth delay should never be negative, so
representing it with an unsigned int is most appropriate. This
may introduce confusion because of the "allocation > ALLOCATION_MAX"
check used to check wraparound of the subtraction.

Modify the mba_setup() flow to start at the minimum, ALLOCATION_MIN,
and incrementally, with ALLOCATION_STEP steps, adjust the
bandwidth delay value. This avoids wraparound while making the purpose
of "allocation > ALLOCATION_MAX" clear and eliminates the
need for the "allocation < ALLOCATION_MIN" check.

Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
Changes since V1:
- New patch
---
  tools/testing/selftests/resctrl/mba_test.c | 12 +++++++-----
  1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index ab8496a4925b..947d5699f0c8 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -39,7 +39,8 @@ static int mba_setup(const struct resctrl_test *test,
  		     const struct user_params *uparams,
  		     struct resctrl_val_param *p)
  {
-	static int runs_per_allocation, allocation = 100;
+	static unsigned int allocation = ALLOCATION_MIN;
+	static int runs_per_allocation = 0;
  	char allocation_str[64];
  	int ret;
  
@@ -50,7 +51,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (runs_per_allocation++ != 0)
  		return 0;
  
-	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
+	if (allocation > ALLOCATION_MAX)
  		return END_OF_TESTS;
  
  	sprintf(allocation_str, "%d", allocation);
@@ -59,7 +60,7 @@ static int mba_setup(const struct resctrl_test *test,
  	if (ret < 0)
  		return ret;
  
-	allocation -= ALLOCATION_STEP;
+	allocation += ALLOCATION_STEP;
  
  	return 0;
  }
@@ -72,8 +73,9 @@ static int mba_measure(const struct user_params *uparams,
  
  static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  {
-	int allocation, runs;
+	unsigned int allocation;
  	bool ret = false;
+	int runs;
  
  	ksft_print_msg("Results are displayed in (MB)\n");
  	/* Memory bandwidth from 100% down to 10% */
@@ -103,7 +105,7 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
  			       avg_diff_per > MAX_DIFF_PERCENT ?
  			       "Fail:" : "Pass:",
  			       MAX_DIFF_PERCENT,
-			       ALLOCATION_MAX - ALLOCATION_STEP * allocation);
+			       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
  
  		ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
  		ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);


> 
> The risk here is that somebody removes allocation > ALLOCATION_MAX check.
> 
> Something called check_sub_overflow() is not subject to a similar risk
> regardless of what operations occur inside it.

Reinette

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth
  2024-09-07  0:05                 ` Reinette Chatre
@ 2024-09-09  8:13                   ` Ilpo Järvinen
  0 siblings, 0 replies; 30+ messages in thread
From: Ilpo Järvinen @ 2024-09-09  8:13 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	Maciej Wieczór-Retman, linux-kselftest, LKML

[-- Attachment #1: Type: text/plain, Size: 11851 bytes --]

On Fri, 6 Sep 2024, Reinette Chatre wrote:
> On 9/6/24 1:44 AM, Ilpo Järvinen wrote:
> > On Thu, 5 Sep 2024, Reinette Chatre wrote:
> > > On 9/5/24 4:45 AM, Ilpo Järvinen wrote:
> > > > On Wed, 4 Sep 2024, Reinette Chatre wrote:
> > > > > On 9/4/24 4:43 AM, Ilpo Järvinen wrote:
> > > > > > On Fri, 30 Aug 2024, Reinette Chatre wrote:
> > > > > > > On 8/30/24 4:42 AM, Ilpo Järvinen wrote:
> > > > > > > > On Thu, 29 Aug 2024, Reinette Chatre wrote:
> > > > > > > > 
> > > > > > > > > The MBA test incrementally throttles memory bandwidth, each
> > > > > > > > > time
> > > > > > > > > followed by a comparison between the memory bandwidth observed
> > > > > > > > > by the performance counters and resctrl respectively.
> > > > > > > > > 
> > > > > > > > > While a comparison between performance counters and resctrl is
> > > > > > > > > generally appropriate, they do not have an identical view of
> > > > > > > > > memory bandwidth. For example RAS features or memory
> > > > > > > > > performance
> > > > > > > > > features that generate memory traffic may drive accesses that
> > > > > > > > > are
> > > > > > > > > counted differently by performance counters and MBM
> > > > > > > > > respectively,
> > > > > > > > > for instance generating "overhead" traffic which is not
> > > > > > > > > counted
> > > > > > > > > against any specific RMID. As a ratio, this different view of
> > > > > > > > > memory
> > > > > > > > > bandwidth becomes more apparent at low memory bandwidths.
> > > > > > > > 
> > > > > > > > Interesting.
> > > > > > > > 
> > > > > > > > I did some time back prototype with a change to MBM test such
> > > > > > > > that
> > > > > > > > instead
> > > > > > > > of using once=false I changed fill_buf to be able to run N
> > > > > > > > passes
> > > > > > > > through
> > > > > > > > the buffer which allowed me to know how many reads were
> > > > > > > > performed by
> > > > > > > > the
> > > > > > > > benchmark. This yielded numerical difference between all those 3
> > > > > > > > values
> > > > > > > > (# of reads, MBM, perf) which also varied from arch to another
> > > > > > > > so it
> > > > > > > > didn't end up making an usable test.
> > > > > > > > 
> > > > > > > > I guess I now have an explanation for at least a part of the
> > > > > > > > differences.
> > > > > > > > 
> > > > > > > > > It is not practical to enable/disable the various features
> > > > > > > > > that
> > > > > > > > > may generate memory bandwidth to give performance counters and
> > > > > > > > > resctrl an identical view. Instead, do not compare performance
> > > > > > > > > counters and resctrl view of memory bandwidth when the memory
> > > > > > > > > bandwidth is low.
> > > > > > > > > 
> > > > > > > > > Bandwidth throttling behaves differently across platforms
> > > > > > > > > so it is not appropriate to drop measurement data simply based
> > > > > > > > > on the throttling level. Instead, use a threshold of 750MiB
> > > > > > > > > that has been observed to support adequate comparison between
> > > > > > > > > performance counters and resctrl.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> > > > > > > > > ---
> > > > > > > > >      tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
> > > > > > > > >      tools/testing/selftests/resctrl/resctrl.h  | 6 ++++++
> > > > > > > > >      2 files changed, 13 insertions(+)
> > > > > > > > > 
> > > > > > > > > diff --git a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > index cad473b81a64..204b9ac4b108 100644
> > > > > > > > > --- a/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > +++ b/tools/testing/selftests/resctrl/mba_test.c
> > > > > > > > > @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long
> > > > > > > > > *bw_imc,
> > > > > > > > > unsigned long *bw_resc)
> > > > > > > > >        		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> > > > > > > > >      		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> > > > > > > > > +		if (avg_bw_imc < THROTTLE_THRESHOLD ||
> > > > > > > > > avg_bw_resc <
> > > > > > > > > THROTTLE_THRESHOLD) {
> > > > > > > > > +			ksft_print_msg("Bandwidth below
> > > > > > > > > threshold (%d
> > > > > > > > > MiB).
> > > > > > > > > Dropping results from MBA schemata %u.\n",
> > > > > > > > > +					THROTTLE_THRESHOLD,
> > > > > > > > > +					ALLOCATION_MAX -
> > > > > > > > > ALLOCATION_STEP *
> > > > > > > > > allocation);
> > > > > > > > 
> > > > > > > > The second one too should be %d.
> > > > > > > > 
> > > > > > > 
> > > > > > > hmmm ... I intended to have it be consistent with the
> > > > > > > ksft_print_msg()
> > > > > > > that
> > > > > > > follows. Perhaps allocation can be made unsigned instead?
> > > > > > 
> > > > > > If you go that way, then it would be good to make the related
> > > > > > defines
> > > > > > and
> > > > > > allocation in mba_setup() unsigned too, although the latter is a bit
> > > > > > scary
> > > > > 
> > > > > Sure, will look into that.
> > > > > 
> > > > > > because it does allocation -= ALLOCATION_STEP which could underflow
> > > > > > if
> > > > > > the
> > > > > > defines are ever changed.
> > > > > > 
> > > > > 
> > > > > Is this not already covered in the following check:
> > > > > 	if (allocation < ALLOCATION_MIN || allocation >
> > > > > ALLOCATION_MAX)
> > > > > 		return END_OF_TESTS;
> > > > > 
> > > > > We are talking about hardcoded constants though.
> > > > 
> > > > Borderline yes. It is "covered" by the allocation > ALLOCATION_MAX but
> > > > it's also very non-intuitive to let the value underflow and then check
> > > > for
> > > > that with the > operator.
> > > 
> > > My understanding is that this is the traditional way of checking overflow
> > > (or more accurately wraparound). There are several examples of this
> > > pattern
> > > in the kernel and it is also the pattern that I understand Linus referred
> > > to as "traditional" in [1]. Even the compiler's intrinsic overflow
> > > checkers
> > > do checking in this way (perform the subtraction and then check if it
> > > overflowed) [2].
> > 
> > Fair enough. I've never come across that kind of claim before.
> > 
> > > > You're correct that they're constants so one would need to tweak the
> > > > source to end up into this condition in the first place.
> > > > 
> > > > Perhaps I'm being overly pendantic here but I in general don't like
> > > > leaving trappy and non-obvious logic like that lying around because one
> > > > day one of such will come back biting.
> > > 
> > > It is not clear to me what is "trappy" about this. The current checks will
> > > catch the wraparound if somebody changes ALLOCATION_STEP in your scenario,
> > > no?
> > > 
> > > > So, if a variable is unsigned and we ever do subtraction (or adding
> > > > negative numbers to it), I'd prefer additional check to prevent ever
> > > > underflowing it unexpectedly. Or leave them signed.
> > > 
> > > To support checking at the time of subtraction we either need to change
> > > the
> > > flow of that function since it cannot exit with failure if that
> > > subtraction
> > > fails because of overflow/wraparound, or we need to introduce more state
> > > that
> > > will be an additional check that the existing
> > > "if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)"
> > > would have caught anyway.
> > > 
> > > In either case, to do this checking at the time of subtraction the ideal
> > > way
> > > would be to use check_sub_overflow() ... but it again does exactly what
> > > you find to be non-intuitive since the implementation in
> > > tools/include/linux/overflow.h uses the gcc intrinsics that does
> > > subtraction
> > > first and then checks if overflow occurred.
> > 
> > It's trappy because by glance, that check looks unnecessary since
> > allocation starts from max and goes downwards. Also worth to note,
> > the check is not immediately after the decrement but done on the next
> > iteration.
> 
> Right. This is probably what causes most confusion.
> 
> Considering that, what do you think of below that avoids wraparound entirely:
> 
> ---8<---
> From: Reinette Chatre <reinette.chatre@intel.com>
> Subject: [PATCH] selftests/resctrl: Make wraparound handling obvious
> 
> Within mba_setup() the programmed bandwidth delay value starts
> at the maximum (100, or rather ALLOCATION_MAX) and progresses
> towards ALLOCATION_MIN by decrementing with ALLOCATION_STEP.
> 
> The programmed bandwidth delay should never be negative, so
> representing it with an unsigned int is most appropriate. This
> may introduce confusion because of the "allocation > ALLOCATION_MAX"
> check used to check wraparound of the subtraction.
> 
> Modify the mba_setup() flow to start at the minimum, ALLOCATION_MIN,
> and incrementally, with ALLOCATION_STEP steps, adjust the
> bandwidth delay value. This avoids wraparound while making the purpose
> of "allocation > ALLOCATION_MAX" clear and eliminates the
> need for the "allocation < ALLOCATION_MIN" check.
> 
> Reported-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
> ---
> Changes since V1:
> - New patch
> ---
>  tools/testing/selftests/resctrl/mba_test.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/testing/selftests/resctrl/mba_test.c
> b/tools/testing/selftests/resctrl/mba_test.c
> index ab8496a4925b..947d5699f0c8 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -39,7 +39,8 @@ static int mba_setup(const struct resctrl_test *test,
>  		     const struct user_params *uparams,
>  		     struct resctrl_val_param *p)
>  {
> -	static int runs_per_allocation, allocation = 100;
> +	static unsigned int allocation = ALLOCATION_MIN;
> +	static int runs_per_allocation = 0;
>  	char allocation_str[64];
>  	int ret;
>  @@ -50,7 +51,7 @@ static int mba_setup(const struct resctrl_test *test,
>  	if (runs_per_allocation++ != 0)
>  		return 0;
>  -	if (allocation < ALLOCATION_MIN || allocation > ALLOCATION_MAX)
> +	if (allocation > ALLOCATION_MAX)
>  		return END_OF_TESTS;
>   	sprintf(allocation_str, "%d", allocation);
> @@ -59,7 +60,7 @@ static int mba_setup(const struct resctrl_test *test,
>  	if (ret < 0)
>  		return ret;
>  -	allocation -= ALLOCATION_STEP;
> +	allocation += ALLOCATION_STEP;
>   	return 0;
>  }
> @@ -72,8 +73,9 @@ static int mba_measure(const struct user_params *uparams,
>   static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
>  {
> -	int allocation, runs;
> +	unsigned int allocation;
>  	bool ret = false;
> +	int runs;
>   	ksft_print_msg("Results are displayed in (MB)\n");
>  	/* Memory bandwidth from 100% down to 10% */
> @@ -103,7 +105,7 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned
> long *bw_resc)
>  			       avg_diff_per > MAX_DIFF_PERCENT ?
>  			       "Fail:" : "Pass:",
>  			       MAX_DIFF_PERCENT,
> -			       ALLOCATION_MAX - ALLOCATION_STEP * allocation);
> +			       ALLOCATION_MIN + ALLOCATION_STEP * allocation);
>   		ksft_print_msg("avg_diff_per: %d%%\n", avg_diff_per);
>  		ksft_print_msg("avg_bw_imc: %lu\n", avg_bw_imc);

Looks fine.

Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

-- 
 i.

> 
> 
> > 
> > The risk here is that somebody removes allocation > ALLOCATION_MAX check.
> > 
> > Something called check_sub_overflow() is not subject to a similar risk
> > regardless of what operations occur inside it.
> 
> Reinette
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 6/6] selftests/resctrl: Keep results from first test run
  2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
                   ` (4 preceding siblings ...)
  2024-08-29 22:52 ` [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth Reinette Chatre
@ 2024-08-29 22:52 ` Reinette Chatre
  5 siblings, 0 replies; 30+ messages in thread
From: Reinette Chatre @ 2024-08-29 22:52 UTC (permalink / raw)
  To: fenghua.yu, shuah, tony.luck, peternewman, babu.moger,
	ilpo.jarvinen
  Cc: maciej.wieczor-retman, reinette.chatre, linux-kselftest,
	linux-kernel

The resctrl selftests drop the results from every first test run
to avoid (per comment) "inaccurate due to monitoring setup transition
phase" data. Previously inaccurate data resulted from workloads needing
some time to "settle" and also the measurements themselves to
account for earlier measurements to measure across needed timeframe.

commit da50de0a92f3 ("selftests/resctrl: Calculate resctrl FS derived mem
bw over sleep(1) only")

ensured that measurements accurately measure just the time frame of
interest. The default "fill_buf" benchmark since separated the buffer
prepare phase from the benchmark run phase reducing the need for the
tests themselves to accommodate the benchmark's "settle" time.

With these enhancements there are no remaining portions needing
to "settle" and the first test run can contribute to measurements.

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
---
 tools/testing/selftests/resctrl/cmt_test.c |  5 ++---
 tools/testing/selftests/resctrl/mba_test.c |  6 +++---
 tools/testing/selftests/resctrl/mbm_test.c | 10 +++-------
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/tools/testing/selftests/resctrl/cmt_test.c b/tools/testing/selftests/resctrl/cmt_test.c
index f09d5dfab38c..85cb93d525b8 100644
--- a/tools/testing/selftests/resctrl/cmt_test.c
+++ b/tools/testing/selftests/resctrl/cmt_test.c
@@ -99,14 +99,13 @@ static int check_results(struct resctrl_val_param *param, size_t span, int no_of
 		}
 
 		/* Field 3 is llc occ resc value */
-		if (runs > 0)
-			sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
+		sum_llc_occu_resc += strtoul(token_array[3], NULL, 0);
 		runs++;
 	}
 	fclose(fp);
 
 	return show_results_info(sum_llc_occu_resc, no_of_bits, span,
-				 MAX_DIFF, MAX_DIFF_PERCENT, runs - 1, true);
+				 MAX_DIFF, MAX_DIFF_PERCENT, runs, true);
 }
 
 static void cmt_test_cleanup(void)
diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
index 204b9ac4b108..dddf9bc04cfa 100644
--- a/tools/testing/selftests/resctrl/mba_test.c
+++ b/tools/testing/selftests/resctrl/mba_test.c
@@ -88,14 +88,14 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
 		 * The first run is discarded due to inaccurate value from
 		 * phase transition.
 		 */
-		for (runs = NUM_OF_RUNS * allocation + 1;
+		for (runs = NUM_OF_RUNS * allocation;
 		     runs < NUM_OF_RUNS * allocation + NUM_OF_RUNS ; runs++) {
 			sum_bw_imc += bw_imc[runs];
 			sum_bw_resc += bw_resc[runs];
 		}
 
-		avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
-		avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
+		avg_bw_imc = sum_bw_imc / NUM_OF_RUNS;
+		avg_bw_resc = sum_bw_resc / NUM_OF_RUNS;
 		if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
 			ksft_print_msg("Bandwidth below threshold (%d MiB).  Dropping results from MBA schemata %u.\n",
 					THROTTLE_THRESHOLD,
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 734bfa4f42b3..bbacba4ec195 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -22,17 +22,13 @@ show_bw_info(unsigned long *bw_imc, unsigned long *bw_resc, size_t span)
 	int runs, ret, avg_diff_per;
 	float avg_diff = 0;
 
-	/*
-	 * Discard the first value which is inaccurate due to monitoring setup
-	 * transition phase.
-	 */
-	for (runs = 1; runs < NUM_OF_RUNS ; runs++) {
+	for (runs = 0; runs < NUM_OF_RUNS ; runs++) {
 		sum_bw_imc += bw_imc[runs];
 		sum_bw_resc += bw_resc[runs];
 	}
 
-	avg_bw_imc = sum_bw_imc / 4;
-	avg_bw_resc = sum_bw_resc / 4;
+	avg_bw_imc = sum_bw_imc / NUM_OF_RUNS;
+	avg_bw_resc = sum_bw_resc / NUM_OF_RUNS;
 	avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
 	avg_diff_per = (int)(avg_diff * 100);
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2024-09-09 12:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-29 22:52 [PATCH 0/6] selftests/resctrl: Support diverse platforms with MBM and MBA tests Reinette Chatre
2024-08-29 22:52 ` [PATCH 1/6] selftests/resctrl: Fix sparse warnings Reinette Chatre
2024-08-30 10:29   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 2/6] selftests/resctrl: Ensure measurements skip initialization of default benchmark Reinette Chatre
2024-08-30 10:56   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-09-04 11:57       ` Ilpo Järvinen
2024-09-04 21:15         ` Reinette Chatre
2024-09-05 12:10           ` Ilpo Järvinen
2024-09-05 18:08             ` Reinette Chatre
2024-09-06 10:00               ` Ilpo Järvinen
2024-09-07  0:05                 ` Reinette Chatre
2024-09-09 12:52                   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 3/6] selftests/resctrl: Simplify benchmark parameter passing Reinette Chatre
2024-08-30 11:13   ` Ilpo Järvinen
2024-08-30 16:01     ` Reinette Chatre
2024-08-29 22:52 ` [PATCH 4/6] selftests/resctrl: Use cache size to determine "fill_buf" buffer size Reinette Chatre
2024-08-30 11:25   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-08-29 22:52 ` [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth Reinette Chatre
2024-08-30 11:42   ` Ilpo Järvinen
2024-08-30 16:00     ` Reinette Chatre
2024-09-04 11:43       ` Ilpo Järvinen
2024-09-04 21:15         ` Reinette Chatre
2024-09-05 11:45           ` Ilpo Järvinen
2024-09-05 18:08             ` Reinette Chatre
2024-09-06  8:44               ` Ilpo Järvinen
2024-09-07  0:05                 ` Reinette Chatre
2024-09-09  8:13                   ` Ilpo Järvinen
2024-08-29 22:52 ` [PATCH 6/6] selftests/resctrl: Keep results from first test run Reinette Chatre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox