rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset
@ 2025-07-30  2:23 Yuzhuo Jing
  2025-07-30  2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30  2:23 UTC (permalink / raw)
  To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
	Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
	Frank van der Linden, linux-doc, linux-kernel, rcu
  Cc: Yuzhuo Jing

In an effort to add RCU benchmarks to the perf tool and to improve
the base-metal rcuscale tests, this patch series adds several auxiliary
features useful for testing tools.

This series introduces a few rcuscale options:
  * writer_no_print: skip writer duration printing during shutdown, but
    instead let users read from the new "writer_durations" debugfs file.
    This drastically improves cleanup speed.
  * block_start: an option to hold all worker thread until the new
    debugfs "should_start" file is written.
  * {reader,writer,kfree}_cpu_offset: the starting value of CPU affinity
    for each type of threads.  This can be used to avoid scheduling
    different types of threads on the same CPU.  The 4th patch in this
    series shows drastic performance differences w/ and w/o overlaps.

This patch series creates an "rcuscale" folder in debugfs, containing
the following files:
  * writer_durations: a CSV formatted file containing writer id and
    writer durations.
  * {reader,writer,kfree}_tasks: the list of kernel task PIDs for
    external tools to attach to.
  * should_start: a writable file to signal the start of the experiment,
    used in conjunction with the new "block_start" option.
  * test_complete: a readable file to indicate whether the experiment has
    finished or not.

RFCs:
  * Should those new files reside in debugfs or in procfs?
  * What format should be used for the writer_duartions file, and what
    documentations should be updated for the file format definition?
  * In the 4th patch, we see different characteristics between overlap
    and non-overlap.  Current rcuscale creates nr_cpu readers and nr_cpu
    writers, thus scheduling 2nr_cpu tasks on nr_cpu CPUs.  Should we
    consider changes to this behavior?  Or add automatic conflict
    resolutions when total threads <= nr_cpu.

Thank you!

Yuzhuo Jing (4):
  rcuscale: Create debugfs file for writer durations
  rcuscale: Create debugfs files for worker thread PIDs
  rcuscale: Add file based start/finish control
  rcuscale: Add CPU affinity offset options

 .../admin-guide/kernel-parameters.txt         |  29 ++
 kernel/rcu/rcuscale.c                         | 361 +++++++++++++++++-
 2 files changed, 377 insertions(+), 13 deletions(-)

-- 
2.50.1.552.g942d659e1b-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations
  2025-07-30  2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
@ 2025-07-30  2:23 ` Yuzhuo Jing
  2025-07-31  7:47   ` kernel test robot
  2025-07-30  2:23 ` [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs Yuzhuo Jing
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30  2:23 UTC (permalink / raw)
  To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
	Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
	Frank van der Linden, linux-doc, linux-kernel, rcu
  Cc: Yuzhuo Jing

Creates an "rcuscale" folder in debugfs and creates a "writer_durations"
file in the folder.  This file is in CSV format.  Each line represents
one duration record, with columns defined as:

  writer_id,duration

Added an option "writer_no_print" to skip printing writer durations on
cleanup.

This allows external tools to read structured data and also drastically
improves cleanup performance on large core count machines.

On a 256C 512T machines running nreaders=1 nwriters=511:

Before:
$ time modprobe -r rcuscale; modprobe -r torture
real    3m17.349s
user    0m0.000s
sys     3m15.288s

After:
$ time cat /sys/kernel/debug/rcuscale/writer_durations > durations.csv
real    0m0.005s
user    0m0.000s
sys     0m0.005s
$ time modprobe -r rcuscale; modprobe -r torture
real    0m0.388s
user    0m0.000s
sys     0m0.335s

Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
 .../admin-guide/kernel-parameters.txt         |   5 +
 kernel/rcu/rcuscale.c                         | 142 +++++++++++++++++-
 2 files changed, 139 insertions(+), 8 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1f2c0874da9..7b62a84a19d4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5583,6 +5583,11 @@
 			periods, but in jiffies.  The default of zero
 			says no holdoff.
 
+	rcuscale.writer_no_print= [KNL]
+			Do not print writer durations to kernel ring buffer.
+			Instead, users can read them from the
+			rcuscale/writer_durations file in debugfs.
+
 	rcutorture.fqs_duration= [KNL]
 			Set duration of force_quiescent_state bursts
 			in microseconds.
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index b521d0455992..ad10b42be6fc 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -40,6 +40,8 @@
 #include <linux/vmalloc.h>
 #include <linux/rcupdate_trace.h>
 #include <linux/sched/debug.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
 
 #include "rcu.h"
 
@@ -97,6 +99,7 @@ torture_param(bool, shutdown, RCUSCALE_SHUTDOWN,
 torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
 torture_param(int, writer_holdoff_jiffies, 0, "Holdoff (jiffies) between GPs, zero to disable");
+torture_param(bool, writer_no_print, false, "Do not print writer durations to ring buffer");
 torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
 torture_param(int, kfree_by_call_rcu, 0, "Use call_rcu() to emulate kfree_rcu()?");
@@ -138,6 +141,9 @@ static u64 t_rcu_scale_writer_finished;
 static unsigned long b_rcu_gp_test_started;
 static unsigned long b_rcu_gp_test_finished;
 
+static struct dentry *debugfs_dir;
+static struct dentry *debugfs_writer_durations;
+
 #define MAX_MEAS 10000
 #define MIN_MEAS 100
 
@@ -607,6 +613,7 @@ rcu_scale_writer(void *arg)
 		t = ktime_get_mono_fast_ns();
 		*wdp = t - *wdp;
 		i_max = i;
+		writer_n_durations[me] = i_max + 1;
 		if (!started &&
 		    atomic_read(&n_rcu_scale_writer_started) >= nrealwriters)
 			started = true;
@@ -620,6 +627,7 @@ rcu_scale_writer(void *arg)
 			    nrealwriters) {
 				schedule_timeout_interruptible(10);
 				rcu_ftrace_dump(DUMP_ALL);
+				WRITE_ONCE(test_complete, true);
 				SCALEOUT_STRING("Test complete");
 				t_rcu_scale_writer_finished = t;
 				if (gp_exp) {
@@ -666,7 +674,6 @@ rcu_scale_writer(void *arg)
 		rcu_scale_free(wmbp);
 		cur_ops->gp_barrier();
 	}
-	writer_n_durations[me] = i_max + 1;
 	torture_kthread_stopping("rcu_scale_writer");
 	return 0;
 }
@@ -941,6 +948,117 @@ kfree_scale_init(void)
 	return firsterr;
 }
 
+/*
+ * A seq_file for writer_durations.  Content is only visible when all writers
+ * finish.  Element i of the sequence is writer_durations + i.
+ */
+static void *writer_durations_start(struct seq_file *m, loff_t *pos)
+{
+	loff_t writer_id = *pos;
+
+	if (!test_complete || writer_id < 0 || writer_id >= nrealwriters)
+		return NULL;
+
+	return writer_durations + writer_id;
+}
+
+static void *writer_durations_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return writer_durations_start(m, pos);
+}
+
+static void writer_durations_stop(struct seq_file *m, void *v)
+{
+}
+
+/*
+ * Each element in the seq_file is an array of one writer's durations.
+ * Each element prints writer_n_durations[writer_id] lines, and each line
+ * contains one duration record, in CSV format:
+ * writer_id,duration
+ */
+static int writer_durations_show(struct seq_file *m, void *v)
+{
+	u64 **durations = v;
+	loff_t writer_id = durations - writer_durations;
+
+	for (int i = 0; i < writer_n_durations[writer_id]; ++i)
+		seq_printf(m, "%lld,%lld\n", writer_id, durations[0][i]);
+
+	return 0;
+}
+
+static const struct seq_operations writer_durations_op = {
+	.start	= writer_durations_start,
+	.next	= writer_durations_next,
+	.stop	= writer_durations_stop,
+	.show	= writer_durations_show
+};
+
+static int writer_durations_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &writer_durations_op);
+}
+
+static const struct file_operations writer_durations_fops = {
+	.owner = THIS_MODULE,
+	.open = writer_durations_open,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
+
+/*
+ * Create an rcuscale directory exposing run states and results.
+ */
+static int register_debugfs(void)
+{
+#define try_create_file(variable, name, mode, parent, data, fops)		\
+({										\
+	variable = debugfs_create_file((name), (mode), (parent), (data), (fops)); \
+	err = PTR_ERR_OR_ZERO(variable);					\
+	err;									\
+})
+
+	int err;
+
+	debugfs_dir = debugfs_create_dir("rcuscale", NULL);
+	err = PTR_ERR_OR_ZERO(debugfs_dir);
+	if (err)
+		goto fail;
+
+	if (try_create_file(debugfs_writer_durations, "writer_durations", 0444,
+			debugfs_dir, NULL, &writer_durations_fops))
+		goto fail;
+
+	return 0;
+fail:
+	pr_err("rcu-scale: Failed to create debugfs file.");
+	/* unregister_debugfs is called by rcu_scale_cleanup, avoid
+	 * calling it twice.
+	 */
+	return err;
+#undef try_create_file
+}
+
+static void unregister_debugfs(void)
+{
+#define try_remove(variable)			\
+do {						\
+	if (!IS_ERR_OR_NULL(variable))		\
+		debugfs_remove(variable);	\
+	variable = NULL;			\
+} while (0)
+
+	try_remove(debugfs_writer_durations);
+
+	/* Remove directory after files. */
+	try_remove(debugfs_dir);
+
+#undef try_remove
+}
+
 static void
 rcu_scale_cleanup(void)
 {
@@ -961,6 +1079,8 @@ rcu_scale_cleanup(void)
 	if (gp_exp && gp_async)
 		SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
 
+	unregister_debugfs();
+
 	// If built-in, just report all of the GP kthread's CPU time.
 	if (IS_BUILTIN(CONFIG_RCU_SCALE_TEST) && !kthread_tp && cur_ops->rso_gp_kthread)
 		kthread_tp = cur_ops->rso_gp_kthread();
@@ -1020,13 +1140,15 @@ rcu_scale_cleanup(void)
 			wdpp = writer_durations[i];
 			if (!wdpp)
 				continue;
-			for (j = 0; j < writer_n_durations[i]; j++) {
-				wdp = &wdpp[j];
-				pr_alert("%s%s %4d writer-duration: %5d %llu\n",
-					scale_type, SCALE_FLAG,
-					i, j, *wdp);
-				if (j % 100 == 0)
-					schedule_timeout_uninterruptible(1);
+			if (!writer_no_print) {
+				for (j = 0; j < writer_n_durations[i]; j++) {
+					wdp = &wdpp[j];
+					pr_alert("%s%s %4d writer-duration: %5d %llu\n",
+						scale_type, SCALE_FLAG,
+						i, j, *wdp);
+					if (j % 100 == 0)
+						schedule_timeout_uninterruptible(1);
+				}
 			}
 			kfree(writer_durations[i]);
 			if (writer_freelists) {
@@ -1202,6 +1324,10 @@ rcu_scale_init(void)
 		if (torture_init_error(firsterr))
 			goto unwind;
 	}
+
+	if (register_debugfs())
+		goto unwind;
+
 	torture_init_end();
 	return 0;
 
-- 
2.50.1.552.g942d659e1b-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs
  2025-07-30  2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
  2025-07-30  2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
@ 2025-07-30  2:23 ` Yuzhuo Jing
  2025-07-30  2:23 ` [PATCH v1 3/4] rcuscale: Add file based start/finish control Yuzhuo Jing
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30  2:23 UTC (permalink / raw)
  To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
	Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
	Frank van der Linden, linux-doc, linux-kernel, rcu
  Cc: Yuzhuo Jing

Creates {reader,writer,kfree}_tasks files in the "rcuscale" debugfs
folder.  Each line contains one kernel thread PID.

This provides a more robust way for external performance analysis tools
to attach to kernel threads than using pgrep.

Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
 kernel/rcu/rcuscale.c | 124 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index ad10b42be6fc..7c88d461ed2c 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -143,6 +143,9 @@ static unsigned long b_rcu_gp_test_finished;
 
 static struct dentry *debugfs_dir;
 static struct dentry *debugfs_writer_durations;
+static struct dentry *debugfs_reader_tasks;
+static struct dentry *debugfs_writer_tasks;
+static struct dentry *debugfs_kfree_tasks;
 
 #define MAX_MEAS 10000
 #define MIN_MEAS 100
@@ -1009,6 +1012,112 @@ static const struct file_operations writer_durations_fops = {
 	.release = seq_release,
 };
 
+/*
+ * Generic seq_file private data for tasks walkthrough.
+ */
+struct debugfs_pid_info {
+	int ntasks;
+	struct task_struct **tasks;
+};
+
+/*
+ * Generic seq_file pos to pointer conversion function, using private data
+ * of type debugfs_pid_info, and ensure it is within bound.
+ */
+static void *debugfs_pid_start(struct seq_file *m, loff_t *pos)
+{
+	loff_t worker = *pos;
+	struct debugfs_pid_info *info = m->private;
+
+	if (worker < 0 || worker >= info->ntasks)
+		return NULL;
+
+	return info->tasks[worker];
+}
+
+static void *debugfs_pid_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	(*pos)++;
+	return debugfs_pid_start(m, pos);
+}
+
+/*
+ * Each line of the file contains one PID from the selected kernel threads.
+ */
+static int debugfs_pid_show(struct seq_file *m, void *v)
+{
+	seq_printf(m, "%d\n", ((struct task_struct *)v)->pid);
+	return 0;
+}
+
+static void debugfs_pid_stop(struct seq_file *m, void *v)
+{
+}
+
+static const struct seq_operations debugfs_pid_fops = {
+	.start	= debugfs_pid_start,
+	.next	= debugfs_pid_next,
+	.stop	= debugfs_pid_stop,
+	.show	= debugfs_pid_show
+};
+
+/*
+ * Generic seq_file creation function that sets private data of type
+ * debugfs_pid_info.
+ */
+static int debugfs_pid_open_info(struct inode *inode, struct file *file,
+		int ntasks, struct task_struct **tasks)
+{
+	struct debugfs_pid_info *info =
+		__seq_open_private(file, &debugfs_pid_fops, sizeof(*info));
+	if (!info)
+		return -ENOMEM;
+
+	info->ntasks = ntasks;
+	info->tasks = tasks;
+
+	return 0;
+}
+
+static int debugfs_pid_open_reader(struct inode *inode, struct file *file)
+{
+	return debugfs_pid_open_info(inode, file, nrealreaders, reader_tasks);
+}
+
+static int debugfs_pid_open_writer(struct inode *inode, struct file *file)
+{
+	return debugfs_pid_open_info(inode, file, nrealwriters, writer_tasks);
+}
+
+static int debugfs_pid_open_kfree(struct inode *inode, struct file *file)
+{
+	return debugfs_pid_open_info(inode, file, kfree_nrealthreads, kfree_reader_tasks);
+}
+
+static const struct file_operations readers_fops = {
+	.owner = THIS_MODULE,
+	.open = debugfs_pid_open_reader,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
+
+static const struct file_operations writers_fops = {
+	.owner = THIS_MODULE,
+	.open = debugfs_pid_open_writer,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
+
+static const struct file_operations kfrees_fops = {
+	.owner = THIS_MODULE,
+	.open = debugfs_pid_open_kfree,
+	.read = seq_read,
+	.llseek = seq_lseek,
+	.release = seq_release,
+};
+
 /*
  * Create an rcuscale directory exposing run states and results.
  */
@@ -1032,6 +1141,18 @@ static int register_debugfs(void)
 			debugfs_dir, NULL, &writer_durations_fops))
 		goto fail;
 
+	if (try_create_file(debugfs_reader_tasks, "reader_tasks", 0444,
+			debugfs_dir, NULL, &readers_fops))
+		goto fail;
+
+	if (try_create_file(debugfs_writer_tasks, "writer_tasks", 0444,
+			debugfs_dir, NULL, &writers_fops))
+		goto fail;
+
+	if (try_create_file(debugfs_kfree_tasks, "kfree_tasks", 0444,
+			debugfs_dir, NULL, &kfrees_fops))
+		goto fail;
+
 	return 0;
 fail:
 	pr_err("rcu-scale: Failed to create debugfs file.");
@@ -1052,6 +1173,9 @@ do {						\
 } while (0)
 
 	try_remove(debugfs_writer_durations);
+	try_remove(debugfs_reader_tasks);
+	try_remove(debugfs_writer_tasks);
+	try_remove(debugfs_kfree_tasks);
 
 	/* Remove directory after files. */
 	try_remove(debugfs_dir);
-- 
2.50.1.552.g942d659e1b-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 3/4] rcuscale: Add file based start/finish control
  2025-07-30  2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
  2025-07-30  2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
  2025-07-30  2:23 ` [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs Yuzhuo Jing
@ 2025-07-30  2:23 ` Yuzhuo Jing
  2025-07-30  2:23 ` [PATCH v1 4/4] rcuscale: Add CPU affinity offset options Yuzhuo Jing
  2025-07-31 23:38 ` [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Paul E. McKenney
  4 siblings, 0 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30  2:23 UTC (permalink / raw)
  To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
	Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
	Frank van der Linden, linux-doc, linux-kernel, rcu
  Cc: Yuzhuo Jing

In addition to the existing timing-based (holdoff, writer_holdoff)
start control, add file-based controls to debugfs.

This patch adds an option "block_start", which holds all worker threads
until the "rcuscale/should_start" debugfs file is written with a
non-zero integer.  A new "test_complete" file is added to the debugfs
folder, with file content "0" indicating experiment has not finished and
"1" indicating finished.  This is useful for start/finish control by
external test tools.

Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
 .../admin-guide/kernel-parameters.txt         |  5 ++
 kernel/rcu/rcuscale.c                         | 79 +++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7b62a84a19d4..5e233e511f81 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5487,6 +5487,11 @@
 
 			Default is 0.
 
+	rcuscale.block_start= [KNL]
+			Block the experiment start until "1" is written to the
+			rcuscale/should_start file in debugfs.  This is useful
+			for start/finish control by external tools.
+
 	rcuscale.gp_async= [KNL]
 			Measure performance of asynchronous
 			grace-period primitives such as call_rcu().
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 7c88d461ed2c..43bcaeac457f 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -87,6 +87,7 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
 # define RCUSCALE_SHUTDOWN 1
 #endif
 
+torture_param(bool, block_start, false, "Block all threads after creation and wait for should_start");
 torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives");
 torture_param(int, gp_async_max, 1000, "Max # outstanding waits per writer");
 torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
@@ -146,6 +147,12 @@ static struct dentry *debugfs_writer_durations;
 static struct dentry *debugfs_reader_tasks;
 static struct dentry *debugfs_writer_tasks;
 static struct dentry *debugfs_kfree_tasks;
+static struct dentry *debugfs_should_start;
+static struct dentry *debugfs_test_complete;
+
+static DECLARE_COMPLETION(start_barrier);
+static bool should_start;
+static bool test_complete;
 
 #define MAX_MEAS 10000
 #define MIN_MEAS 100
@@ -457,6 +464,23 @@ static void rcu_scale_wait_shutdown(void)
 		schedule_timeout_uninterruptible(1);
 }
 
+/*
+ * Wait start_barrier if block_start is enabled.  Exit early if shutdown
+ * is requested.
+ *
+ * Return: true if caller should exit; false if caller should continue.
+ */
+static bool wait_start_barrier(void)
+{
+	if (!block_start)
+		return false;
+	while (wait_for_completion_interruptible(&start_barrier)) {
+		if (torture_must_stop())
+			return true;
+	}
+	return false;
+}
+
 /*
  * RCU scalability reader kthread.  Repeatedly does empty RCU read-side
  * critical section, minimizing update-side interference.  However, the
@@ -475,6 +499,11 @@ rcu_scale_reader(void *arg)
 	set_user_nice(current, MAX_NICE);
 	atomic_inc(&n_rcu_scale_reader_started);
 
+	if (wait_start_barrier()) {
+		torture_kthread_stopping("rcu_scale_reader");
+		return 0;
+	}
+
 	do {
 		local_irq_save(flags);
 		idx = cur_ops->readlock();
@@ -560,6 +589,11 @@ rcu_scale_writer(void *arg)
 	current->flags |= PF_NO_SETAFFINITY;
 	sched_set_fifo_low(current);
 
+	if (wait_start_barrier()) {
+		torture_kthread_stopping("rcu_scale_writer");
+		return 0;
+	}
+
 	if (holdoff)
 		schedule_timeout_idle(holdoff * HZ);
 
@@ -755,6 +789,11 @@ kfree_scale_thread(void *arg)
 	set_user_nice(current, MAX_NICE);
 	kfree_rcu_test_both = (kfree_rcu_test_single == kfree_rcu_test_double);
 
+	if (wait_start_barrier()) {
+		torture_kthread_stopping("kfree_scale_thread");
+		return 0;
+	}
+
 	start_time = ktime_get_mono_fast_ns();
 
 	if (atomic_inc_return(&n_kfree_scale_thread_started) >= kfree_nrealthreads) {
@@ -1118,6 +1157,32 @@ static const struct file_operations kfrees_fops = {
 	.release = seq_release,
 };
 
+/*
+ * For the "should_start" writable file, reuse debugfs integer parsing, but
+ * override write function to also send complete_all if should_start is
+ * changed to 1.
+ *
+ * Any non-zero value written to this file is converted to 1.
+ */
+static int should_start_set(void *data, u64 val)
+{
+	*(bool *)data = !!val;
+
+	if (block_start && !!val)
+		complete_all(&start_barrier);
+
+	return 0;
+}
+
+static int bool_get(void *data, u64 *val)
+{
+	*val = *(bool *)data;
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(should_start_fops, bool_get, should_start_set, "%llu");
+DEFINE_DEBUGFS_ATTRIBUTE(test_complete_fops, bool_get, NULL, "%llu");
+
 /*
  * Create an rcuscale directory exposing run states and results.
  */
@@ -1153,6 +1218,15 @@ static int register_debugfs(void)
 			debugfs_dir, NULL, &kfrees_fops))
 		goto fail;
 
+	if (try_create_file(debugfs_should_start, "should_start", 0644,
+			debugfs_dir, &should_start, &should_start_fops))
+		goto fail;
+
+	/* Future: add notification method for readers waiting on file change. */
+	if (try_create_file(debugfs_test_complete, "test_complete", 0444,
+			debugfs_dir, &test_complete, &test_complete_fops))
+		goto fail;
+
 	return 0;
 fail:
 	pr_err("rcu-scale: Failed to create debugfs file.");
@@ -1176,6 +1250,8 @@ do {						\
 	try_remove(debugfs_reader_tasks);
 	try_remove(debugfs_writer_tasks);
 	try_remove(debugfs_kfree_tasks);
+	try_remove(debugfs_should_start);
+	try_remove(debugfs_test_complete);
 
 	/* Remove directory after files. */
 	try_remove(debugfs_dir);
@@ -1372,6 +1448,9 @@ rcu_scale_init(void)
 	atomic_set(&n_rcu_scale_writer_finished, 0);
 	rcu_scale_print_module_parms(cur_ops, "Start of test");
 
+	if (!block_start)
+		should_start = true;
+
 	/* Start up the kthreads. */
 
 	if (shutdown) {
-- 
2.50.1.552.g942d659e1b-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 4/4] rcuscale: Add CPU affinity offset options
  2025-07-30  2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
                   ` (2 preceding siblings ...)
  2025-07-30  2:23 ` [PATCH v1 3/4] rcuscale: Add file based start/finish control Yuzhuo Jing
@ 2025-07-30  2:23 ` Yuzhuo Jing
  2025-07-31 23:38 ` [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Paul E. McKenney
  4 siblings, 0 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30  2:23 UTC (permalink / raw)
  To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
	Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
	Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
	Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
	Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
	Frank van der Linden, linux-doc, linux-kernel, rcu
  Cc: Yuzhuo Jing

Currently, reader, writer, and kfree threads set their affinity by their
id % nr_cpu_ids.  IDs of the all three types all start from 0, and
therefore readers, writers, and kfrees may be scheduled on the same CPU.

This patch adds options to offset CPU affinity.

From the experiments below, writer duration characteristics are very
different between offset 0 and 1.  Experiments carried out on a 256C 512T
machine running PREEMPT=n kernel.

Experiment: nreaders=1 nwriters=1 reader_cpu_offset=0 writer_cpu_offset=0
Average grace-period duration: 108376 microseconds
Minimum grace-period duration: 13000.4
50th percentile grace-period duration: 115000
90th percentile grace-period duration: 121000
99th percentile grace-period duration: 121004
Maximum grace-period duration: 219000
Grace periods: 101 Batches: 1 Ratio: 101

Experiment: nreaders=1 nwriters=1 reader_cpu_offset=0 writer_cpu_offset=1
Average grace-period duration: 185950 microseconds
Minimum grace-period duration: 8999.84
50th percentile grace-period duration: 217946
90th percentile grace-period duration: 218003
99th percentile grace-period duration: 218018
Maximum grace-period duration: 272195
Grace periods: 101 Batches: 1 Ratio: 101

Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
 .../admin-guide/kernel-parameters.txt         | 19 +++++++++++++++++++
 kernel/rcu/rcuscale.c                         | 16 +++++++++++-----
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5e233e511f81..f68651c103a4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1,3 +1,4 @@
+# vim: noet:sw=8:sts=8:
 	accept_memory=  [MM]
 			Format: { eager | lazy }
 			default: lazy
@@ -5513,6 +5514,12 @@
 			test until boot completes in order to avoid
 			interference.
 
+	rcuscale.kfree_cpu_offset= [KNL]
+			Set the starting CPU affinity index of kfree threads.
+			CPU affinity is assigned sequentially from
+			kfree_cpu_offset to kfree_cpu_offset+kfree_nthreads,
+			modded by number of CPUs.  Negative value is reset to 0.
+
 	rcuscale.kfree_by_call_rcu= [KNL]
 			In kernels built with CONFIG_RCU_LAZY=y, test
 			call_rcu() instead of kfree_rcu().
@@ -5567,6 +5574,12 @@
 			the same as for rcuscale.nreaders.
 			N, where N is the number of CPUs
 
+	rcuscale.reader_cpu_offset= [KNL]
+			Set the starting CPU affinity index of reader threads.
+			CPU affinity is assigned sequentially from
+			reader_cpu_offset to reader_cpu_offset+nreaders, modded
+			by number of CPUs.  Negative value is reset to 0.
+
 	rcuscale.scale_type= [KNL]
 			Specify the RCU implementation to test.
 
@@ -5578,6 +5591,12 @@
 	rcuscale.verbose= [KNL]
 			Enable additional printk() statements.
 
+	rcuscale.writer_cpu_offset= [KNL]
+			Set the starting CPU affinity index of writer threads.
+			CPU affinity is assigned sequentially from
+			writer_cpu_offset to writer_cpu_offset+nwriters, modded
+			by number of CPUs.  Negative value is reset to 0.
+
 	rcuscale.writer_holdoff= [KNL]
 			Write-side holdoff between grace periods,
 			in microseconds.  The default of zero says
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 43bcaeac457f..1208169be15e 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -95,12 +95,15 @@ torture_param(int, holdoff, 10, "Holdoff time before test start (s)");
 torture_param(int, minruntime, 0, "Minimum run time (s)");
 torture_param(int, nreaders, -1, "Number of RCU reader threads");
 torture_param(int, nwriters, -1, "Number of RCU updater threads");
+torture_param(int, reader_cpu_offset, 0, "Offset of reader CPU affinity")
 torture_param(bool, shutdown, RCUSCALE_SHUTDOWN,
 	      "Shutdown at end of scalability tests.");
 torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
+torture_param(int, writer_cpu_offset, 0, "Offset of writer CPU affinity")
 torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
 torture_param(int, writer_holdoff_jiffies, 0, "Holdoff (jiffies) between GPs, zero to disable");
 torture_param(bool, writer_no_print, false, "Do not print writer durations to ring buffer");
+torture_param(int, kfree_cpu_offset, 0, "Offset of kfree CPU affinity")
 torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
 torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
 torture_param(int, kfree_by_call_rcu, 0, "Use call_rcu() to emulate kfree_rcu()?");
@@ -495,7 +498,7 @@ rcu_scale_reader(void *arg)
 	long me = (long)arg;
 
 	VERBOSE_SCALEOUT_STRING("rcu_scale_reader task started");
-	set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
+	set_cpus_allowed_ptr(current, cpumask_of((reader_cpu_offset + me) % nr_cpu_ids));
 	set_user_nice(current, MAX_NICE);
 	atomic_inc(&n_rcu_scale_reader_started);
 
@@ -585,7 +588,7 @@ rcu_scale_writer(void *arg)
 
 	VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started");
 	WARN_ON(!wdpp);
-	set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
+	set_cpus_allowed_ptr(current, cpumask_of((writer_cpu_offset + me) % nr_cpu_ids));
 	current->flags |= PF_NO_SETAFFINITY;
 	sched_set_fifo_low(current);
 
@@ -719,8 +722,8 @@ static void
 rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
 {
 	pr_alert("%s" SCALE_FLAG
-		 "--- %s: gp_async=%d gp_async_max=%d gp_exp=%d holdoff=%d minruntime=%d nreaders=%d nwriters=%d writer_holdoff=%d writer_holdoff_jiffies=%d verbose=%d shutdown=%d\n",
-		 scale_type, tag, gp_async, gp_async_max, gp_exp, holdoff, minruntime, nrealreaders, nrealwriters, writer_holdoff, writer_holdoff_jiffies, verbose, shutdown);
+		 "--- %s: gp_async=%d gp_async_max=%d gp_exp=%d holdoff=%d minruntime=%d nreaders=%d nwriters=%d reader_cpu_offset=%d writer_cpu_offset=%d writer_holdoff=%d writer_holdoff_jiffies=%d kfree_cpu_offset=%d verbose=%d shutdown=%d\n",
+		 scale_type, tag, gp_async, gp_async_max, gp_exp, holdoff, minruntime, nrealreaders, nrealwriters, reader_cpu_offset, writer_cpu_offset, writer_holdoff, writer_holdoff_jiffies, kfree_cpu_offset, verbose, shutdown);
 }
 
 /*
@@ -785,7 +788,7 @@ kfree_scale_thread(void *arg)
 	DEFINE_TORTURE_RANDOM(tr);
 
 	VERBOSE_SCALEOUT_STRING("kfree_scale_thread task started");
-	set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
+	set_cpus_allowed_ptr(current, cpumask_of((kfree_cpu_offset + me) % nr_cpu_ids));
 	set_user_nice(current, MAX_NICE);
 	kfree_rcu_test_both = (kfree_rcu_test_single == kfree_rcu_test_double);
 
@@ -1446,6 +1449,9 @@ rcu_scale_init(void)
 	atomic_set(&n_rcu_scale_reader_started, 0);
 	atomic_set(&n_rcu_scale_writer_started, 0);
 	atomic_set(&n_rcu_scale_writer_finished, 0);
+	reader_cpu_offset = max(reader_cpu_offset, 0);
+	writer_cpu_offset = max(writer_cpu_offset, 0);
+	kfree_cpu_offset = max(kfree_cpu_offset, 0);
 	rcu_scale_print_module_parms(cur_ops, "Start of test");
 
 	if (!block_start)
-- 
2.50.1.552.g942d659e1b-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations
  2025-07-30  2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
@ 2025-07-31  7:47   ` kernel test robot
  0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-07-31  7:47 UTC (permalink / raw)
  To: Yuzhuo Jing, Ian Rogers, Yuzhuo Jing, Jonathan Corbet,
	Davidlohr Bueso, Paul E . McKenney, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
	Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang, Andrew Morton, Ingo Molnar,
	Borislav Petkov, Arnd Bergmann, Frank van der Linden, linux-doc,
	linux-kernel, rcu
  Cc: oe-kbuild-all, Linux Memory Management List

Hi Yuzhuo,

kernel test robot noticed the following build errors:

[auto build test ERROR on rcu/rcu/dev]
[also build test ERROR on linus/master v6.16 next-20250731]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yuzhuo-Jing/rcuscale-Create-debugfs-file-for-writer-durations/20250730-102613
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git rcu/dev
patch link:    https://lore.kernel.org/r/20250730022347.71722-2-yuzhuo%40google.com
patch subject: [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations
config: i386-randconfig-002-20250731 (https://download.01.org/0day-ci/archive/20250731/202507311504.ttQGhW04-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250731/202507311504.ttQGhW04-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507311504.ttQGhW04-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from <command-line>:
   kernel/rcu/rcuscale.c: In function 'rcu_scale_writer':
>> kernel/rcu/rcuscale.c:630:44: error: 'test_complete' undeclared (first use in this function); did you mean 'complete'?
     630 |                                 WRITE_ONCE(test_complete, true);
         |                                            ^~~~~~~~~~~~~
   include/linux/compiler_types.h:548:23: note: in definition of macro '__compiletime_assert'
     548 |                 if (!(condition))                                       \
         |                       ^~~~~~~~~
   include/linux/compiler_types.h:568:9: note: in expansion of macro '_compiletime_assert'
     568 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:28: note: in expansion of macro '__native_word'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |                            ^~~~~~~~~~~~~
   include/asm-generic/rwonce.h:60:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      60 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/rcu/rcuscale.c:630:33: note: in expansion of macro 'WRITE_ONCE'
     630 |                                 WRITE_ONCE(test_complete, true);
         |                                 ^~~~~~~~~~
   kernel/rcu/rcuscale.c:630:44: note: each undeclared identifier is reported only once for each function it appears in
     630 |                                 WRITE_ONCE(test_complete, true);
         |                                            ^~~~~~~~~~~~~
   include/linux/compiler_types.h:548:23: note: in definition of macro '__compiletime_assert'
     548 |                 if (!(condition))                                       \
         |                       ^~~~~~~~~
   include/linux/compiler_types.h:568:9: note: in expansion of macro '_compiletime_assert'
     568 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
         |         ^~~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |         ^~~~~~~~~~~~~~~~~~
   include/asm-generic/rwonce.h:36:28: note: in expansion of macro '__native_word'
      36 |         compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long),  \
         |                            ^~~~~~~~~~~~~
   include/asm-generic/rwonce.h:60:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
      60 |         compiletime_assert_rwonce_type(x);                              \
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/rcu/rcuscale.c:630:33: note: in expansion of macro 'WRITE_ONCE'
     630 |                                 WRITE_ONCE(test_complete, true);
         |                                 ^~~~~~~~~~
   kernel/rcu/rcuscale.c: In function 'writer_durations_start':
   kernel/rcu/rcuscale.c:959:14: error: 'test_complete' undeclared (first use in this function); did you mean 'complete'?
     959 |         if (!test_complete || writer_id < 0 || writer_id >= nrealwriters)
         |              ^~~~~~~~~~~~~
         |              complete


vim +630 kernel/rcu/rcuscale.c

   534	
   535	/*
   536	 * RCU scale writer kthread.  Repeatedly does a grace period.
   537	 */
   538	static int
   539	rcu_scale_writer(void *arg)
   540	{
   541		int i = 0;
   542		int i_max;
   543		unsigned long jdone;
   544		long me = (long)arg;
   545		bool selfreport = false;
   546		bool started = false, done = false, alldone = false;
   547		u64 t;
   548		DEFINE_TORTURE_RANDOM(tr);
   549		u64 *wdp;
   550		u64 *wdpp = writer_durations[me];
   551		struct writer_freelist *wflp = &writer_freelists[me];
   552		struct writer_mblock *wmbp = NULL;
   553	
   554		VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started");
   555		WARN_ON(!wdpp);
   556		set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
   557		current->flags |= PF_NO_SETAFFINITY;
   558		sched_set_fifo_low(current);
   559	
   560		if (holdoff)
   561			schedule_timeout_idle(holdoff * HZ);
   562	
   563		/*
   564		 * Wait until rcu_end_inkernel_boot() is called for normal GP tests
   565		 * so that RCU is not always expedited for normal GP tests.
   566		 * The system_state test is approximate, but works well in practice.
   567		 */
   568		while (!gp_exp && system_state != SYSTEM_RUNNING)
   569			schedule_timeout_uninterruptible(1);
   570	
   571		t = ktime_get_mono_fast_ns();
   572		if (atomic_inc_return(&n_rcu_scale_writer_started) >= nrealwriters) {
   573			t_rcu_scale_writer_started = t;
   574			if (gp_exp) {
   575				b_rcu_gp_test_started =
   576					cur_ops->exp_completed() / 2;
   577			} else {
   578				b_rcu_gp_test_started = cur_ops->get_gp_seq();
   579			}
   580		}
   581	
   582		jdone = jiffies + minruntime * HZ;
   583		do {
   584			bool gp_succeeded = false;
   585	
   586			if (writer_holdoff)
   587				udelay(writer_holdoff);
   588			if (writer_holdoff_jiffies)
   589				schedule_timeout_idle(torture_random(&tr) % writer_holdoff_jiffies + 1);
   590			wdp = &wdpp[i];
   591			*wdp = ktime_get_mono_fast_ns();
   592			if (gp_async && !WARN_ON_ONCE(!cur_ops->async)) {
   593				if (!wmbp)
   594					wmbp = rcu_scale_alloc(me);
   595				if (wmbp && atomic_read(&wflp->ws_inflight) < gp_async_max) {
   596					atomic_inc(&wflp->ws_inflight);
   597					cur_ops->async(&wmbp->wmb_rh, rcu_scale_async_cb);
   598					wmbp = NULL;
   599					gp_succeeded = true;
   600				} else if (!kthread_should_stop()) {
   601					cur_ops->gp_barrier();
   602				} else {
   603					rcu_scale_free(wmbp); /* Because we are stopping. */
   604					wmbp = NULL;
   605				}
   606			} else if (gp_exp) {
   607				cur_ops->exp_sync();
   608				gp_succeeded = true;
   609			} else {
   610				cur_ops->sync();
   611				gp_succeeded = true;
   612			}
   613			t = ktime_get_mono_fast_ns();
   614			*wdp = t - *wdp;
   615			i_max = i;
   616			writer_n_durations[me] = i_max + 1;
   617			if (!started &&
   618			    atomic_read(&n_rcu_scale_writer_started) >= nrealwriters)
   619				started = true;
   620			if (!done && i >= MIN_MEAS && time_after(jiffies, jdone)) {
   621				done = true;
   622				WRITE_ONCE(writer_done[me], true);
   623				sched_set_normal(current, 0);
   624				pr_alert("%s%s rcu_scale_writer %ld has %d measurements\n",
   625					 scale_type, SCALE_FLAG, me, MIN_MEAS);
   626				if (atomic_inc_return(&n_rcu_scale_writer_finished) >=
   627				    nrealwriters) {
   628					schedule_timeout_interruptible(10);
   629					rcu_ftrace_dump(DUMP_ALL);
 > 630					WRITE_ONCE(test_complete, true);
   631					SCALEOUT_STRING("Test complete");
   632					t_rcu_scale_writer_finished = t;
   633					if (gp_exp) {
   634						b_rcu_gp_test_finished =
   635							cur_ops->exp_completed() / 2;
   636					} else {
   637						b_rcu_gp_test_finished =
   638							cur_ops->get_gp_seq();
   639					}
   640					if (shutdown) {
   641						smp_mb(); /* Assign before wake. */
   642						wake_up(&shutdown_wq);
   643					}
   644				}
   645			}
   646			if (done && !alldone &&
   647			    atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters)
   648				alldone = true;
   649			if (done && !alldone && time_after(jiffies, jdone + HZ * 60)) {
   650				static atomic_t dumped;
   651				int i;
   652	
   653				if (!atomic_xchg(&dumped, 1)) {
   654					for (i = 0; i < nrealwriters; i++) {
   655						if (writer_done[i])
   656							continue;
   657						pr_info("%s: Task %ld flags writer %d:\n", __func__, me, i);
   658						sched_show_task(writer_tasks[i]);
   659					}
   660					if (cur_ops->stats)
   661						cur_ops->stats();
   662				}
   663			}
   664			if (!selfreport && time_after(jiffies, jdone + HZ * (70 + me))) {
   665				pr_info("%s: Writer %ld self-report: started %d done %d/%d->%d i %d jdone %lu.\n",
   666					__func__, me, started, done, writer_done[me], atomic_read(&n_rcu_scale_writer_finished), i, jiffies - jdone);
   667				selfreport = true;
   668			}
   669			if (gp_succeeded && started && !alldone && i < MAX_MEAS - 1)
   670				i++;
   671			rcu_scale_wait_shutdown();
   672		} while (!torture_must_stop());
   673		if (gp_async && cur_ops->async) {
   674			rcu_scale_free(wmbp);
   675			cur_ops->gp_barrier();
   676		}
   677		torture_kthread_stopping("rcu_scale_writer");
   678		return 0;
   679	}
   680	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset
  2025-07-30  2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
                   ` (3 preceding siblings ...)
  2025-07-30  2:23 ` [PATCH v1 4/4] rcuscale: Add CPU affinity offset options Yuzhuo Jing
@ 2025-07-31 23:38 ` Paul E. McKenney
  4 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-07-31 23:38 UTC (permalink / raw)
  To: Yuzhuo Jing
  Cc: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
	Josh Triplett, Frederic Weisbecker, Neeraj Upadhyay,
	Joel Fernandes, Boqun Feng, Uladzislau Rezki, Steven Rostedt,
	Mathieu Desnoyers, Lai Jiangshan, Zqiang, Andrew Morton,
	Ingo Molnar, Borislav Petkov, Arnd Bergmann, Frank van der Linden,
	linux-doc, linux-kernel, rcu

On Tue, Jul 29, 2025 at 07:23:43PM -0700, Yuzhuo Jing wrote:
> In an effort to add RCU benchmarks to the perf tool and to improve
> the base-metal rcuscale tests, this patch series adds several auxiliary
> features useful for testing tools.
> 
> This series introduces a few rcuscale options:
>   * writer_no_print: skip writer duration printing during shutdown, but
>     instead let users read from the new "writer_durations" debugfs file.
>     This drastically improves cleanup speed.

But existing scripts running something like this will continue to
work, correct?  (It looks like they do, just checking.)

tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 5

Don't get me wrong, your debugfs read-out performance increase looks
quite good, but these tests run in a guest OS with minimal userspace.
And by "minimal", I mean that they run out of an initrd having a root
filesystem consisting of a single statically linked "init" program.  ;-)

>   * block_start: an option to hold all worker thread until the new
>     debugfs "should_start" file is written.
>   * {reader,writer,kfree}_cpu_offset: the starting value of CPU affinity
>     for each type of threads.  This can be used to avoid scheduling
>     different types of threads on the same CPU.  The 4th patch in this
>     series shows drastic performance differences w/ and w/o overlaps.

The usual use cases run only writers except for stress tests, but this
seems like a good capability.

> This patch series creates an "rcuscale" folder in debugfs, containing
> the following files:
>   * writer_durations: a CSV formatted file containing writer id and
>     writer durations.
>   * {reader,writer,kfree}_tasks: the list of kernel task PIDs for
>     external tools to attach to.
>   * should_start: a writable file to signal the start of the experiment,
>     used in conjunction with the new "block_start" option.
>   * test_complete: a readable file to indicate whether the experiment has
>     finished or not.
> 
> RFCs:
>   * Should those new files reside in debugfs or in procfs?

New files in procfs face serious scrutiny, so your choice of debugfs
is a good one.

>   * What format should be used for the writer_duartions file, and what
>     documentations should be updated for the file format definition?

Back in the old days, I would have insisted on space/tab separated fields.
But gawk now supports a --csv flag, so I don't feel strongly about this.

>   * In the 4th patch, we see different characteristics between overlap
>     and non-overlap.  Current rcuscale creates nr_cpu readers and nr_cpu
>     writers, thus scheduling 2nr_cpu tasks on nr_cpu CPUs.  Should we
>     consider changes to this behavior?  Or add automatic conflict
>     resolutions when total threads <= nr_cpu.

The theory back in the day was that the updater would spend enough time
blocked that this would not matter.  However, you have shown that it
clearly does matter.

Except that running the reader and writer on the same CPU seems to
*improve* grace-period latency, with P99 value duration of 121,004
microseconds for overlapping (your first patch 4/4 experiment) and of
218,018 microseconds for non-overlapping.  Since shorter grace periods
are usually considered better, this suggests better performance with
the reader and writer running on the same thread.

Or am I misreading your commit log?

It would not be too surprising for the overlapping case to provide
faster grace periods because you are running PREEMPT=n and the writer
kthread would force context switches more frequently.  But I figured
that I should check.

> Thank you!
> 
> Yuzhuo Jing (4):
>   rcuscale: Create debugfs file for writer durations
>   rcuscale: Create debugfs files for worker thread PIDs
>   rcuscale: Add file based start/finish control

This does not apply on the dev branch of my -rcu tree.  Which is not too
surprising because kernel-parameters.txt is subject to change.  But when
you repost to fix the bug that kernel test robot detected, could you
please let me know what mainline version you are developing against?
That would allow me to apply it there and then to rebase and resolve
conflicts as needed.

							Thanx, Paul

>   rcuscale: Add CPU affinity offset options
> 
>  .../admin-guide/kernel-parameters.txt         |  29 ++
>  kernel/rcu/rcuscale.c                         | 361 +++++++++++++++++-
>  2 files changed, 377 insertions(+), 13 deletions(-)
> 
> -- 
> 2.50.1.552.g942d659e1b-goog
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-07-31 23:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30  2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
2025-07-30  2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
2025-07-31  7:47   ` kernel test robot
2025-07-30  2:23 ` [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs Yuzhuo Jing
2025-07-30  2:23 ` [PATCH v1 3/4] rcuscale: Add file based start/finish control Yuzhuo Jing
2025-07-30  2:23 ` [PATCH v1 4/4] rcuscale: Add CPU affinity offset options Yuzhuo Jing
2025-07-31 23:38 ` [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).