* [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset
@ 2025-07-30 2:23 Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
` (4 more replies)
0 siblings, 5 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30 2:23 UTC (permalink / raw)
To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
Frank van der Linden, linux-doc, linux-kernel, rcu
Cc: Yuzhuo Jing
In an effort to add RCU benchmarks to the perf tool and to improve
the base-metal rcuscale tests, this patch series adds several auxiliary
features useful for testing tools.
This series introduces a few rcuscale options:
* writer_no_print: skip writer duration printing during shutdown, but
instead let users read from the new "writer_durations" debugfs file.
This drastically improves cleanup speed.
* block_start: an option to hold all worker thread until the new
debugfs "should_start" file is written.
* {reader,writer,kfree}_cpu_offset: the starting value of CPU affinity
for each type of threads. This can be used to avoid scheduling
different types of threads on the same CPU. The 4th patch in this
series shows drastic performance differences w/ and w/o overlaps.
This patch series creates an "rcuscale" folder in debugfs, containing
the following files:
* writer_durations: a CSV formatted file containing writer id and
writer durations.
* {reader,writer,kfree}_tasks: the list of kernel task PIDs for
external tools to attach to.
* should_start: a writable file to signal the start of the experiment,
used in conjunction with the new "block_start" option.
* test_complete: a readable file to indicate whether the experiment has
finished or not.
RFCs:
* Should those new files reside in debugfs or in procfs?
* What format should be used for the writer_duartions file, and what
documentations should be updated for the file format definition?
* In the 4th patch, we see different characteristics between overlap
and non-overlap. Current rcuscale creates nr_cpu readers and nr_cpu
writers, thus scheduling 2nr_cpu tasks on nr_cpu CPUs. Should we
consider changes to this behavior? Or add automatic conflict
resolutions when total threads <= nr_cpu.
Thank you!
Yuzhuo Jing (4):
rcuscale: Create debugfs file for writer durations
rcuscale: Create debugfs files for worker thread PIDs
rcuscale: Add file based start/finish control
rcuscale: Add CPU affinity offset options
.../admin-guide/kernel-parameters.txt | 29 ++
kernel/rcu/rcuscale.c | 361 +++++++++++++++++-
2 files changed, 377 insertions(+), 13 deletions(-)
--
2.50.1.552.g942d659e1b-goog
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations
2025-07-30 2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
@ 2025-07-30 2:23 ` Yuzhuo Jing
2025-07-31 7:47 ` kernel test robot
2025-07-30 2:23 ` [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs Yuzhuo Jing
` (3 subsequent siblings)
4 siblings, 1 reply; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30 2:23 UTC (permalink / raw)
To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
Frank van der Linden, linux-doc, linux-kernel, rcu
Cc: Yuzhuo Jing
Creates an "rcuscale" folder in debugfs and creates a "writer_durations"
file in the folder. This file is in CSV format. Each line represents
one duration record, with columns defined as:
writer_id,duration
Added an option "writer_no_print" to skip printing writer durations on
cleanup.
This allows external tools to read structured data and also drastically
improves cleanup performance on large core count machines.
On a 256C 512T machines running nreaders=1 nwriters=511:
Before:
$ time modprobe -r rcuscale; modprobe -r torture
real 3m17.349s
user 0m0.000s
sys 3m15.288s
After:
$ time cat /sys/kernel/debug/rcuscale/writer_durations > durations.csv
real 0m0.005s
user 0m0.000s
sys 0m0.005s
$ time modprobe -r rcuscale; modprobe -r torture
real 0m0.388s
user 0m0.000s
sys 0m0.335s
Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
.../admin-guide/kernel-parameters.txt | 5 +
kernel/rcu/rcuscale.c | 142 +++++++++++++++++-
2 files changed, 139 insertions(+), 8 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f1f2c0874da9..7b62a84a19d4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5583,6 +5583,11 @@
periods, but in jiffies. The default of zero
says no holdoff.
+ rcuscale.writer_no_print= [KNL]
+ Do not print writer durations to kernel ring buffer.
+ Instead, users can read them from the
+ rcuscale/writer_durations file in debugfs.
+
rcutorture.fqs_duration= [KNL]
Set duration of force_quiescent_state bursts
in microseconds.
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index b521d0455992..ad10b42be6fc 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -40,6 +40,8 @@
#include <linux/vmalloc.h>
#include <linux/rcupdate_trace.h>
#include <linux/sched/debug.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
#include "rcu.h"
@@ -97,6 +99,7 @@ torture_param(bool, shutdown, RCUSCALE_SHUTDOWN,
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
torture_param(int, writer_holdoff_jiffies, 0, "Holdoff (jiffies) between GPs, zero to disable");
+torture_param(bool, writer_no_print, false, "Do not print writer durations to ring buffer");
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
torture_param(int, kfree_by_call_rcu, 0, "Use call_rcu() to emulate kfree_rcu()?");
@@ -138,6 +141,9 @@ static u64 t_rcu_scale_writer_finished;
static unsigned long b_rcu_gp_test_started;
static unsigned long b_rcu_gp_test_finished;
+static struct dentry *debugfs_dir;
+static struct dentry *debugfs_writer_durations;
+
#define MAX_MEAS 10000
#define MIN_MEAS 100
@@ -607,6 +613,7 @@ rcu_scale_writer(void *arg)
t = ktime_get_mono_fast_ns();
*wdp = t - *wdp;
i_max = i;
+ writer_n_durations[me] = i_max + 1;
if (!started &&
atomic_read(&n_rcu_scale_writer_started) >= nrealwriters)
started = true;
@@ -620,6 +627,7 @@ rcu_scale_writer(void *arg)
nrealwriters) {
schedule_timeout_interruptible(10);
rcu_ftrace_dump(DUMP_ALL);
+ WRITE_ONCE(test_complete, true);
SCALEOUT_STRING("Test complete");
t_rcu_scale_writer_finished = t;
if (gp_exp) {
@@ -666,7 +674,6 @@ rcu_scale_writer(void *arg)
rcu_scale_free(wmbp);
cur_ops->gp_barrier();
}
- writer_n_durations[me] = i_max + 1;
torture_kthread_stopping("rcu_scale_writer");
return 0;
}
@@ -941,6 +948,117 @@ kfree_scale_init(void)
return firsterr;
}
+/*
+ * A seq_file for writer_durations. Content is only visible when all writers
+ * finish. Element i of the sequence is writer_durations + i.
+ */
+static void *writer_durations_start(struct seq_file *m, loff_t *pos)
+{
+ loff_t writer_id = *pos;
+
+ if (!test_complete || writer_id < 0 || writer_id >= nrealwriters)
+ return NULL;
+
+ return writer_durations + writer_id;
+}
+
+static void *writer_durations_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ (*pos)++;
+ return writer_durations_start(m, pos);
+}
+
+static void writer_durations_stop(struct seq_file *m, void *v)
+{
+}
+
+/*
+ * Each element in the seq_file is an array of one writer's durations.
+ * Each element prints writer_n_durations[writer_id] lines, and each line
+ * contains one duration record, in CSV format:
+ * writer_id,duration
+ */
+static int writer_durations_show(struct seq_file *m, void *v)
+{
+ u64 **durations = v;
+ loff_t writer_id = durations - writer_durations;
+
+ for (int i = 0; i < writer_n_durations[writer_id]; ++i)
+ seq_printf(m, "%lld,%lld\n", writer_id, durations[0][i]);
+
+ return 0;
+}
+
+static const struct seq_operations writer_durations_op = {
+ .start = writer_durations_start,
+ .next = writer_durations_next,
+ .stop = writer_durations_stop,
+ .show = writer_durations_show
+};
+
+static int writer_durations_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &writer_durations_op);
+}
+
+static const struct file_operations writer_durations_fops = {
+ .owner = THIS_MODULE,
+ .open = writer_durations_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+/*
+ * Create an rcuscale directory exposing run states and results.
+ */
+static int register_debugfs(void)
+{
+#define try_create_file(variable, name, mode, parent, data, fops) \
+({ \
+ variable = debugfs_create_file((name), (mode), (parent), (data), (fops)); \
+ err = PTR_ERR_OR_ZERO(variable); \
+ err; \
+})
+
+ int err;
+
+ debugfs_dir = debugfs_create_dir("rcuscale", NULL);
+ err = PTR_ERR_OR_ZERO(debugfs_dir);
+ if (err)
+ goto fail;
+
+ if (try_create_file(debugfs_writer_durations, "writer_durations", 0444,
+ debugfs_dir, NULL, &writer_durations_fops))
+ goto fail;
+
+ return 0;
+fail:
+ pr_err("rcu-scale: Failed to create debugfs file.");
+ /* unregister_debugfs is called by rcu_scale_cleanup, avoid
+ * calling it twice.
+ */
+ return err;
+#undef try_create_file
+}
+
+static void unregister_debugfs(void)
+{
+#define try_remove(variable) \
+do { \
+ if (!IS_ERR_OR_NULL(variable)) \
+ debugfs_remove(variable); \
+ variable = NULL; \
+} while (0)
+
+ try_remove(debugfs_writer_durations);
+
+ /* Remove directory after files. */
+ try_remove(debugfs_dir);
+
+#undef try_remove
+}
+
static void
rcu_scale_cleanup(void)
{
@@ -961,6 +1079,8 @@ rcu_scale_cleanup(void)
if (gp_exp && gp_async)
SCALEOUT_ERRSTRING("No expedited async GPs, so went with async!");
+ unregister_debugfs();
+
// If built-in, just report all of the GP kthread's CPU time.
if (IS_BUILTIN(CONFIG_RCU_SCALE_TEST) && !kthread_tp && cur_ops->rso_gp_kthread)
kthread_tp = cur_ops->rso_gp_kthread();
@@ -1020,13 +1140,15 @@ rcu_scale_cleanup(void)
wdpp = writer_durations[i];
if (!wdpp)
continue;
- for (j = 0; j < writer_n_durations[i]; j++) {
- wdp = &wdpp[j];
- pr_alert("%s%s %4d writer-duration: %5d %llu\n",
- scale_type, SCALE_FLAG,
- i, j, *wdp);
- if (j % 100 == 0)
- schedule_timeout_uninterruptible(1);
+ if (!writer_no_print) {
+ for (j = 0; j < writer_n_durations[i]; j++) {
+ wdp = &wdpp[j];
+ pr_alert("%s%s %4d writer-duration: %5d %llu\n",
+ scale_type, SCALE_FLAG,
+ i, j, *wdp);
+ if (j % 100 == 0)
+ schedule_timeout_uninterruptible(1);
+ }
}
kfree(writer_durations[i]);
if (writer_freelists) {
@@ -1202,6 +1324,10 @@ rcu_scale_init(void)
if (torture_init_error(firsterr))
goto unwind;
}
+
+ if (register_debugfs())
+ goto unwind;
+
torture_init_end();
return 0;
--
2.50.1.552.g942d659e1b-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs
2025-07-30 2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
@ 2025-07-30 2:23 ` Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 3/4] rcuscale: Add file based start/finish control Yuzhuo Jing
` (2 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30 2:23 UTC (permalink / raw)
To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
Frank van der Linden, linux-doc, linux-kernel, rcu
Cc: Yuzhuo Jing
Creates {reader,writer,kfree}_tasks files in the "rcuscale" debugfs
folder. Each line contains one kernel thread PID.
This provides a more robust way for external performance analysis tools
to attach to kernel threads than using pgrep.
Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
kernel/rcu/rcuscale.c | 124 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 124 insertions(+)
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index ad10b42be6fc..7c88d461ed2c 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -143,6 +143,9 @@ static unsigned long b_rcu_gp_test_finished;
static struct dentry *debugfs_dir;
static struct dentry *debugfs_writer_durations;
+static struct dentry *debugfs_reader_tasks;
+static struct dentry *debugfs_writer_tasks;
+static struct dentry *debugfs_kfree_tasks;
#define MAX_MEAS 10000
#define MIN_MEAS 100
@@ -1009,6 +1012,112 @@ static const struct file_operations writer_durations_fops = {
.release = seq_release,
};
+/*
+ * Generic seq_file private data for tasks walkthrough.
+ */
+struct debugfs_pid_info {
+ int ntasks;
+ struct task_struct **tasks;
+};
+
+/*
+ * Generic seq_file pos to pointer conversion function, using private data
+ * of type debugfs_pid_info, and ensure it is within bound.
+ */
+static void *debugfs_pid_start(struct seq_file *m, loff_t *pos)
+{
+ loff_t worker = *pos;
+ struct debugfs_pid_info *info = m->private;
+
+ if (worker < 0 || worker >= info->ntasks)
+ return NULL;
+
+ return info->tasks[worker];
+}
+
+static void *debugfs_pid_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ (*pos)++;
+ return debugfs_pid_start(m, pos);
+}
+
+/*
+ * Each line of the file contains one PID from the selected kernel threads.
+ */
+static int debugfs_pid_show(struct seq_file *m, void *v)
+{
+ seq_printf(m, "%d\n", ((struct task_struct *)v)->pid);
+ return 0;
+}
+
+static void debugfs_pid_stop(struct seq_file *m, void *v)
+{
+}
+
+static const struct seq_operations debugfs_pid_fops = {
+ .start = debugfs_pid_start,
+ .next = debugfs_pid_next,
+ .stop = debugfs_pid_stop,
+ .show = debugfs_pid_show
+};
+
+/*
+ * Generic seq_file creation function that sets private data of type
+ * debugfs_pid_info.
+ */
+static int debugfs_pid_open_info(struct inode *inode, struct file *file,
+ int ntasks, struct task_struct **tasks)
+{
+ struct debugfs_pid_info *info =
+ __seq_open_private(file, &debugfs_pid_fops, sizeof(*info));
+ if (!info)
+ return -ENOMEM;
+
+ info->ntasks = ntasks;
+ info->tasks = tasks;
+
+ return 0;
+}
+
+static int debugfs_pid_open_reader(struct inode *inode, struct file *file)
+{
+ return debugfs_pid_open_info(inode, file, nrealreaders, reader_tasks);
+}
+
+static int debugfs_pid_open_writer(struct inode *inode, struct file *file)
+{
+ return debugfs_pid_open_info(inode, file, nrealwriters, writer_tasks);
+}
+
+static int debugfs_pid_open_kfree(struct inode *inode, struct file *file)
+{
+ return debugfs_pid_open_info(inode, file, kfree_nrealthreads, kfree_reader_tasks);
+}
+
+static const struct file_operations readers_fops = {
+ .owner = THIS_MODULE,
+ .open = debugfs_pid_open_reader,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static const struct file_operations writers_fops = {
+ .owner = THIS_MODULE,
+ .open = debugfs_pid_open_writer,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static const struct file_operations kfrees_fops = {
+ .owner = THIS_MODULE,
+ .open = debugfs_pid_open_kfree,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
/*
* Create an rcuscale directory exposing run states and results.
*/
@@ -1032,6 +1141,18 @@ static int register_debugfs(void)
debugfs_dir, NULL, &writer_durations_fops))
goto fail;
+ if (try_create_file(debugfs_reader_tasks, "reader_tasks", 0444,
+ debugfs_dir, NULL, &readers_fops))
+ goto fail;
+
+ if (try_create_file(debugfs_writer_tasks, "writer_tasks", 0444,
+ debugfs_dir, NULL, &writers_fops))
+ goto fail;
+
+ if (try_create_file(debugfs_kfree_tasks, "kfree_tasks", 0444,
+ debugfs_dir, NULL, &kfrees_fops))
+ goto fail;
+
return 0;
fail:
pr_err("rcu-scale: Failed to create debugfs file.");
@@ -1052,6 +1173,9 @@ do { \
} while (0)
try_remove(debugfs_writer_durations);
+ try_remove(debugfs_reader_tasks);
+ try_remove(debugfs_writer_tasks);
+ try_remove(debugfs_kfree_tasks);
/* Remove directory after files. */
try_remove(debugfs_dir);
--
2.50.1.552.g942d659e1b-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v1 3/4] rcuscale: Add file based start/finish control
2025-07-30 2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs Yuzhuo Jing
@ 2025-07-30 2:23 ` Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 4/4] rcuscale: Add CPU affinity offset options Yuzhuo Jing
2025-07-31 23:38 ` [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Paul E. McKenney
4 siblings, 0 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30 2:23 UTC (permalink / raw)
To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
Frank van der Linden, linux-doc, linux-kernel, rcu
Cc: Yuzhuo Jing
In addition to the existing timing-based (holdoff, writer_holdoff)
start control, add file-based controls to debugfs.
This patch adds an option "block_start", which holds all worker threads
until the "rcuscale/should_start" debugfs file is written with a
non-zero integer. A new "test_complete" file is added to the debugfs
folder, with file content "0" indicating experiment has not finished and
"1" indicating finished. This is useful for start/finish control by
external test tools.
Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
.../admin-guide/kernel-parameters.txt | 5 ++
kernel/rcu/rcuscale.c | 79 +++++++++++++++++++
2 files changed, 84 insertions(+)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7b62a84a19d4..5e233e511f81 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5487,6 +5487,11 @@
Default is 0.
+ rcuscale.block_start= [KNL]
+ Block the experiment start until "1" is written to the
+ rcuscale/should_start file in debugfs. This is useful
+ for start/finish control by external tools.
+
rcuscale.gp_async= [KNL]
Measure performance of asynchronous
grace-period primitives such as call_rcu().
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 7c88d461ed2c..43bcaeac457f 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -87,6 +87,7 @@ MODULE_AUTHOR("Paul E. McKenney <paulmck@linux.ibm.com>");
# define RCUSCALE_SHUTDOWN 1
#endif
+torture_param(bool, block_start, false, "Block all threads after creation and wait for should_start");
torture_param(bool, gp_async, false, "Use asynchronous GP wait primitives");
torture_param(int, gp_async_max, 1000, "Max # outstanding waits per writer");
torture_param(bool, gp_exp, false, "Use expedited GP wait primitives");
@@ -146,6 +147,12 @@ static struct dentry *debugfs_writer_durations;
static struct dentry *debugfs_reader_tasks;
static struct dentry *debugfs_writer_tasks;
static struct dentry *debugfs_kfree_tasks;
+static struct dentry *debugfs_should_start;
+static struct dentry *debugfs_test_complete;
+
+static DECLARE_COMPLETION(start_barrier);
+static bool should_start;
+static bool test_complete;
#define MAX_MEAS 10000
#define MIN_MEAS 100
@@ -457,6 +464,23 @@ static void rcu_scale_wait_shutdown(void)
schedule_timeout_uninterruptible(1);
}
+/*
+ * Wait start_barrier if block_start is enabled. Exit early if shutdown
+ * is requested.
+ *
+ * Return: true if caller should exit; false if caller should continue.
+ */
+static bool wait_start_barrier(void)
+{
+ if (!block_start)
+ return false;
+ while (wait_for_completion_interruptible(&start_barrier)) {
+ if (torture_must_stop())
+ return true;
+ }
+ return false;
+}
+
/*
* RCU scalability reader kthread. Repeatedly does empty RCU read-side
* critical section, minimizing update-side interference. However, the
@@ -475,6 +499,11 @@ rcu_scale_reader(void *arg)
set_user_nice(current, MAX_NICE);
atomic_inc(&n_rcu_scale_reader_started);
+ if (wait_start_barrier()) {
+ torture_kthread_stopping("rcu_scale_reader");
+ return 0;
+ }
+
do {
local_irq_save(flags);
idx = cur_ops->readlock();
@@ -560,6 +589,11 @@ rcu_scale_writer(void *arg)
current->flags |= PF_NO_SETAFFINITY;
sched_set_fifo_low(current);
+ if (wait_start_barrier()) {
+ torture_kthread_stopping("rcu_scale_writer");
+ return 0;
+ }
+
if (holdoff)
schedule_timeout_idle(holdoff * HZ);
@@ -755,6 +789,11 @@ kfree_scale_thread(void *arg)
set_user_nice(current, MAX_NICE);
kfree_rcu_test_both = (kfree_rcu_test_single == kfree_rcu_test_double);
+ if (wait_start_barrier()) {
+ torture_kthread_stopping("kfree_scale_thread");
+ return 0;
+ }
+
start_time = ktime_get_mono_fast_ns();
if (atomic_inc_return(&n_kfree_scale_thread_started) >= kfree_nrealthreads) {
@@ -1118,6 +1157,32 @@ static const struct file_operations kfrees_fops = {
.release = seq_release,
};
+/*
+ * For the "should_start" writable file, reuse debugfs integer parsing, but
+ * override write function to also send complete_all if should_start is
+ * changed to 1.
+ *
+ * Any non-zero value written to this file is converted to 1.
+ */
+static int should_start_set(void *data, u64 val)
+{
+ *(bool *)data = !!val;
+
+ if (block_start && !!val)
+ complete_all(&start_barrier);
+
+ return 0;
+}
+
+static int bool_get(void *data, u64 *val)
+{
+ *val = *(bool *)data;
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(should_start_fops, bool_get, should_start_set, "%llu");
+DEFINE_DEBUGFS_ATTRIBUTE(test_complete_fops, bool_get, NULL, "%llu");
+
/*
* Create an rcuscale directory exposing run states and results.
*/
@@ -1153,6 +1218,15 @@ static int register_debugfs(void)
debugfs_dir, NULL, &kfrees_fops))
goto fail;
+ if (try_create_file(debugfs_should_start, "should_start", 0644,
+ debugfs_dir, &should_start, &should_start_fops))
+ goto fail;
+
+ /* Future: add notification method for readers waiting on file change. */
+ if (try_create_file(debugfs_test_complete, "test_complete", 0444,
+ debugfs_dir, &test_complete, &test_complete_fops))
+ goto fail;
+
return 0;
fail:
pr_err("rcu-scale: Failed to create debugfs file.");
@@ -1176,6 +1250,8 @@ do { \
try_remove(debugfs_reader_tasks);
try_remove(debugfs_writer_tasks);
try_remove(debugfs_kfree_tasks);
+ try_remove(debugfs_should_start);
+ try_remove(debugfs_test_complete);
/* Remove directory after files. */
try_remove(debugfs_dir);
@@ -1372,6 +1448,9 @@ rcu_scale_init(void)
atomic_set(&n_rcu_scale_writer_finished, 0);
rcu_scale_print_module_parms(cur_ops, "Start of test");
+ if (!block_start)
+ should_start = true;
+
/* Start up the kthreads. */
if (shutdown) {
--
2.50.1.552.g942d659e1b-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v1 4/4] rcuscale: Add CPU affinity offset options
2025-07-30 2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
` (2 preceding siblings ...)
2025-07-30 2:23 ` [PATCH v1 3/4] rcuscale: Add file based start/finish control Yuzhuo Jing
@ 2025-07-30 2:23 ` Yuzhuo Jing
2025-07-31 23:38 ` [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Paul E. McKenney
4 siblings, 0 replies; 7+ messages in thread
From: Yuzhuo Jing @ 2025-07-30 2:23 UTC (permalink / raw)
To: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
Paul E . McKenney, Josh Triplett, Frederic Weisbecker,
Neeraj Upadhyay, Joel Fernandes, Boqun Feng, Uladzislau Rezki,
Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang,
Andrew Morton, Ingo Molnar, Borislav Petkov, Arnd Bergmann,
Frank van der Linden, linux-doc, linux-kernel, rcu
Cc: Yuzhuo Jing
Currently, reader, writer, and kfree threads set their affinity by their
id % nr_cpu_ids. IDs of the all three types all start from 0, and
therefore readers, writers, and kfrees may be scheduled on the same CPU.
This patch adds options to offset CPU affinity.
From the experiments below, writer duration characteristics are very
different between offset 0 and 1. Experiments carried out on a 256C 512T
machine running PREEMPT=n kernel.
Experiment: nreaders=1 nwriters=1 reader_cpu_offset=0 writer_cpu_offset=0
Average grace-period duration: 108376 microseconds
Minimum grace-period duration: 13000.4
50th percentile grace-period duration: 115000
90th percentile grace-period duration: 121000
99th percentile grace-period duration: 121004
Maximum grace-period duration: 219000
Grace periods: 101 Batches: 1 Ratio: 101
Experiment: nreaders=1 nwriters=1 reader_cpu_offset=0 writer_cpu_offset=1
Average grace-period duration: 185950 microseconds
Minimum grace-period duration: 8999.84
50th percentile grace-period duration: 217946
90th percentile grace-period duration: 218003
99th percentile grace-period duration: 218018
Maximum grace-period duration: 272195
Grace periods: 101 Batches: 1 Ratio: 101
Signed-off-by: Yuzhuo Jing <yuzhuo@google.com>
---
.../admin-guide/kernel-parameters.txt | 19 +++++++++++++++++++
kernel/rcu/rcuscale.c | 16 +++++++++++-----
2 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5e233e511f81..f68651c103a4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1,3 +1,4 @@
+# vim: noet:sw=8:sts=8:
accept_memory= [MM]
Format: { eager | lazy }
default: lazy
@@ -5513,6 +5514,12 @@
test until boot completes in order to avoid
interference.
+ rcuscale.kfree_cpu_offset= [KNL]
+ Set the starting CPU affinity index of kfree threads.
+ CPU affinity is assigned sequentially from
+ kfree_cpu_offset to kfree_cpu_offset+kfree_nthreads,
+ modded by number of CPUs. Negative value is reset to 0.
+
rcuscale.kfree_by_call_rcu= [KNL]
In kernels built with CONFIG_RCU_LAZY=y, test
call_rcu() instead of kfree_rcu().
@@ -5567,6 +5574,12 @@
the same as for rcuscale.nreaders.
N, where N is the number of CPUs
+ rcuscale.reader_cpu_offset= [KNL]
+ Set the starting CPU affinity index of reader threads.
+ CPU affinity is assigned sequentially from
+ reader_cpu_offset to reader_cpu_offset+nreaders, modded
+ by number of CPUs. Negative value is reset to 0.
+
rcuscale.scale_type= [KNL]
Specify the RCU implementation to test.
@@ -5578,6 +5591,12 @@
rcuscale.verbose= [KNL]
Enable additional printk() statements.
+ rcuscale.writer_cpu_offset= [KNL]
+ Set the starting CPU affinity index of writer threads.
+ CPU affinity is assigned sequentially from
+ writer_cpu_offset to writer_cpu_offset+nwriters, modded
+ by number of CPUs. Negative value is reset to 0.
+
rcuscale.writer_holdoff= [KNL]
Write-side holdoff between grace periods,
in microseconds. The default of zero says
diff --git a/kernel/rcu/rcuscale.c b/kernel/rcu/rcuscale.c
index 43bcaeac457f..1208169be15e 100644
--- a/kernel/rcu/rcuscale.c
+++ b/kernel/rcu/rcuscale.c
@@ -95,12 +95,15 @@ torture_param(int, holdoff, 10, "Holdoff time before test start (s)");
torture_param(int, minruntime, 0, "Minimum run time (s)");
torture_param(int, nreaders, -1, "Number of RCU reader threads");
torture_param(int, nwriters, -1, "Number of RCU updater threads");
+torture_param(int, reader_cpu_offset, 0, "Offset of reader CPU affinity")
torture_param(bool, shutdown, RCUSCALE_SHUTDOWN,
"Shutdown at end of scalability tests.");
torture_param(int, verbose, 1, "Enable verbose debugging printk()s");
+torture_param(int, writer_cpu_offset, 0, "Offset of writer CPU affinity")
torture_param(int, writer_holdoff, 0, "Holdoff (us) between GPs, zero to disable");
torture_param(int, writer_holdoff_jiffies, 0, "Holdoff (jiffies) between GPs, zero to disable");
torture_param(bool, writer_no_print, false, "Do not print writer durations to ring buffer");
+torture_param(int, kfree_cpu_offset, 0, "Offset of kfree CPU affinity")
torture_param(int, kfree_rcu_test, 0, "Do we run a kfree_rcu() scale test?");
torture_param(int, kfree_mult, 1, "Multiple of kfree_obj size to allocate.");
torture_param(int, kfree_by_call_rcu, 0, "Use call_rcu() to emulate kfree_rcu()?");
@@ -495,7 +498,7 @@ rcu_scale_reader(void *arg)
long me = (long)arg;
VERBOSE_SCALEOUT_STRING("rcu_scale_reader task started");
- set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
+ set_cpus_allowed_ptr(current, cpumask_of((reader_cpu_offset + me) % nr_cpu_ids));
set_user_nice(current, MAX_NICE);
atomic_inc(&n_rcu_scale_reader_started);
@@ -585,7 +588,7 @@ rcu_scale_writer(void *arg)
VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started");
WARN_ON(!wdpp);
- set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
+ set_cpus_allowed_ptr(current, cpumask_of((writer_cpu_offset + me) % nr_cpu_ids));
current->flags |= PF_NO_SETAFFINITY;
sched_set_fifo_low(current);
@@ -719,8 +722,8 @@ static void
rcu_scale_print_module_parms(struct rcu_scale_ops *cur_ops, const char *tag)
{
pr_alert("%s" SCALE_FLAG
- "--- %s: gp_async=%d gp_async_max=%d gp_exp=%d holdoff=%d minruntime=%d nreaders=%d nwriters=%d writer_holdoff=%d writer_holdoff_jiffies=%d verbose=%d shutdown=%d\n",
- scale_type, tag, gp_async, gp_async_max, gp_exp, holdoff, minruntime, nrealreaders, nrealwriters, writer_holdoff, writer_holdoff_jiffies, verbose, shutdown);
+ "--- %s: gp_async=%d gp_async_max=%d gp_exp=%d holdoff=%d minruntime=%d nreaders=%d nwriters=%d reader_cpu_offset=%d writer_cpu_offset=%d writer_holdoff=%d writer_holdoff_jiffies=%d kfree_cpu_offset=%d verbose=%d shutdown=%d\n",
+ scale_type, tag, gp_async, gp_async_max, gp_exp, holdoff, minruntime, nrealreaders, nrealwriters, reader_cpu_offset, writer_cpu_offset, writer_holdoff, writer_holdoff_jiffies, kfree_cpu_offset, verbose, shutdown);
}
/*
@@ -785,7 +788,7 @@ kfree_scale_thread(void *arg)
DEFINE_TORTURE_RANDOM(tr);
VERBOSE_SCALEOUT_STRING("kfree_scale_thread task started");
- set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
+ set_cpus_allowed_ptr(current, cpumask_of((kfree_cpu_offset + me) % nr_cpu_ids));
set_user_nice(current, MAX_NICE);
kfree_rcu_test_both = (kfree_rcu_test_single == kfree_rcu_test_double);
@@ -1446,6 +1449,9 @@ rcu_scale_init(void)
atomic_set(&n_rcu_scale_reader_started, 0);
atomic_set(&n_rcu_scale_writer_started, 0);
atomic_set(&n_rcu_scale_writer_finished, 0);
+ reader_cpu_offset = max(reader_cpu_offset, 0);
+ writer_cpu_offset = max(writer_cpu_offset, 0);
+ kfree_cpu_offset = max(kfree_cpu_offset, 0);
rcu_scale_print_module_parms(cur_ops, "Start of test");
if (!block_start)
--
2.50.1.552.g942d659e1b-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations
2025-07-30 2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
@ 2025-07-31 7:47 ` kernel test robot
0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-07-31 7:47 UTC (permalink / raw)
To: Yuzhuo Jing, Ian Rogers, Yuzhuo Jing, Jonathan Corbet,
Davidlohr Bueso, Paul E . McKenney, Josh Triplett,
Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Boqun Feng,
Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers,
Lai Jiangshan, Zqiang, Andrew Morton, Ingo Molnar,
Borislav Petkov, Arnd Bergmann, Frank van der Linden, linux-doc,
linux-kernel, rcu
Cc: oe-kbuild-all, Linux Memory Management List
Hi Yuzhuo,
kernel test robot noticed the following build errors:
[auto build test ERROR on rcu/rcu/dev]
[also build test ERROR on linus/master v6.16 next-20250731]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Yuzhuo-Jing/rcuscale-Create-debugfs-file-for-writer-durations/20250730-102613
base: https://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git rcu/dev
patch link: https://lore.kernel.org/r/20250730022347.71722-2-yuzhuo%40google.com
patch subject: [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations
config: i386-randconfig-002-20250731 (https://download.01.org/0day-ci/archive/20250731/202507311504.ttQGhW04-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14+deb12u1) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250731/202507311504.ttQGhW04-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507311504.ttQGhW04-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from <command-line>:
kernel/rcu/rcuscale.c: In function 'rcu_scale_writer':
>> kernel/rcu/rcuscale.c:630:44: error: 'test_complete' undeclared (first use in this function); did you mean 'complete'?
630 | WRITE_ONCE(test_complete, true);
| ^~~~~~~~~~~~~
include/linux/compiler_types.h:548:23: note: in definition of macro '__compiletime_assert'
548 | if (!(condition)) \
| ^~~~~~~~~
include/linux/compiler_types.h:568:9: note: in expansion of macro '_compiletime_assert'
568 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
| ^~~~~~~~~~~~~~~~~~
include/asm-generic/rwonce.h:36:28: note: in expansion of macro '__native_word'
36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
| ^~~~~~~~~~~~~
include/asm-generic/rwonce.h:60:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
60 | compiletime_assert_rwonce_type(x); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/rcu/rcuscale.c:630:33: note: in expansion of macro 'WRITE_ONCE'
630 | WRITE_ONCE(test_complete, true);
| ^~~~~~~~~~
kernel/rcu/rcuscale.c:630:44: note: each undeclared identifier is reported only once for each function it appears in
630 | WRITE_ONCE(test_complete, true);
| ^~~~~~~~~~~~~
include/linux/compiler_types.h:548:23: note: in definition of macro '__compiletime_assert'
548 | if (!(condition)) \
| ^~~~~~~~~
include/linux/compiler_types.h:568:9: note: in expansion of macro '_compiletime_assert'
568 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^~~~~~~~~~~~~~~~~~~
include/asm-generic/rwonce.h:36:9: note: in expansion of macro 'compiletime_assert'
36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
| ^~~~~~~~~~~~~~~~~~
include/asm-generic/rwonce.h:36:28: note: in expansion of macro '__native_word'
36 | compiletime_assert(__native_word(t) || sizeof(t) == sizeof(long long), \
| ^~~~~~~~~~~~~
include/asm-generic/rwonce.h:60:9: note: in expansion of macro 'compiletime_assert_rwonce_type'
60 | compiletime_assert_rwonce_type(x); \
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/rcu/rcuscale.c:630:33: note: in expansion of macro 'WRITE_ONCE'
630 | WRITE_ONCE(test_complete, true);
| ^~~~~~~~~~
kernel/rcu/rcuscale.c: In function 'writer_durations_start':
kernel/rcu/rcuscale.c:959:14: error: 'test_complete' undeclared (first use in this function); did you mean 'complete'?
959 | if (!test_complete || writer_id < 0 || writer_id >= nrealwriters)
| ^~~~~~~~~~~~~
| complete
vim +630 kernel/rcu/rcuscale.c
534
535 /*
536 * RCU scale writer kthread. Repeatedly does a grace period.
537 */
538 static int
539 rcu_scale_writer(void *arg)
540 {
541 int i = 0;
542 int i_max;
543 unsigned long jdone;
544 long me = (long)arg;
545 bool selfreport = false;
546 bool started = false, done = false, alldone = false;
547 u64 t;
548 DEFINE_TORTURE_RANDOM(tr);
549 u64 *wdp;
550 u64 *wdpp = writer_durations[me];
551 struct writer_freelist *wflp = &writer_freelists[me];
552 struct writer_mblock *wmbp = NULL;
553
554 VERBOSE_SCALEOUT_STRING("rcu_scale_writer task started");
555 WARN_ON(!wdpp);
556 set_cpus_allowed_ptr(current, cpumask_of(me % nr_cpu_ids));
557 current->flags |= PF_NO_SETAFFINITY;
558 sched_set_fifo_low(current);
559
560 if (holdoff)
561 schedule_timeout_idle(holdoff * HZ);
562
563 /*
564 * Wait until rcu_end_inkernel_boot() is called for normal GP tests
565 * so that RCU is not always expedited for normal GP tests.
566 * The system_state test is approximate, but works well in practice.
567 */
568 while (!gp_exp && system_state != SYSTEM_RUNNING)
569 schedule_timeout_uninterruptible(1);
570
571 t = ktime_get_mono_fast_ns();
572 if (atomic_inc_return(&n_rcu_scale_writer_started) >= nrealwriters) {
573 t_rcu_scale_writer_started = t;
574 if (gp_exp) {
575 b_rcu_gp_test_started =
576 cur_ops->exp_completed() / 2;
577 } else {
578 b_rcu_gp_test_started = cur_ops->get_gp_seq();
579 }
580 }
581
582 jdone = jiffies + minruntime * HZ;
583 do {
584 bool gp_succeeded = false;
585
586 if (writer_holdoff)
587 udelay(writer_holdoff);
588 if (writer_holdoff_jiffies)
589 schedule_timeout_idle(torture_random(&tr) % writer_holdoff_jiffies + 1);
590 wdp = &wdpp[i];
591 *wdp = ktime_get_mono_fast_ns();
592 if (gp_async && !WARN_ON_ONCE(!cur_ops->async)) {
593 if (!wmbp)
594 wmbp = rcu_scale_alloc(me);
595 if (wmbp && atomic_read(&wflp->ws_inflight) < gp_async_max) {
596 atomic_inc(&wflp->ws_inflight);
597 cur_ops->async(&wmbp->wmb_rh, rcu_scale_async_cb);
598 wmbp = NULL;
599 gp_succeeded = true;
600 } else if (!kthread_should_stop()) {
601 cur_ops->gp_barrier();
602 } else {
603 rcu_scale_free(wmbp); /* Because we are stopping. */
604 wmbp = NULL;
605 }
606 } else if (gp_exp) {
607 cur_ops->exp_sync();
608 gp_succeeded = true;
609 } else {
610 cur_ops->sync();
611 gp_succeeded = true;
612 }
613 t = ktime_get_mono_fast_ns();
614 *wdp = t - *wdp;
615 i_max = i;
616 writer_n_durations[me] = i_max + 1;
617 if (!started &&
618 atomic_read(&n_rcu_scale_writer_started) >= nrealwriters)
619 started = true;
620 if (!done && i >= MIN_MEAS && time_after(jiffies, jdone)) {
621 done = true;
622 WRITE_ONCE(writer_done[me], true);
623 sched_set_normal(current, 0);
624 pr_alert("%s%s rcu_scale_writer %ld has %d measurements\n",
625 scale_type, SCALE_FLAG, me, MIN_MEAS);
626 if (atomic_inc_return(&n_rcu_scale_writer_finished) >=
627 nrealwriters) {
628 schedule_timeout_interruptible(10);
629 rcu_ftrace_dump(DUMP_ALL);
> 630 WRITE_ONCE(test_complete, true);
631 SCALEOUT_STRING("Test complete");
632 t_rcu_scale_writer_finished = t;
633 if (gp_exp) {
634 b_rcu_gp_test_finished =
635 cur_ops->exp_completed() / 2;
636 } else {
637 b_rcu_gp_test_finished =
638 cur_ops->get_gp_seq();
639 }
640 if (shutdown) {
641 smp_mb(); /* Assign before wake. */
642 wake_up(&shutdown_wq);
643 }
644 }
645 }
646 if (done && !alldone &&
647 atomic_read(&n_rcu_scale_writer_finished) >= nrealwriters)
648 alldone = true;
649 if (done && !alldone && time_after(jiffies, jdone + HZ * 60)) {
650 static atomic_t dumped;
651 int i;
652
653 if (!atomic_xchg(&dumped, 1)) {
654 for (i = 0; i < nrealwriters; i++) {
655 if (writer_done[i])
656 continue;
657 pr_info("%s: Task %ld flags writer %d:\n", __func__, me, i);
658 sched_show_task(writer_tasks[i]);
659 }
660 if (cur_ops->stats)
661 cur_ops->stats();
662 }
663 }
664 if (!selfreport && time_after(jiffies, jdone + HZ * (70 + me))) {
665 pr_info("%s: Writer %ld self-report: started %d done %d/%d->%d i %d jdone %lu.\n",
666 __func__, me, started, done, writer_done[me], atomic_read(&n_rcu_scale_writer_finished), i, jiffies - jdone);
667 selfreport = true;
668 }
669 if (gp_succeeded && started && !alldone && i < MAX_MEAS - 1)
670 i++;
671 rcu_scale_wait_shutdown();
672 } while (!torture_must_stop());
673 if (gp_async && cur_ops->async) {
674 rcu_scale_free(wmbp);
675 cur_ops->gp_barrier();
676 }
677 torture_kthread_stopping("rcu_scale_writer");
678 return 0;
679 }
680
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset
2025-07-30 2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
` (3 preceding siblings ...)
2025-07-30 2:23 ` [PATCH v1 4/4] rcuscale: Add CPU affinity offset options Yuzhuo Jing
@ 2025-07-31 23:38 ` Paul E. McKenney
4 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-07-31 23:38 UTC (permalink / raw)
To: Yuzhuo Jing
Cc: Ian Rogers, Yuzhuo Jing, Jonathan Corbet, Davidlohr Bueso,
Josh Triplett, Frederic Weisbecker, Neeraj Upadhyay,
Joel Fernandes, Boqun Feng, Uladzislau Rezki, Steven Rostedt,
Mathieu Desnoyers, Lai Jiangshan, Zqiang, Andrew Morton,
Ingo Molnar, Borislav Petkov, Arnd Bergmann, Frank van der Linden,
linux-doc, linux-kernel, rcu
On Tue, Jul 29, 2025 at 07:23:43PM -0700, Yuzhuo Jing wrote:
> In an effort to add RCU benchmarks to the perf tool and to improve
> the base-metal rcuscale tests, this patch series adds several auxiliary
> features useful for testing tools.
>
> This series introduces a few rcuscale options:
> * writer_no_print: skip writer duration printing during shutdown, but
> instead let users read from the new "writer_durations" debugfs file.
> This drastically improves cleanup speed.
But existing scripts running something like this will continue to
work, correct? (It looks like they do, just checking.)
tools/testing/selftests/rcutorture/bin/kvm.sh --torture rcuscale --allcpus --duration 5
Don't get me wrong, your debugfs read-out performance increase looks
quite good, but these tests run in a guest OS with minimal userspace.
And by "minimal", I mean that they run out of an initrd having a root
filesystem consisting of a single statically linked "init" program. ;-)
> * block_start: an option to hold all worker thread until the new
> debugfs "should_start" file is written.
> * {reader,writer,kfree}_cpu_offset: the starting value of CPU affinity
> for each type of threads. This can be used to avoid scheduling
> different types of threads on the same CPU. The 4th patch in this
> series shows drastic performance differences w/ and w/o overlaps.
The usual use cases run only writers except for stress tests, but this
seems like a good capability.
> This patch series creates an "rcuscale" folder in debugfs, containing
> the following files:
> * writer_durations: a CSV formatted file containing writer id and
> writer durations.
> * {reader,writer,kfree}_tasks: the list of kernel task PIDs for
> external tools to attach to.
> * should_start: a writable file to signal the start of the experiment,
> used in conjunction with the new "block_start" option.
> * test_complete: a readable file to indicate whether the experiment has
> finished or not.
>
> RFCs:
> * Should those new files reside in debugfs or in procfs?
New files in procfs face serious scrutiny, so your choice of debugfs
is a good one.
> * What format should be used for the writer_duartions file, and what
> documentations should be updated for the file format definition?
Back in the old days, I would have insisted on space/tab separated fields.
But gawk now supports a --csv flag, so I don't feel strongly about this.
> * In the 4th patch, we see different characteristics between overlap
> and non-overlap. Current rcuscale creates nr_cpu readers and nr_cpu
> writers, thus scheduling 2nr_cpu tasks on nr_cpu CPUs. Should we
> consider changes to this behavior? Or add automatic conflict
> resolutions when total threads <= nr_cpu.
The theory back in the day was that the updater would spend enough time
blocked that this would not matter. However, you have shown that it
clearly does matter.
Except that running the reader and writer on the same CPU seems to
*improve* grace-period latency, with P99 value duration of 121,004
microseconds for overlapping (your first patch 4/4 experiment) and of
218,018 microseconds for non-overlapping. Since shorter grace periods
are usually considered better, this suggests better performance with
the reader and writer running on the same thread.
Or am I misreading your commit log?
It would not be too surprising for the overlapping case to provide
faster grace periods because you are running PREEMPT=n and the writer
kthread would force context switches more frequently. But I figured
that I should check.
> Thank you!
>
> Yuzhuo Jing (4):
> rcuscale: Create debugfs file for writer durations
> rcuscale: Create debugfs files for worker thread PIDs
> rcuscale: Add file based start/finish control
This does not apply on the dev branch of my -rcu tree. Which is not too
surprising because kernel-parameters.txt is subject to change. But when
you repost to fix the bug that kernel test robot detected, could you
please let me know what mainline version you are developing against?
That would allow me to apply it there and then to rebase and resolve
conflicts as needed.
Thanx, Paul
> rcuscale: Add CPU affinity offset options
>
> .../admin-guide/kernel-parameters.txt | 29 ++
> kernel/rcu/rcuscale.c | 361 +++++++++++++++++-
> 2 files changed, 377 insertions(+), 13 deletions(-)
>
> --
> 2.50.1.552.g942d659e1b-goog
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-07-31 23:38 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 2:23 [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 1/4] rcuscale: Create debugfs file for writer durations Yuzhuo Jing
2025-07-31 7:47 ` kernel test robot
2025-07-30 2:23 ` [PATCH v1 2/4] rcuscale: Create debugfs files for worker thread PIDs Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 3/4] rcuscale: Add file based start/finish control Yuzhuo Jing
2025-07-30 2:23 ` [PATCH v1 4/4] rcuscale: Add CPU affinity offset options Yuzhuo Jing
2025-07-31 23:38 ` [PATCH v1 0/4] rcuscale: Add debugfs file based controls and CPU affinity offset Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).