BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/3] General enhancements to rqspinlock stress test
@ 2025-11-25  2:07 Kumar Kartikeya Dwivedi
  2025-11-25  2:07 ` [PATCH bpf-next v1 1/3] selftests/bpf: Relax CPU requirements for " Kumar Kartikeya Dwivedi
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-11-25  2:07 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kkd, kernel-team

Three enchancements, details in commit messages.

First, the CPU requirements are 2 for AA, 3 for ABBA, and 4 for ABBCCA,
hence relax the check during module initialization. Second, add a
per-CPU histogram to capture lock acquisition times to record which
buckets these acquisitions fall into for the normal task context and NMI
context.  Anything below 10ms is not printed in detail, but above that
displays the full breakdown for each context. Finally, make the delay of
the NMI and task contexts configurable, set to 10 and 20 ms respectively
by default.

Kumar Kartikeya Dwivedi (3):
  selftests/bpf: Relax CPU requirements for rqspinlock stress test
  selftests/bpf: Add lock wait time stats to rqspinlock stress test
  selftests/bpf: Make CS length configurable for rqspinlock stress test

 .../bpf/test_kmods/bpf_test_rqspinlock.c      | 120 +++++++++++++++++-
 1 file changed, 117 insertions(+), 3 deletions(-)


base-commit: 590699d85823f38b74d52a0811ef22ebb61afddc
-- 
2.51.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH bpf-next v1 1/3] selftests/bpf: Relax CPU requirements for rqspinlock stress test
  2025-11-25  2:07 [PATCH bpf-next v1 0/3] General enhancements to rqspinlock stress test Kumar Kartikeya Dwivedi
@ 2025-11-25  2:07 ` Kumar Kartikeya Dwivedi
  2025-11-25  2:07 ` [PATCH bpf-next v1 2/3] selftests/bpf: Add lock wait time stats to " Kumar Kartikeya Dwivedi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-11-25  2:07 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kkd, kernel-team

Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is
calculated nicely by adding to the mode enum. Enables running single CPU
AA tests.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
index 4cced4bb8af1..8096624cf9c1 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
@@ -182,7 +182,7 @@ static int bpf_test_rqspinlock_init(void)
 
 	pr_err("Mode = %s\n", rqsl_mode_names[test_mode]);
 
-	if (ncpus < 3)
+	if (ncpus < test_mode + 2)
 		return -ENOTSUPP;
 
 	raw_res_spin_lock_init(&lock_a);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next v1 2/3] selftests/bpf: Add lock wait time stats to rqspinlock stress test
  2025-11-25  2:07 [PATCH bpf-next v1 0/3] General enhancements to rqspinlock stress test Kumar Kartikeya Dwivedi
  2025-11-25  2:07 ` [PATCH bpf-next v1 1/3] selftests/bpf: Relax CPU requirements for " Kumar Kartikeya Dwivedi
@ 2025-11-25  2:07 ` Kumar Kartikeya Dwivedi
  2025-11-25  2:07 ` [PATCH bpf-next v1 3/3] selftests/bpf: Make CS length configurable for " Kumar Kartikeya Dwivedi
  2025-11-25 23:40 ` [PATCH bpf-next v1 0/3] General enhancements to " patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-11-25  2:07 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kkd, kernel-team

Add statistics per-CPU broken down by context and various timing windows
for the time taken to acquire an rqspinlock. Cases where all
acquisitions fit into the 10ms window are skipped from printing,
otherwise the full breakdown is displayed when printing the summary.
This allows capturing precisely the number of times outlier attempts
happened for a given lock in a given context.

A critical detail is that time is captured regardless of success or
failure, which is important to capture events for failed but long
waiting timeout attempts.

Output:

[   64.279459] rqspinlock acquisition latency histogram (ms):
[   64.279472]  cpu1: total 528426 (normal 526559, nmi 1867)
[   64.279477]    0-1ms: total 524697 (normal 524697, nmi 0)
[   64.279480]    2-2ms: total 3652 (normal 1811, nmi 1841)
[   64.279482]    3-3ms: total 66 (normal 47, nmi 19)
[   64.279485]    4-4ms: total 2 (normal 1, nmi 1)
[   64.279487]    5-5ms: total 1 (normal 1, nmi 0)
[   64.279489]    6-6ms: total 1 (normal 0, nmi 1)
[   64.279490]    101-150ms: total 1 (normal 0, nmi 1)
[   64.279492]    >= 251ms: total 6 (normal 2, nmi 4)
...

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../bpf/test_kmods/bpf_test_rqspinlock.c      | 104 ++++++++++++++++++
 1 file changed, 104 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
index 8096624cf9c1..4ea7ec420e4e 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
@@ -5,6 +5,7 @@
 #include <linux/delay.h>
 #include <linux/module.h>
 #include <linux/prandom.h>
+#include <linux/ktime.h>
 #include <asm/rqspinlock.h>
 #include <linux/perf_event.h>
 #include <linux/kthread.h>
@@ -24,6 +25,21 @@ static rqspinlock_t lock_a;
 static rqspinlock_t lock_b;
 static rqspinlock_t lock_c;
 
+#define RQSL_SLOW_THRESHOLD_MS 10
+static const unsigned int rqsl_hist_ms[] = {
+	1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
+	12, 14, 16, 18, 20, 25, 30, 40, 50, 75,
+	100, 150, 200, 250, 1000,
+};
+#define RQSL_NR_HIST_BUCKETS ARRAY_SIZE(rqsl_hist_ms)
+
+struct rqsl_cpu_hist {
+	atomic64_t normal[RQSL_NR_HIST_BUCKETS];
+	atomic64_t nmi[RQSL_NR_HIST_BUCKETS];
+};
+
+static DEFINE_PER_CPU(struct rqsl_cpu_hist, rqsl_cpu_hists);
+
 enum rqsl_mode {
 	RQSL_MODE_AA = 0,
 	RQSL_MODE_ABBA,
@@ -79,10 +95,33 @@ static struct rqsl_lock_pair rqsl_get_lock_pair(int cpu)
 	}
 }
 
+static u32 rqsl_hist_bucket_idx(u32 delta_ms)
+{
+	int i;
+
+	for (i = 0; i < RQSL_NR_HIST_BUCKETS; i++) {
+		if (delta_ms <= rqsl_hist_ms[i])
+			return i;
+	}
+
+	return RQSL_NR_HIST_BUCKETS - 1;
+}
+
+static void rqsl_record_lock_time(u64 delta_ns, bool is_nmi)
+{
+	struct rqsl_cpu_hist *hist = this_cpu_ptr(&rqsl_cpu_hists);
+	u32 delta_ms = DIV_ROUND_UP_ULL(delta_ns, NSEC_PER_MSEC);
+	u32 bucket = rqsl_hist_bucket_idx(delta_ms);
+	atomic64_t *buckets = is_nmi ? hist->nmi : hist->normal;
+
+	atomic64_inc(&buckets[bucket]);
+}
+
 static int rqspinlock_worker_fn(void *arg)
 {
 	int cpu = smp_processor_id();
 	unsigned long flags;
+	u64 start_ns;
 	int ret;
 
 	if (cpu) {
@@ -96,7 +135,9 @@ static int rqspinlock_worker_fn(void *arg)
 				msleep(1000);
 				continue;
 			}
+			start_ns = ktime_get_mono_fast_ns();
 			ret = raw_res_spin_lock_irqsave(worker_lock, flags);
+			rqsl_record_lock_time(ktime_get_mono_fast_ns() - start_ns, false);
 			mdelay(20);
 			if (!ret)
 				raw_res_spin_unlock_irqrestore(worker_lock, flags);
@@ -130,13 +171,16 @@ static void nmi_cb(struct perf_event *event, struct perf_sample_data *data,
 	struct rqsl_lock_pair locks;
 	int cpu = smp_processor_id();
 	unsigned long flags;
+	u64 start_ns;
 	int ret;
 
 	if (!cpu || READ_ONCE(pause))
 		return;
 
 	locks = rqsl_get_lock_pair(cpu);
+	start_ns = ktime_get_mono_fast_ns();
 	ret = raw_res_spin_lock_irqsave(locks.nmi_lock, flags);
+	rqsl_record_lock_time(ktime_get_mono_fast_ns() - start_ns, true);
 
 	mdelay(10);
 
@@ -235,10 +279,70 @@ static int bpf_test_rqspinlock_init(void)
 
 module_init(bpf_test_rqspinlock_init);
 
+static void rqsl_print_histograms(void)
+{
+	int cpu, i;
+
+	pr_err("rqspinlock acquisition latency histogram (ms):\n");
+
+	for_each_online_cpu(cpu) {
+		struct rqsl_cpu_hist *hist = per_cpu_ptr(&rqsl_cpu_hists, cpu);
+		u64 norm_counts[RQSL_NR_HIST_BUCKETS];
+		u64 nmi_counts[RQSL_NR_HIST_BUCKETS];
+		u64 total_counts[RQSL_NR_HIST_BUCKETS];
+		u64 norm_total = 0, nmi_total = 0, total = 0;
+		bool has_slow = false;
+
+		for (i = 0; i < RQSL_NR_HIST_BUCKETS; i++) {
+			norm_counts[i] = atomic64_read(&hist->normal[i]);
+			nmi_counts[i] = atomic64_read(&hist->nmi[i]);
+			total_counts[i] = norm_counts[i] + nmi_counts[i];
+			norm_total += norm_counts[i];
+			nmi_total += nmi_counts[i];
+			total += total_counts[i];
+			if (rqsl_hist_ms[i] > RQSL_SLOW_THRESHOLD_MS &&
+			    total_counts[i])
+				has_slow = true;
+		}
+
+		if (!total)
+			continue;
+
+		if (!has_slow) {
+			pr_err(" cpu%d: total %llu (normal %llu, nmi %llu), all within 0-%ums\n",
+			       cpu, total, norm_total, nmi_total, RQSL_SLOW_THRESHOLD_MS);
+			continue;
+		}
+
+		pr_err(" cpu%d: total %llu (normal %llu, nmi %llu)\n",
+		       cpu, total, norm_total, nmi_total);
+		for (i = 0; i < RQSL_NR_HIST_BUCKETS; i++) {
+			unsigned int start_ms;
+
+			if (!total_counts[i])
+				continue;
+
+			start_ms = i == 0 ? 0 : rqsl_hist_ms[i - 1] + 1;
+			if (i == RQSL_NR_HIST_BUCKETS - 1) {
+				pr_err("   >= %ums: total %llu (normal %llu, nmi %llu)\n",
+				       start_ms, total_counts[i],
+				       norm_counts[i], nmi_counts[i]);
+			} else {
+				pr_err("   %u-%ums: total %llu (normal %llu, nmi %llu)\n",
+				       start_ms, rqsl_hist_ms[i],
+				       total_counts[i],
+				       norm_counts[i], nmi_counts[i]);
+			}
+		}
+	}
+}
+
 static void bpf_test_rqspinlock_exit(void)
 {
+	WRITE_ONCE(pause, 1);
 	free_rqsl_threads();
 	free_rqsl_evts();
+	rqsl_print_histograms();
 }
 
 module_exit(bpf_test_rqspinlock_exit);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf-next v1 3/3] selftests/bpf: Make CS length configurable for rqspinlock stress test
  2025-11-25  2:07 [PATCH bpf-next v1 0/3] General enhancements to rqspinlock stress test Kumar Kartikeya Dwivedi
  2025-11-25  2:07 ` [PATCH bpf-next v1 1/3] selftests/bpf: Relax CPU requirements for " Kumar Kartikeya Dwivedi
  2025-11-25  2:07 ` [PATCH bpf-next v1 2/3] selftests/bpf: Add lock wait time stats to " Kumar Kartikeya Dwivedi
@ 2025-11-25  2:07 ` Kumar Kartikeya Dwivedi
  2025-11-25 23:40 ` [PATCH bpf-next v1 0/3] General enhancements to " patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-11-25  2:07 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kkd, kernel-team

Allow users to configure the critical section delay for both task/normal
and NMI contexts, and set to 20ms and 10ms as before by default.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/test_kmods/bpf_test_rqspinlock.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
index 4ea7ec420e4e..e8dd3fbc6ea5 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
@@ -51,6 +51,16 @@ module_param(test_mode, int, 0644);
 MODULE_PARM_DESC(test_mode,
 		 "rqspinlock test mode: 0 = AA, 1 = ABBA, 2 = ABBCCA");
 
+static int normal_delay = 20;
+module_param(normal_delay, int, 0644);
+MODULE_PARM_DESC(normal_delay,
+		 "rqspinlock critical section length for normal context (20ms default)");
+
+static int nmi_delay = 10;
+module_param(nmi_delay, int, 0644);
+MODULE_PARM_DESC(nmi_delay,
+		 "rqspinlock critical section length for NMI context (10ms default)");
+
 static struct perf_event **rqsl_evts;
 static int rqsl_nevts;
 
@@ -138,7 +148,7 @@ static int rqspinlock_worker_fn(void *arg)
 			start_ns = ktime_get_mono_fast_ns();
 			ret = raw_res_spin_lock_irqsave(worker_lock, flags);
 			rqsl_record_lock_time(ktime_get_mono_fast_ns() - start_ns, false);
-			mdelay(20);
+			mdelay(normal_delay);
 			if (!ret)
 				raw_res_spin_unlock_irqrestore(worker_lock, flags);
 			cpu_relax();
@@ -182,7 +192,7 @@ static void nmi_cb(struct perf_event *event, struct perf_sample_data *data,
 	ret = raw_res_spin_lock_irqsave(locks.nmi_lock, flags);
 	rqsl_record_lock_time(ktime_get_mono_fast_ns() - start_ns, true);
 
-	mdelay(10);
+	mdelay(nmi_delay);
 
 	if (!ret)
 		raw_res_spin_unlock_irqrestore(locks.nmi_lock, flags);
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf-next v1 0/3] General enhancements to rqspinlock stress test
  2025-11-25  2:07 [PATCH bpf-next v1 0/3] General enhancements to rqspinlock stress test Kumar Kartikeya Dwivedi
                   ` (2 preceding siblings ...)
  2025-11-25  2:07 ` [PATCH bpf-next v1 3/3] selftests/bpf: Make CS length configurable for " Kumar Kartikeya Dwivedi
@ 2025-11-25 23:40 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-11-25 23:40 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, ast, andrii, daniel, martin.lau, eddyz87, kkd, kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Tue, 25 Nov 2025 02:07:46 +0000 you wrote:
> Three enchancements, details in commit messages.
> 
> First, the CPU requirements are 2 for AA, 3 for ABBA, and 4 for ABBCCA,
> hence relax the check during module initialization. Second, add a
> per-CPU histogram to capture lock acquisition times to record which
> buckets these acquisitions fall into for the normal task context and NMI
> context.  Anything below 10ms is not printed in detail, but above that
> displays the full breakdown for each context. Finally, make the delay of
> the NMI and task contexts configurable, set to 10 and 20 ms respectively
> by default.
> 
> [...]

Here is the summary with links:
  - [bpf-next,v1,1/3] selftests/bpf: Relax CPU requirements for rqspinlock stress test
    https://git.kernel.org/bpf/bpf-next/c/224de8d5a30e
  - [bpf-next,v1,2/3] selftests/bpf: Add lock wait time stats to rqspinlock stress test
    https://git.kernel.org/bpf/bpf-next/c/6173c1d6208c
  - [bpf-next,v1,3/3] selftests/bpf: Make CS length configurable for rqspinlock stress test
    https://git.kernel.org/bpf/bpf-next/c/88337b587b8b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-11-25 23:40 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-25  2:07 [PATCH bpf-next v1 0/3] General enhancements to rqspinlock stress test Kumar Kartikeya Dwivedi
2025-11-25  2:07 ` [PATCH bpf-next v1 1/3] selftests/bpf: Relax CPU requirements for " Kumar Kartikeya Dwivedi
2025-11-25  2:07 ` [PATCH bpf-next v1 2/3] selftests/bpf: Add lock wait time stats to " Kumar Kartikeya Dwivedi
2025-11-25  2:07 ` [PATCH bpf-next v1 3/3] selftests/bpf: Make CS length configurable for " Kumar Kartikeya Dwivedi
2025-11-25 23:40 ` [PATCH bpf-next v1 0/3] General enhancements to " patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox