BPF List
 help / color / mirror / Atom feed
* [PATCH bpf-next v1 0/2] Misc rqspinlock updates
@ 2025-10-29 18:18 Kumar Kartikeya Dwivedi
  2025-10-29 18:18 ` [PATCH bpf-next v1 1/2] rqspinlock: Disable queue destruction for deadlocks Kumar Kartikeya Dwivedi
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-10-29 18:18 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kkd, kernel-team

A couple of changes for rqspinlock, the first disables propagation of AA
and ABBA deadlocks to waiters succeeding the deadlocking waiter. A more
verbose rationale is available in the commit log. The second commit
expands the stress test to introduce a ABBCCA mode that will reliably
exercise the timeout fallback.

Kumar Kartikeya Dwivedi (2):
  rqspinlock: Disable queue destruction for deadlocks
  selftests/bpf: Add ABBCCA case for rqspinlock stress test

 kernel/bpf/rqspinlock.c                       |  8 ++
 .../selftests/bpf/prog_tests/res_spin_lock.c  |  8 +-
 .../bpf/test_kmods/bpf_test_rqspinlock.c      | 85 ++++++++++++++-----
 3 files changed, 74 insertions(+), 27 deletions(-)


base-commit: 54c134f379ee2816f60130d44e0e386c261dff45
-- 
2.51.0


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH bpf-next v1 1/2] rqspinlock: Disable queue destruction for deadlocks
  2025-10-29 18:18 [PATCH bpf-next v1 0/2] Misc rqspinlock updates Kumar Kartikeya Dwivedi
@ 2025-10-29 18:18 ` Kumar Kartikeya Dwivedi
  2025-10-29 18:18 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add ABBCCA case for rqspinlock stress test Kumar Kartikeya Dwivedi
  2025-10-30  1:20 ` [PATCH bpf-next v1 0/2] Misc rqspinlock updates patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-10-29 18:18 UTC (permalink / raw)
  To: bpf
  Cc: Amery Hung, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann,
	Martin KaFai Lau, Eduard Zingerman, kkd, kernel-team

Disable propagation and unwinding of the waiter queue in case the head
waiter detects a deadlock condition, but keep it enabled in case of the
timeout fallback.

Currently, when the head waiter experiences an AA deadlock, it will
signal all its successors in the queue to exit with an error. This is
not ideal for cases where the same lock is held in contexts which can
cause errors in an unrestricted fashion (e.g., BPF programs, or kernel
paths invoked through BPF programs), and core kernel logic which is
written in a correct fashion and does not expect deadlocks.

The same reasoning can be extended to ABBA situations. Depending on the
actual runtime schedule, one or both of the head waiters involved in an
ABBA situation can detect and exit directly without terminating their
waiter queue. If the ABBA situation manifests again, the waiters will
keep exiting until progress can be made, or a timeout is triggered in
case of more complicated locking dependencies.

We still preserve the queue destruction in case of timeouts, as either
the locking dependencies are too complex to be captured by AA and ABBA
heuristics, or the owner is perpetually stuck. As such, it would be
unwise to continue to apply the timeout for each new head waiter without
terminating the queue, since we may end up waiting for more than 250 ms
in aggregate with all participants in the locking transaction.

The patch itself is fairly simple; we can simply signal our successor to
become the next head waiter, and leave the queue without attempting to
acquire the lock.

With this change, the behavior for waiters in case of deadlocks
experienced by a predecessor changes. It is guaranteed that call sites
will no longer receive errors if the predecessors encounter deadlocks
and the successors do not participate in one. This should lower the
failure rate for waiters that are not doing improper locking opreations,
just because they were unlucky to queue behind a misbehaving waiter.
However, timeouts are still a possibility, hence they must be accounted
for, so users cannot rely upon errors not occuring at all.

Suggested-by: Amery Hung <ameryhung@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 kernel/bpf/rqspinlock.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/rqspinlock.c b/kernel/bpf/rqspinlock.c
index 21be48108e96..b94e258bf2b9 100644
--- a/kernel/bpf/rqspinlock.c
+++ b/kernel/bpf/rqspinlock.c
@@ -572,6 +572,14 @@ int __lockfunc resilient_queued_spin_lock_slowpath(rqspinlock_t *lock, u32 val)
 	val = res_atomic_cond_read_acquire(&lock->val, !(VAL & _Q_LOCKED_PENDING_MASK) ||
 					   RES_CHECK_TIMEOUT(ts, ret, _Q_LOCKED_PENDING_MASK));
 
+	/* Disable queue destruction when we detect deadlocks. */
+	if (ret == -EDEADLK) {
+		if (!next)
+			next = smp_cond_load_relaxed(&node->next, (VAL));
+		arch_mcs_spin_unlock_contended(&next->locked);
+		goto err_release_node;
+	}
+
 waitq_timeout:
 	if (ret) {
 		/*
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH bpf-next v1 2/2] selftests/bpf: Add ABBCCA case for rqspinlock stress test
  2025-10-29 18:18 [PATCH bpf-next v1 0/2] Misc rqspinlock updates Kumar Kartikeya Dwivedi
  2025-10-29 18:18 ` [PATCH bpf-next v1 1/2] rqspinlock: Disable queue destruction for deadlocks Kumar Kartikeya Dwivedi
@ 2025-10-29 18:18 ` Kumar Kartikeya Dwivedi
  2025-10-30  1:20 ` [PATCH bpf-next v1 0/2] Misc rqspinlock updates patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2025-10-29 18:18 UTC (permalink / raw)
  To: bpf
  Cc: Eduard Zingerman, Amery Hung, Alexei Starovoitov, Andrii Nakryiko,
	Daniel Borkmann, Martin KaFai Lau, kkd, kernel-team

Introduce a new mode for the rqspinlock stress test that exercises a
deadlock that won't be detected by the AA and ABBA checks, such that we
always reliably trigger the timeout fallback. We need 4 CPUs for this
particular case, as CPU 0 is untouched, and three participant CPUs for
triggering the ABBCCA case.

Refactor the lock acquisition paths in the module to better reflect the
three modes and choose the right lock depending on the context.

Also drop ABBA case from running by default as part of test progs, since
the stress test can consume a significant amount of time.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 .../selftests/bpf/prog_tests/res_spin_lock.c  |  8 +-
 .../bpf/test_kmods/bpf_test_rqspinlock.c      | 85 ++++++++++++++-----
 2 files changed, 66 insertions(+), 27 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
index 8c6c2043a432..f0a8c828f8f1 100644
--- a/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
+++ b/tools/testing/selftests/bpf/prog_tests/res_spin_lock.c
@@ -110,8 +110,8 @@ void serial_test_res_spin_lock_stress(void)
 	ASSERT_OK(load_module("bpf_test_rqspinlock.ko", false), "load module AA");
 	sleep(5);
 	unload_module("bpf_test_rqspinlock", false);
-
-	ASSERT_OK(load_module_params("bpf_test_rqspinlock.ko", "test_ab=1", false), "load module ABBA");
-	sleep(5);
-	unload_module("bpf_test_rqspinlock", false);
+	/*
+	 * Insert bpf_test_rqspinlock.ko manually with test_mode=[1|2] to test
+	 * other cases (ABBA, ABBCCA).
+	 */
 }
diff --git a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
index 769206fc70e4..4cced4bb8af1 100644
--- a/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
+++ b/tools/testing/selftests/bpf/test_kmods/bpf_test_rqspinlock.c
@@ -22,23 +22,61 @@ static struct perf_event_attr hw_attr = {
 
 static rqspinlock_t lock_a;
 static rqspinlock_t lock_b;
+static rqspinlock_t lock_c;
+
+enum rqsl_mode {
+	RQSL_MODE_AA = 0,
+	RQSL_MODE_ABBA,
+	RQSL_MODE_ABBCCA,
+};
+
+static int test_mode = RQSL_MODE_AA;
+module_param(test_mode, int, 0644);
+MODULE_PARM_DESC(test_mode,
+		 "rqspinlock test mode: 0 = AA, 1 = ABBA, 2 = ABBCCA");
 
 static struct perf_event **rqsl_evts;
 static int rqsl_nevts;
 
-static bool test_ab = false;
-module_param(test_ab, bool, 0644);
-MODULE_PARM_DESC(test_ab, "Test ABBA situations instead of AA situations");
-
 static struct task_struct **rqsl_threads;
 static int rqsl_nthreads;
 static atomic_t rqsl_ready_cpus = ATOMIC_INIT(0);
 
 static int pause = 0;
 
-static bool nmi_locks_a(int cpu)
+static const char *rqsl_mode_names[] = {
+	[RQSL_MODE_AA] = "AA",
+	[RQSL_MODE_ABBA] = "ABBA",
+	[RQSL_MODE_ABBCCA] = "ABBCCA",
+};
+
+struct rqsl_lock_pair {
+	rqspinlock_t *worker_lock;
+	rqspinlock_t *nmi_lock;
+};
+
+static struct rqsl_lock_pair rqsl_get_lock_pair(int cpu)
 {
-	return (cpu & 1) && test_ab;
+	int mode = READ_ONCE(test_mode);
+
+	switch (mode) {
+	default:
+	case RQSL_MODE_AA:
+		return (struct rqsl_lock_pair){ &lock_a, &lock_a };
+	case RQSL_MODE_ABBA:
+		if (cpu & 1)
+			return (struct rqsl_lock_pair){ &lock_b, &lock_a };
+		return (struct rqsl_lock_pair){ &lock_a, &lock_b };
+	case RQSL_MODE_ABBCCA:
+		switch (cpu % 3) {
+		case 0:
+			return (struct rqsl_lock_pair){ &lock_a, &lock_b };
+		case 1:
+			return (struct rqsl_lock_pair){ &lock_b, &lock_c };
+		default:
+			return (struct rqsl_lock_pair){ &lock_c, &lock_a };
+		}
+	}
 }
 
 static int rqspinlock_worker_fn(void *arg)
@@ -51,19 +89,17 @@ static int rqspinlock_worker_fn(void *arg)
 		atomic_inc(&rqsl_ready_cpus);
 
 		while (!kthread_should_stop()) {
+			struct rqsl_lock_pair locks = rqsl_get_lock_pair(cpu);
+			rqspinlock_t *worker_lock = locks.worker_lock;
+
 			if (READ_ONCE(pause)) {
 				msleep(1000);
 				continue;
 			}
-			if (nmi_locks_a(cpu))
-				ret = raw_res_spin_lock_irqsave(&lock_b, flags);
-			else
-				ret = raw_res_spin_lock_irqsave(&lock_a, flags);
+			ret = raw_res_spin_lock_irqsave(worker_lock, flags);
 			mdelay(20);
-			if (nmi_locks_a(cpu) && !ret)
-				raw_res_spin_unlock_irqrestore(&lock_b, flags);
-			else if (!ret)
-				raw_res_spin_unlock_irqrestore(&lock_a, flags);
+			if (!ret)
+				raw_res_spin_unlock_irqrestore(worker_lock, flags);
 			cpu_relax();
 		}
 		return 0;
@@ -91,6 +127,7 @@ static int rqspinlock_worker_fn(void *arg)
 static void nmi_cb(struct perf_event *event, struct perf_sample_data *data,
 		   struct pt_regs *regs)
 {
+	struct rqsl_lock_pair locks;
 	int cpu = smp_processor_id();
 	unsigned long flags;
 	int ret;
@@ -98,17 +135,13 @@ static void nmi_cb(struct perf_event *event, struct perf_sample_data *data,
 	if (!cpu || READ_ONCE(pause))
 		return;
 
-	if (nmi_locks_a(cpu))
-		ret = raw_res_spin_lock_irqsave(&lock_a, flags);
-	else
-		ret = raw_res_spin_lock_irqsave(test_ab ? &lock_b : &lock_a, flags);
+	locks = rqsl_get_lock_pair(cpu);
+	ret = raw_res_spin_lock_irqsave(locks.nmi_lock, flags);
 
 	mdelay(10);
 
-	if (nmi_locks_a(cpu) && !ret)
-		raw_res_spin_unlock_irqrestore(&lock_a, flags);
-	else if (!ret)
-		raw_res_spin_unlock_irqrestore(test_ab ? &lock_b : &lock_a, flags);
+	if (!ret)
+		raw_res_spin_unlock_irqrestore(locks.nmi_lock, flags);
 }
 
 static void free_rqsl_threads(void)
@@ -142,13 +175,19 @@ static int bpf_test_rqspinlock_init(void)
 	int i, ret;
 	int ncpus = num_online_cpus();
 
-	pr_err("Mode = %s\n", test_ab ? "ABBA" : "AA");
+	if (test_mode < RQSL_MODE_AA || test_mode > RQSL_MODE_ABBCCA) {
+		pr_err("Invalid mode %d\n", test_mode);
+		return -EINVAL;
+	}
+
+	pr_err("Mode = %s\n", rqsl_mode_names[test_mode]);
 
 	if (ncpus < 3)
 		return -ENOTSUPP;
 
 	raw_res_spin_lock_init(&lock_a);
 	raw_res_spin_lock_init(&lock_b);
+	raw_res_spin_lock_init(&lock_c);
 
 	rqsl_evts = kcalloc(ncpus - 1, sizeof(*rqsl_evts), GFP_KERNEL);
 	if (!rqsl_evts)
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH bpf-next v1 0/2] Misc rqspinlock updates
  2025-10-29 18:18 [PATCH bpf-next v1 0/2] Misc rqspinlock updates Kumar Kartikeya Dwivedi
  2025-10-29 18:18 ` [PATCH bpf-next v1 1/2] rqspinlock: Disable queue destruction for deadlocks Kumar Kartikeya Dwivedi
  2025-10-29 18:18 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add ABBCCA case for rqspinlock stress test Kumar Kartikeya Dwivedi
@ 2025-10-30  1:20 ` patchwork-bot+netdevbpf
  2 siblings, 0 replies; 4+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-10-30  1:20 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: bpf, ast, andrii, daniel, martin.lau, eddyz87, kkd, kernel-team

Hello:

This series was applied to bpf/bpf-next.git (master)
by Alexei Starovoitov <ast@kernel.org>:

On Wed, 29 Oct 2025 18:18:26 +0000 you wrote:
> A couple of changes for rqspinlock, the first disables propagation of AA
> and ABBA deadlocks to waiters succeeding the deadlocking waiter. A more
> verbose rationale is available in the commit log. The second commit
> expands the stress test to introduce a ABBCCA mode that will reliably
> exercise the timeout fallback.
> 
> Kumar Kartikeya Dwivedi (2):
>   rqspinlock: Disable queue destruction for deadlocks
>   selftests/bpf: Add ABBCCA case for rqspinlock stress test
> 
> [...]

Here is the summary with links:
  - [bpf-next,v1,1/2] rqspinlock: Disable queue destruction for deadlocks
    https://git.kernel.org/bpf/bpf-next/c/7bd6e5ce5be6
  - [bpf-next,v1,2/2] selftests/bpf: Add ABBCCA case for rqspinlock stress test
    https://git.kernel.org/bpf/bpf-next/c/a8a0abf09754

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-30  1:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-29 18:18 [PATCH bpf-next v1 0/2] Misc rqspinlock updates Kumar Kartikeya Dwivedi
2025-10-29 18:18 ` [PATCH bpf-next v1 1/2] rqspinlock: Disable queue destruction for deadlocks Kumar Kartikeya Dwivedi
2025-10-29 18:18 ` [PATCH bpf-next v1 2/2] selftests/bpf: Add ABBCCA case for rqspinlock stress test Kumar Kartikeya Dwivedi
2025-10-30  1:20 ` [PATCH bpf-next v1 0/2] Misc rqspinlock updates patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox