[PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK

public inbox for sched-ext@lists.linux.dev
 help / color / mirror / Atom feed

* [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock
@ 2026-03-29  0:18 Tejun Heo
  2026-03-29  0:18 ` [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback Tejun Heo
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Tejun Heo @ 2026-03-29  0:18 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min
  Cc: Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel,
	Tejun Heo

Hello,

SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() until the target CPU
reschedules. Because the irq_work runs in hardirq context, the waiting
CPU's kick_sync never advances, and if multiple CPUs form a wait cycle, all
deadlock. This was reported by Christian while testing on arm64.

0001 fixes the deadlock by deferring the wait to a balance callback which
drops the rq lock and enables IRQs, allowing IPIs to be processed and
kick_sync to keep advancing during the wait.

0002 adds a selftest that creates a 3-CPU kick_wait cycle to reproduce the
issue.

Based on sched_ext/for-7.0-fixes (db08b1940f4b).

 0001-sched_ext-Fix-SCX_KICK_WAIT-deadlock-by-deferring-wa.patch
 0002-selftests-sched_ext-Add-cyclic-SCX_KICK_WAIT-stress-.patch

 kernel/sched/ext.c                                 |  95 +++++++---
 kernel/sched/sched.h                               |   3 +
 tools/testing/selftests/sched_ext/Makefile         |   1 +
 .../selftests/sched_ext/cyclic_kick_wait.bpf.c     |  68 ++++++++
 .../testing/selftests/sched_ext/cyclic_kick_wait.c | 194 +++++++++++++++++++++
 5 files changed, 336 insertions(+), 25 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
  2026-03-29  0:18 [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock Tejun Heo
@ 2026-03-29  0:18 ` Tejun Heo
  2026-03-29 16:26   ` Andrea Righi
  2026-03-29  0:18 ` [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test Tejun Heo
  2026-03-30  8:52 ` [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock Christian Loehle
  2 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2026-03-29  0:18 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min
  Cc: Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel,
	Tejun Heo, stable

SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using
smp_cond_load_acquire() until the target CPU's kick_sync advances. Because
the irq_work runs in hardirq context, the waiting CPU cannot reschedule and
its own kick_sync never advances. If multiple CPUs form a wait cycle, all
CPUs deadlock.

Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to
force the CPU through do_pick_task_scx(), which queues a balance callback
to perform the wait. The balance callback drops the rq lock and enables
IRQs following the sched_core_balance() pattern, so the CPU can process
IPIs while waiting. The local CPU's kick_sync is advanced on entry to
do_pick_task_scx() and continuously during the wait, ensuring any CPU that
starts waiting for us sees the advancement and cannot form cyclic
dependencies.

Fixes: 90e55164dad4 ("sched_ext: Implement SCX_KICK_WAIT")
Cc: stable@vger.kernel.org # v6.12+
Reported-by: Christian Loehle <christian.loehle@arm.com>
Link: https://lore.kernel.org/r/20260316100249.1651641-1-christian.loehle@arm.com
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c   | 95 ++++++++++++++++++++++++++++++++------------
 kernel/sched/sched.h |  3 ++
 2 files changed, 73 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 26a6ac2f8826..d5bdcdb3f700 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2404,7 +2404,7 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 {
 	struct scx_sched *sch = scx_root;
 
-	/* see kick_cpus_irq_workfn() */
+	/* see kick_sync_wait_bal_cb() */
 	smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
 
 	update_curr_scx(rq);
@@ -2447,6 +2447,48 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 		switch_class(rq, next);
 }
 
+static void kick_sync_wait_bal_cb(struct rq *rq)
+{
+	struct scx_kick_syncs __rcu *ks = __this_cpu_read(scx_kick_syncs);
+	unsigned long *ksyncs = rcu_dereference_sched(ks)->syncs;
+	bool waited;
+	s32 cpu;
+
+	/*
+	 * Drop rq lock and enable IRQs while waiting. IRQs must be enabled
+	 * — a target CPU may be waiting for us to process an IPI (e.g. TLB
+	 * flush) while we wait for its kick_sync to advance.
+	 *
+	 * Also, keep advancing our own kick_sync so that new kick_sync waits
+	 * targeting us, which can start after we drop the lock, cannot form
+	 * cyclic dependencies.
+	 */
+retry:
+	waited = false;
+	for_each_cpu(cpu, rq->scx.cpus_to_sync) {
+		/*
+		 * smp_load_acquire() pairs with smp_store_release() on
+		 * kick_sync updates on the target CPUs.
+		 */
+		if (cpu == cpu_of(rq) ||
+		    smp_load_acquire(&cpu_rq(cpu)->scx.kick_sync) != ksyncs[cpu]) {
+			cpumask_clear_cpu(cpu, rq->scx.cpus_to_sync);
+			continue;
+		}
+
+		raw_spin_rq_unlock_irq(rq);
+		while (READ_ONCE(cpu_rq(cpu)->scx.kick_sync) == ksyncs[cpu]) {
+			smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
+			cpu_relax();
+		}
+		raw_spin_rq_lock_irq(rq);
+		waited = true;
+	}
+
+	if (waited)
+		goto retry;
+}
+
 static struct task_struct *first_local_task(struct rq *rq)
 {
 	return list_first_entry_or_null(&rq->scx.local_dsq.list,
@@ -2460,7 +2502,7 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx)
 	bool keep_prev;
 	struct task_struct *p;
 
-	/* see kick_cpus_irq_workfn() */
+	/* see kick_sync_wait_bal_cb() */
 	smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
 
 	rq_modified_begin(rq, &ext_sched_class);
@@ -2470,6 +2512,17 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx)
 	rq_repin_lock(rq, rf);
 	maybe_queue_balance_callback(rq);
 
+	/*
+	 * Defer to a balance callback which can drop rq lock and enable
+	 * IRQs. Waiting directly in the pick path would deadlock against
+	 * CPUs sending us IPIs (e.g. TLB flushes) while we wait for them.
+	 */
+	if (unlikely(rq->scx.kick_sync_pending)) {
+		rq->scx.kick_sync_pending = false;
+		queue_balance_callback(rq, &rq->scx.kick_sync_bal_cb,
+				       kick_sync_wait_bal_cb);
+	}
+
 	/*
 	 * If any higher-priority sched class enqueued a runnable task on
 	 * this rq during balance_one(), abort and return RETRY_TASK, so
@@ -4713,6 +4766,9 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
 		if (!cpumask_empty(rq->scx.cpus_to_wait))
 			dump_line(&ns, "  cpus_to_wait   : %*pb",
 				  cpumask_pr_args(rq->scx.cpus_to_wait));
+		if (!cpumask_empty(rq->scx.cpus_to_sync))
+			dump_line(&ns, "  cpus_to_sync   : %*pb",
+				  cpumask_pr_args(rq->scx.cpus_to_sync));
 
 		used = seq_buf_used(&ns);
 		if (SCX_HAS_OP(sch, dump_cpu)) {
@@ -5610,11 +5666,11 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *ksyncs)
 
 		if (cpumask_test_cpu(cpu, this_scx->cpus_to_wait)) {
 			if (cur_class == &ext_sched_class) {
+				cpumask_set_cpu(cpu, this_scx->cpus_to_sync);
 				ksyncs[cpu] = rq->scx.kick_sync;
 				should_wait = true;
-			} else {
-				cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
 			}
+			cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
 		}
 
 		resched_curr(rq);
@@ -5669,27 +5725,15 @@ static void kick_cpus_irq_workfn(struct irq_work *irq_work)
 		cpumask_clear_cpu(cpu, this_scx->cpus_to_kick_if_idle);
 	}
 
-	if (!should_wait)
-		return;
-
-	for_each_cpu(cpu, this_scx->cpus_to_wait) {
-		unsigned long *wait_kick_sync = &cpu_rq(cpu)->scx.kick_sync;
-
-		/*
-		 * Busy-wait until the task running at the time of kicking is no
-		 * longer running. This can be used to implement e.g. core
-		 * scheduling.
-		 *
-		 * smp_cond_load_acquire() pairs with store_releases in
-		 * pick_task_scx() and put_prev_task_scx(). The former breaks
-		 * the wait if SCX's scheduling path is entered even if the same
-		 * task is picked subsequently. The latter is necessary to break
-		 * the wait when $cpu is taken by a higher sched class.
-		 */
-		if (cpu != cpu_of(this_rq))
-			smp_cond_load_acquire(wait_kick_sync, VAL != ksyncs[cpu]);
-
-		cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
+	/*
+	 * Can't wait in hardirq — kick_sync can't advance, deadlocking if
+	 * CPUs wait for each other. Defer to kick_sync_wait_bal_cb().
+	 */
+	if (should_wait) {
+		raw_spin_rq_lock(this_rq);
+		this_scx->kick_sync_pending = true;
+		resched_curr(this_rq);
+		raw_spin_rq_unlock(this_rq);
 	}
 }
 
@@ -5794,6 +5838,7 @@ void __init init_sched_ext_class(void)
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_kick_if_idle, GFP_KERNEL, n));
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_preempt, GFP_KERNEL, n));
 		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_wait, GFP_KERNEL, n));
+		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_sync, GFP_KERNEL, n));
 		rq->scx.deferred_irq_work = IRQ_WORK_INIT_HARD(deferred_irq_workfn);
 		rq->scx.kick_cpus_irq_work = IRQ_WORK_INIT_HARD(kick_cpus_irq_workfn);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 43bbf0693cca..1ef9ba480f51 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -805,9 +805,12 @@ struct scx_rq {
 	cpumask_var_t		cpus_to_kick_if_idle;
 	cpumask_var_t		cpus_to_preempt;
 	cpumask_var_t		cpus_to_wait;
+	cpumask_var_t		cpus_to_sync;
+	bool			kick_sync_pending;
 	unsigned long		kick_sync;
 	local_t			reenq_local_deferred;
 	struct balance_callback	deferred_bal_cb;
+	struct balance_callback	kick_sync_bal_cb;
 	struct irq_work		deferred_irq_work;
 	struct irq_work		kick_cpus_irq_work;
 	struct scx_dispatch_q	bypass_dsq;
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  2026-03-29  0:18 [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock Tejun Heo
  2026-03-29  0:18 ` [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback Tejun Heo
@ 2026-03-29  0:18 ` Tejun Heo
  2026-03-29  9:06   ` Cheng-Yang Chou
  2026-03-30  8:51   ` Christian Loehle
  2026-03-30  8:52 ` [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock Christian Loehle
  2 siblings, 2 replies; 9+ messages in thread
From: Tejun Heo @ 2026-03-29  0:18 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min
  Cc: Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel,
	Tejun Heo

Add a test that creates a 3-CPU kick_wait cycle (A->B->C->A). A BPF
scheduler kicks the next CPU in the ring with SCX_KICK_WAIT on every
enqueue while userspace workers generate continuous scheduling churn via
sched_yield(). Without the preceding fix, this hangs the machine within seconds.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 tools/testing/selftests/sched_ext/Makefile    |   1 +
 .../sched_ext/cyclic_kick_wait.bpf.c          |  68 ++++++
 .../selftests/sched_ext/cyclic_kick_wait.c    | 194 ++++++++++++++++++
 3 files changed, 263 insertions(+)
 create mode 100644 tools/testing/selftests/sched_ext/cyclic_kick_wait.bpf.c
 create mode 100644 tools/testing/selftests/sched_ext/cyclic_kick_wait.c

diff --git a/tools/testing/selftests/sched_ext/Makefile b/tools/testing/selftests/sched_ext/Makefile
index 006300ac6dff..1c9ca328cca1 100644
--- a/tools/testing/selftests/sched_ext/Makefile
+++ b/tools/testing/selftests/sched_ext/Makefile
@@ -188,6 +188,7 @@ auto-test-targets :=			\
 	rt_stall			\
 	test_example			\
 	total_bw			\
+	cyclic_kick_wait		\
 
 testcase-targets := $(addsuffix .o,$(addprefix $(SCXOBJ_DIR)/,$(auto-test-targets)))
 
diff --git a/tools/testing/selftests/sched_ext/cyclic_kick_wait.bpf.c b/tools/testing/selftests/sched_ext/cyclic_kick_wait.bpf.c
new file mode 100644
index 000000000000..cb34d3335917
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/cyclic_kick_wait.bpf.c
@@ -0,0 +1,68 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Stress concurrent SCX_KICK_WAIT calls to reproduce wait-cycle deadlock.
+ *
+ * Three CPUs are designated from userspace. Every enqueue from one of the
+ * three CPUs kicks the next CPU in the ring with SCX_KICK_WAIT, creating a
+ * persistent A -> B -> C -> A wait cycle pressure.
+ */
+#include <scx/common.bpf.h>
+
+char _license[] SEC("license") = "GPL";
+
+const volatile s32 test_cpu_a;
+const volatile s32 test_cpu_b;
+const volatile s32 test_cpu_c;
+
+u64 nr_enqueues;
+u64 nr_wait_kicks;
+
+UEI_DEFINE(uei);
+
+static s32 target_cpu(s32 cpu)
+{
+	if (cpu == test_cpu_a)
+		return test_cpu_b;
+	if (cpu == test_cpu_b)
+		return test_cpu_c;
+	if (cpu == test_cpu_c)
+		return test_cpu_a;
+	return -1;
+}
+
+void BPF_STRUCT_OPS(cyclic_kick_wait_enqueue, struct task_struct *p,
+		    u64 enq_flags)
+{
+	s32 this_cpu = bpf_get_smp_processor_id();
+	s32 tgt;
+
+	__sync_fetch_and_add(&nr_enqueues, 1);
+
+	if (p->flags & PF_KTHREAD) {
+		scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_INF,
+				   enq_flags | SCX_ENQ_PREEMPT);
+		return;
+	}
+
+	scx_bpf_dsq_insert(p, SCX_DSQ_GLOBAL, SCX_SLICE_DFL, enq_flags);
+
+	tgt = target_cpu(this_cpu);
+	if (tgt < 0 || tgt == this_cpu)
+		return;
+
+	__sync_fetch_and_add(&nr_wait_kicks, 1);
+	scx_bpf_kick_cpu(tgt, SCX_KICK_WAIT);
+}
+
+void BPF_STRUCT_OPS(cyclic_kick_wait_exit, struct scx_exit_info *ei)
+{
+	UEI_RECORD(uei, ei);
+}
+
+SEC(".struct_ops.link")
+struct sched_ext_ops cyclic_kick_wait_ops = {
+	.enqueue		= cyclic_kick_wait_enqueue,
+	.exit			= cyclic_kick_wait_exit,
+	.name			= "cyclic_kick_wait",
+	.timeout_ms		= 1000U,
+};
diff --git a/tools/testing/selftests/sched_ext/cyclic_kick_wait.c b/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
new file mode 100644
index 000000000000..c2e5aa9de715
--- /dev/null
+++ b/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
@@ -0,0 +1,194 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Test SCX_KICK_WAIT forward progress under cyclic wait pressure.
+ *
+ * SCX_KICK_WAIT busy-waits until the target CPU enters the scheduling path.
+ * If multiple CPUs form a wait cycle (A waits for B, B waits for C, C waits
+ * for A), all CPUs deadlock unless the implementation breaks the cycle.
+ *
+ * This test creates that scenario: three CPUs are arranged in a ring. The BPF
+ * scheduler's ops.enqueue() kicks the next CPU in the ring with SCX_KICK_WAIT
+ * on every enqueue. Userspace pins 4 worker threads per CPU that loop calling
+ * sched_yield(), generating a steady stream of enqueues and thus sustained
+ * A->B->C->A kick_wait cycle pressure. The test passes if the system remains
+ * responsive for 5 seconds without the scheduler being killed by the watchdog.
+ */
+#define _GNU_SOURCE
+
+#include <bpf/bpf.h>
+#include <errno.h>
+#include <pthread.h>
+#include <sched.h>
+#include <scx/common.h>
+#include <stdint.h>
+#include <string.h>
+#include <time.h>
+#include <unistd.h>
+
+#include "scx_test.h"
+#include "cyclic_kick_wait.bpf.skel.h"
+
+#define WORKERS_PER_CPU	4
+#define NR_TEST_CPUS	3
+#define NR_WORKERS	(NR_TEST_CPUS * WORKERS_PER_CPU)
+
+struct worker_ctx {
+	pthread_t tid;
+	int cpu;
+	volatile bool stop;
+	volatile __u64 iters;
+	bool started;
+};
+
+static void *worker_fn(void *arg)
+{
+	struct worker_ctx *worker = arg;
+	cpu_set_t mask;
+
+	CPU_ZERO(&mask);
+	CPU_SET(worker->cpu, &mask);
+
+	if (sched_setaffinity(0, sizeof(mask), &mask))
+		return (void *)(uintptr_t)errno;
+
+	while (!worker->stop) {
+		sched_yield();
+		worker->iters++;
+	}
+
+	return NULL;
+}
+
+static int join_worker(struct worker_ctx *worker)
+{
+	void *ret;
+	struct timespec ts;
+	int err;
+
+	if (!worker->started)
+		return 0;
+
+	if (clock_gettime(CLOCK_REALTIME, &ts))
+		return -errno;
+
+	ts.tv_sec += 2;
+	err = pthread_timedjoin_np(worker->tid, &ret, &ts);
+	if (err == ETIMEDOUT)
+		pthread_detach(worker->tid);
+	if (err)
+		return -err;
+
+	if ((uintptr_t)ret)
+		return -(int)(uintptr_t)ret;
+
+	return 0;
+}
+
+static enum scx_test_status setup(void **ctx)
+{
+	struct cyclic_kick_wait *skel;
+
+	skel = cyclic_kick_wait__open();
+	SCX_FAIL_IF(!skel, "Failed to open skel");
+	SCX_ENUM_INIT(skel);
+
+	*ctx = skel;
+	return SCX_TEST_PASS;
+}
+
+static enum scx_test_status run(void *ctx)
+{
+	struct cyclic_kick_wait *skel = ctx;
+	struct worker_ctx workers[NR_WORKERS] = {};
+	struct bpf_link *link = NULL;
+	enum scx_test_status status = SCX_TEST_PASS;
+	int test_cpus[NR_TEST_CPUS];
+	int nr_cpus = 0;
+	cpu_set_t mask;
+	int ret, i;
+
+	if (sched_getaffinity(0, sizeof(mask), &mask)) {
+		SCX_ERR("Failed to get affinity (%d)", errno);
+		return SCX_TEST_FAIL;
+	}
+
+	for (i = 0; i < CPU_SETSIZE; i++) {
+		if (CPU_ISSET(i, &mask))
+			test_cpus[nr_cpus++] = i;
+		if (nr_cpus == NR_TEST_CPUS)
+			break;
+	}
+
+	if (nr_cpus < NR_TEST_CPUS)
+		return SCX_TEST_SKIP;
+
+	skel->rodata->test_cpu_a = test_cpus[0];
+	skel->rodata->test_cpu_b = test_cpus[1];
+	skel->rodata->test_cpu_c = test_cpus[2];
+
+	if (cyclic_kick_wait__load(skel)) {
+		SCX_ERR("Failed to load skel");
+		return SCX_TEST_FAIL;
+	}
+
+	link = bpf_map__attach_struct_ops(skel->maps.cyclic_kick_wait_ops);
+	if (!link) {
+		SCX_ERR("Failed to attach scheduler");
+		return SCX_TEST_FAIL;
+	}
+
+	for (i = 0; i < NR_WORKERS; i++)
+		workers[i].cpu = test_cpus[i / WORKERS_PER_CPU];
+
+	for (i = 0; i < NR_WORKERS; i++) {
+		ret = pthread_create(&workers[i].tid, NULL, worker_fn, &workers[i]);
+		if (ret) {
+			SCX_ERR("Failed to create worker thread %d (%d)", i, ret);
+			status = SCX_TEST_FAIL;
+			goto out;
+		}
+		workers[i].started = true;
+	}
+
+	sleep(5);
+
+	if (skel->data->uei.kind != EXIT_KIND(SCX_EXIT_NONE)) {
+		SCX_ERR("Scheduler exited unexpectedly (kind=%llu code=%lld)",
+			(unsigned long long)skel->data->uei.kind,
+			(long long)skel->data->uei.exit_code);
+		status = SCX_TEST_FAIL;
+	}
+
+out:
+	for (i = 0; i < NR_WORKERS; i++)
+		workers[i].stop = true;
+
+	for (i = 0; i < NR_WORKERS; i++) {
+		ret = join_worker(&workers[i]);
+		if (ret && status == SCX_TEST_PASS) {
+			SCX_ERR("Failed to join worker thread %d (%d)", i, ret);
+			status = SCX_TEST_FAIL;
+		}
+	}
+
+	if (link)
+		bpf_link__destroy(link);
+
+	return status;
+}
+
+static void cleanup(void *ctx)
+{
+	struct cyclic_kick_wait *skel = ctx;
+
+	cyclic_kick_wait__destroy(skel);
+}
+
+struct scx_test cyclic_kick_wait = {
+	.name = "cyclic_kick_wait",
+	.description = "Verify SCX_KICK_WAIT forward progress under a 3-CPU wait cycle",
+	.setup = setup,
+	.run = run,
+	.cleanup = cleanup,
+};
+REGISTER_SCX_TEST(&cyclic_kick_wait)
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  2026-03-29  0:18 ` [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test Tejun Heo
@ 2026-03-29  9:06   ` Cheng-Yang Chou
  2026-03-29 15:52     ` Andrea Righi
  2026-03-30  8:51   ` Christian Loehle
  1 sibling, 1 reply; 9+ messages in thread
From: Cheng-Yang Chou @ 2026-03-29  9:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Vernet, Andrea Righi, Changwoo Min, Christian Loehle,
	Emil Tsalapatis, sched-ext, linux-kernel, Ching-Chun Huang,
	Chia-Ping Tsai

[-- Attachment #1: Type: text/plain, Size: 1287 bytes --]

Hi Tejun,

On Sat, Mar 28, 2026 at 02:18:56PM -1000, Tejun Heo wrote:
> Add a test that creates a 3-CPU kick_wait cycle (A->B->C->A). A BPF
> scheduler kicks the next CPU in the ring with SCX_KICK_WAIT on every
> enqueue while userspace workers generate continuous scheduling churn via
> sched_yield(). Without the preceding fix, this hangs the machine within seconds.

I think it would be better to skip this test on older, unpatched kernels.
Sometimes I use my local kernel for testing before the stable patches are
fully integrated, so skipping the test would be a safer approach.

Otherwise, this test can stall the machine and make it impossible to
exit the test runner.

Log:

$ sudo ./runner -t cyclic_kick_wait # on v6.14
===== START =====
TEST: cyclic_kick_wait
DESCRIPTION: Verify SCX_KICK_WAIT forward progress under a 3-CPU wait cycle
OUTPUT:
libbpf: struct_ops cyclic_kick_wait_ops: member cgroup_set_bandwidth not found in kernel, skipping it as it's set to zero
libbpf: struct_ops cyclic_kick_wait_ops: member cgroup_set_idle not found in kernel, skipping it as it's set to zero
libbpf: struct_ops cyclic_kick_wait_ops: member priv not found in kernel, skipping it as it's set to zero
ERR: cyclic_kick_wait.c:169
Failed to join worker thread 0 (-110)

Thanks,
Cheng-Yang


[-- Attachment #2: fix.patch --]
[-- Type: text/x-diff, Size: 1648 bytes --]

From 6f09d2298e547aa7a519d4bff262fc6851a969ea Mon Sep 17 00:00:00 2001
From: Cheng-Yang Chou <yphbchou0911@gmail.com>
Date: Sun, 29 Mar 2026 16:07:05 +0800
Subject: [PATCH] selftests/sched_ext: Skip cyclic_kick_wait on kernels without
 deadlock fix

The cyclic_kick_wait test triggers a deadlock on kernels lacking the
SCX_KICK_WAIT fix, causing the machine to hang or worker threads to
timeout (-110).

Use __COMPAT_struct_has_field() to probe vmlinux BTF for
rq_scx.kick_sync_pending, a field introduced by the SCX_KICK_WAIT
deadlock fix. Skip the test on older kernels that lack the fix rather
than hanging the machine.

Example failure on unpatched kernels:
  ERR: cyclic_kick_wait.c:169
  Failed to join worker thread 0 (-110)

Fixes: e9b990b76922 ("selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test")
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
---
 tools/testing/selftests/sched_ext/cyclic_kick_wait.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/sched_ext/cyclic_kick_wait.c b/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
index c2e5aa9de715..b060ffe63ba3 100644
--- a/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
+++ b/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
@@ -88,6 +88,11 @@ static enum scx_test_status setup(void **ctx)
 {
 	struct cyclic_kick_wait *skel;
 
+	if (!__COMPAT_struct_has_field("rq_scx", "kick_sync_pending")) {
+		fprintf(stderr, "Skipping test: kernel lacks SCX_KICK_WAIT deadlock fix\n");
+		return SCX_TEST_SKIP;
+	}
+
 	skel = cyclic_kick_wait__open();
 	SCX_FAIL_IF(!skel, "Failed to open skel");
 	SCX_ENUM_INIT(skel);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  2026-03-29  9:06   ` Cheng-Yang Chou
@ 2026-03-29 15:52     ` Andrea Righi
  2026-03-30  4:40       ` Cheng-Yang Chou
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea Righi @ 2026-03-29 15:52 UTC (permalink / raw)
  To: Cheng-Yang Chou
  Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle,
	Emil Tsalapatis, sched-ext, linux-kernel, Ching-Chun Huang,
	Chia-Ping Tsai

Hi Cheng-Yang,

On Sun, Mar 29, 2026 at 05:06:20PM +0800, Cheng-Yang Chou wrote:
> Hi Tejun,
> 
> On Sat, Mar 28, 2026 at 02:18:56PM -1000, Tejun Heo wrote:
> > Add a test that creates a 3-CPU kick_wait cycle (A->B->C->A). A BPF
> > scheduler kicks the next CPU in the ring with SCX_KICK_WAIT on every
> > enqueue while userspace workers generate continuous scheduling churn via
> > sched_yield(). Without the preceding fix, this hangs the machine within seconds.
> 
> I think it would be better to skip this test on older, unpatched kernels.
> Sometimes I use my local kernel for testing before the stable patches are
> fully integrated, so skipping the test would be a safer approach.
> 
> Otherwise, this test can stall the machine and make it impossible to
> exit the test runner.

Actually, I disagree. The whole point of the kselftests is to expose kernel
issues, including bugs in older kernels that may need fixes. So, I think we
shouldn't skip the test.

-Andrea

> 
> Log:
> 
> $ sudo ./runner -t cyclic_kick_wait # on v6.14
> ===== START =====
> TEST: cyclic_kick_wait
> DESCRIPTION: Verify SCX_KICK_WAIT forward progress under a 3-CPU wait cycle
> OUTPUT:
> libbpf: struct_ops cyclic_kick_wait_ops: member cgroup_set_bandwidth not found in kernel, skipping it as it's set to zero
> libbpf: struct_ops cyclic_kick_wait_ops: member cgroup_set_idle not found in kernel, skipping it as it's set to zero
> libbpf: struct_ops cyclic_kick_wait_ops: member priv not found in kernel, skipping it as it's set to zero
> ERR: cyclic_kick_wait.c:169
> Failed to join worker thread 0 (-110)
> 
> Thanks,
> Cheng-Yang
> 

> From 6f09d2298e547aa7a519d4bff262fc6851a969ea Mon Sep 17 00:00:00 2001
> From: Cheng-Yang Chou <yphbchou0911@gmail.com>
> Date: Sun, 29 Mar 2026 16:07:05 +0800
> Subject: [PATCH] selftests/sched_ext: Skip cyclic_kick_wait on kernels without
>  deadlock fix
> 
> The cyclic_kick_wait test triggers a deadlock on kernels lacking the
> SCX_KICK_WAIT fix, causing the machine to hang or worker threads to
> timeout (-110).
> 
> Use __COMPAT_struct_has_field() to probe vmlinux BTF for
> rq_scx.kick_sync_pending, a field introduced by the SCX_KICK_WAIT
> deadlock fix. Skip the test on older kernels that lack the fix rather
> than hanging the machine.
> 
> Example failure on unpatched kernels:
>   ERR: cyclic_kick_wait.c:169
>   Failed to join worker thread 0 (-110)
> 
> Fixes: e9b990b76922 ("selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test")
> Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
> ---
>  tools/testing/selftests/sched_ext/cyclic_kick_wait.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/tools/testing/selftests/sched_ext/cyclic_kick_wait.c b/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
> index c2e5aa9de715..b060ffe63ba3 100644
> --- a/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
> +++ b/tools/testing/selftests/sched_ext/cyclic_kick_wait.c
> @@ -88,6 +88,11 @@ static enum scx_test_status setup(void **ctx)
>  {
>  	struct cyclic_kick_wait *skel;
>  
> +	if (!__COMPAT_struct_has_field("rq_scx", "kick_sync_pending")) {
> +		fprintf(stderr, "Skipping test: kernel lacks SCX_KICK_WAIT deadlock fix\n");
> +		return SCX_TEST_SKIP;
> +	}
> +
>  	skel = cyclic_kick_wait__open();
>  	SCX_FAIL_IF(!skel, "Failed to open skel");
>  	SCX_ENUM_INIT(skel);
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
  2026-03-29  0:18 ` [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback Tejun Heo
@ 2026-03-29 16:26   ` Andrea Righi
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2026-03-29 16:26 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis,
	sched-ext, linux-kernel, stable

Hi Tejun,

On Sat, Mar 28, 2026 at 02:18:55PM -1000, Tejun Heo wrote:
> SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using
> smp_cond_load_acquire() until the target CPU's kick_sync advances. Because
> the irq_work runs in hardirq context, the waiting CPU cannot reschedule and
> its own kick_sync never advances. If multiple CPUs form a wait cycle, all
> CPUs deadlock.
> 
> Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to
> force the CPU through do_pick_task_scx(), which queues a balance callback
> to perform the wait. The balance callback drops the rq lock and enables
> IRQs following the sched_core_balance() pattern, so the CPU can process
> IPIs while waiting. The local CPU's kick_sync is advanced on entry to
> do_pick_task_scx() and continuously during the wait, ensuring any CPU that
> starts waiting for us sees the advancement and cannot form cyclic
> dependencies.
> 
> Fixes: 90e55164dad4 ("sched_ext: Implement SCX_KICK_WAIT")
> Cc: stable@vger.kernel.org # v6.12+
> Reported-by: Christian Loehle <christian.loehle@arm.com>
> Link: https://lore.kernel.org/r/20260316100249.1651641-1-christian.loehle@arm.com
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  kernel/sched/ext.c   | 95 ++++++++++++++++++++++++++++++++------------
>  kernel/sched/sched.h |  3 ++
>  2 files changed, 73 insertions(+), 25 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 26a6ac2f8826..d5bdcdb3f700 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -2404,7 +2404,7 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
>  {
>  	struct scx_sched *sch = scx_root;
>  
> -	/* see kick_cpus_irq_workfn() */
> +	/* see kick_sync_wait_bal_cb() */
>  	smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
>  
>  	update_curr_scx(rq);
> @@ -2447,6 +2447,48 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
>  		switch_class(rq, next);
>  }
>  
> +static void kick_sync_wait_bal_cb(struct rq *rq)
> +{
> +	struct scx_kick_syncs __rcu *ks = __this_cpu_read(scx_kick_syncs);
> +	unsigned long *ksyncs = rcu_dereference_sched(ks)->syncs;
> +	bool waited;
> +	s32 cpu;
> +
> +	/*
> +	 * Drop rq lock and enable IRQs while waiting. IRQs must be enabled
> +	 * — a target CPU may be waiting for us to process an IPI (e.g. TLB

nit: s/—/-/

> +	 * flush) while we wait for its kick_sync to advance.
> +	 *
> +	 * Also, keep advancing our own kick_sync so that new kick_sync waits
> +	 * targeting us, which can start after we drop the lock, cannot form
> +	 * cyclic dependencies.
> +	 */
> +retry:
> +	waited = false;
> +	for_each_cpu(cpu, rq->scx.cpus_to_sync) {
> +		/*
> +		 * smp_load_acquire() pairs with smp_store_release() on
> +		 * kick_sync updates on the target CPUs.
> +		 */
> +		if (cpu == cpu_of(rq) ||
> +		    smp_load_acquire(&cpu_rq(cpu)->scx.kick_sync) != ksyncs[cpu]) {
> +			cpumask_clear_cpu(cpu, rq->scx.cpus_to_sync);
> +			continue;
> +		}

Should we add something like:

		if (cpu != cpu_of(rq) && !cpu_online(cpu)) {
			cpumask_clear_cpu(cpu, rq->scx.cpus_to_sync);
			continue;
		}

> +
> +		raw_spin_rq_unlock_irq(rq);
> +		while (READ_ONCE(cpu_rq(cpu)->scx.kick_sync) == ksyncs[cpu]) {

And here:
			if (cpu != cpu_of(rq) && !cpu_online(cpu))
				break;

(see below)

> +			smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
> +			cpu_relax();
> +		}
> +		raw_spin_rq_lock_irq(rq);
> +		waited = true;
> +	}
> +
> +	if (waited)
> +		goto retry;
> +}
> +
>  static struct task_struct *first_local_task(struct rq *rq)
>  {
>  	return list_first_entry_or_null(&rq->scx.local_dsq.list,
> @@ -2460,7 +2502,7 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx)
>  	bool keep_prev;
>  	struct task_struct *p;
>  
> -	/* see kick_cpus_irq_workfn() */
> +	/* see kick_sync_wait_bal_cb() */
>  	smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1);
>  
>  	rq_modified_begin(rq, &ext_sched_class);
> @@ -2470,6 +2512,17 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx)
>  	rq_repin_lock(rq, rf);
>  	maybe_queue_balance_callback(rq);
>  
> +	/*
> +	 * Defer to a balance callback which can drop rq lock and enable
> +	 * IRQs. Waiting directly in the pick path would deadlock against
> +	 * CPUs sending us IPIs (e.g. TLB flushes) while we wait for them.
> +	 */
> +	if (unlikely(rq->scx.kick_sync_pending)) {
> +		rq->scx.kick_sync_pending = false;
> +		queue_balance_callback(rq, &rq->scx.kick_sync_bal_cb,
> +				       kick_sync_wait_bal_cb);

queue_balance_callback() is a no-op if the rq is in balance_push, but I
guess it's ok to just clear the kick_sync_pending if we add the checks
above.

> +	}
> +
>  	/*
>  	 * If any higher-priority sched class enqueued a runnable task on
>  	 * this rq during balance_one(), abort and return RETRY_TASK, so
> @@ -4713,6 +4766,9 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len)
>  		if (!cpumask_empty(rq->scx.cpus_to_wait))
>  			dump_line(&ns, "  cpus_to_wait   : %*pb",
>  				  cpumask_pr_args(rq->scx.cpus_to_wait));
> +		if (!cpumask_empty(rq->scx.cpus_to_sync))
> +			dump_line(&ns, "  cpus_to_sync   : %*pb",
> +				  cpumask_pr_args(rq->scx.cpus_to_sync));
>  
>  		used = seq_buf_used(&ns);
>  		if (SCX_HAS_OP(sch, dump_cpu)) {
> @@ -5610,11 +5666,11 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *ksyncs)
>  
>  		if (cpumask_test_cpu(cpu, this_scx->cpus_to_wait)) {
>  			if (cur_class == &ext_sched_class) {
> +				cpumask_set_cpu(cpu, this_scx->cpus_to_sync);
>  				ksyncs[cpu] = rq->scx.kick_sync;
>  				should_wait = true;
> -			} else {
> -				cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
>  			}
> +			cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
>  		}
>  
>  		resched_curr(rq);
> @@ -5669,27 +5725,15 @@ static void kick_cpus_irq_workfn(struct irq_work *irq_work)
>  		cpumask_clear_cpu(cpu, this_scx->cpus_to_kick_if_idle);
>  	}
>  
> -	if (!should_wait)
> -		return;
> -
> -	for_each_cpu(cpu, this_scx->cpus_to_wait) {
> -		unsigned long *wait_kick_sync = &cpu_rq(cpu)->scx.kick_sync;
> -
> -		/*
> -		 * Busy-wait until the task running at the time of kicking is no
> -		 * longer running. This can be used to implement e.g. core
> -		 * scheduling.
> -		 *
> -		 * smp_cond_load_acquire() pairs with store_releases in
> -		 * pick_task_scx() and put_prev_task_scx(). The former breaks
> -		 * the wait if SCX's scheduling path is entered even if the same
> -		 * task is picked subsequently. The latter is necessary to break
> -		 * the wait when $cpu is taken by a higher sched class.
> -		 */
> -		if (cpu != cpu_of(this_rq))
> -			smp_cond_load_acquire(wait_kick_sync, VAL != ksyncs[cpu]);
> -
> -		cpumask_clear_cpu(cpu, this_scx->cpus_to_wait);
> +	/*
> +	 * Can't wait in hardirq — kick_sync can't advance, deadlocking if
> +	 * CPUs wait for each other. Defer to kick_sync_wait_bal_cb().
> +	 */
> +	if (should_wait) {
> +		raw_spin_rq_lock(this_rq);
> +		this_scx->kick_sync_pending = true;
> +		resched_curr(this_rq);
> +		raw_spin_rq_unlock(this_rq);
>  	}
>  }
>  
> @@ -5794,6 +5838,7 @@ void __init init_sched_ext_class(void)
>  		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_kick_if_idle, GFP_KERNEL, n));
>  		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_preempt, GFP_KERNEL, n));
>  		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_wait, GFP_KERNEL, n));
> +		BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_sync, GFP_KERNEL, n));
>  		rq->scx.deferred_irq_work = IRQ_WORK_INIT_HARD(deferred_irq_workfn);
>  		rq->scx.kick_cpus_irq_work = IRQ_WORK_INIT_HARD(kick_cpus_irq_workfn);
>  
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 43bbf0693cca..1ef9ba480f51 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -805,9 +805,12 @@ struct scx_rq {
>  	cpumask_var_t		cpus_to_kick_if_idle;
>  	cpumask_var_t		cpus_to_preempt;
>  	cpumask_var_t		cpus_to_wait;
> +	cpumask_var_t		cpus_to_sync;
> +	bool			kick_sync_pending;
>  	unsigned long		kick_sync;
>  	local_t			reenq_local_deferred;
>  	struct balance_callback	deferred_bal_cb;
> +	struct balance_callback	kick_sync_bal_cb;
>  	struct irq_work		deferred_irq_work;
>  	struct irq_work		kick_cpus_irq_work;
>  	struct scx_dispatch_q	bypass_dsq;
> -- 
> 2.53.0
> 

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  2026-03-29 15:52     ` Andrea Righi
@ 2026-03-30  4:40       ` Cheng-Yang Chou
  0 siblings, 0 replies; 9+ messages in thread
From: Cheng-Yang Chou @ 2026-03-30  4:40 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle,
	Emil Tsalapatis, sched-ext, linux-kernel, Ching-Chun Huang,
	Chia-Ping Tsai

Hi Andrea,

On Sun, Mar 29, 2026 at 05:52:12PM +0200, Andrea Righi wrote:
> Hi Cheng-Yang,
> 
> On Sun, Mar 29, 2026 at 05:06:20PM +0800, Cheng-Yang Chou wrote:
> > Hi Tejun,
> > 
> > On Sat, Mar 28, 2026 at 02:18:56PM -1000, Tejun Heo wrote:
> > > Add a test that creates a 3-CPU kick_wait cycle (A->B->C->A). A BPF
> > > scheduler kicks the next CPU in the ring with SCX_KICK_WAIT on every
> > > enqueue while userspace workers generate continuous scheduling churn via
> > > sched_yield(). Without the preceding fix, this hangs the machine within seconds.
> > 
> > I think it would be better to skip this test on older, unpatched kernels.
> > Sometimes I use my local kernel for testing before the stable patches are
> > fully integrated, so skipping the test would be a safer approach.
> > 
> > Otherwise, this test can stall the machine and make it impossible to
> > exit the test runner.
> 
> Actually, I disagree. The whole point of the kselftests is to expose kernel
> issues, including bugs in older kernels that may need fixes. So, I think we
> shouldn't skip the test.

I see your point. While the system stall is a bit painful during local 
machine testing on unpatched kernels, I agree that kselftests should 
remain an effective gatekeeper for kernel issues.

Or maybe return SCX_TEST_FAIL? 

Since without this patch, this test won't pass anyway. Perhaps failing
more gracefully is a better middle ground than stalling the entire machine.

-- 
Thanks,
Cheng-Yang

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
  2026-03-29  0:18 ` [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test Tejun Heo
  2026-03-29  9:06   ` Cheng-Yang Chou
@ 2026-03-30  8:51   ` Christian Loehle
  1 sibling, 0 replies; 9+ messages in thread
From: Christian Loehle @ 2026-03-30  8:51 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
  Cc: Emil Tsalapatis, sched-ext, linux-kernel

On 3/29/26 00:18, Tejun Heo wrote:
> Add a test that creates a 3-CPU kick_wait cycle (A->B->C->A). A BPF
> scheduler kicks the next CPU in the ring with SCX_KICK_WAIT on every
> enqueue while userspace workers generate continuous scheduling churn via
> sched_yield(). Without the preceding fix, this hangs the machine within seconds.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christian Loehle <christian.loehle@arm.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock
  2026-03-29  0:18 [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock Tejun Heo
  2026-03-29  0:18 ` [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback Tejun Heo
  2026-03-29  0:18 ` [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test Tejun Heo
@ 2026-03-30  8:52 ` Christian Loehle
  2 siblings, 0 replies; 9+ messages in thread
From: Christian Loehle @ 2026-03-30  8:52 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min
  Cc: Emil Tsalapatis, sched-ext, linux-kernel

On 3/29/26 00:18, Tejun Heo wrote:
> Hello,
> 
> SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() until the target CPU
> reschedules. Because the irq_work runs in hardirq context, the waiting
> CPU's kick_sync never advances, and if multiple CPUs form a wait cycle, all
> deadlock. This was reported by Christian while testing on arm64.
> 
> 0001 fixes the deadlock by deferring the wait to a balance callback which
> drops the rq lock and enables IRQs, allowing IPIs to be processed and
> kick_sync to keep advancing during the wait.
> 
> 0002 adds a selftest that creates a 3-CPU kick_wait cycle to reproduce the
> issue.
> 
> Based on sched_ext/for-7.0-fixes (db08b1940f4b).
> 
>  0001-sched_ext-Fix-SCX_KICK_WAIT-deadlock-by-deferring-wa.patch
>  0002-selftests-sched_ext-Add-cyclic-SCX_KICK_WAIT-stress-.patch
> 
>  kernel/sched/ext.c                                 |  95 +++++++---
>  kernel/sched/sched.h                               |   3 +
>  tools/testing/selftests/sched_ext/Makefile         |   1 +
>  .../selftests/sched_ext/cyclic_kick_wait.bpf.c     |  68 ++++++++
>  .../testing/selftests/sched_ext/cyclic_kick_wait.c | 194 +++++++++++++++++++++
>  5 files changed, 336 insertions(+), 25 deletions(-)
> 
> --
> tejun

For both:
Tested-by: Christian Loehle <christian.loehle@arm.com>

Thanks for fixing this!

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2026-03-30  8:52 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-29  0:18 [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock Tejun Heo
2026-03-29  0:18 ` [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback Tejun Heo
2026-03-29 16:26   ` Andrea Righi
2026-03-29  0:18 ` [PATCH 2/2] selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test Tejun Heo
2026-03-29  9:06   ` Cheng-Yang Chou
2026-03-29 15:52     ` Andrea Righi
2026-03-30  4:40       ` Cheng-Yang Chou
2026-03-30  8:51   ` Christian Loehle
2026-03-30  8:52 ` [PATCHSET sched_ext/for-7.0-fixes] sched_ext: Fix SCX_KICK_WAIT deadlock Christian Loehle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox