The Linux Kernel Mailing List
 help / color / mirror / Atom feed
* [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting
@ 2026-05-07 13:56 Christian Loehle
  2026-05-08 14:14 ` Andrea Righi
  2026-05-08 15:28 ` Tejun Heo
  0 siblings, 2 replies; 6+ messages in thread
From: Christian Loehle @ 2026-05-07 13:56 UTC (permalink / raw)
  To: sched-ext; +Cc: linux-kernel, tj, void, arighi, changwoo, Christian Loehle

When aborting, consume_dispatch_q() breaks out of the task iteration
loop entirely for non-bypass DSQs. This prevents CPUs from consuming
even their own tasks (where rq == task_rq) from any DSQ.

This causes a deadlock during CPU hotplug:

1. The BPF scheduler's cpu_offline callback calls scx_bpf_exit(),
   setting sch->aborting and queuing the disable_work on the helper
   kthread.

2. The helper kthread (and other tasks) are stuck on the global or
   user DSQs because bypass mode hasn't been entered yet.

3. No CPU can consume these tasks due to the aborting break, so the
   helper never runs scx_root_disable() -> scx_bypass().

4. The cpuhp thread is stuck in balance_hotplug_wait() because the
   dying CPU's rq never drains.

Tasks on user DSQs are equally affected: BPF schedulers can dispatch
RCU and other critical kthreads to user DSQs, causing RCU stalls when
those tasks become unconsumable.

The aborting check was added to prevent live-locks from the remote task
migration path (consume_remote_task() -> goto retry), but also avoid
holding the dsq->lock for too long.

Change the break to skip only remote tasks via continue, allowing each
CPU to still consume tasks already on its own rq. This unblocks the
helper kthread, lets bypass mode activate, and allows both hotplug and
RCU grace periods to complete.

Fixes: 5ebec443fb96 ("sched_ext: Exit dispatch and move operations immediately when aborting")
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
RFC:
I guess this reintroduces the live-lock of a BPF scheduler having a
highly contended DSQ with a lot of tasks and the outer loop holding
dsq->lock and therefore it still taking too long for the bypass to
activate, is there a better way?
I also couldn't trigger a lockup through that, did I just not have
the right platform (e.g. 2x Intel 8480c). Should we add a selftest
for this too, then?

 kernel/sched/ext.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 345aa11b84b2..3cce200708b0 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2463,10 +2463,13 @@ static bool consume_dispatch_q(struct scx_sched *sch, struct rq *rq,
 		 * a contended DSQ, or the outer retry loop can repeatedly race
 		 * against scx_bypass() dequeueing tasks from @dsq trying to put
 		 * the system into the bypass mode. This can easily live-lock the
-		 * machine. If aborting, exit from all non-bypass DSQs.
+		 * machine. If aborting, skip remote tasks from non-bypass DSQs
+		 * but still allow consuming local tasks to prevent deadlocks
+		 * during CPU hotplug where the dying CPU must drain its rq.
 		 */
-		if (unlikely(READ_ONCE(sch->aborting)) && dsq->id != SCX_DSQ_BYPASS)
-			break;
+		if (unlikely(READ_ONCE(sch->aborting)) && dsq->id != SCX_DSQ_BYPASS
+		    && rq != task_rq)
+			continue;
 
 		if (rq == task_rq) {
 			task_unlink_from_dsq(p, dsq);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-05-08 17:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-07 13:56 [RFC][PATCH] sched_ext: Allow consuming local tasks when aborting Christian Loehle
2026-05-08 14:14 ` Andrea Righi
2026-05-08 15:45   ` Christian Loehle
2026-05-08 15:28 ` Tejun Heo
2026-05-08 15:47   ` Andrea Righi
2026-05-08 17:59     ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox