public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id
@ 2026-04-01 21:56 Andrea Righi
  2026-04-01 22:46 ` Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Andrea Righi @ 2026-04-01 21:56 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min
  Cc: Daniel Hodges, sched-ext, linux-kernel

@p->scx.ddsp_dsq_id can be left set (non-SCX_DSQ_INVALID) in three
scenarios, causing a spurious WARN_ON_ONCE() in mark_direct_dispatch()
when the next wakeup's ops.select_cpu() calls scx_bpf_dsq_insert():

1. Deferred dispatch cancellation: when a task is directly dispatched to
   a remote CPU's local DSQ via ops.select_cpu() or ops.enqueue(), the
   dispatch is deferred (since we can't lock the remote rq while holding
   the current one). If the task is dequeued before processing the
   dispatch in process_ddsp_deferred_locals(), dispatch_dequeue()
   removes the task from the list leaving a stale direct dispatch state.

   Fix: clear ddsp_dsq_id and ddsp_enq_flags in the !list_empty branch
   of dispatch_dequeue().

2. Holding-cpu dispatch race: when dispatch_to_local_dsq() transfers a
   task to another CPU's local DSQ, it sets holding_cpu and releases
   DISPATCHING before locking the source rq. If dequeue wins the race
   and clears holding_cpu, dispatch_enqueue() is never called and
   ddsp_dsq_id is not cleared.

   Fix: clear ddsp_dsq_id and ddsp_enq_flags when clearing holding_cpu
   in dispatch_dequeue().

3. Cross-scheduler-instance stale state: When an SCX scheduler exits,
   scx_bypass() iterates over all runnable tasks to dequeue/re-enqueue
   them, but sleeping tasks are not on any runqueue and are not touched.
   If a sleeping task had a deferred dispatch in flight (ddsp_dsq_id
   set) at the time the scheduler exited, the state persists. When a new
   scheduler instance loads and calls scx_enable_task() for all tasks,
   it does not reset this leftover state. The next wakeup's
   ops.select_cpu() then sees a non-INVALID ddsp_dsq_id and triggers:

     WARN_ON_ONCE(p->scx.ddsp_dsq_id != SCX_DSQ_INVALID)

   Fix: clear ddsp_dsq_id and ddsp_enq_flags in scx_enable_task() before
   calling ops.enable(), ensuring each new scheduler instance starts
   with a clean direct dispatch state per task.

With all the fixes applied the SCX_DSQ_INVALID warning doesn't seem to
happen anymore.

Fixes: 5b26f7b920f76 ("sched_ext: Allow SCX_DSQ_LOCAL_ON for direct dispatches")
Cc: stable@vger.kernel.org # v6.12+
Cc: Daniel Hodges <hodgesd@meta.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 23 +++++++++++++++++++----
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 26a6ac2f88267..de827ce0ffb74 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -1163,20 +1163,32 @@ static void dispatch_dequeue(struct rq *rq, struct task_struct *p)
 	if (!dsq) {
 		/*
 		 * If !dsq && on-list, @p is on @rq's ddsp_deferred_locals.
-		 * Unlinking is all that's needed to cancel.
+		 * Unlink and clear the deferred dispatch state.
 		 */
-		if (unlikely(!list_empty(&p->scx.dsq_list.node)))
+		if (unlikely(!list_empty(&p->scx.dsq_list.node))) {
 			list_del_init(&p->scx.dsq_list.node);
 
+			p->scx.ddsp_dsq_id = SCX_DSQ_INVALID;
+			p->scx.ddsp_enq_flags = 0;
+		}
+
 		/*
 		 * When dispatching directly from the BPF scheduler to a local
 		 * DSQ, the task isn't associated with any DSQ but
 		 * @p->scx.holding_cpu may be set under the protection of
-		 * %SCX_OPSS_DISPATCHING.
+		 * %SCX_OPSS_DISPATCHING. If we win the race and clear
+		 * holding_cpu before dispatch_to_local_dsq() completes, the
+		 * in-flight dispatch is cancelled and dispatch_enqueue() won't
+		 * be called. Clear the stale direct dispatch state here so the
+		 * next wakeup starts clean.
 		 */
-		if (p->scx.holding_cpu >= 0)
+		if (p->scx.holding_cpu >= 0) {
 			p->scx.holding_cpu = -1;
 
+			p->scx.ddsp_dsq_id = SCX_DSQ_INVALID;
+			p->scx.ddsp_enq_flags = 0;
+		}
+
 		return;
 	}
 
@@ -2945,6 +2957,9 @@ static void scx_enable_task(struct task_struct *p)
 
 	p->scx.weight = sched_weight_to_cgroup(weight);
 
+	p->scx.ddsp_dsq_id = SCX_DSQ_INVALID;
+	p->scx.ddsp_enq_flags = 0;
+
 	if (SCX_HAS_OP(sch, enable))
 		SCX_CALL_OP_TASK(sch, SCX_KF_REST, enable, rq, p);
 	scx_set_task_state(p, SCX_TASK_ENABLED);
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-02  7:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-01 21:56 [PATCH] sched_ext: Fix stale direct dispatch state in ddsp_dsq_id Andrea Righi
2026-04-01 22:46 ` Tejun Heo
2026-04-02  7:40   ` Andrea Righi
2026-04-02  7:54     ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox