public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
@ 2025-01-24  7:24 Andrea Righi
  2025-01-24 19:08 ` Tejun Heo
  0 siblings, 1 reply; 3+ messages in thread
From: Andrea Righi @ 2025-01-24  7:24 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel

While performing the rq locking dance in dispatch_to_local_dsq(), we may
trigger the following lock imbalance condition, in particular when
multiple tasks are rapidly changing CPU affinity (i.e., running a
`stress-ng --race-sched 0`):

[   13.413579] =====================================
[   13.413660] WARNING: bad unlock balance detected!
[   13.413729] 6.13.0-virtme #15 Not tainted
[   13.413792] -------------------------------------
[   13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at:
[   13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0
[   13.414111] but there are no more locks to release!
[   13.414176]
[   13.414176] other info that might help us debug this:
[   13.414258] 1 lock held by kworker/1:1/80:
[   13.414318]  #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90
[   13.414612]
[   13.414612] stack backtrace:
[   13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme #15
[   13.415505] Workqueue:  0x0 (events)
[   13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms
[   13.415570] Call Trace:
[   13.415700]  <TASK>
[   13.415744]  dump_stack_lvl+0x78/0xe0
[   13.415806]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.415884]  print_unlock_imbalance_bug+0x11b/0x130
[   13.415965]  ? dispatch_to_local_dsq+0x108/0x1a0
[   13.416226]  lock_release+0x231/0x2c0
[   13.416326]  _raw_spin_unlock+0x1b/0x40
[   13.416422]  dispatch_to_local_dsq+0x108/0x1a0
[   13.416554]  flush_dispatch_buf+0x199/0x1d0
[   13.416652]  balance_one+0x194/0x370
[   13.416751]  balance_scx+0x61/0x1e0
[   13.416848]  prev_balance+0x43/0xb0
[   13.416947]  __pick_next_task+0x6b/0x1b0
[   13.417052]  __schedule+0x20d/0x1740

This happens because dispatch_to_local_dsq() is racing with
dispatch_dequeue(), when the latter wins we incorrectly assume that the
task has been moved to dst_rq.

Fix this by correctly assuming that task is still in src_rq in this
specific scenario.

Fixes: 4d3ca89bdd31 ("sched_ext: Refactor consume_remote_task()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
 kernel/sched/ext.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

ChangeLog v1 -> v2:
 - more comments to clarify the race with dequeue
 - rebase to tip

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 3f3f6baac917..92b69c57b400 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2598,7 +2598,10 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
 		raw_spin_rq_lock(src_rq);
 	}
 
-	/* task_rq couldn't have changed if we're still the holding cpu */
+	/*
+	 * If p->scx.holding_cpu still matches the current CPU, task_rq(p)
+	 * has not changed and we can safely move the task to @dst_rq.
+	 */
 	if (likely(p->scx.holding_cpu == raw_smp_processor_id()) &&
 	    !WARN_ON_ONCE(src_rq != task_rq(p))) {
 		/*
@@ -2617,6 +2620,13 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
 		/* if the destination CPU is idle, wake it up */
 		if (sched_class_above(p->sched_class, dst_rq->curr->sched_class))
 			resched_curr(dst_rq);
+	} else {
+		/*
+		 * Otherwise, if dequeue wins the race, we no longer have
+		 * exclusive ownership of the task and we must keep it in
+		 * its original @src_dsq.
+		 */
+		dst_rq = src_rq;
 	}
 
 	/* switch back to @rq lock */
-- 
2.48.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-01-25  5:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-24  7:24 [PATCH v2] sched_ext: Fix lock imbalance in dispatch_to_local_dsq() Andrea Righi
2025-01-24 19:08 ` Tejun Heo
2025-01-25  5:02   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox