* [PATCH] sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
@ 2025-01-23 23:42 Andrea Righi
2025-01-24 2:21 ` Changwoo Min
0 siblings, 1 reply; 3+ messages in thread
From: Andrea Righi @ 2025-01-23 23:42 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Changwoo Min; +Cc: linux-kernel
While performing the rq locking dance in dispatch_to_local_dsq(), we may
trigger the following lock imbalance condition, in particular when
multiple tasks are rapidly changing CPU affinity (i.e., running a
`stress-ng --race-sched 0`):
[ 13.413579] =====================================
[ 13.413660] WARNING: bad unlock balance detected!
[ 13.413729] 6.13.0-virtme #15 Not tainted
[ 13.413792] -------------------------------------
[ 13.413859] kworker/1:1/80 is trying to release lock (&rq->__lock) at:
[ 13.413954] [<ffffffff873c6c48>] dispatch_to_local_dsq+0x108/0x1a0
[ 13.414111] but there are no more locks to release!
[ 13.414176]
[ 13.414176] other info that might help us debug this:
[ 13.414258] 1 lock held by kworker/1:1/80:
[ 13.414318] #0: ffff8b66feb41698 (&rq->__lock){-.-.}-{2:2}, at: raw_spin_rq_lock_nested+0x20/0x90
[ 13.414612]
[ 13.414612] stack backtrace:
[ 13.415255] CPU: 1 UID: 0 PID: 80 Comm: kworker/1:1 Not tainted 6.13.0-virtme #15
[ 13.415505] Workqueue: 0x0 (events)
[ 13.415567] Sched_ext: dsp_local_on (enabled+all), task: runnable_at=-2ms
[ 13.415570] Call Trace:
[ 13.415700] <TASK>
[ 13.415744] dump_stack_lvl+0x78/0xe0
[ 13.415806] ? dispatch_to_local_dsq+0x108/0x1a0
[ 13.415884] print_unlock_imbalance_bug+0x11b/0x130
[ 13.415965] ? dispatch_to_local_dsq+0x108/0x1a0
[ 13.416226] lock_release+0x231/0x2c0
[ 13.416326] _raw_spin_unlock+0x1b/0x40
[ 13.416422] dispatch_to_local_dsq+0x108/0x1a0
[ 13.416554] flush_dispatch_buf+0x199/0x1d0
[ 13.416652] balance_one+0x194/0x370
[ 13.416751] balance_scx+0x61/0x1e0
[ 13.416848] prev_balance+0x43/0xb0
[ 13.416947] __pick_next_task+0x6b/0x1b0
[ 13.417052] __schedule+0x20d/0x1740
This happens because dispatch_to_local_dsq() is racing with
dispatch_dequeue(), when the latter wins we incorrectly assume that the
task has been moved to the dst_rq.
Fix this by correctly assuming that task is still in the src_rq in this
specific scenario.
Fixes: 4d3ca89bdd31 ("sched_ext: Refactor consume_remote_task()")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
kernel/sched/ext.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index a24d48cebfb7..7500b1a26757 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2617,6 +2617,8 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
/* if the destination CPU is idle, wake it up */
if (sched_class_above(p->sched_class, dst_rq->curr->sched_class))
resched_curr(dst_rq);
+ } else {
+ dst_rq = src_rq;
}
/* switch back to @rq lock */
--
2.48.1
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
2025-01-23 23:42 [PATCH] sched_ext: Fix lock imbalance in dispatch_to_local_dsq() Andrea Righi
@ 2025-01-24 2:21 ` Changwoo Min
2025-01-24 6:21 ` Andrea Righi
0 siblings, 1 reply; 3+ messages in thread
From: Changwoo Min @ 2025-01-24 2:21 UTC (permalink / raw)
To: Andrea Righi, Tejun Heo, David Vernet; +Cc: linux-kernel
Hello Andrea,
On 25. 1. 24. 08:42, Andrea Righi wrote:
> kernel/sched/ext.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index a24d48cebfb7..7500b1a26757 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -2617,6 +2617,8 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
> /* if the destination CPU is idle, wake it up */
> if (sched_class_above(p->sched_class, dst_rq->curr->sched_class))
> resched_curr(dst_rq);
> + } else {
> + dst_rq = src_rq;
> }
The fix makes sense to me. Since this is a very specific and
tricky case, it will be better to include detailed comments in
the else part so anyone can easily understand why the else part
is necessary.
Regards,
Changwoo Min
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] sched_ext: Fix lock imbalance in dispatch_to_local_dsq()
2025-01-24 2:21 ` Changwoo Min
@ 2025-01-24 6:21 ` Andrea Righi
0 siblings, 0 replies; 3+ messages in thread
From: Andrea Righi @ 2025-01-24 6:21 UTC (permalink / raw)
To: Changwoo Min; +Cc: Tejun Heo, David Vernet, linux-kernel
On Fri, Jan 24, 2025 at 11:21:33AM +0900, Changwoo Min wrote:
> Hello Andrea,
>
> On 25. 1. 24. 08:42, Andrea Righi wrote:
> > kernel/sched/ext.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> > index a24d48cebfb7..7500b1a26757 100644
> > --- a/kernel/sched/ext.c
> > +++ b/kernel/sched/ext.c
> > @@ -2617,6 +2617,8 @@ static void dispatch_to_local_dsq(struct rq *rq, struct scx_dispatch_q *dst_dsq,
> > /* if the destination CPU is idle, wake it up */
> > if (sched_class_above(p->sched_class, dst_rq->curr->sched_class))
> > resched_curr(dst_rq);
> > + } else {
> > + dst_rq = src_rq;
> > }
>
> The fix makes sense to me. Since this is a very specific and
> tricky case, it will be better to include detailed comments in
> the else part so anyone can easily understand why the else part
> is necessary.
Good idea, I'll send a v2 including a comment in the else part.
Thanks!
-Andrea
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-24 6:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 23:42 [PATCH] sched_ext: Fix lock imbalance in dispatch_to_local_dsq() Andrea Righi
2025-01-24 2:21 ` Changwoo Min
2025-01-24 6:21 ` Andrea Righi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox