From: Tejun Heo <tj@kernel.org>
To: Andrea Righi <arighi@nvidia.com>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Dan Schatzberg <schatzberg.dan@gmail.com>,
Emil Tsalapatis <etsal@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 13/13] sched_ext: Implement load balancer for bypass mode
Date: Mon, 10 Nov 2025 09:21:46 -1000 [thread overview]
Message-ID: <aRI7SpAS_CQeS-Ph@slm.duckdns.org> (raw)
In-Reply-To: <aRGyo6M9AbInZTkb@gpd4>
Hello,
On Mon, Nov 10, 2025 at 10:38:43AM +0100, Andrea Righi wrote:
> > @@ -965,7 +980,9 @@ static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
> > !RB_EMPTY_NODE(&p->scx.dsq_priq));
> >
> > if (!is_local) {
> > - raw_spin_lock(&dsq->lock);
> > + raw_spin_lock_nested(&dsq->lock,
> > + (enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
> > +
> > if (unlikely(dsq->id == SCX_DSQ_INVALID)) {
> > scx_error(sch, "attempting to dispatch to a destroyed dsq");
> > /* fall back to the global dsq */
>
> Outside the context of the patch we're doing:
>
> /* fall back to the global dsq */
> raw_spin_unlock(&dsq->lock);
> dsq = find_global_dsq(sch, p);
> raw_spin_lock(&dsq->lock);
>
> I think we should we preserve the nested lock annotation also when locking
> the global DSQ and do:
>
> raw_spin_lock_nested(&dsq->lock,
> (enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
>
> It seems correct either way, but without this I think we could potentially
> trigger false positive lockdep warnings.
That'd be a bug. I'll add an explicit WARN. I don't think falling back to
global DSQ quietly makes sense - e.g. global DSQ is not even consumed in
bypass mode anymore.
> > + /*
> > + * Moving $p from one non-local DSQ to another. The source DSQ
> > + * is already locked. Do an abbreviated dequeue and then perform
> > + * enqueue without unlocking $donor_dsq.
> > + *
> > + * We don't want to drop and reacquire the lock on each
> > + * iteration as @donor_dsq can be very long and potentially
> > + * highly contended. Donee DSQs are less likely to be contended.
> > + * The nested locking is safe as only this LB moves tasks
> > + * between bypass DSQs.
> > + */
> > + task_unlink_from_dsq(p, donor_dsq);
> > + p->scx.dsq = NULL;
> > + dispatch_enqueue(sch, donee_dsq, p, SCX_ENQ_NESTED);
>
> Are we racing with dispatch_dequeue() and the holding_cpu dancing here?
>
> If I read correctly, dispatch_dequeue() reads p->scx.dsq without holding
> the lock, then acquires the lock on that DSQ, but between the read and lock
> acquisition, the load balancer can move the task to a different DSQ.
>
> Maybe we should change dispatch_dequeue() as well to verify after locking
> that we locked the correct DSQ, and retry if the task was moved.
Right, this is a bug. The LB should hold the source rq lock too. Let me
update the code and add a lockdep annotation.
Thanks.
--
tejun
next prev parent reply other threads:[~2025-11-10 19:21 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 18:30 [PATCHSET sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-09 18:31 ` [PATCH 01/13] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10 6:57 ` Andrea Righi
2025-11-10 16:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 02/13] sched_ext: Make slice values tunable and use shorter slice " Tejun Heo
2025-11-10 7:03 ` Andrea Righi
2025-11-10 7:59 ` Andrea Righi
2025-11-10 16:21 ` Tejun Heo
2025-11-10 16:22 ` Tejun Heo
2025-11-10 8:22 ` Andrea Righi
2025-11-11 14:57 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 03/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10 7:21 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 04/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10 7:42 ` Andrea Righi
2025-11-10 16:42 ` Tejun Heo
2025-11-10 17:30 ` Andrea Righi
2025-11-11 15:31 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 05/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-10 7:45 ` Andrea Righi
2025-11-11 15:34 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 06/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10 8:20 ` Andrea Righi
2025-11-10 18:51 ` Tejun Heo
2025-11-11 15:46 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10 8:28 ` Andrea Righi
2025-11-11 15:48 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10 8:29 ` Andrea Righi
2025-11-11 15:49 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 09/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10 8:29 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 10/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-10 8:31 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 11/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10 8:36 ` Andrea Righi
2025-11-10 18:44 ` Tejun Heo
2025-11-10 21:06 ` Andrea Righi
2025-11-10 22:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 12/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10 8:37 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-10 9:38 ` Andrea Righi
2025-11-10 19:21 ` Tejun Heo [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-11-11 19:18 [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-11 19:18 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-11 19:30 ` Emil Tsalapatis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRI7SpAS_CQeS-Ph@slm.duckdns.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=etsal@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schatzberg.dan@gmail.com \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox