From: Tejun Heo <tj@kernel.org>
To: Andrea Righi <arighi@nvidia.com>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Dan Schatzberg <schatzberg.dan@gmail.com>,
Emil Tsalapatis <etsal@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 13/13] sched_ext: Implement load balancer for bypass mode
Date: Mon, 10 Nov 2025 09:21:46 -1000 [thread overview]
Message-ID: <aRI7SpAS_CQeS-Ph@slm.duckdns.org> (raw)
In-Reply-To: <aRGyo6M9AbInZTkb@gpd4>
Hello,
On Mon, Nov 10, 2025 at 10:38:43AM +0100, Andrea Righi wrote:
> > @@ -965,7 +980,9 @@ static void dispatch_enqueue(struct scx_sched *sch, struct scx_dispatch_q *dsq,
> > !RB_EMPTY_NODE(&p->scx.dsq_priq));
> >
> > if (!is_local) {
> > - raw_spin_lock(&dsq->lock);
> > + raw_spin_lock_nested(&dsq->lock,
> > + (enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
> > +
> > if (unlikely(dsq->id == SCX_DSQ_INVALID)) {
> > scx_error(sch, "attempting to dispatch to a destroyed dsq");
> > /* fall back to the global dsq */
>
> Outside the context of the patch we're doing:
>
> /* fall back to the global dsq */
> raw_spin_unlock(&dsq->lock);
> dsq = find_global_dsq(sch, p);
> raw_spin_lock(&dsq->lock);
>
> I think we should we preserve the nested lock annotation also when locking
> the global DSQ and do:
>
> raw_spin_lock_nested(&dsq->lock,
> (enq_flags & SCX_ENQ_NESTED) ? SINGLE_DEPTH_NESTING : 0);
>
> It seems correct either way, but without this I think we could potentially
> trigger false positive lockdep warnings.
That'd be a bug. I'll add an explicit WARN. I don't think falling back to
global DSQ quietly makes sense - e.g. global DSQ is not even consumed in
bypass mode anymore.
> > + /*
> > + * Moving $p from one non-local DSQ to another. The source DSQ
> > + * is already locked. Do an abbreviated dequeue and then perform
> > + * enqueue without unlocking $donor_dsq.
> > + *
> > + * We don't want to drop and reacquire the lock on each
> > + * iteration as @donor_dsq can be very long and potentially
> > + * highly contended. Donee DSQs are less likely to be contended.
> > + * The nested locking is safe as only this LB moves tasks
> > + * between bypass DSQs.
> > + */
> > + task_unlink_from_dsq(p, donor_dsq);
> > + p->scx.dsq = NULL;
> > + dispatch_enqueue(sch, donee_dsq, p, SCX_ENQ_NESTED);
>
> Are we racing with dispatch_dequeue() and the holding_cpu dancing here?
>
> If I read correctly, dispatch_dequeue() reads p->scx.dsq without holding
> the lock, then acquires the lock on that DSQ, but between the read and lock
> acquisition, the load balancer can move the task to a different DSQ.
>
> Maybe we should change dispatch_dequeue() as well to verify after locking
> that we locked the correct DSQ, and retry if the task was moved.
Right, this is a bug. The LB should hold the source rq lock too. Let me
update the code and add a lockdep annotation.
Thanks.
--
tejun
next prev parent reply other threads:[~2025-11-10 19:21 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 18:30 [PATCHSET sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-09 18:31 ` [PATCH 01/13] sched_ext: Don't set ddsp_dsq_id during select_cpu in bypass mode Tejun Heo
2025-11-10 6:57 ` Andrea Righi
2025-11-10 16:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 02/13] sched_ext: Make slice values tunable and use shorter slice " Tejun Heo
2025-11-10 7:03 ` Andrea Righi
2025-11-10 7:59 ` Andrea Righi
2025-11-10 16:21 ` Tejun Heo
2025-11-10 16:22 ` Tejun Heo
2025-11-10 8:22 ` Andrea Righi
2025-11-11 14:57 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 03/13] sched_ext: Refactor do_enqueue_task() local and global DSQ paths Tejun Heo
2025-11-10 7:21 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 04/13] sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode Tejun Heo
2025-11-10 7:42 ` Andrea Righi
2025-11-10 16:42 ` Tejun Heo
2025-11-10 17:30 ` Andrea Righi
2025-11-11 15:31 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 05/13] sched_ext: Simplify breather mechanism with scx_aborting flag Tejun Heo
2025-11-10 7:45 ` Andrea Righi
2025-11-11 15:34 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 06/13] sched_ext: Exit dispatch and move operations immediately when aborting Tejun Heo
2025-11-10 8:20 ` Andrea Righi
2025-11-10 18:51 ` Tejun Heo
2025-11-11 15:46 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool Tejun Heo
2025-11-10 8:28 ` Andrea Righi
2025-11-11 15:48 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 08/13] sched_ext: Refactor lockup handlers into handle_lockup() Tejun Heo
2025-11-10 8:29 ` Andrea Righi
2025-11-11 15:49 ` Dan Schatzberg
2025-11-09 18:31 ` [PATCH 09/13] sched_ext: Make handle_lockup() propagate scx_verror() result Tejun Heo
2025-11-10 8:29 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 10/13] sched_ext: Hook up hardlockup detector Tejun Heo
2025-11-10 8:31 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 11/13] sched_ext: Add scx_cpu0 example scheduler Tejun Heo
2025-11-10 8:36 ` Andrea Righi
2025-11-10 18:44 ` Tejun Heo
2025-11-10 21:06 ` Andrea Righi
2025-11-10 22:08 ` Tejun Heo
2025-11-09 18:31 ` [PATCH 12/13] sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR Tejun Heo
2025-11-10 8:37 ` Andrea Righi
2025-11-09 18:31 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-10 9:38 ` Andrea Righi
2025-11-10 19:21 ` Tejun Heo [this message]
-- strict thread matches above, loose matches on Subject: below --
2025-11-11 19:18 [PATCHSET v3 sched_ext/for-6.19] sched_ext: Improve bypass mode scalability Tejun Heo
2025-11-11 19:18 ` [PATCH 13/13] sched_ext: Implement load balancer for bypass mode Tejun Heo
2025-11-11 19:30 ` Emil Tsalapatis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aRI7SpAS_CQeS-Ph@slm.duckdns.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=etsal@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=schatzberg.dan@gmail.com \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.