From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Kuba Piecuch <jpiecuch@google.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
Christian Loehle <christian.loehle@arm.com>,
Daniel Hodges <hodgesd@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
Date: Wed, 11 Feb 2026 23:34:54 +0100 [thread overview]
Message-ID: <aY0EDpFD1FSrigB6@gpd4> (raw)
In-Reply-To: <aYzc7ZVkSrtdeHnA@slm.duckdns.org>
On Wed, Feb 11, 2026 at 09:47:57AM -1000, Tejun Heo wrote:
> Hello,
>
> On Wed, Feb 11, 2026 at 05:06:20PM +0100, Andrea Righi wrote:
> ...
> > > Please use () do clarify ordering between & and &&. It's just visually
> > > confusing. I wonder whether it'd be cleaner to make it take @dsq instead of
> > > @dsq_id and then it can just do:
> > >
> > > return dsq->id == SCX_DSQ_LOCAL || dsq->id == SCX_DSQ_GLOBAL;
> > >
> > > because SCX_DSQ_LOCAL_ON is only used as the designator not as actual DSQ
> > > id, and the above code positively identifies what's terminal.
> >
> > Ok, but we also need to include SCX_DSQ_BYPASS, in that case maybe checking
> > SCX_DSQ_FLAG_BUILTIN is more generic?
>
> Ah, forgot about that. Hmm... we can do:
>
> switch (dsq->id) {
> case SCX_DSQ_LOCAL:
> case SCX_DSQ_GLOBAL:
> case SCX_DSQ_BYPASS:
> return true;
> default:
> return false;
> }
>
> I just feel iffy about not being specific. Easier to make mistakes in the
> future and more difficult to notice after doing so, but I think this point
> is kinda moot. If we break up LOCAL and GLOBAL/BYPASS handling into separate
> paths in dispatch_enqueue(), we won't need this function anyway.
Ack, makes sense.
>
> > > > @@ -1524,6 +1590,12 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags)
> > > >
> > > > switch (opss & SCX_OPSS_STATE_MASK) {
> > > > case SCX_OPSS_NONE:
> > > > + /*
> > > > + * If the task is still in BPF scheduler's custody
> > > > + * (%SCX_TASK_IN_CUSTODY is set) call ops.dequeue().
> > > > + */
> > > > + if (p->scx.flags & SCX_TASK_IN_CUSTODY)
> > > > + call_task_dequeue(sch, rq, p, deq_flags, true);
> > >
> > > Hmm... why is this path necessary? Shouldn't the one that cleared OPSS be
> > > responsible for clearing IN_CUSTODY too?
> >
> > The path that clears OPSS to NONE doesn't always clear IN_CUSTODY: in
> > dispatch_to_local_dsq(), when we're moving a task that was in DISPATCHING
> > to a remote CPU's local DSQ, we only set ops_state to NONE, so a concurrent
> > dequeue can proceed, but we only clear IN_CUSTODY when we later enqueue or
> > move the task. So we can see NONE + IN_CUSTODY here and need to handle it.
> > And we can't clear IN_CUSTODY at the same time we set NONE there, because
> > we don't hold the task's rq lock yet and we can't trigger ops.dequeue().
>
> I see. Can you please add a comment with the above?
Ok.
>
> ...
> > > I think a better place to put this would be inside local_dsq_post_enq() so
> > > that dispatch_enqueue() and move_local_task_to_local_dsq() can share the
> > > path. This would mean breaking out local and global cases in
> > > dispatch_enqueue(). ie. at the end of dispatch_enqueue():
> > >
> > > if (is_local) {
> > > local_dsq_post_enq(...);
> > > } else {
> > > if (dsq->id == SCX_DSQ_GLOBAL)
> > > global_dsq_post_enq(...); /* or open code with comment */
> > > raw_spin_unlock(&dsq->lock);
> > > }
> >
> > Agreed, I'll move this into local_dsq_post_enq() and introduce
> > a global_dsq_post_enq().
>
> Yeah, and as you pointed out, BYPASS.
Ok.
>
> > > > +static bool consume_remote_task(struct scx_sched *sch, struct rq *this_rq,
> > > > + struct task_struct *p,
> > > > + struct scx_dispatch_q *dsq, struct rq *src_rq)
> > > > {
> > > > raw_spin_rq_unlock(this_rq);
> > > >
> > > > if (unlink_dsq_and_lock_src_rq(p, dsq, src_rq)) {
> > > > + /*
> > > > + * Task is moving from a non-local DSQ to a local (terminal) DSQ.
> > > > + * Call ops.dequeue() if the task was in BPF custody.
> > > > + */
> > > > + if (p->scx.flags & SCX_TASK_IN_CUSTODY)
> > > > + call_task_dequeue(sch, src_rq, p, 0, false);
> > >
> > > and this shouldn't be necessary. move_remote_task_to_local_dsq() deactivates
> > > and reactivates the task. The deactivation invokes ops_dequeue() but that
> > > should suppress dequeue invocation as that's internal transfer (this is
> > > discernable from p->on_rq being set to TASK_ON_RQ_MIGRATING) and when it
> > > gets enqueued on the target CPU, dispatch_enqueue() on the local DSQ should
> > > trigger dequeue invocation, right?
> >
> > Should we trigger ops.dequeue() when the task is dequeued inside
> > move_remote_task_to_local_dsq() (in ops_dequeue() on the path triggered by
> > deactivate_task() there) instead of suppressing it and invoking on the
> > target in local_dsq_post_enq()?
> >
> > That way the BPF sees dequeue on the source and then enqueue on the target,
> > we avoid special-casing SCX_TASK_IN_CUSTODY in do_enqueue_task() and the
> > "when to call dequeue" logic stays consistent in ops_dequeue and the
> > terminal local/global post_enq paths.
> >
> > Does it make sense or would you rather suppress it and only invoke on the
> > target when the task lands on the local DSQ??
>
> The end result is about the same because whenever we migrate we're sending
> it to the local DSQ of the destination CPU, so whether we generate the event
> on deactivation of the source CPU or activation on the destination doesn't
> make *whole* lot of difference. However, conceptually, migrations are
> internal events. There isn't anything actionable for the BPF scheduler. The
> reason why ops.dequeue() should be emitted is not because the task is
> changing CPUs (which caused the deactivation) but the fact that it ends up
> in a local DSQ afterwards. I think it'll be cleaner both conceptually and
> code-wise to emit ops.dequeue() only from dispatch_enqueue() and dequeue
> paths.
Does this include core scheduler migrations or just SCX-initiated
migrations (move_remote_task_to_local_dsq())?
Because with core scheduler migrations we trigger ops.enqueue(), so we
should also trigger ops.dequeue(). Or we need to send the task straight to
local to prevent calling ops.enqueue().
Thanks,
-Andrea
next prev parent reply other threads:[~2026-02-11 22:35 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 21:26 [PATCHSET v8] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-02-10 21:26 ` [PATCH 1/2] " Andrea Righi
2026-02-10 23:20 ` Tejun Heo
2026-02-11 16:06 ` Andrea Righi
2026-02-11 19:47 ` Tejun Heo
2026-02-11 22:34 ` Andrea Righi [this message]
2026-02-11 22:37 ` Tejun Heo
2026-02-11 22:48 ` Andrea Righi
2026-02-12 10:16 ` Andrea Righi
2026-02-12 14:32 ` Christian Loehle
2026-02-12 15:45 ` Andrea Righi
2026-02-12 17:07 ` Tejun Heo
2026-02-12 18:14 ` Andrea Righi
2026-02-12 18:35 ` Tejun Heo
2026-02-12 22:30 ` Andrea Righi
2026-02-14 10:16 ` Andrea Righi
2026-02-14 17:56 ` Tejun Heo
2026-02-14 19:32 ` Andrea Righi
2026-02-10 23:54 ` Tejun Heo
2026-02-11 16:07 ` Andrea Righi
2026-02-10 21:26 ` [PATCH 2/2] selftests/sched_ext: Add test to validate " Andrea Righi
2026-02-12 17:15 ` Christian Loehle
2026-02-12 18:25 ` Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2026-02-06 13:54 [PATCHSET v7] sched_ext: Fix " Andrea Righi
2026-02-06 13:54 ` [PATCH 1/2] " Andrea Righi
2026-02-06 20:35 ` Emil Tsalapatis
2026-02-07 9:26 ` Andrea Righi
2026-02-09 17:28 ` Tejun Heo
2026-02-09 19:06 ` Andrea Righi
2026-02-05 15:32 [PATCHSET v6] " Andrea Righi
2026-02-05 15:32 ` [PATCH 1/2] " Andrea Righi
2026-02-05 19:29 ` Kuba Piecuch
2026-02-05 21:32 ` Andrea Righi
2026-02-04 16:05 [PATCHSET v5] " Andrea Righi
2026-02-04 16:05 ` [PATCH 1/2] " Andrea Righi
2026-02-04 22:14 ` Tejun Heo
2026-02-05 9:26 ` Andrea Righi
2026-02-01 9:08 [PATCHSET v4 sched_ext/for-6.20] " Andrea Righi
2026-02-01 9:08 ` [PATCH 1/2] " Andrea Righi
2026-02-01 22:47 ` Christian Loehle
2026-02-02 7:45 ` Andrea Righi
2026-02-02 9:26 ` Andrea Righi
2026-02-02 10:02 ` Christian Loehle
2026-02-02 15:32 ` Andrea Righi
2026-02-02 10:09 ` Christian Loehle
2026-02-02 13:59 ` Kuba Piecuch
2026-02-04 9:36 ` Andrea Righi
2026-02-04 9:51 ` Kuba Piecuch
2026-02-02 11:56 ` Kuba Piecuch
2026-02-04 10:11 ` Andrea Righi
2026-02-04 10:33 ` Kuba Piecuch
2026-01-26 8:41 [PATCHSET v3 sched_ext/for-6.20] " Andrea Righi
2026-01-26 8:41 ` [PATCH 1/2] " Andrea Righi
2026-01-27 16:38 ` Emil Tsalapatis
2026-01-27 16:41 ` Kuba Piecuch
2026-01-30 7:34 ` Andrea Righi
2026-01-30 13:14 ` Kuba Piecuch
2026-01-31 6:54 ` Andrea Righi
2026-01-31 16:45 ` Kuba Piecuch
2026-01-31 17:24 ` Andrea Righi
2026-01-28 21:21 ` Tejun Heo
2026-01-30 11:54 ` Kuba Piecuch
2026-01-31 9:02 ` Andrea Righi
2026-01-31 17:53 ` Kuba Piecuch
2026-01-31 20:26 ` Andrea Righi
2026-02-02 15:19 ` Tejun Heo
2026-02-02 15:30 ` Andrea Righi
2026-02-01 17:43 ` Tejun Heo
2026-02-02 15:52 ` Andrea Righi
2026-02-02 16:23 ` Kuba Piecuch
2026-01-21 12:25 [PATCHSET v2 sched_ext/for-6.20] " Andrea Righi
2026-01-21 12:25 ` [PATCH 1/2] " Andrea Righi
2026-01-21 12:54 ` Christian Loehle
2026-01-21 12:57 ` Andrea Righi
2026-01-22 9:28 ` Kuba Piecuch
2026-01-23 13:32 ` Andrea Righi
2025-12-19 22:43 [PATCH 0/2] sched_ext: Implement proper " Andrea Righi
2025-12-19 22:43 ` [PATCH 1/2] sched_ext: Fix " Andrea Righi
2025-12-28 3:20 ` Emil Tsalapatis
2025-12-29 16:36 ` Andrea Righi
2025-12-29 18:35 ` Emil Tsalapatis
2025-12-28 17:19 ` Tejun Heo
2025-12-28 23:28 ` Tejun Heo
2025-12-28 23:38 ` Tejun Heo
2025-12-29 17:07 ` Andrea Righi
2025-12-29 18:55 ` Emil Tsalapatis
2025-12-28 23:42 ` Tejun Heo
2025-12-29 17:17 ` Andrea Righi
2025-12-29 0:06 ` Tejun Heo
2025-12-29 18:56 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aY0EDpFD1FSrigB6@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=emil@etsalapatis.com \
--cc=hodgesd@meta.com \
--cc=jpiecuch@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox