From: Andrea Righi <arighi@nvidia.com>
To: Tejun Heo <tj@kernel.org>
Cc: David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Kuba Piecuch <jpiecuch@google.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
Christian Loehle <christian.loehle@arm.com>,
Daniel Hodges <hodgesd@meta.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Fix ops.dequeue() semantics
Date: Wed, 11 Feb 2026 23:34:54 +0100 [thread overview]
Message-ID: <aY0EDpFD1FSrigB6@gpd4> (raw)
In-Reply-To: <aYzc7ZVkSrtdeHnA@slm.duckdns.org>
On Wed, Feb 11, 2026 at 09:47:57AM -1000, Tejun Heo wrote:
> Hello,
>
> On Wed, Feb 11, 2026 at 05:06:20PM +0100, Andrea Righi wrote:
> ...
> > > Please use () do clarify ordering between & and &&. It's just visually
> > > confusing. I wonder whether it'd be cleaner to make it take @dsq instead of
> > > @dsq_id and then it can just do:
> > >
> > > return dsq->id == SCX_DSQ_LOCAL || dsq->id == SCX_DSQ_GLOBAL;
> > >
> > > because SCX_DSQ_LOCAL_ON is only used as the designator not as actual DSQ
> > > id, and the above code positively identifies what's terminal.
> >
> > Ok, but we also need to include SCX_DSQ_BYPASS, in that case maybe checking
> > SCX_DSQ_FLAG_BUILTIN is more generic?
>
> Ah, forgot about that. Hmm... we can do:
>
> switch (dsq->id) {
> case SCX_DSQ_LOCAL:
> case SCX_DSQ_GLOBAL:
> case SCX_DSQ_BYPASS:
> return true;
> default:
> return false;
> }
>
> I just feel iffy about not being specific. Easier to make mistakes in the
> future and more difficult to notice after doing so, but I think this point
> is kinda moot. If we break up LOCAL and GLOBAL/BYPASS handling into separate
> paths in dispatch_enqueue(), we won't need this function anyway.
Ack, makes sense.
>
> > > > @@ -1524,6 +1590,12 @@ static void ops_dequeue(struct rq *rq, struct task_struct *p, u64 deq_flags)
> > > >
> > > > switch (opss & SCX_OPSS_STATE_MASK) {
> > > > case SCX_OPSS_NONE:
> > > > + /*
> > > > + * If the task is still in BPF scheduler's custody
> > > > + * (%SCX_TASK_IN_CUSTODY is set) call ops.dequeue().
> > > > + */
> > > > + if (p->scx.flags & SCX_TASK_IN_CUSTODY)
> > > > + call_task_dequeue(sch, rq, p, deq_flags, true);
> > >
> > > Hmm... why is this path necessary? Shouldn't the one that cleared OPSS be
> > > responsible for clearing IN_CUSTODY too?
> >
> > The path that clears OPSS to NONE doesn't always clear IN_CUSTODY: in
> > dispatch_to_local_dsq(), when we're moving a task that was in DISPATCHING
> > to a remote CPU's local DSQ, we only set ops_state to NONE, so a concurrent
> > dequeue can proceed, but we only clear IN_CUSTODY when we later enqueue or
> > move the task. So we can see NONE + IN_CUSTODY here and need to handle it.
> > And we can't clear IN_CUSTODY at the same time we set NONE there, because
> > we don't hold the task's rq lock yet and we can't trigger ops.dequeue().
>
> I see. Can you please add a comment with the above?
Ok.
>
> ...
> > > I think a better place to put this would be inside local_dsq_post_enq() so
> > > that dispatch_enqueue() and move_local_task_to_local_dsq() can share the
> > > path. This would mean breaking out local and global cases in
> > > dispatch_enqueue(). ie. at the end of dispatch_enqueue():
> > >
> > > if (is_local) {
> > > local_dsq_post_enq(...);
> > > } else {
> > > if (dsq->id == SCX_DSQ_GLOBAL)
> > > global_dsq_post_enq(...); /* or open code with comment */
> > > raw_spin_unlock(&dsq->lock);
> > > }
> >
> > Agreed, I'll move this into local_dsq_post_enq() and introduce
> > a global_dsq_post_enq().
>
> Yeah, and as you pointed out, BYPASS.
Ok.
>
> > > > +static bool consume_remote_task(struct scx_sched *sch, struct rq *this_rq,
> > > > + struct task_struct *p,
> > > > + struct scx_dispatch_q *dsq, struct rq *src_rq)
> > > > {
> > > > raw_spin_rq_unlock(this_rq);
> > > >
> > > > if (unlink_dsq_and_lock_src_rq(p, dsq, src_rq)) {
> > > > + /*
> > > > + * Task is moving from a non-local DSQ to a local (terminal) DSQ.
> > > > + * Call ops.dequeue() if the task was in BPF custody.
> > > > + */
> > > > + if (p->scx.flags & SCX_TASK_IN_CUSTODY)
> > > > + call_task_dequeue(sch, src_rq, p, 0, false);
> > >
> > > and this shouldn't be necessary. move_remote_task_to_local_dsq() deactivates
> > > and reactivates the task. The deactivation invokes ops_dequeue() but that
> > > should suppress dequeue invocation as that's internal transfer (this is
> > > discernable from p->on_rq being set to TASK_ON_RQ_MIGRATING) and when it
> > > gets enqueued on the target CPU, dispatch_enqueue() on the local DSQ should
> > > trigger dequeue invocation, right?
> >
> > Should we trigger ops.dequeue() when the task is dequeued inside
> > move_remote_task_to_local_dsq() (in ops_dequeue() on the path triggered by
> > deactivate_task() there) instead of suppressing it and invoking on the
> > target in local_dsq_post_enq()?
> >
> > That way the BPF sees dequeue on the source and then enqueue on the target,
> > we avoid special-casing SCX_TASK_IN_CUSTODY in do_enqueue_task() and the
> > "when to call dequeue" logic stays consistent in ops_dequeue and the
> > terminal local/global post_enq paths.
> >
> > Does it make sense or would you rather suppress it and only invoke on the
> > target when the task lands on the local DSQ??
>
> The end result is about the same because whenever we migrate we're sending
> it to the local DSQ of the destination CPU, so whether we generate the event
> on deactivation of the source CPU or activation on the destination doesn't
> make *whole* lot of difference. However, conceptually, migrations are
> internal events. There isn't anything actionable for the BPF scheduler. The
> reason why ops.dequeue() should be emitted is not because the task is
> changing CPUs (which caused the deactivation) but the fact that it ends up
> in a local DSQ afterwards. I think it'll be cleaner both conceptually and
> code-wise to emit ops.dequeue() only from dispatch_enqueue() and dequeue
> paths.
Does this include core scheduler migrations or just SCX-initiated
migrations (move_remote_task_to_local_dsq())?
Because with core scheduler migrations we trigger ops.enqueue(), so we
should also trigger ops.dequeue(). Or we need to send the task straight to
local to prevent calling ops.enqueue().
Thanks,
-Andrea
next prev parent reply other threads:[~2026-02-11 22:35 UTC|newest]
Thread overview: 83+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-10 21:26 [PATCHSET v8] sched_ext: Fix ops.dequeue() semantics Andrea Righi
2026-02-10 21:26 ` [PATCH 1/2] " Andrea Righi
2026-02-10 23:20 ` Tejun Heo
2026-02-11 16:06 ` Andrea Righi
2026-02-11 19:47 ` Tejun Heo
2026-02-11 22:34 ` Andrea Righi [this message]
2026-02-11 22:37 ` Tejun Heo
2026-02-11 22:48 ` Andrea Righi
2026-02-12 10:16 ` Andrea Righi
2026-02-12 14:32 ` Christian Loehle
2026-02-12 15:45 ` Andrea Righi
2026-02-12 17:07 ` Tejun Heo
2026-02-12 18:14 ` Andrea Righi
2026-02-12 18:35 ` Tejun Heo
2026-02-12 22:30 ` Andrea Righi
2026-02-14 10:16 ` Andrea Righi
2026-02-14 17:56 ` Tejun Heo
2026-02-14 19:32 ` Andrea Righi
2026-02-10 23:54 ` Tejun Heo
2026-02-11 16:07 ` Andrea Righi
2026-02-10 21:26 ` [PATCH 2/2] selftests/sched_ext: Add test to validate " Andrea Righi
2026-02-12 17:15 ` Christian Loehle
2026-02-12 18:25 ` Andrea Righi
-- strict thread matches above, loose matches on Subject: below --
2026-02-06 13:54 [PATCHSET v7] sched_ext: Fix " Andrea Righi
2026-02-06 13:54 ` [PATCH 1/2] " Andrea Righi
2026-02-06 20:35 ` Emil Tsalapatis
2026-02-07 9:26 ` Andrea Righi
2026-02-09 17:28 ` Tejun Heo
2026-02-09 19:06 ` Andrea Righi
2026-02-05 15:32 [PATCHSET v6] " Andrea Righi
2026-02-05 15:32 ` [PATCH 1/2] " Andrea Righi
2026-02-05 19:29 ` Kuba Piecuch
2026-02-05 21:32 ` Andrea Righi
2026-02-04 16:05 [PATCHSET v5] " Andrea Righi
2026-02-04 16:05 ` [PATCH 1/2] " Andrea Righi
2026-02-04 22:14 ` Tejun Heo
2026-02-05 9:26 ` Andrea Righi
2026-02-01 9:08 [PATCHSET v4 sched_ext/for-6.20] " Andrea Righi
2026-02-01 9:08 ` [PATCH 1/2] " Andrea Righi
2026-02-01 22:47 ` Christian Loehle
2026-02-02 7:45 ` Andrea Righi
2026-02-02 9:26 ` Andrea Righi
2026-02-02 10:02 ` Christian Loehle
2026-02-02 15:32 ` Andrea Righi
2026-02-02 10:09 ` Christian Loehle
2026-02-02 13:59 ` Kuba Piecuch
2026-02-04 9:36 ` Andrea Righi
2026-02-04 9:51 ` Kuba Piecuch
2026-02-02 11:56 ` Kuba Piecuch
2026-02-04 10:11 ` Andrea Righi
2026-02-04 10:33 ` Kuba Piecuch
2026-01-26 8:41 [PATCHSET v3 sched_ext/for-6.20] " Andrea Righi
2026-01-26 8:41 ` [PATCH 1/2] " Andrea Righi
2026-01-27 16:38 ` Emil Tsalapatis
2026-01-27 16:41 ` Kuba Piecuch
2026-01-30 7:34 ` Andrea Righi
2026-01-30 13:14 ` Kuba Piecuch
2026-01-31 6:54 ` Andrea Righi
2026-01-31 16:45 ` Kuba Piecuch
2026-01-31 17:24 ` Andrea Righi
2026-01-28 21:21 ` Tejun Heo
2026-01-30 11:54 ` Kuba Piecuch
2026-01-31 9:02 ` Andrea Righi
2026-01-31 17:53 ` Kuba Piecuch
2026-01-31 20:26 ` Andrea Righi
2026-02-02 15:19 ` Tejun Heo
2026-02-02 15:30 ` Andrea Righi
2026-02-01 17:43 ` Tejun Heo
2026-02-02 15:52 ` Andrea Righi
2026-02-02 16:23 ` Kuba Piecuch
2026-01-21 12:25 [PATCHSET v2 sched_ext/for-6.20] " Andrea Righi
2026-01-21 12:25 ` [PATCH 1/2] " Andrea Righi
2026-01-21 12:54 ` Christian Loehle
2026-01-21 12:57 ` Andrea Righi
2026-01-22 9:28 ` Kuba Piecuch
2026-01-23 13:32 ` Andrea Righi
2025-12-19 22:43 [PATCH 0/2] sched_ext: Implement proper " Andrea Righi
2025-12-19 22:43 ` [PATCH 1/2] sched_ext: Fix " Andrea Righi
2025-12-28 3:20 ` Emil Tsalapatis
2025-12-29 16:36 ` Andrea Righi
2025-12-29 18:35 ` Emil Tsalapatis
2025-12-28 17:19 ` Tejun Heo
2025-12-28 23:28 ` Tejun Heo
2025-12-28 23:38 ` Tejun Heo
2025-12-29 17:07 ` Andrea Righi
2025-12-29 18:55 ` Emil Tsalapatis
2025-12-28 23:42 ` Tejun Heo
2025-12-29 17:17 ` Andrea Righi
2025-12-29 0:06 ` Tejun Heo
2025-12-29 18:56 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aY0EDpFD1FSrigB6@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=emil@etsalapatis.com \
--cc=hodgesd@meta.com \
--cc=jpiecuch@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.