From: Andrea Righi <arighi@nvidia.com>
To: Kuba Piecuch <jpiecuch@google.com>
Cc: Christian Loehle <christian.loehle@arm.com>,
Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable()
Date: Thu, 9 Apr 2026 16:12:29 +0200 [thread overview]
Message-ID: <adezzWWsgrcGGD3j@gpd4> (raw)
In-Reply-To: <DHONT6QVA0PH.IP79IYRM8PKK@google.com>
On Thu, Apr 09, 2026 at 01:30:55PM +0000, Kuba Piecuch wrote:
> On Thu Apr 9, 2026 at 9:46 AM UTC, Christian Loehle wrote:
> ...
> >>>
> >>> ops.init_task(); /* A new task is created */
> >>> ops.enable(); /* Enable BPF scheduling for the task */
> >>>
> >>> while (task in SCHED_EXT) {
> >>> if (task can migrate)
> >>> ops.select_cpu(); /* Called on wakeup (optimization) */
> >>>
> >>> ops.runnable(); /* Task becomes ready to run */
> >>>
> >>> while (task_is_runnable(task)) {
> >>> if (task is not in a DSQ || task->scx.slice == 0) {
> >>> ops.enqueue(); /* Task can be added to a DSQ */
> >>>
> >>> /* Task property change (i.e., affinity, nice, etc.)? */
> >>> if (sched_change(task)) {
> >>> ops.dequeue(); /* Exiting BPF scheduler custody */
> >>> ops.quiescent();
> >>>
> >>> /* Property change callback, e.g. ops.set_weight() */
> >>>
> >>> ops.runnable();
> >>> continue;
> >>> }
> >>>
> >>> /* Any usable CPU becomes available */
> >>>
> >>> ops.dispatch(); /* Task is moved to a local DSQ */
> >>> ops.dequeue(); /* Exiting BPF scheduler custody */
> > Is this true here? Any dispatch followed by a dequeue?
>
> The comment next to ops.dispatch() says the task is moved to a local DSQ,
> so if we assume that, then I think it will always be followed by ops.dequeue().
> Same if we move the task to the global DSQ.
So, ops.dispatch() is not a "task callback", it's a "CPU callback", invoked when
a CPU becomes available. So having ops.dispatch() here can be a bit confusing.
The intent was to describe the workflow where, once the task is enqueued to a
non-terminal DSQ, then it can be consumed by an ops.dispatch() event and, in
that case, ops.dequeue() is also invoked when the task reaches a terminal DSQ.
Not sure if there's a better way to express this concept in the pseudocode.
>
> Of course, you could do something weird like dispatch the task to a user DSQ,
> in which case there won't be a dequeue and the task won't start running, but
> that's weird enough that I don't think we need to consider it.
Right. For the records, scx_rustland_core does something similar: from
ops.dispatch() it consumes a task from a BPF user ringbuffer, inserts it into a
user DSQ and then consumes the first task from the user DSQ via
scx_bpf_dsq_move_to_local().
But that's a bit of a special use case, due to the unusual user-space scheduling
part. But in this case the pseudocode is still accurate, since the
scx_bpf_dsq_move_to_local() triggers an ops.dequeue().
>
> You could also have a property change racing with the dispatch which would make
> the dispatch fail and not be followed by a dequeue, but again, we need to draw
> the line somewhere.
Yeah, the thing is that sched_change() introduces a lot of different edge cases
that are difficult to represent in the pseudocode. I guess the best we can do is
document in a descriptive manner the concept of "BPF scheduler's custody" and
the fact that a task can temporarily leave the custody when a sched_change()
event happens.
>
> So, in other words, any _successful_ dispatch to a _terminal_ DSQ is always
> followed by a dequeue.
>
> Another case that isn't handled here is direct dispatch to a terminal DSQ from
> ops.enqueue(), where we don't get ops.dispatch() or ops.dequeue() and go
> straight to ops.running(). If any of the above cases should be handled in the
> pseudocode, I'd say it's this one.
Right, in fact it should be: any _successful_ dispatch to a _terminal DSQ_, if
the task was in the BPF scheduler's custody. A direct dispatch to a terminal DSQ
either from ops.select_cpu() or ops.enqueue() doesn't trigger ops.dequeue(),
because the task doesn't enter the BPF scheduler's custody.
Thanks,
-Andrea
next prev parent reply other threads:[~2026-04-09 14:12 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 11:47 [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle Andrea Righi
2026-04-06 14:49 ` Emil Tsalapatis
2026-04-06 19:08 ` Andrea Righi
2026-04-06 18:09 ` Tejun Heo
2026-04-07 9:54 ` Kuba Piecuch
2026-04-07 16:31 ` Andrea Righi
2026-04-08 9:18 ` [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() Kuba Piecuch
2026-04-08 11:28 ` Andrea Righi
2026-04-08 12:40 ` Kuba Piecuch
2026-04-08 13:49 ` Andrea Righi
2026-04-08 14:17 ` Kuba Piecuch
2026-04-08 14:54 ` Andrea Righi
2026-04-09 8:46 ` Kuba Piecuch
2026-04-09 9:38 ` Andrea Righi
2026-04-09 9:46 ` Christian Loehle
2026-04-09 13:30 ` Kuba Piecuch
2026-04-09 14:12 ` Andrea Righi [this message]
2026-04-09 13:51 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adezzWWsgrcGGD3j@gpd4 \
--to=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=emil@etsalapatis.com \
--cc=jpiecuch@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox