From: Kuba Piecuch <jpiecuch@google.com>
To: Andrea Righi <arighi@nvidia.com>, Kuba Piecuch <jpiecuch@google.com>
Cc: Tejun Heo <tj@kernel.org>, David Vernet <void@manifault.com>,
Changwoo Min <changwoo@igalia.com>,
Christian Loehle <christian.loehle@arm.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
<sched-ext@lists.linux.dev>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable()
Date: Wed, 08 Apr 2026 14:17:03 +0000 [thread overview]
Message-ID: <DHNU5YSWABWC.2B8Q38OKJGVDR@google.com> (raw)
In-Reply-To: <adZc8caEfOZw8TLE@gpd4>
On Wed Apr 8, 2026 at 1:49 PM UTC, Andrea Righi wrote:
> On Wed, Apr 08, 2026 at 12:40:09PM +0000, Kuba Piecuch wrote:
>> Hi Andrea,
>>
>> On Wed Apr 8, 2026 at 11:28 AM UTC, Andrea Righi wrote:
>> ...
>> >
>> > Looks good, but I noticed another issue, should we also change the condition up
>> > above as following?
>> >
>> > Documentation/scheduler/sched-ext.rst | 2 +-
>> > 1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
>> > index 29d36e248f58b..99df4cc982375 100644
>> > --- a/Documentation/scheduler/sched-ext.rst
>> > +++ b/Documentation/scheduler/sched-ext.rst
>> > @@ -423,7 +423,7 @@ by a sched_ext scheduler:
>> > ops.runnable(); /* Task becomes ready to run */
>> >
>> > while (task_is_runnable(task)) {
>> > - if (task is not in a DSQ && task->scx.slice == 0) {
>> > + if (task is not in a DSQ || task->scx.slice == 0) {
>> > ops.enqueue(); /* Task can be added to a DSQ */
>> >
>> > /* Task property change (i.e., affinity, nice, etc.)? */
>> >
>> > Because we trigger ops.enqueue() when the task expired its time slice or it
>> > becomes runnable and has not been added to a DSQ.
>> >
>> > This also represents correctly the sched_change() scenario: a task being
>> > re-enqueued after sched_change() still has its time slice > 0, but we need to
>> > call ops.enqueue() for it.
>>
>> I agree that the condition should be changed, but I'm not sure that this is
>> what it should look like.
>>
>> Is the "task is not in a DSQ" part of the condition there to handle direct
>> dispatch? Apart from direct dispatch from ops.select_cpu(), I wasn't able to
>> come up with a situation where we would reach this condition with the task
>> present on some DSQ.
>
> The intent is to represent the direct dispatch from ops.select_cpu(), since in
> that case ops.enqueue() is skipped.
>
> Honestly I think if we change the && to || in that condition, everything should
> be pretty accurate.
In the case of direct dispatch from ops.select_cpu() we don't invoke
ops.dispatch() and ops.dequeue() before ops.running(), right? The current
pseudocode calls them unconditionally.
Another inaccuracy not related to direct dispatch: property changes can occur
while a task is running, while the psedocode only allows for property changes
while a task is queued.
There's also preemption by a higher sched class, which is not covered in the
loop condition (task_is_runnable(task) && task->scx.slice > 0), unless we take
task_is_runnable() to return false if there's a higher-priority sched class
with runnable tasks on the CPU, though that would be in conflict with the
actual implementation of task_is_runnable() in include/linux/sched.h.
>
>>
>> A more general comment about the pseudocode: I think it can be useful to
>> introduce someone new to the general flow of the callbacks in sched_ext,
>> but the documentation should be clear that this is a simplified view that
>> makes assumptions about the behavior of the BPF scheduler itself (flags like
>> SCX_OPS_ENQ_LAST, whether the scheduler uses direct dispatch), as well as
>> the overall system (Can sched_ext be preempted by a higher-priority sched
>> class? Can scheduling properties of a task be changed while it's running?)
>> Without stating these assumptions clearly, we risk leaving the reader falsely
>> believing they have a complete understanding.
>
> Of course this schema is not a complete representation of the entire sched_ext
> state machine, if we put everything it'd become too big and complex. I think we
> should just cover the most common use cases here. Maybe we can clarify this in
> the description before this diagram.
Let's agree on what inaccuracies need to be fixed and I'll send a v2 with fixes
and attach an appropriate disclaimer to the pseudocode.
next prev parent reply other threads:[~2026-04-08 14:17 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-06 11:47 [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle Andrea Righi
2026-04-06 14:49 ` Emil Tsalapatis
2026-04-06 19:08 ` Andrea Righi
2026-04-06 18:09 ` Tejun Heo
2026-04-07 9:54 ` Kuba Piecuch
2026-04-07 16:31 ` Andrea Righi
2026-04-08 9:18 ` [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() Kuba Piecuch
2026-04-08 11:28 ` Andrea Righi
2026-04-08 12:40 ` Kuba Piecuch
2026-04-08 13:49 ` Andrea Righi
2026-04-08 14:17 ` Kuba Piecuch [this message]
2026-04-08 14:54 ` Andrea Righi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DHNU5YSWABWC.2B8Q38OKJGVDR@google.com \
--to=jpiecuch@google.com \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=emil@etsalapatis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=tj@kernel.org \
--cc=void@manifault.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox