* [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle
@ 2026-04-06 11:47 Andrea Righi
2026-04-06 14:49 ` Emil Tsalapatis
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Andrea Righi @ 2026-04-06 11:47 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Changwoo Min
Cc: Christian Loehle, Kuba Piecuch, Emil Tsalapatis, sched-ext,
linux-kernel
Document ops.dequeue() in the sched_ext task lifecycle now that its
semantics are well-defined.
Also update the pseudo-code to use task_is_runnable() consistently and
clarify the case where ops.dispatch() does not refill the time slice.
Fixes: ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics")
Signed-off-by: Andrea Righi <arighi@nvidia.com>
---
Documentation/scheduler/sched-ext.rst | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst
index 404b4e4c33f7e..9f03650abfeba 100644
--- a/Documentation/scheduler/sched-ext.rst
+++ b/Documentation/scheduler/sched-ext.rst
@@ -422,23 +422,29 @@ by a sched_ext scheduler:
ops.runnable(); /* Task becomes ready to run */
- while (task is runnable) {
+ while (task_is_runnable(task)) {
if (task is not in a DSQ && task->scx.slice == 0) {
ops.enqueue(); /* Task can be added to a DSQ */
- /* Any usable CPU becomes available */
+ /* Task property change (i.e., affinity, nice, etc.)? */
+ if (sched_change(task)) {
+ ops.dequeue(); /* Exiting BPF scheduler custody */
+ continue;
+ }
+ }
- ops.dispatch(); /* Task is moved to a local DSQ */
+ /* Any usable CPU becomes available */
+
+ ops.dispatch(); /* Task is moved to a local DSQ */
+ ops.dequeue(); /* Exiting BPF scheduler custody */
- ops.dequeue(); /* Exiting BPF scheduler */
- }
ops.running(); /* Task starts running on its assigned CPU */
- while task_is_runnable(p) {
- while (task->scx.slice > 0 && task_is_runnable(p))
- ops.tick(); /* Called every 1/HZ seconds */
+ while (task_is_runnable(task) && task->scx.slice > 0) {
+ ops.tick(); /* Called every 1/HZ seconds */
- ops.dispatch(); /* task->scx.slice can be refilled */
+ if (task->scx.slice == 0)
+ ops.dispatch(); /* task->scx.slice can be refilled */
}
ops.stopping(); /* Task stops running (time slice expires or wait) */
--
2.53.0
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle 2026-04-06 11:47 [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle Andrea Righi @ 2026-04-06 14:49 ` Emil Tsalapatis 2026-04-06 19:08 ` Andrea Righi 2026-04-06 18:09 ` Tejun Heo 2026-04-07 9:54 ` Kuba Piecuch 2 siblings, 1 reply; 12+ messages in thread From: Emil Tsalapatis @ 2026-04-06 14:49 UTC (permalink / raw) To: Andrea Righi, Tejun Heo, David Vernet, Changwoo Min Cc: Christian Loehle, Kuba Piecuch, Emil Tsalapatis, sched-ext, linux-kernel On Mon Apr 6, 2026 at 7:47 AM EDT, Andrea Righi wrote: > Document ops.dequeue() in the sched_ext task lifecycle now that its > semantics are well-defined. > > Also update the pseudo-code to use task_is_runnable() consistently and > clarify the case where ops.dispatch() does not refill the time slice. > > Fixes: ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics") Is the Fixes: tag appropriate here? It's not like the original patch introduced a bug by fixing ops.dequeue(). Otherwise the state machine looks fine to me! Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> > Signed-off-by: Andrea Righi <arighi@nvidia.com> > --- > Documentation/scheduler/sched-ext.rst | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > index 404b4e4c33f7e..9f03650abfeba 100644 > --- a/Documentation/scheduler/sched-ext.rst > +++ b/Documentation/scheduler/sched-ext.rst > @@ -422,23 +422,29 @@ by a sched_ext scheduler: > > ops.runnable(); /* Task becomes ready to run */ > > - while (task is runnable) { > + while (task_is_runnable(task)) { > if (task is not in a DSQ && task->scx.slice == 0) { > ops.enqueue(); /* Task can be added to a DSQ */ > > - /* Any usable CPU becomes available */ > + /* Task property change (i.e., affinity, nice, etc.)? */ > + if (sched_change(task)) { > + ops.dequeue(); /* Exiting BPF scheduler custody */ > + continue; > + } > + } > > - ops.dispatch(); /* Task is moved to a local DSQ */ > + /* Any usable CPU becomes available */ > + > + ops.dispatch(); /* Task is moved to a local DSQ */ > + ops.dequeue(); /* Exiting BPF scheduler custody */ > > - ops.dequeue(); /* Exiting BPF scheduler */ > - } > ops.running(); /* Task starts running on its assigned CPU */ > > - while task_is_runnable(p) { > - while (task->scx.slice > 0 && task_is_runnable(p)) > - ops.tick(); /* Called every 1/HZ seconds */ > + while (task_is_runnable(task) && task->scx.slice > 0) { > + ops.tick(); /* Called every 1/HZ seconds */ > > - ops.dispatch(); /* task->scx.slice can be refilled */ > + if (task->scx.slice == 0) > + ops.dispatch(); /* task->scx.slice can be refilled */ > } > > ops.stopping(); /* Task stops running (time slice expires or wait) */ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle 2026-04-06 14:49 ` Emil Tsalapatis @ 2026-04-06 19:08 ` Andrea Righi 0 siblings, 0 replies; 12+ messages in thread From: Andrea Righi @ 2026-04-06 19:08 UTC (permalink / raw) To: Emil Tsalapatis Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle, Kuba Piecuch, sched-ext, linux-kernel Hi Emil, On Mon, Apr 06, 2026 at 10:49:18AM -0400, Emil Tsalapatis wrote: > On Mon Apr 6, 2026 at 7:47 AM EDT, Andrea Righi wrote: > > Document ops.dequeue() in the sched_ext task lifecycle now that its > > semantics are well-defined. > > > > Also update the pseudo-code to use task_is_runnable() consistently and > > clarify the case where ops.dispatch() does not refill the time slice. > > > > Fixes: ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics") > > Is the Fixes: tag appropriate here? It's not like the original patch > introduced a bug by fixing ops.dequeue(). Yeah, the intent here was to make sure this commit isn't applied without ebf1ccff79c4 (otherwise the state machine would be inaccurate), but that shouldn't happen, so it's probably reasonable to drop the Fixes line. Thanks, -Andrea > > Otherwise the state machine looks fine to me! > > Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> > > > Signed-off-by: Andrea Righi <arighi@nvidia.com> > > --- > > Documentation/scheduler/sched-ext.rst | 24 +++++++++++++++--------- > > 1 file changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > > index 404b4e4c33f7e..9f03650abfeba 100644 > > --- a/Documentation/scheduler/sched-ext.rst > > +++ b/Documentation/scheduler/sched-ext.rst > > @@ -422,23 +422,29 @@ by a sched_ext scheduler: > > > > ops.runnable(); /* Task becomes ready to run */ > > > > - while (task is runnable) { > > + while (task_is_runnable(task)) { > > if (task is not in a DSQ && task->scx.slice == 0) { > > ops.enqueue(); /* Task can be added to a DSQ */ > > > > - /* Any usable CPU becomes available */ > > + /* Task property change (i.e., affinity, nice, etc.)? */ > > + if (sched_change(task)) { > > + ops.dequeue(); /* Exiting BPF scheduler custody */ > > + continue; > > + } > > + } > > > > - ops.dispatch(); /* Task is moved to a local DSQ */ > > + /* Any usable CPU becomes available */ > > + > > + ops.dispatch(); /* Task is moved to a local DSQ */ > > + ops.dequeue(); /* Exiting BPF scheduler custody */ > > > > - ops.dequeue(); /* Exiting BPF scheduler */ > > - } > > ops.running(); /* Task starts running on its assigned CPU */ > > > > - while task_is_runnable(p) { > > - while (task->scx.slice > 0 && task_is_runnable(p)) > > - ops.tick(); /* Called every 1/HZ seconds */ > > + while (task_is_runnable(task) && task->scx.slice > 0) { > > + ops.tick(); /* Called every 1/HZ seconds */ > > > > - ops.dispatch(); /* task->scx.slice can be refilled */ > > + if (task->scx.slice == 0) > > + ops.dispatch(); /* task->scx.slice can be refilled */ > > } > > > > ops.stopping(); /* Task stops running (time slice expires or wait) */ > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle 2026-04-06 11:47 [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle Andrea Righi 2026-04-06 14:49 ` Emil Tsalapatis @ 2026-04-06 18:09 ` Tejun Heo 2026-04-07 9:54 ` Kuba Piecuch 2 siblings, 0 replies; 12+ messages in thread From: Tejun Heo @ 2026-04-06 18:09 UTC (permalink / raw) To: Andrea Righi, David Vernet, Changwoo Min Cc: Christian Loehle, Kuba Piecuch, Emil Tsalapatis, sched-ext, linux-kernel > sched_ext: Documentation: Add ops.dequeue() to task lifecycle Applied to sched_ext/for-7.1 with Emil's Reviewed-by added and the Fixes: tag dropped per his comment. Thanks. -- tejun ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle 2026-04-06 11:47 [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle Andrea Righi 2026-04-06 14:49 ` Emil Tsalapatis 2026-04-06 18:09 ` Tejun Heo @ 2026-04-07 9:54 ` Kuba Piecuch 2026-04-07 16:31 ` Andrea Righi 2 siblings, 1 reply; 12+ messages in thread From: Kuba Piecuch @ 2026-04-07 9:54 UTC (permalink / raw) To: Andrea Righi, Tejun Heo, David Vernet, Changwoo Min Cc: Christian Loehle, Kuba Piecuch, Emil Tsalapatis, sched-ext, linux-kernel Hi Andrea, On Mon Apr 6, 2026 at 11:47 AM UTC, Andrea Righi wrote: > Document ops.dequeue() in the sched_ext task lifecycle now that its > semantics are well-defined. > > Also update the pseudo-code to use task_is_runnable() consistently and > clarify the case where ops.dispatch() does not refill the time slice. > > Fixes: ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics") > Signed-off-by: Andrea Righi <arighi@nvidia.com> > --- > Documentation/scheduler/sched-ext.rst | 24 +++++++++++++++--------- > 1 file changed, 15 insertions(+), 9 deletions(-) > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > index 404b4e4c33f7e..9f03650abfeba 100644 > --- a/Documentation/scheduler/sched-ext.rst > +++ b/Documentation/scheduler/sched-ext.rst > @@ -422,23 +422,29 @@ by a sched_ext scheduler: > > ops.runnable(); /* Task becomes ready to run */ > > - while (task is runnable) { > + while (task_is_runnable(task)) { > if (task is not in a DSQ && task->scx.slice == 0) { > ops.enqueue(); /* Task can be added to a DSQ */ > > - /* Any usable CPU becomes available */ > + /* Task property change (i.e., affinity, nice, etc.)? */ > + if (sched_change(task)) { > + ops.dequeue(); /* Exiting BPF scheduler custody */ Doesn't the task also go through quiescent -> runnable here? The full path being dequeue -> quiescent -> (actual property change) -> runnable -> enqueue. I guess we should be accurate here since quiescent and runnable are present elsewhere in the pseudocode. > + continue; > + } > + } > > - ops.dispatch(); /* Task is moved to a local DSQ */ > + /* Any usable CPU becomes available */ > + > + ops.dispatch(); /* Task is moved to a local DSQ */ s/local/terminal/? > + ops.dequeue(); /* Exiting BPF scheduler custody */ > > - ops.dequeue(); /* Exiting BPF scheduler */ > - } > ops.running(); /* Task starts running on its assigned CPU */ > > - while task_is_runnable(p) { > - while (task->scx.slice > 0 && task_is_runnable(p)) > - ops.tick(); /* Called every 1/HZ seconds */ > + while (task_is_runnable(task) && task->scx.slice > 0) { > + ops.tick(); /* Called every 1/HZ seconds */ > > - ops.dispatch(); /* task->scx.slice can be refilled */ > + if (task->scx.slice == 0) > + ops.dispatch(); /* task->scx.slice can be refilled */ > } > > ops.stopping(); /* Task stops running (time slice expires or wait) */ Thanks, Kuba ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle 2026-04-07 9:54 ` Kuba Piecuch @ 2026-04-07 16:31 ` Andrea Righi 2026-04-08 9:18 ` [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() Kuba Piecuch 0 siblings, 1 reply; 12+ messages in thread From: Andrea Righi @ 2026-04-07 16:31 UTC (permalink / raw) To: Kuba Piecuch Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel Hi Kuba, On Tue, Apr 07, 2026 at 09:54:22AM +0000, Kuba Piecuch wrote: > Hi Andrea, > > On Mon Apr 6, 2026 at 11:47 AM UTC, Andrea Righi wrote: > > Document ops.dequeue() in the sched_ext task lifecycle now that its > > semantics are well-defined. > > > > Also update the pseudo-code to use task_is_runnable() consistently and > > clarify the case where ops.dispatch() does not refill the time slice. > > > > Fixes: ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics") > > Signed-off-by: Andrea Righi <arighi@nvidia.com> > > --- > > Documentation/scheduler/sched-ext.rst | 24 +++++++++++++++--------- > > 1 file changed, 15 insertions(+), 9 deletions(-) > > > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > > index 404b4e4c33f7e..9f03650abfeba 100644 > > --- a/Documentation/scheduler/sched-ext.rst > > +++ b/Documentation/scheduler/sched-ext.rst > > @@ -422,23 +422,29 @@ by a sched_ext scheduler: > > > > ops.runnable(); /* Task becomes ready to run */ > > > > - while (task is runnable) { > > + while (task_is_runnable(task)) { > > if (task is not in a DSQ && task->scx.slice == 0) { > > ops.enqueue(); /* Task can be added to a DSQ */ > > > > - /* Any usable CPU becomes available */ > > + /* Task property change (i.e., affinity, nice, etc.)? */ > > + if (sched_change(task)) { > > + ops.dequeue(); /* Exiting BPF scheduler custody */ > > Doesn't the task also go through quiescent -> runnable here? The full path > being dequeue -> quiescent -> (actual property change) -> runnable -> enqueue. > > I guess we should be accurate here since quiescent and runnable are present > elsewhere in the pseudocode. Ah yes, we need to add ops.quiescent() and ops.runnable() here. Tejun already applied this patch to his branch, can you send another patch on top of this? > > > + continue; > > + } > > + } > > > > - ops.dispatch(); /* Task is moved to a local DSQ */ > > + /* Any usable CPU becomes available */ > > + > > + ops.dispatch(); /* Task is moved to a local DSQ */ > > s/local/terminal/? Technically it'd be correct to say "terminal", but typically we use scx_bpf_move_to_local() here, which moves the task to a local DSQ. Then it may fallback into SCX_DSQ_GLOBAL if something goes wrong, but, from a logical perspective, the intention is to move the task to local DSQ at this point. So, I'm not sure if saying "terminal" here would be more confusing than helpful... but I don't have a strong opinion on that. Thanks, -Andrea > > > + ops.dequeue(); /* Exiting BPF scheduler custody */ > > > > - ops.dequeue(); /* Exiting BPF scheduler */ > > - } > > ops.running(); /* Task starts running on its assigned CPU */ > > > > - while task_is_runnable(p) { > > - while (task->scx.slice > 0 && task_is_runnable(p)) > > - ops.tick(); /* Called every 1/HZ seconds */ > > + while (task_is_runnable(task) && task->scx.slice > 0) { > > + ops.tick(); /* Called every 1/HZ seconds */ > > > > - ops.dispatch(); /* task->scx.slice can be refilled */ > > + if (task->scx.slice == 0) > > + ops.dispatch(); /* task->scx.slice can be refilled */ > > } > > > > ops.stopping(); /* Task stops running (time slice expires or wait) */ > > Thanks, > Kuba ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() 2026-04-07 16:31 ` Andrea Righi @ 2026-04-08 9:18 ` Kuba Piecuch 2026-04-08 11:28 ` Andrea Righi 0 siblings, 1 reply; 12+ messages in thread From: Kuba Piecuch @ 2026-04-08 9:18 UTC (permalink / raw) To: Tejun Heo, Andrea Righi Cc: David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel, Kuba Piecuch When a queued task has one of its scheduling properties changed (e.g. nice, affinity), it goes through dequeue() -> quiescent() -> (property change callback, e.g. ops.set_weight()) -> runnable() -> enqueue(). The existing documentation only mentions dequeue() and enqueue() on that path, so add the missing callbacks. Fixes: a4f61f0a1afd ("sched_ext: Documentation: Add ops.dequeue() to task lifecycle") Signed-off-by: Kuba Piecuch <jpiecuch@google.com> --- Documentation/scheduler/sched-ext.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst index ec594ae8086de..b5c70f4cfc352 100644 --- a/Documentation/scheduler/sched-ext.rst +++ b/Documentation/scheduler/sched-ext.rst @@ -429,6 +429,11 @@ by a sched_ext scheduler: /* Task property change (i.e., affinity, nice, etc.)? */ if (sched_change(task)) { ops.dequeue(); /* Exiting BPF scheduler custody */ + ops.quiescent(); + + /* Property change callback, e.g. ops.set_weight() */ + + ops.runnable(); continue; } } -- 2.53.0.1213.gd9a14994de-goog ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() 2026-04-08 9:18 ` [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() Kuba Piecuch @ 2026-04-08 11:28 ` Andrea Righi 2026-04-08 12:40 ` Kuba Piecuch 0 siblings, 1 reply; 12+ messages in thread From: Andrea Righi @ 2026-04-08 11:28 UTC (permalink / raw) To: Kuba Piecuch Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel Hi Kuba, On Wed, Apr 08, 2026 at 09:18:21AM +0000, Kuba Piecuch wrote: > When a queued task has one of its scheduling properties changed > (e.g. nice, affinity), it goes through dequeue() -> quiescent() -> > (property change callback, e.g. ops.set_weight()) -> runnable() -> > enqueue(). > > The existing documentation only mentions dequeue() and enqueue() on that > path, so add the missing callbacks. > > Fixes: a4f61f0a1afd ("sched_ext: Documentation: Add ops.dequeue() to task lifecycle") > Signed-off-by: Kuba Piecuch <jpiecuch@google.com> > --- > Documentation/scheduler/sched-ext.rst | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > index ec594ae8086de..b5c70f4cfc352 100644 > --- a/Documentation/scheduler/sched-ext.rst > +++ b/Documentation/scheduler/sched-ext.rst > @@ -429,6 +429,11 @@ by a sched_ext scheduler: Looks good, but I noticed another issue, should we also change the condition up above as following? Documentation/scheduler/sched-ext.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst index 29d36e248f58b..99df4cc982375 100644 --- a/Documentation/scheduler/sched-ext.rst +++ b/Documentation/scheduler/sched-ext.rst @@ -423,7 +423,7 @@ by a sched_ext scheduler: ops.runnable(); /* Task becomes ready to run */ while (task_is_runnable(task)) { - if (task is not in a DSQ && task->scx.slice == 0) { + if (task is not in a DSQ || task->scx.slice == 0) { ops.enqueue(); /* Task can be added to a DSQ */ /* Task property change (i.e., affinity, nice, etc.)? */ Because we trigger ops.enqueue() when the task expired its time slice or it becomes runnable and has not been added to a DSQ. This also represents correctly the sched_change() scenario: a task being re-enqueued after sched_change() still has its time slice > 0, but we need to call ops.enqueue() for it. Thanks, -Andrea > /* Task property change (i.e., affinity, nice, etc.)? */ > if (sched_change(task)) { > ops.dequeue(); /* Exiting BPF scheduler custody */ > + ops.quiescent(); > + > + /* Property change callback, e.g. ops.set_weight() */ > + > + ops.runnable(); > continue; > } > } > -- > 2.53.0.1213.gd9a14994de-goog > ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() 2026-04-08 11:28 ` Andrea Righi @ 2026-04-08 12:40 ` Kuba Piecuch 2026-04-08 13:49 ` Andrea Righi 0 siblings, 1 reply; 12+ messages in thread From: Kuba Piecuch @ 2026-04-08 12:40 UTC (permalink / raw) To: Andrea Righi, Kuba Piecuch Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel Hi Andrea, On Wed Apr 8, 2026 at 11:28 AM UTC, Andrea Righi wrote: ... > > Looks good, but I noticed another issue, should we also change the condition up > above as following? > > Documentation/scheduler/sched-ext.rst | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > index 29d36e248f58b..99df4cc982375 100644 > --- a/Documentation/scheduler/sched-ext.rst > +++ b/Documentation/scheduler/sched-ext.rst > @@ -423,7 +423,7 @@ by a sched_ext scheduler: > ops.runnable(); /* Task becomes ready to run */ > > while (task_is_runnable(task)) { > - if (task is not in a DSQ && task->scx.slice == 0) { > + if (task is not in a DSQ || task->scx.slice == 0) { > ops.enqueue(); /* Task can be added to a DSQ */ > > /* Task property change (i.e., affinity, nice, etc.)? */ > > Because we trigger ops.enqueue() when the task expired its time slice or it > becomes runnable and has not been added to a DSQ. > > This also represents correctly the sched_change() scenario: a task being > re-enqueued after sched_change() still has its time slice > 0, but we need to > call ops.enqueue() for it. I agree that the condition should be changed, but I'm not sure that this is what it should look like. Is the "task is not in a DSQ" part of the condition there to handle direct dispatch? Apart from direct dispatch from ops.select_cpu(), I wasn't able to come up with a situation where we would reach this condition with the task present on some DSQ. A more general comment about the pseudocode: I think it can be useful to introduce someone new to the general flow of the callbacks in sched_ext, but the documentation should be clear that this is a simplified view that makes assumptions about the behavior of the BPF scheduler itself (flags like SCX_OPS_ENQ_LAST, whether the scheduler uses direct dispatch), as well as the overall system (Can sched_ext be preempted by a higher-priority sched class? Can scheduling properties of a task be changed while it's running?) Without stating these assumptions clearly, we risk leaving the reader falsely believing they have a complete understanding. Thanks, Kuba ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() 2026-04-08 12:40 ` Kuba Piecuch @ 2026-04-08 13:49 ` Andrea Righi 2026-04-08 14:17 ` Kuba Piecuch 0 siblings, 1 reply; 12+ messages in thread From: Andrea Righi @ 2026-04-08 13:49 UTC (permalink / raw) To: Kuba Piecuch Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel On Wed, Apr 08, 2026 at 12:40:09PM +0000, Kuba Piecuch wrote: > Hi Andrea, > > On Wed Apr 8, 2026 at 11:28 AM UTC, Andrea Righi wrote: > ... > > > > Looks good, but I noticed another issue, should we also change the condition up > > above as following? > > > > Documentation/scheduler/sched-ext.rst | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > > index 29d36e248f58b..99df4cc982375 100644 > > --- a/Documentation/scheduler/sched-ext.rst > > +++ b/Documentation/scheduler/sched-ext.rst > > @@ -423,7 +423,7 @@ by a sched_ext scheduler: > > ops.runnable(); /* Task becomes ready to run */ > > > > while (task_is_runnable(task)) { > > - if (task is not in a DSQ && task->scx.slice == 0) { > > + if (task is not in a DSQ || task->scx.slice == 0) { > > ops.enqueue(); /* Task can be added to a DSQ */ > > > > /* Task property change (i.e., affinity, nice, etc.)? */ > > > > Because we trigger ops.enqueue() when the task expired its time slice or it > > becomes runnable and has not been added to a DSQ. > > > > This also represents correctly the sched_change() scenario: a task being > > re-enqueued after sched_change() still has its time slice > 0, but we need to > > call ops.enqueue() for it. > > I agree that the condition should be changed, but I'm not sure that this is > what it should look like. > > Is the "task is not in a DSQ" part of the condition there to handle direct > dispatch? Apart from direct dispatch from ops.select_cpu(), I wasn't able to > come up with a situation where we would reach this condition with the task > present on some DSQ. The intent is to represent the direct dispatch from ops.select_cpu(), since in that case ops.enqueue() is skipped. Honestly I think if we change the && to || in that condition, everything should be pretty accurate. > > A more general comment about the pseudocode: I think it can be useful to > introduce someone new to the general flow of the callbacks in sched_ext, > but the documentation should be clear that this is a simplified view that > makes assumptions about the behavior of the BPF scheduler itself (flags like > SCX_OPS_ENQ_LAST, whether the scheduler uses direct dispatch), as well as > the overall system (Can sched_ext be preempted by a higher-priority sched > class? Can scheduling properties of a task be changed while it's running?) > Without stating these assumptions clearly, we risk leaving the reader falsely > believing they have a complete understanding. Of course this schema is not a complete representation of the entire sched_ext state machine, if we put everything it'd become too big and complex. I think we should just cover the most common use cases here. Maybe we can clarify this in the description before this diagram. Thanks, -Andrea ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() 2026-04-08 13:49 ` Andrea Righi @ 2026-04-08 14:17 ` Kuba Piecuch 2026-04-08 14:54 ` Andrea Righi 0 siblings, 1 reply; 12+ messages in thread From: Kuba Piecuch @ 2026-04-08 14:17 UTC (permalink / raw) To: Andrea Righi, Kuba Piecuch Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel On Wed Apr 8, 2026 at 1:49 PM UTC, Andrea Righi wrote: > On Wed, Apr 08, 2026 at 12:40:09PM +0000, Kuba Piecuch wrote: >> Hi Andrea, >> >> On Wed Apr 8, 2026 at 11:28 AM UTC, Andrea Righi wrote: >> ... >> > >> > Looks good, but I noticed another issue, should we also change the condition up >> > above as following? >> > >> > Documentation/scheduler/sched-ext.rst | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst >> > index 29d36e248f58b..99df4cc982375 100644 >> > --- a/Documentation/scheduler/sched-ext.rst >> > +++ b/Documentation/scheduler/sched-ext.rst >> > @@ -423,7 +423,7 @@ by a sched_ext scheduler: >> > ops.runnable(); /* Task becomes ready to run */ >> > >> > while (task_is_runnable(task)) { >> > - if (task is not in a DSQ && task->scx.slice == 0) { >> > + if (task is not in a DSQ || task->scx.slice == 0) { >> > ops.enqueue(); /* Task can be added to a DSQ */ >> > >> > /* Task property change (i.e., affinity, nice, etc.)? */ >> > >> > Because we trigger ops.enqueue() when the task expired its time slice or it >> > becomes runnable and has not been added to a DSQ. >> > >> > This also represents correctly the sched_change() scenario: a task being >> > re-enqueued after sched_change() still has its time slice > 0, but we need to >> > call ops.enqueue() for it. >> >> I agree that the condition should be changed, but I'm not sure that this is >> what it should look like. >> >> Is the "task is not in a DSQ" part of the condition there to handle direct >> dispatch? Apart from direct dispatch from ops.select_cpu(), I wasn't able to >> come up with a situation where we would reach this condition with the task >> present on some DSQ. > > The intent is to represent the direct dispatch from ops.select_cpu(), since in > that case ops.enqueue() is skipped. > > Honestly I think if we change the && to || in that condition, everything should > be pretty accurate. In the case of direct dispatch from ops.select_cpu() we don't invoke ops.dispatch() and ops.dequeue() before ops.running(), right? The current pseudocode calls them unconditionally. Another inaccuracy not related to direct dispatch: property changes can occur while a task is running, while the psedocode only allows for property changes while a task is queued. There's also preemption by a higher sched class, which is not covered in the loop condition (task_is_runnable(task) && task->scx.slice > 0), unless we take task_is_runnable() to return false if there's a higher-priority sched class with runnable tasks on the CPU, though that would be in conflict with the actual implementation of task_is_runnable() in include/linux/sched.h. > >> >> A more general comment about the pseudocode: I think it can be useful to >> introduce someone new to the general flow of the callbacks in sched_ext, >> but the documentation should be clear that this is a simplified view that >> makes assumptions about the behavior of the BPF scheduler itself (flags like >> SCX_OPS_ENQ_LAST, whether the scheduler uses direct dispatch), as well as >> the overall system (Can sched_ext be preempted by a higher-priority sched >> class? Can scheduling properties of a task be changed while it's running?) >> Without stating these assumptions clearly, we risk leaving the reader falsely >> believing they have a complete understanding. > > Of course this schema is not a complete representation of the entire sched_ext > state machine, if we put everything it'd become too big and complex. I think we > should just cover the most common use cases here. Maybe we can clarify this in > the description before this diagram. Let's agree on what inaccuracies need to be fixed and I'll send a v2 with fixes and attach an appropriate disclaimer to the pseudocode. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() 2026-04-08 14:17 ` Kuba Piecuch @ 2026-04-08 14:54 ` Andrea Righi 0 siblings, 0 replies; 12+ messages in thread From: Andrea Righi @ 2026-04-08 14:54 UTC (permalink / raw) To: Kuba Piecuch Cc: Tejun Heo, David Vernet, Changwoo Min, Christian Loehle, Emil Tsalapatis, sched-ext, linux-kernel Hi Kuba, On Wed, Apr 08, 2026 at 02:17:03PM +0000, Kuba Piecuch wrote: > On Wed Apr 8, 2026 at 1:49 PM UTC, Andrea Righi wrote: > > On Wed, Apr 08, 2026 at 12:40:09PM +0000, Kuba Piecuch wrote: > >> Hi Andrea, > >> > >> On Wed Apr 8, 2026 at 11:28 AM UTC, Andrea Righi wrote: > >> ... > >> > > >> > Looks good, but I noticed another issue, should we also change the condition up > >> > above as following? > >> > > >> > Documentation/scheduler/sched-ext.rst | 2 +- > >> > 1 file changed, 1 insertion(+), 1 deletion(-) > >> > > >> > diff --git a/Documentation/scheduler/sched-ext.rst b/Documentation/scheduler/sched-ext.rst > >> > index 29d36e248f58b..99df4cc982375 100644 > >> > --- a/Documentation/scheduler/sched-ext.rst > >> > +++ b/Documentation/scheduler/sched-ext.rst > >> > @@ -423,7 +423,7 @@ by a sched_ext scheduler: > >> > ops.runnable(); /* Task becomes ready to run */ > >> > > >> > while (task_is_runnable(task)) { > >> > - if (task is not in a DSQ && task->scx.slice == 0) { > >> > + if (task is not in a DSQ || task->scx.slice == 0) { > >> > ops.enqueue(); /* Task can be added to a DSQ */ > >> > > >> > /* Task property change (i.e., affinity, nice, etc.)? */ > >> > > >> > Because we trigger ops.enqueue() when the task expired its time slice or it > >> > becomes runnable and has not been added to a DSQ. > >> > > >> > This also represents correctly the sched_change() scenario: a task being > >> > re-enqueued after sched_change() still has its time slice > 0, but we need to > >> > call ops.enqueue() for it. > >> > >> I agree that the condition should be changed, but I'm not sure that this is > >> what it should look like. > >> > >> Is the "task is not in a DSQ" part of the condition there to handle direct > >> dispatch? Apart from direct dispatch from ops.select_cpu(), I wasn't able to > >> come up with a situation where we would reach this condition with the task > >> present on some DSQ. > > > > The intent is to represent the direct dispatch from ops.select_cpu(), since in > > that case ops.enqueue() is skipped. > > > > Honestly I think if we change the && to || in that condition, everything should > > be pretty accurate. > > In the case of direct dispatch from ops.select_cpu() we don't invoke > ops.dispatch() and ops.dequeue() before ops.running(), right? The current > pseudocode calls them unconditionally. We can move ops.dispatch() and ops.dequeue() inside the if (task is not in a DSQ || task->scx.slice == 0) block. > > Another inaccuracy not related to direct dispatch: property changes can occur > while a task is running, while the psedocode only allows for property changes > while a task is queued. Sure... but again, modelling all the possible scenarios would make the pseudocode completely unreadable. IMHO it'd be better to give an overview of the most common use cases here and clarify in the description that the diagram doesn't cover all the possible scenarios. This one is a special use case that, personally, I wouldn't cover in the pseudocode. > > There's also preemption by a higher sched class, which is not covered in the > loop condition (task_is_runnable(task) && task->scx.slice > 0), unless we take > task_is_runnable() to return false if there's a higher-priority sched class > with runnable tasks on the CPU, though that would be in conflict with the > actual implementation of task_is_runnable() in include/linux/sched.h. Ditto. > > > > >> > >> A more general comment about the pseudocode: I think it can be useful to > >> introduce someone new to the general flow of the callbacks in sched_ext, > >> but the documentation should be clear that this is a simplified view that > >> makes assumptions about the behavior of the BPF scheduler itself (flags like > >> SCX_OPS_ENQ_LAST, whether the scheduler uses direct dispatch), as well as > >> the overall system (Can sched_ext be preempted by a higher-priority sched > >> class? Can scheduling properties of a task be changed while it's running?) > >> Without stating these assumptions clearly, we risk leaving the reader falsely > >> believing they have a complete understanding. > > > > Of course this schema is not a complete representation of the entire sched_ext > > state machine, if we put everything it'd become too big and complex. I think we > > should just cover the most common use cases here. Maybe we can clarify this in > > the description before this diagram. > > Let's agree on what inaccuracies need to be fixed and I'll send a v2 with fixes > and attach an appropriate disclaimer to the pseudocode. If we move ops.dispatch() + ops.dequeue() inside the ops.enqueue() block I think the pseudocode becomes "fairly" accurate. At least more accurate than what we have right now. It won't be perfect, but it can help newer sched_ext devs having an overview the task lifecycle without going too much into implementation details. So, to recap, what do you think about this? ops.init_task(); /* A new task is created */ ops.enable(); /* Enable BPF scheduling for the task */ while (task in SCHED_EXT) { if (task can migrate) ops.select_cpu(); /* Called on wakeup (optimization) */ ops.runnable(); /* Task becomes ready to run */ while (task_is_runnable(task)) { if (task is not in a DSQ || task->scx.slice == 0) { ops.enqueue(); /* Task can be added to a DSQ */ /* Task property change (i.e., affinity, nice, etc.)? */ if (sched_change(task)) { ops.dequeue(); /* Exiting BPF scheduler custody */ ops.quiescent(); /* Property change callback, e.g. ops.set_weight() */ ops.runnable(); continue; } /* Any usable CPU becomes available */ ops.dispatch(); /* Task is moved to a local DSQ */ ops.dequeue(); /* Exiting BPF scheduler custody */ } ops.running(); /* Task starts running on its assigned CPU */ while (task_is_runnable(task) && task->scx.slice > 0) { ops.tick(); /* Called every 1/HZ seconds */ if (task->scx.slice == 0) ops.dispatch(); /* task->scx.slice can be refilled */ } ops.stopping(); /* Task stops running (time slice expires or wait) */ } ops.quiescent(); /* Task releases its assigned CPU (wait) */ } ops.disable(); /* Disable BPF scheduling for the task */ ops.exit_task(); /* Task is destroyed */ Thanks, -Andrea ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-04-08 14:54 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-06 11:47 [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add ops.dequeue() to task lifecycle Andrea Righi 2026-04-06 14:49 ` Emil Tsalapatis 2026-04-06 19:08 ` Andrea Righi 2026-04-06 18:09 ` Tejun Heo 2026-04-07 9:54 ` Kuba Piecuch 2026-04-07 16:31 ` Andrea Righi 2026-04-08 9:18 ` [PATCH sched_ext/for-7.1] sched_ext: Documentation: Add missing calls to quiescent(), runnable() Kuba Piecuch 2026-04-08 11:28 ` Andrea Righi 2026-04-08 12:40 ` Kuba Piecuch 2026-04-08 13:49 ` Andrea Righi 2026-04-08 14:17 ` Kuba Piecuch 2026-04-08 14:54 ` Andrea Righi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox