* [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
@ 2026-06-24 23:55 Tejun Heo
2026-06-25 9:45 ` Kuba Piecuch
2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2026-06-24 23:55 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min, sched-ext
Cc: Emil Tsalapatis, linux-kernel
put_prev_task_scx() warns when a runnable task drops to a lower
sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
otherwise keep it running.
Under core scheduling that assumption is wrong: a forced-idle SMT sibling
reschedules through the core_pick fast path in pick_next_task(), which
skips balance() for the CPU, so balance_one() never runs and a runnable
task can drop to idle with ENQ_LAST unset. Skip the warning when core
scheduling is enabled.
Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext/ext.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
index 9c9cb9d08bca..503c4d2105ee 100644
--- a/kernel/sched/ext/ext.c
+++ b/kernel/sched/ext/ext.c
@@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
* which should trigger an explicit follow-up scheduling event.
*/
if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
- WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
+ /* core-sched can force cpu idle while @p is runnable */
+ if (!sched_core_enabled(rq))
+ WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
} else {
do_enqueue_task(rq, p, 0, -1);
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() 2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo @ 2026-06-25 9:45 ` Kuba Piecuch 2026-06-29 19:42 ` Tejun Heo 2026-06-29 19:43 ` [PATCH v2] " Tejun Heo 1 sibling, 1 reply; 5+ messages in thread From: Kuba Piecuch @ 2026-06-25 9:45 UTC (permalink / raw) To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min, sched-ext Cc: Emil Tsalapatis, linux-kernel Hi Tejun, On Wed Jun 24, 2026 at 11:55 PM UTC, Tejun Heo wrote: > put_prev_task_scx() warns when a runnable task drops to a lower > sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would > otherwise keep it running. > > Under core scheduling that assumption is wrong: a forced-idle SMT sibling > reschedules through the core_pick fast path in pick_next_task(), which > skips balance() for the CPU, so balance_one() never runs and a runnable Nit: balance_one() doesn't happen in balance() anymore, it happens in pick. So IMO it should read "... skips pick_task_scx() for the CPU, ...". > task can drop to idle with ENQ_LAST unset. Skip the warning when core > scheduling is enabled. > > Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task") > Signed-off-by: Tejun Heo <tj@kernel.org> > --- > kernel/sched/ext/ext.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c > index 9c9cb9d08bca..503c4d2105ee 100644 > --- a/kernel/sched/ext/ext.c > +++ b/kernel/sched/ext/ext.c > @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p, > * which should trigger an explicit follow-up scheduling event. > */ > if (next && sched_class_above(&ext_sched_class, next->sched_class)) { > - WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST)); > + /* core-sched can force cpu idle while @p is runnable */ > + if (!sched_core_enabled(rq)) Is there a more precise check that we could do to determine if this switch is due to core-sched forcing the CPU idle? I was thinking about rq->core->core_forceidle_count, but IIUC that's the core-wide number of CPUs forced idle, so it's not a reliable signal about any particular CPU. > + WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST)); > do_enqueue_task(rq, p, SCX_ENQ_LAST, -1); > } else { > do_enqueue_task(rq, p, 0, -1); This patch made me think a bit about core-sched interactions and I have a concern about IMMED tasks staying on local DSQ when the CPU is forced idle. I wasn't able to quickly convince myself that an IMMED task will be reenqueued in the case where a CPU running an SCX task has an IMMED task enqueued in its local DSQ by a remote CPU, and the CPU is forced idle while the IMMED task is on the local DSQ. Looks like we might need a call to schedule_reenq_local() somewhere in here (in a separate patch, of course). WDYT? Thanks, Kuba ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() 2026-06-25 9:45 ` Kuba Piecuch @ 2026-06-29 19:42 ` Tejun Heo 0 siblings, 0 replies; 5+ messages in thread From: Tejun Heo @ 2026-06-29 19:42 UTC (permalink / raw) To: Kuba Piecuch, David Vernet, Andrea Righi, Changwoo Min, sched-ext Cc: Emil Tsalapatis, linux-kernel Hello, Kuba. On Thu, Jun 25, 2026 at 09:45:30AM +0000, Kuba Piecuch wrote: > Nit: balance_one() doesn't happen in balance() anymore, it happens in pick. > So IMO it should read "... skips pick_task_scx() for the CPU, ...". Right, fixed in v2. > Is there a more precise check that we could do to determine if this switch is > due to core-sched forcing the CPU idle? I was thinking about > rq->core->core_forceidle_count, but IIUC that's the core-wide number of CPUs > forced idle, so it's not a reliable signal about any particular CPU. I couldn't find anything with better granularity either, so v2 keeps the sched_core_enabled() gate. > I wasn't able to quickly convince myself that an IMMED task will be reenqueued > in the case where a CPU running an SCX task has an IMMED task enqueued in its > local DSQ by a remote CPU, and the CPU is forced idle while the IMMED task > is on the local DSQ. > Looks like we might need a call to schedule_reenq_local() somewhere in here > (in a separate patch, of course). WDYT? Makes sense, and not too surprising. For now maybe we just note it in the comment. Thanks. -- tejun ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() 2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo 2026-06-25 9:45 ` Kuba Piecuch @ 2026-06-29 19:43 ` Tejun Heo 2026-06-29 20:06 ` Andrea Righi 1 sibling, 1 reply; 5+ messages in thread From: Tejun Heo @ 2026-06-29 19:43 UTC (permalink / raw) To: David Vernet, Andrea Righi, Changwoo Min, sched-ext Cc: Tejun Heo, Emil Tsalapatis, linux-kernel, Kuba Piecuch put_prev_task_scx() warns when a runnable task drops to a lower sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would otherwise keep it running. Under core scheduling that assumption is wrong: a forced-idle SMT sibling reschedules through the core_pick fast path in pick_next_task(), which skips pick_task_scx() for the CPU, so balance_one() never runs and a runnable task can drop to idle with ENQ_LAST unset. Skip the warning when core scheduling is enabled. v2: Reworded the description per Kuba Piecuch's review. No code change. Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task") Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/sched/ext/ext.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c index 9c9cb9d08bca..503c4d2105ee 100644 --- a/kernel/sched/ext/ext.c +++ b/kernel/sched/ext/ext.c @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p, * which should trigger an explicit follow-up scheduling event. */ if (next && sched_class_above(&ext_sched_class, next->sched_class)) { - WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST)); + /* core-sched can force cpu idle while @p is runnable */ + if (!sched_core_enabled(rq)) + WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST)); do_enqueue_task(rq, p, SCX_ENQ_LAST, -1); } else { do_enqueue_task(rq, p, 0, -1); ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() 2026-06-29 19:43 ` [PATCH v2] " Tejun Heo @ 2026-06-29 20:06 ` Andrea Righi 0 siblings, 0 replies; 5+ messages in thread From: Andrea Righi @ 2026-06-29 20:06 UTC (permalink / raw) To: Tejun Heo Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis, linux-kernel, Kuba Piecuch Hi Tejun, On Mon, Jun 29, 2026 at 09:43:55AM -1000, Tejun Heo wrote: > put_prev_task_scx() warns when a runnable task drops to a lower > sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would > otherwise keep it running. > > Under core scheduling that assumption is wrong: a forced-idle SMT sibling > reschedules through the core_pick fast path in pick_next_task(), which skips > pick_task_scx() for the CPU, so balance_one() never runs and a runnable task > can drop to idle with ENQ_LAST unset. Skip the warning when core scheduling > is enabled. > > v2: Reworded the description per Kuba Piecuch's review. No code change. > > Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task") > Signed-off-by: Tejun Heo <tj@kernel.org> > --- > kernel/sched/ext/ext.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c > index 9c9cb9d08bca..503c4d2105ee 100644 > --- a/kernel/sched/ext/ext.c > +++ b/kernel/sched/ext/ext.c > @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p, > * which should trigger an explicit follow-up scheduling event. > */ > if (next && sched_class_above(&ext_sched_class, next->sched_class)) { > - WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST)); > + /* core-sched can force cpu idle while @p is runnable */ > + if (!sched_core_enabled(rq)) > + WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST)); I was wondering if this could be a better check: WARN_ON_ONCE(sched_cpu_cookie_match(rq, p) && !(sch->ops.flags & SCX_OPS_ENQ_LAST)); With this: - the surrounding branch establishes that runnable p is being replaced by idle (class below sched_ext), - the core picker has already stored the selected cookie, - if p does not match that cookie, this CPU is being forced idle by core scheduling, - when core scheduling is disabled, sched_cpu_cookie_match() returns true, so we preserve the warning - when core scheduling is enabled and the cookies match the warning is also preserved. In theory it should work, unless I'm missing some other edge cases. Thanks, -Andrea > do_enqueue_task(rq, p, SCX_ENQ_LAST, -1); > } else { > do_enqueue_task(rq, p, 0, -1); ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-29 20:06 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo 2026-06-25 9:45 ` Kuba Piecuch 2026-06-29 19:42 ` Tejun Heo 2026-06-29 19:43 ` [PATCH v2] " Tejun Heo 2026-06-29 20:06 ` Andrea Righi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox