* [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
@ 2026-06-24 23:55 Tejun Heo
2026-06-25 9:45 ` Kuba Piecuch
2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2026-06-24 23:55 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min, sched-ext
Cc: Emil Tsalapatis, linux-kernel
put_prev_task_scx() warns when a runnable task drops to a lower
sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
otherwise keep it running.
Under core scheduling that assumption is wrong: a forced-idle SMT sibling
reschedules through the core_pick fast path in pick_next_task(), which
skips balance() for the CPU, so balance_one() never runs and a runnable
task can drop to idle with ENQ_LAST unset. Skip the warning when core
scheduling is enabled.
Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext/ext.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
index 9c9cb9d08bca..503c4d2105ee 100644
--- a/kernel/sched/ext/ext.c
+++ b/kernel/sched/ext/ext.c
@@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
* which should trigger an explicit follow-up scheduling event.
*/
if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
- WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
+ /* core-sched can force cpu idle while @p is runnable */
+ if (!sched_core_enabled(rq))
+ WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
} else {
do_enqueue_task(rq, p, 0, -1);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo
@ 2026-06-25 9:45 ` Kuba Piecuch
2026-06-29 19:42 ` Tejun Heo
2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
1 sibling, 1 reply; 5+ messages in thread
From: Kuba Piecuch @ 2026-06-25 9:45 UTC (permalink / raw)
To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min, sched-ext
Cc: Emil Tsalapatis, linux-kernel
Hi Tejun,
On Wed Jun 24, 2026 at 11:55 PM UTC, Tejun Heo wrote:
> put_prev_task_scx() warns when a runnable task drops to a lower
> sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
> otherwise keep it running.
>
> Under core scheduling that assumption is wrong: a forced-idle SMT sibling
> reschedules through the core_pick fast path in pick_next_task(), which
> skips balance() for the CPU, so balance_one() never runs and a runnable
Nit: balance_one() doesn't happen in balance() anymore, it happens in pick.
So IMO it should read "... skips pick_task_scx() for the CPU, ...".
> task can drop to idle with ENQ_LAST unset. Skip the warning when core
> scheduling is enabled.
>
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> kernel/sched/ext/ext.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
> index 9c9cb9d08bca..503c4d2105ee 100644
> --- a/kernel/sched/ext/ext.c
> +++ b/kernel/sched/ext/ext.c
> @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
> * which should trigger an explicit follow-up scheduling event.
> */
> if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
> - WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
> + /* core-sched can force cpu idle while @p is runnable */
> + if (!sched_core_enabled(rq))
Is there a more precise check that we could do to determine if this switch is
due to core-sched forcing the CPU idle? I was thinking about
rq->core->core_forceidle_count, but IIUC that's the core-wide number of CPUs
forced idle, so it's not a reliable signal about any particular CPU.
> + WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
> do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
> } else {
> do_enqueue_task(rq, p, 0, -1);
This patch made me think a bit about core-sched interactions and I have
a concern about IMMED tasks staying on local DSQ when the CPU is forced idle.
I wasn't able to quickly convince myself that an IMMED task will be reenqueued
in the case where a CPU running an SCX task has an IMMED task enqueued in its
local DSQ by a remote CPU, and the CPU is forced idle while the IMMED task
is on the local DSQ.
Looks like we might need a call to schedule_reenq_local() somewhere in here
(in a separate patch, of course). WDYT?
Thanks,
Kuba
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
2026-06-25 9:45 ` Kuba Piecuch
@ 2026-06-29 19:42 ` Tejun Heo
0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2026-06-29 19:42 UTC (permalink / raw)
To: Kuba Piecuch, David Vernet, Andrea Righi, Changwoo Min, sched-ext
Cc: Emil Tsalapatis, linux-kernel
Hello, Kuba.
On Thu, Jun 25, 2026 at 09:45:30AM +0000, Kuba Piecuch wrote:
> Nit: balance_one() doesn't happen in balance() anymore, it happens in pick.
> So IMO it should read "... skips pick_task_scx() for the CPU, ...".
Right, fixed in v2.
> Is there a more precise check that we could do to determine if this switch is
> due to core-sched forcing the CPU idle? I was thinking about
> rq->core->core_forceidle_count, but IIUC that's the core-wide number of CPUs
> forced idle, so it's not a reliable signal about any particular CPU.
I couldn't find anything with better granularity either, so v2 keeps the
sched_core_enabled() gate.
> I wasn't able to quickly convince myself that an IMMED task will be reenqueued
> in the case where a CPU running an SCX task has an IMMED task enqueued in its
> local DSQ by a remote CPU, and the CPU is forced idle while the IMMED task
> is on the local DSQ.
> Looks like we might need a call to schedule_reenq_local() somewhere in here
> (in a separate patch, of course). WDYT?
Makes sense, and not too surprising. For now maybe we just note it in the
comment.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo
2026-06-25 9:45 ` Kuba Piecuch
@ 2026-06-29 19:43 ` Tejun Heo
2026-06-29 20:06 ` Andrea Righi
1 sibling, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2026-06-29 19:43 UTC (permalink / raw)
To: David Vernet, Andrea Righi, Changwoo Min, sched-ext
Cc: Tejun Heo, Emil Tsalapatis, linux-kernel, Kuba Piecuch
put_prev_task_scx() warns when a runnable task drops to a lower
sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
otherwise keep it running.
Under core scheduling that assumption is wrong: a forced-idle SMT sibling
reschedules through the core_pick fast path in pick_next_task(), which skips
pick_task_scx() for the CPU, so balance_one() never runs and a runnable task
can drop to idle with ENQ_LAST unset. Skip the warning when core scheduling
is enabled.
v2: Reworded the description per Kuba Piecuch's review. No code change.
Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
kernel/sched/ext/ext.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
index 9c9cb9d08bca..503c4d2105ee 100644
--- a/kernel/sched/ext/ext.c
+++ b/kernel/sched/ext/ext.c
@@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
* which should trigger an explicit follow-up scheduling event.
*/
if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
- WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
+ /* core-sched can force cpu idle while @p is runnable */
+ if (!sched_core_enabled(rq))
+ WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
} else {
do_enqueue_task(rq, p, 0, -1);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
@ 2026-06-29 20:06 ` Andrea Righi
0 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2026-06-29 20:06 UTC (permalink / raw)
To: Tejun Heo
Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis,
linux-kernel, Kuba Piecuch
Hi Tejun,
On Mon, Jun 29, 2026 at 09:43:55AM -1000, Tejun Heo wrote:
> put_prev_task_scx() warns when a runnable task drops to a lower
> sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
> otherwise keep it running.
>
> Under core scheduling that assumption is wrong: a forced-idle SMT sibling
> reschedules through the core_pick fast path in pick_next_task(), which skips
> pick_task_scx() for the CPU, so balance_one() never runs and a runnable task
> can drop to idle with ENQ_LAST unset. Skip the warning when core scheduling
> is enabled.
>
> v2: Reworded the description per Kuba Piecuch's review. No code change.
>
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
> kernel/sched/ext/ext.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
> index 9c9cb9d08bca..503c4d2105ee 100644
> --- a/kernel/sched/ext/ext.c
> +++ b/kernel/sched/ext/ext.c
> @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
> * which should trigger an explicit follow-up scheduling event.
> */
> if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
> - WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
> + /* core-sched can force cpu idle while @p is runnable */
> + if (!sched_core_enabled(rq))
> + WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
I was wondering if this could be a better check:
WARN_ON_ONCE(sched_cpu_cookie_match(rq, p) &&
!(sch->ops.flags & SCX_OPS_ENQ_LAST));
With this:
- the surrounding branch establishes that runnable p is being replaced by
idle (class below sched_ext),
- the core picker has already stored the selected cookie,
- if p does not match that cookie, this CPU is being forced idle by core
scheduling,
- when core scheduling is disabled, sched_cpu_cookie_match() returns true, so
we preserve the warning
- when core scheduling is enabled and the cookies match the warning is also
preserved.
In theory it should work, unless I'm missing some other edge cases.
Thanks,
-Andrea
> do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
> } else {
> do_enqueue_task(rq, p, 0, -1);
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-29 20:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo
2026-06-25 9:45 ` Kuba Piecuch
2026-06-29 19:42 ` Tejun Heo
2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
2026-06-29 20:06 ` Andrea Righi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox