Sched_ext development
 help / color / mirror / Atom feed
* [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
@ 2026-06-24 23:55 Tejun Heo
  2026-06-25  9:45 ` Kuba Piecuch
  2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
  0 siblings, 2 replies; 5+ messages in thread
From: Tejun Heo @ 2026-06-24 23:55 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min, sched-ext
  Cc: Emil Tsalapatis, linux-kernel

put_prev_task_scx() warns when a runnable task drops to a lower
sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
otherwise keep it running.

Under core scheduling that assumption is wrong: a forced-idle SMT sibling
reschedules through the core_pick fast path in pick_next_task(), which
skips balance() for the CPU, so balance_one() never runs and a runnable
task can drop to idle with ENQ_LAST unset. Skip the warning when core
scheduling is enabled.

Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext/ext.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
index 9c9cb9d08bca..503c4d2105ee 100644
--- a/kernel/sched/ext/ext.c
+++ b/kernel/sched/ext/ext.c
@@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 		 * which should trigger an explicit follow-up scheduling event.
 		 */
 		if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
-			WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
+			/* core-sched can force cpu idle while @p is runnable */
+			if (!sched_core_enabled(rq))
+				WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
 			do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
 		} else {
 			do_enqueue_task(rq, p, 0, -1);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
  2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo
@ 2026-06-25  9:45 ` Kuba Piecuch
  2026-06-29 19:42   ` Tejun Heo
  2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
  1 sibling, 1 reply; 5+ messages in thread
From: Kuba Piecuch @ 2026-06-25  9:45 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Andrea Righi, Changwoo Min, sched-ext
  Cc: Emil Tsalapatis, linux-kernel

Hi Tejun,

On Wed Jun 24, 2026 at 11:55 PM UTC, Tejun Heo wrote:
> put_prev_task_scx() warns when a runnable task drops to a lower
> sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
> otherwise keep it running.
>
> Under core scheduling that assumption is wrong: a forced-idle SMT sibling
> reschedules through the core_pick fast path in pick_next_task(), which
> skips balance() for the CPU, so balance_one() never runs and a runnable

Nit: balance_one() doesn't happen in balance() anymore, it happens in pick.
So IMO it should read "... skips pick_task_scx() for the CPU, ...".

> task can drop to idle with ENQ_LAST unset. Skip the warning when core
> scheduling is enabled.
>
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  kernel/sched/ext/ext.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
> index 9c9cb9d08bca..503c4d2105ee 100644
> --- a/kernel/sched/ext/ext.c
> +++ b/kernel/sched/ext/ext.c
> @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
>  		 * which should trigger an explicit follow-up scheduling event.
>  		 */
>  		if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
> -			WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
> +			/* core-sched can force cpu idle while @p is runnable */
> +			if (!sched_core_enabled(rq))

Is there a more precise check that we could do to determine if this switch is
due to core-sched forcing the CPU idle? I was thinking about
rq->core->core_forceidle_count, but IIUC that's the core-wide number of CPUs
forced idle, so it's not a reliable signal about any particular CPU.

> +				WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
>  			do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
>  		} else {
>  			do_enqueue_task(rq, p, 0, -1);

This patch made me think a bit about core-sched interactions and I have
a concern about IMMED tasks staying on local DSQ when the CPU is forced idle.
I wasn't able to quickly convince myself that an IMMED task will be reenqueued
in the case where a CPU running an SCX task has an IMMED task enqueued in its
local DSQ by a remote CPU, and the CPU is forced idle while the IMMED task
is on the local DSQ.
Looks like we might need a call to schedule_reenq_local() somewhere in here
(in a separate patch, of course). WDYT?

Thanks,
Kuba

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
  2026-06-25  9:45 ` Kuba Piecuch
@ 2026-06-29 19:42   ` Tejun Heo
  0 siblings, 0 replies; 5+ messages in thread
From: Tejun Heo @ 2026-06-29 19:42 UTC (permalink / raw)
  To: Kuba Piecuch, David Vernet, Andrea Righi, Changwoo Min, sched-ext
  Cc: Emil Tsalapatis, linux-kernel

Hello, Kuba.

On Thu, Jun 25, 2026 at 09:45:30AM +0000, Kuba Piecuch wrote:
> Nit: balance_one() doesn't happen in balance() anymore, it happens in pick.
> So IMO it should read "... skips pick_task_scx() for the CPU, ...".

Right, fixed in v2.

> Is there a more precise check that we could do to determine if this switch is
> due to core-sched forcing the CPU idle? I was thinking about
> rq->core->core_forceidle_count, but IIUC that's the core-wide number of CPUs
> forced idle, so it's not a reliable signal about any particular CPU.

I couldn't find anything with better granularity either, so v2 keeps the
sched_core_enabled() gate.

> I wasn't able to quickly convince myself that an IMMED task will be reenqueued
> in the case where a CPU running an SCX task has an IMMED task enqueued in its
> local DSQ by a remote CPU, and the CPU is forced idle while the IMMED task
> is on the local DSQ.
> Looks like we might need a call to schedule_reenq_local() somewhere in here
> (in a separate patch, of course). WDYT?

Makes sense, and not too surprising. For now maybe we just note it in the
comment.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
  2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo
  2026-06-25  9:45 ` Kuba Piecuch
@ 2026-06-29 19:43 ` Tejun Heo
  2026-06-29 20:06   ` Andrea Righi
  1 sibling, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2026-06-29 19:43 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min, sched-ext
  Cc: Tejun Heo, Emil Tsalapatis, linux-kernel, Kuba Piecuch

put_prev_task_scx() warns when a runnable task drops to a lower
sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
otherwise keep it running.

Under core scheduling that assumption is wrong: a forced-idle SMT sibling
reschedules through the core_pick fast path in pick_next_task(), which skips
pick_task_scx() for the CPU, so balance_one() never runs and a runnable task
can drop to idle with ENQ_LAST unset. Skip the warning when core scheduling
is enabled.

v2: Reworded the description per Kuba Piecuch's review. No code change.

Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext/ext.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
index 9c9cb9d08bca..503c4d2105ee 100644
--- a/kernel/sched/ext/ext.c
+++ b/kernel/sched/ext/ext.c
@@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
 		 * which should trigger an explicit follow-up scheduling event.
 		 */
 		if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
-			WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
+			/* core-sched can force cpu idle while @p is runnable */
+			if (!sched_core_enabled(rq))
+				WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
 			do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
 		} else {
 			do_enqueue_task(rq, p, 0, -1);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx()
  2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
@ 2026-06-29 20:06   ` Andrea Righi
  0 siblings, 0 replies; 5+ messages in thread
From: Andrea Righi @ 2026-06-29 20:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis,
	linux-kernel, Kuba Piecuch

Hi Tejun,

On Mon, Jun 29, 2026 at 09:43:55AM -1000, Tejun Heo wrote:
> put_prev_task_scx() warns when a runnable task drops to a lower
> sched_class without SCX_OPS_ENQ_LAST, assuming balance_one() would
> otherwise keep it running.
> 
> Under core scheduling that assumption is wrong: a forced-idle SMT sibling
> reschedules through the core_pick fast path in pick_next_task(), which skips
> pick_task_scx() for the CPU, so balance_one() never runs and a runnable task
> can drop to idle with ENQ_LAST unset. Skip the warning when core scheduling
> is enabled.
> 
> v2: Reworded the description per Kuba Piecuch's review. No code change.
> 
> Fixes: 7c65ae81ea86 ("sched_ext: Don't call put_prev_task_scx() before picking the next task")
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>  kernel/sched/ext/ext.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/ext/ext.c b/kernel/sched/ext/ext.c
> index 9c9cb9d08bca..503c4d2105ee 100644
> --- a/kernel/sched/ext/ext.c
> +++ b/kernel/sched/ext/ext.c
> @@ -3092,7 +3092,9 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
>  		 * which should trigger an explicit follow-up scheduling event.
>  		 */
>  		if (next && sched_class_above(&ext_sched_class, next->sched_class)) {
> -			WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));
> +			/* core-sched can force cpu idle while @p is runnable */
> +			if (!sched_core_enabled(rq))
> +				WARN_ON_ONCE(!(sch->ops.flags & SCX_OPS_ENQ_LAST));

I was wondering if this could be a better check:

 WARN_ON_ONCE(sched_cpu_cookie_match(rq, p) &&
               !(sch->ops.flags & SCX_OPS_ENQ_LAST));

With this:
 - the surrounding branch establishes that runnable p is being replaced by
   idle (class below sched_ext),
 - the core picker has already stored the selected cookie,
 - if p does not match that cookie, this CPU is being forced idle by core
   scheduling,
 - when core scheduling is disabled, sched_cpu_cookie_match() returns true, so
   we preserve the warning
 - when core scheduling is enabled and the cookies match the warning is also
   preserved.

In theory it should work, unless I'm missing some other edge cases.

Thanks,
-Andrea

>  			do_enqueue_task(rq, p, SCX_ENQ_LAST, -1);
>  		} else {
>  			do_enqueue_task(rq, p, 0, -1);

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-06-29 20:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 23:55 [PATCH] sched_ext: Don't warn on core-sched forced idle in put_prev_task_scx() Tejun Heo
2026-06-25  9:45 ` Kuba Piecuch
2026-06-29 19:42   ` Tejun Heo
2026-06-29 19:43 ` [PATCH v2] " Tejun Heo
2026-06-29 20:06   ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox