linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits
@ 2025-05-21 22:23 Tejun Heo
  2025-05-22  7:27 ` Andrea Righi
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Tejun Heo @ 2025-05-21 22:23 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min, linux-kernel, sched-ext

BPF schedulers that use both builtin CPU idle mechanism and
ops.update_idle() may want to use the latter to create interlocking between
ops.enqueue() and CPU idle transitions so that either ops.enqueue() sees the
idle bit or ops.update_idle() sees the task queued somewhere. This can
prevent race conditions where CPUs go idle while tasks are waiting in DSQs.

For such interlocking to work, ops.update_idle() must be called after
builtin CPU masks are updated. Relocate the invocation. Currently, there are
no ordering requirements on transitions from idle and this relocation isn't
expected to make meaningful differences in that direction.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext_idle.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index ae30de383913..66da03cc0b33 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -738,16 +738,6 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
 
 	lockdep_assert_rq_held(rq);
 
-	/*
-	 * Trigger ops.update_idle() only when transitioning from a task to
-	 * the idle thread and vice versa.
-	 *
-	 * Idle transitions are indicated by do_notify being set to true,
-	 * managed by put_prev_task_idle()/set_next_task_idle().
-	 */
-	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
-		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
-
 	/*
 	 * Update the idle masks:
 	 * - for real idle transitions (do_notify == true)
@@ -765,6 +755,21 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
 	if (static_branch_likely(&scx_builtin_idle_enabled))
 		if (do_notify || is_idle_task(rq->curr))
 			update_builtin_idle(cpu, idle);
+
+	/*
+	 * Trigger ops.update_idle() only when transitioning from a task to
+	 * the idle thread and vice versa.
+	 *
+	 * Idle transitions are indicated by do_notify being set to true,
+	 * managed by put_prev_task_idle()/set_next_task_idle().
+	 *
+	 * This must come after builtin idle update so that BPF schedulers can
+	 * create interlocking between ops.update_idle() and ops.enqueue() -
+	 * either enqueue() sees the idle bit or update_idle() sees the task
+	 * that enqueue() queued.
+	 */
+	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
+		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
 }
 
 static void reset_idle_masks(struct sched_ext_ops *ops)


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits
  2025-05-21 22:23 [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits Tejun Heo
@ 2025-05-22  7:27 ` Andrea Righi
  2025-05-22  8:29 ` Changwoo Min
  2025-05-22 19:26 ` Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Andrea Righi @ 2025-05-22  7:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: David Vernet, Changwoo Min, linux-kernel, sched-ext

On Wed, May 21, 2025 at 12:23:06PM -1000, Tejun Heo wrote:
> BPF schedulers that use both builtin CPU idle mechanism and
> ops.update_idle() may want to use the latter to create interlocking between
> ops.enqueue() and CPU idle transitions so that either ops.enqueue() sees the
> idle bit or ops.update_idle() sees the task queued somewhere. This can
> prevent race conditions where CPUs go idle while tasks are waiting in DSQs.
> 
> For such interlocking to work, ops.update_idle() must be called after
> builtin CPU masks are updated. Relocate the invocation. Currently, there are
> no ordering requirements on transitions from idle and this relocation isn't
> expected to make meaningful differences in that direction.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Looks good and it also makes sense semantically: potentially any action
performed in ops.update_idle() should be able to override the built-in idle
state, not the other way around.

For example, if we call scx_bpf_test_and_clear_cpu_idle(cpu) from within
ops.update_idle(), I would expect that to effectively "exclude" the CPU
from the idle selection, since the intention is to override the built-in
idle state. But that's not what it's happening if we update the idle
cpumasks after ops.update_idle(). With this patch applied, it works as
expected.

Maybe we should mention this aspect as well in the commit message,
something like this (feel free to rephrase/ignore):

  This also makes the ops.update_idle() behavior semantically consistent:
  any action performed in this callback should be able to override the
  builtin idle state, not the other way around.

In any case:

Reviewed-and-tested-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> ---
>  kernel/sched/ext_idle.c |   25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index ae30de383913..66da03cc0b33 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -738,16 +738,6 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
>  
>  	lockdep_assert_rq_held(rq);
>  
> -	/*
> -	 * Trigger ops.update_idle() only when transitioning from a task to
> -	 * the idle thread and vice versa.
> -	 *
> -	 * Idle transitions are indicated by do_notify being set to true,
> -	 * managed by put_prev_task_idle()/set_next_task_idle().
> -	 */
> -	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
> -		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
> -
>  	/*
>  	 * Update the idle masks:
>  	 * - for real idle transitions (do_notify == true)
> @@ -765,6 +755,21 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
>  	if (static_branch_likely(&scx_builtin_idle_enabled))
>  		if (do_notify || is_idle_task(rq->curr))
>  			update_builtin_idle(cpu, idle);
> +
> +	/*
> +	 * Trigger ops.update_idle() only when transitioning from a task to
> +	 * the idle thread and vice versa.
> +	 *
> +	 * Idle transitions are indicated by do_notify being set to true,
> +	 * managed by put_prev_task_idle()/set_next_task_idle().
> +	 *
> +	 * This must come after builtin idle update so that BPF schedulers can
> +	 * create interlocking between ops.update_idle() and ops.enqueue() -
> +	 * either enqueue() sees the idle bit or update_idle() sees the task
> +	 * that enqueue() queued.
> +	 */
> +	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
> +		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
>  }
>  
>  static void reset_idle_masks(struct sched_ext_ops *ops)
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits
  2025-05-21 22:23 [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits Tejun Heo
  2025-05-22  7:27 ` Andrea Righi
@ 2025-05-22  8:29 ` Changwoo Min
  2025-05-22 19:26 ` Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Changwoo Min @ 2025-05-22  8:29 UTC (permalink / raw)
  To: Tejun Heo, David Vernet, Andrea Righi, linux-kernel, sched-ext

Thank you, Tejun, for the change!
The change makes sense semantcially.

Acked-by: Changwoo Min <changwoo@igalia.com>

On 5/22/25 07:23, Tejun Heo wrote:
> BPF schedulers that use both builtin CPU idle mechanism and
> ops.update_idle() may want to use the latter to create interlocking between
> ops.enqueue() and CPU idle transitions so that either ops.enqueue() sees the
> idle bit or ops.update_idle() sees the task queued somewhere. This can
> prevent race conditions where CPUs go idle while tasks are waiting in DSQs.
> 
> For such interlocking to work, ops.update_idle() must be called after
> builtin CPU masks are updated. Relocate the invocation. Currently, there are
> no ordering requirements on transitions from idle and this relocation isn't
> expected to make meaningful differences in that direction.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>   kernel/sched/ext_idle.c |   25 +++++++++++++++----------
>   1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
> index ae30de383913..66da03cc0b33 100644
> --- a/kernel/sched/ext_idle.c
> +++ b/kernel/sched/ext_idle.c
> @@ -738,16 +738,6 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
>   
>   	lockdep_assert_rq_held(rq);
>   
> -	/*
> -	 * Trigger ops.update_idle() only when transitioning from a task to
> -	 * the idle thread and vice versa.
> -	 *
> -	 * Idle transitions are indicated by do_notify being set to true,
> -	 * managed by put_prev_task_idle()/set_next_task_idle().
> -	 */
> -	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
> -		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
> -
>   	/*
>   	 * Update the idle masks:
>   	 * - for real idle transitions (do_notify == true)
> @@ -765,6 +755,21 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
>   	if (static_branch_likely(&scx_builtin_idle_enabled))
>   		if (do_notify || is_idle_task(rq->curr))
>   			update_builtin_idle(cpu, idle);
> +
> +	/*
> +	 * Trigger ops.update_idle() only when transitioning from a task to
> +	 * the idle thread and vice versa.
> +	 *
> +	 * Idle transitions are indicated by do_notify being set to true,
> +	 * managed by put_prev_task_idle()/set_next_task_idle().
> +	 *
> +	 * This must come after builtin idle update so that BPF schedulers can
> +	 * create interlocking between ops.update_idle() and ops.enqueue() -
> +	 * either enqueue() sees the idle bit or update_idle() sees the task
> +	 * that enqueue() queued.
> +	 */
> +	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
> +		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
>   }
>   
>   static void reset_idle_masks(struct sched_ext_ops *ops)
> 
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits
  2025-05-21 22:23 [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits Tejun Heo
  2025-05-22  7:27 ` Andrea Righi
  2025-05-22  8:29 ` Changwoo Min
@ 2025-05-22 19:26 ` Tejun Heo
  2 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2025-05-22 19:26 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min, linux-kernel, sched-ext

Applied to sched_ext/for-6.16 with commit message update suggested by
Andrea.

------ 8< ------
From 273cc949655c70001778eb0b9e7db993df845912 Mon Sep 17 00:00:00 2001
From: Tejun Heo <tj@kernel.org>
Date: Wed, 21 May 2025 12:23:06 -1000
Subject: [PATCH] sched_ext: Call ops.update_idle() after updating builtin idle
 bits

BPF schedulers that use both builtin CPU idle mechanism and
ops.update_idle() may want to use the latter to create interlocking between
ops.enqueue() and CPU idle transitions so that either ops.enqueue() sees the
idle bit or ops.update_idle() sees the task queued somewhere. This can
prevent race conditions where CPUs go idle while tasks are waiting in DSQs.

For such interlocking to work, ops.update_idle() must be called after
builtin CPU masks are updated. Relocate the invocation. Currently, there are
no ordering requirements on transitions from idle and this relocation isn't
expected to make meaningful differences in that direction.

This also makes the ops.update_idle() behavior semantically consistent:
any action performed in this callback should be able to override the
builtin idle state, not the other way around.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-and-tested-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Changwoo Min <changwoo@igalia.com>
---
 kernel/sched/ext_idle.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/ext_idle.c b/kernel/sched/ext_idle.c
index ae30de383913..66da03cc0b33 100644
--- a/kernel/sched/ext_idle.c
+++ b/kernel/sched/ext_idle.c
@@ -738,16 +738,6 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
 
 	lockdep_assert_rq_held(rq);
 
-	/*
-	 * Trigger ops.update_idle() only when transitioning from a task to
-	 * the idle thread and vice versa.
-	 *
-	 * Idle transitions are indicated by do_notify being set to true,
-	 * managed by put_prev_task_idle()/set_next_task_idle().
-	 */
-	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
-		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
-
 	/*
 	 * Update the idle masks:
 	 * - for real idle transitions (do_notify == true)
@@ -765,6 +755,21 @@ void __scx_update_idle(struct rq *rq, bool idle, bool do_notify)
 	if (static_branch_likely(&scx_builtin_idle_enabled))
 		if (do_notify || is_idle_task(rq->curr))
 			update_builtin_idle(cpu, idle);
+
+	/*
+	 * Trigger ops.update_idle() only when transitioning from a task to
+	 * the idle thread and vice versa.
+	 *
+	 * Idle transitions are indicated by do_notify being set to true,
+	 * managed by put_prev_task_idle()/set_next_task_idle().
+	 *
+	 * This must come after builtin idle update so that BPF schedulers can
+	 * create interlocking between ops.update_idle() and ops.enqueue() -
+	 * either enqueue() sees the idle bit or update_idle() sees the task
+	 * that enqueue() queued.
+	 */
+	if (SCX_HAS_OP(sch, update_idle) && do_notify && !scx_rq_bypassing(rq))
+		SCX_CALL_OP(sch, SCX_KF_REST, update_idle, rq, cpu_of(rq), idle);
 }
 
 static void reset_idle_masks(struct sched_ext_ops *ops)
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-05-22 19:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-21 22:23 [PATCH sched_ext/for-6.16] sched_ext: Call ops.update_idle() after updating builtin idle bits Tejun Heo
2025-05-22  7:27 ` Andrea Righi
2025-05-22  8:29 ` Changwoo Min
2025-05-22 19:26 ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).