public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled
@ 2024-09-20 19:41 Pat Somaru
  2024-09-23 15:43 ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: Pat Somaru @ 2024-09-20 19:41 UTC (permalink / raw)
  To: tj, void; +Cc: linux-kernel, kernel-team, sched-ext, peterz, Pat Somaru

Disable the rq empty path when scx is enabled. SCX must consult the
BPF scheduler (via the dispatch path in balance) to determine if rq
is empty.

This fixes stalls when scx is enabled.

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
Fixes: 3dcac251b066 ("sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()")
---
 kernel/sched/core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c415d61a646b5..6a1f662a5e118 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6591,7 +6591,11 @@ static void __sched notrace __schedule(int sched_mode)
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (sched_mode == SM_IDLE) {
-		if (!rq->nr_running) {
+		/*
+		 * scx needs to check the bpf scheduler to determine
+		 * if rq is empty, so disable this path for it.
+		 */
+		if (!rq->nr_running && !scx_enabled()) {
 			next = prev;
 			goto picked;
 		}
-- 
2.44.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled
  2024-09-20 19:41 [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled Pat Somaru
@ 2024-09-23 15:43 ` Tejun Heo
  2024-09-24  3:40   ` K Prateek Nayak
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2024-09-23 15:43 UTC (permalink / raw)
  To: Pat Somaru; +Cc: void, linux-kernel, kernel-team, sched-ext, peterz

Applied to sched_ext/for-6.12-fixes with minor edits:
------ 8< ------
From edf1c586e92675c4e0eb27758fcdb55a56838de1 Mon Sep 17 00:00:00 2001
From: Pat Somaru <patso@likewhatevs.io>
Date: Fri, 20 Sep 2024 15:41:59 -0400
Subject: [PATCH] sched, sched_ext: Disable SM_IDLE/rq empty path when
 scx_enabled()

Disable the rq empty path when scx is enabled. SCX must consult the BPF
scheduler (via the dispatch path in balance) to determine if rq is empty.

This fixes stalls when scx is enabled.

Signed-off-by: Pat Somaru <patso@likewhatevs.io>
Fixes: 3dcac251b066 ("sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()")
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b6cc1cf499d6..43e453ab7e20 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6591,7 +6591,8 @@ static void __sched notrace __schedule(int sched_mode)
 	 */
 	prev_state = READ_ONCE(prev->__state);
 	if (sched_mode == SM_IDLE) {
-		if (!rq->nr_running) {
+		/* SCX must consult the BPF scheduler to tell if rq is empty */
+		if (!rq->nr_running && !scx_enabled()) {
 			next = prev;
 			goto picked;
 		}
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled
  2024-09-23 15:43 ` Tejun Heo
@ 2024-09-24  3:40   ` K Prateek Nayak
  2024-09-24 22:21     ` Tejun Heo
  0 siblings, 1 reply; 5+ messages in thread
From: K Prateek Nayak @ 2024-09-24  3:40 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel, kernel-team, sched-ext, peterz, Pat Somaru

Hello Tejun,

Just seeking some clarification here; the reasoning to bypass SM_IDLE
fast-path looks sound otherwise.

On 9/23/2024 9:13 PM, Tejun Heo wrote:
> Applied to sched_ext/for-6.12-fixes with minor edits:
> ------ 8< ------
>  From edf1c586e92675c4e0eb27758fcdb55a56838de1 Mon Sep 17 00:00:00 2001
> From: Pat Somaru <patso@likewhatevs.io>
> Date: Fri, 20 Sep 2024 15:41:59 -0400
> Subject: [PATCH] sched, sched_ext: Disable SM_IDLE/rq empty path when
>   scx_enabled()
> 
> Disable the rq empty path when scx is enabled. SCX must consult the BPF
> scheduler (via the dispatch path in balance) to determine if rq is empty.
> 
> This fixes stalls when scx is enabled.
> 
> Signed-off-by: Pat Somaru <patso@likewhatevs.io>
> Fixes: 3dcac251b066 ("sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()")
> Signed-off-by: Tejun Heo <tj@kernel.org>
> ---
>   kernel/sched/core.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b6cc1cf499d6..43e453ab7e20 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6591,7 +6591,8 @@ static void __sched notrace __schedule(int sched_mode)
>   	 */
>   	prev_state = READ_ONCE(prev->__state);
>   	if (sched_mode == SM_IDLE) {
> -		if (!rq->nr_running) {
> +		/* SCX must consult the BPF scheduler to tell if rq is empty */

I was wondering if sched_ext case could simply do:

		if (scx_enabled())
			prev_balance(rq, prev, rf);

and use "rq->scx.flags" to skip balancing in balance_scx() later when
__pick_next_task() calls prev_balance() but (and please correct me if
I'm wrong here) balance_scx() calls balance_one() which can call
consume_dispatch_q() to pick a task from global / user-defined dispatch
queue, and in doing so, it does not update "rq->nr_running".

I could only see add_nr_running() being called from enqueue_task_scx()
and this is even before the ext core calls do_enqueue_task() which hooks
into the bpf layer which makes the decision where the task actually
goes.

Is my understanding correct that whichever CPU is the target for the
enqueue_task_scx() callback initially is the one that accounts the
enqueue in "rq->nr_running" until the task is dequeued or did I miss
something?

> +		if (!rq->nr_running && !scx_enabled()) {
>   			next = prev;
>   			goto picked;
>   		}

-- 
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled
  2024-09-24  3:40   ` K Prateek Nayak
@ 2024-09-24 22:21     ` Tejun Heo
  2024-09-25  3:17       ` K Prateek Nayak
  0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2024-09-24 22:21 UTC (permalink / raw)
  To: K Prateek Nayak
  Cc: void, linux-kernel, kernel-team, sched-ext, peterz, Pat Somaru

Hello,

On Tue, Sep 24, 2024 at 09:10:02AM +0530, K Prateek Nayak wrote:
> >   	prev_state = READ_ONCE(prev->__state);
> >   	if (sched_mode == SM_IDLE) {
> > -		if (!rq->nr_running) {
> > +		/* SCX must consult the BPF scheduler to tell if rq is empty */
> 
> I was wondering if sched_ext case could simply do:
> 
> 		if (scx_enabled())
> 			prev_balance(rq, prev, rf);
> 
> and use "rq->scx.flags" to skip balancing in balance_scx() later when
> __pick_next_task() calls prev_balance() but (and please correct me if
> I'm wrong here) balance_scx() calls balance_one() which can call
> consume_dispatch_q() to pick a task from global / user-defined dispatch
> queue, and in doing so, it does not update "rq->nr_running".

Hmm... would that be a meaningful optimization? prev_balance() calls into
SCX's dispatch path and there can be quite a bit going on there. I'm not
sure whether it'd worth much to save a trip through __pick_next_task().

> I could only see add_nr_running() being called from enqueue_task_scx()
> and this is even before the ext core calls do_enqueue_task() which hooks
> into the bpf layer which makes the decision where the task actually
> goes.
> 
> Is my understanding correct that whichever CPU is the target for the
> enqueue_task_scx() callback initially is the one that accounts the
> enqueue in "rq->nr_running" until the task is dequeued or did I miss
> something?

Whenever a task is dispatched to a local DSQ of a CPU including from
balance_one(), if the task is not on that CPU already,
move_remote_task_to_local_dsq() is called which migrates the task to the
target CPU by deactivating and then re-activating it. As deactivating and
re-activating involves dequeueing and re-enqueueing, rq->running gets
updated accordingly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled
  2024-09-24 22:21     ` Tejun Heo
@ 2024-09-25  3:17       ` K Prateek Nayak
  0 siblings, 0 replies; 5+ messages in thread
From: K Prateek Nayak @ 2024-09-25  3:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: void, linux-kernel, kernel-team, sched-ext, peterz, Pat Somaru

Hello Tejun,

On 9/25/2024 3:51 AM, Tejun Heo wrote:
> Hello,
> 
> On Tue, Sep 24, 2024 at 09:10:02AM +0530, K Prateek Nayak wrote:
>>>    	prev_state = READ_ONCE(prev->__state);
>>>    	if (sched_mode == SM_IDLE) {
>>> -		if (!rq->nr_running) {
>>> +		/* SCX must consult the BPF scheduler to tell if rq is empty */
>>
>> I was wondering if sched_ext case could simply do:
>>
>> 		if (scx_enabled())
>> 			prev_balance(rq, prev, rf);
>>
>> and use "rq->scx.flags" to skip balancing in balance_scx() later when
>> __pick_next_task() calls prev_balance() but (and please correct me if
>> I'm wrong here) balance_scx() calls balance_one() which can call
>> consume_dispatch_q() to pick a task from global / user-defined dispatch
>> queue, and in doing so, it does not update "rq->nr_running".
> 
> Hmm... would that be a meaningful optimization? prev_balance() calls into
> SCX's dispatch path and there can be quite a bit going on there. I'm not
> sure whether it'd worth much to save a trip through __pick_next_task().

Probably not worth it given balance_scx() is indeed very complex and can
release and re-acquire the rq-lock (I don't believe it should be a
problem in SM_IDLE path but the given he complexity, I could have easily
missed something again :)

> 
>> I could only see add_nr_running() being called from enqueue_task_scx()
>> and this is even before the ext core calls do_enqueue_task() which hooks
>> into the bpf layer which makes the decision where the task actually
>> goes.
>>
>> Is my understanding correct that whichever CPU is the target for the
>> enqueue_task_scx() callback initially is the one that accounts the
>> enqueue in "rq->nr_running" until the task is dequeued or did I miss
>> something?
> 
> Whenever a task is dispatched to a local DSQ of a CPU including from
> balance_one(), if the task is not on that CPU already,
> move_remote_task_to_local_dsq() is called which migrates the task to the
> target CPU by deactivating and then re-activating it. As deactivating and
> re-activating involves dequeueing and re-enqueueing, rq->running gets
> updated accordingly.

Ah! I gave up too soon going down the call chain. Thank you for
clarifying.

> 
> Thanks.
> 

-- 
Thanks and Regards,
Prateek

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-09-25  3:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-20 19:41 [PATCH sched_ext/for-6.12-fixes] Disable SM_IDLE/rq empty path when scx_enabled Pat Somaru
2024-09-23 15:43 ` Tejun Heo
2024-09-24  3:40   ` K Prateek Nayak
2024-09-24 22:21     ` Tejun Heo
2024-09-25  3:17       ` K Prateek Nayak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox