* [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0
@ 2025-01-07 4:25 Henry Huang
2025-01-07 4:25 ` Henry Huang
0 siblings, 1 reply; 4+ messages in thread
From: Henry Huang @ 2025-01-07 4:25 UTC (permalink / raw)
To: tj, void
Cc: 谈鉴锋, Yan Yan(cailing), linux-kernel,
Henry Huang
In our environment, we have various types of tasks. Some tasks can keep running after their slice is exhausted,
while others need to be dispatched into the global DSQ for rescheduling. Therefore, we set %SCX_OPS_ENQ_LAST.
However, we encountered a problem:
Because put_prev_task_scx() is executed after pick_task_scx(), @prev only has the opportunity to be dispatched
into local DSQ in put_prev_task_scx(). Since pick_task_scx returns NULL, the CPU enters the idle state instead of running @prev.
Our current workaround is to set a value to @prev->scx.slice in ops.dispatch() and call scx_bpf_kick_cpu(cpu, 0) to trigger a reschedule.
Of course, this approach introduces some overhead.
Our solution:
When %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0, we only need to set %SCX_RQ_BAL_KEEP in blance_one to
ensure pick_task_scx() can pick the correct task.
Henry Huang (1):
sched_ext: keep running prev when prev->scx.slice != 0
kernel/sched/ext.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0
2025-01-07 4:25 [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0 Henry Huang
@ 2025-01-07 4:25 ` Henry Huang
2025-01-07 21:20 ` Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Henry Huang @ 2025-01-07 4:25 UTC (permalink / raw)
To: tj, void
Cc: 谈鉴锋, Yan Yan(cailing), linux-kernel,
Henry Huang
When %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0,
@prev will be dispacthed into the local DSQ in put_prev_task_scx().
However, pick_task_scx() is executed before put_prev_task_scx(),
so it will not pick @prev.
Set %SCX_RQ_BAL_KEEP in balance_one() to ensure that pick_task_scx()
can pick @prev.
Signed-off-by: Henry Huang <henry.hj@antgroup.com>
---
kernel/sched/ext.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 81da76a..5f6eb45 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2837,10 +2837,15 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
/*
* Didn't find another task to run. Keep running @prev unless
* %SCX_OPS_ENQ_LAST is in effect.
+ *
+ * If %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0 (configured in ops.dispatch()),
+ * @prev would be dispatched into the local DSQ in put_prev_task_scx()
+ * (excuted after pick_task_scx()). Set %SCX_RQ_BAL_KEEP to ensure that @prev
+ * would be picked in pick_task_scx()
*/
if ((prev->scx.flags & SCX_TASK_QUEUED) &&
(!static_branch_unlikely(&scx_ops_enq_last) ||
- scx_rq_bypassing(rq))) {
+ scx_rq_bypassing(rq) || prev->scx.slice)) {
rq->scx.flags |= SCX_RQ_BAL_KEEP;
goto has_tasks;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0
2025-01-07 4:25 ` Henry Huang
@ 2025-01-07 21:20 ` Tejun Heo
2025-01-08 4:09 ` Henry Huang
0 siblings, 1 reply; 4+ messages in thread
From: Tejun Heo @ 2025-01-07 21:20 UTC (permalink / raw)
To: Henry Huang
Cc: void, 谈鉴锋, Yan Yan(cailing), linux-kernel
Hello,
On Tue, Jan 07, 2025 at 12:25:55PM +0800, Henry Huang wrote:
> When %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0,
> @prev will be dispacthed into the local DSQ in put_prev_task_scx().
> However, pick_task_scx() is executed before put_prev_task_scx(),
> so it will not pick @prev.
> Set %SCX_RQ_BAL_KEEP in balance_one() to ensure that pick_task_scx()
> can pick @prev.
>
> Signed-off-by: Henry Huang <henry.hj@antgroup.com>
> ---
> kernel/sched/ext.c | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 81da76a..5f6eb45 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -2837,10 +2837,15 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
> /*
> * Didn't find another task to run. Keep running @prev unless
> * %SCX_OPS_ENQ_LAST is in effect.
> + *
> + * If %SCX_OPS_ENQ_LAST is set and prev->scx.slice != 0 (configured in ops.dispatch()),
> + * @prev would be dispatched into the local DSQ in put_prev_task_scx()
> + * (excuted after pick_task_scx()). Set %SCX_RQ_BAL_KEEP to ensure that @prev
> + * would be picked in pick_task_scx()
> */
> if ((prev->scx.flags & SCX_TASK_QUEUED) &&
> (!static_branch_unlikely(&scx_ops_enq_last) ||
> - scx_rq_bypassing(rq))) {
> + scx_rq_bypassing(rq) || prev->scx.slice)) {
Update current->scx.slice from ops.dispatch() is the recommended way of
extending the current execution and the current behavior is just buggy
especially when scx_ops_enq_last is set.
While the above change fixes the case where ops.dispatch() updates
current->scx.slice without dispatching any task, it's still theoretically
wrong in that if ops.dispatch() updates current->scx.slice and dispatches
tasks, we should keep running current before moving onto other tasks.
To fix this properly, I think what should be done is adding something like
the following. (untested and we probably should cache SCX_TASK_QUEUED
testing result). Can you test whether the following fixes the issues you're
seeing and if so update the patch accordingly?
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 19d2699cf638..48deb5d5510e 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -2813,6 +2813,10 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
flush_dispatch_buf(rq);
+ if ((prev->scx.flags & SCX_TASK_QUEUED) && prev->scx.slice) {
+ rq->scx.flags |= SCX_RQ_BAL_KEEP;
+ goto has_tasks;
+ }
if (rq->scx.local_dsq.nr)
goto has_tasks;
if (consume_global_dsq(rq))
Thanks.
--
tejun
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0
2025-01-07 21:20 ` Tejun Heo
@ 2025-01-08 4:09 ` Henry Huang
0 siblings, 0 replies; 4+ messages in thread
From: Henry Huang @ 2025-01-08 4:09 UTC (permalink / raw)
To: tj
Cc: Henry Huang, 谈鉴锋, linux-kernel, void,
Yan Yan(cailing)
Tue, 7 Jan 2025 11:20:58 -1000, Tejun Heo wrote:
> Update current->scx.slice from ops.dispatch() is the recommended way of
> extending the current execution and the current behavior is just buggy
> especially when scx_ops_enq_last is set.
>
> While the above change fixes the case where ops.dispatch() updates
> current->scx.slice without dispatching any task, it's still theoretically
> wrong in that if ops.dispatch() updates current->scx.slice and dispatches
> tasks, we should keep running current before moving onto other tasks.
>
> To fix this properly, I think what should be done is adding something like
> the following. (untested and we probably should cache SCX_TASK_QUEUED
> testing result). Can you test whether the following fixes the issues you're
> seeing and if so update the patch accordingly?
>
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 19d2699cf638..48deb5d5510e 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -2813,6 +2813,10 @@ static int balance_one(struct rq *rq, struct task_struct *prev)
>
> flush_dispatch_buf(rq);
>
> + if ((prev->scx.flags & SCX_TASK_QUEUED) && prev->scx.slice) {
> + rq->scx.flags |= SCX_RQ_BAL_KEEP;
> + goto has_tasks;
> + }
> if (rq->scx.local_dsq.nr)
> goto has_tasks;
> if (consume_global_dsq(rq))
Thanks, I'll try this. If it works fine, I'll update this patch soon.
--
Henry
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-01-08 4:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-07 4:25 [PATCH v1] sched_ext: keep running prev when prev->scx.slice != 0 Henry Huang
2025-01-07 4:25 ` Henry Huang
2025-01-07 21:20 ` Tejun Heo
2025-01-08 4:09 ` Henry Huang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox