* [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-25 6:24 ` K Prateek Nayak
2026-06-24 15:12 ` [PATCH 2/6 v3] sched/eevdf: Take into account current's lag when updating slice protection Vincent Guittot
` (4 subsequent siblings)
5 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
If a shorter slice task can preempt current at wakeup, we make sure that
the decision will not be overwritten in between by setting the task as the
next buddy. This still implies that the waking task remains eligible when
the scheduler will actually pick the next task to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d78467ec6ee1..83bce5a04f3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
preempt:
if (preempt_action == PREEMPT_WAKEUP_SHORT) {
cancel_protect_slice(se);
- clear_buddies(cfs_rq, se);
+ set_next_buddy(&p->se);
}
resched_curr_lazy(rq);
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-24 15:12 ` [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short Vincent Guittot
@ 2026-06-25 6:24 ` K Prateek Nayak
2026-06-25 12:40 ` Vincent Guittot
0 siblings, 1 reply; 20+ messages in thread
From: K Prateek Nayak @ 2026-06-25 6:24 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, linux-kernel, qyousef
Hello Vincent,
On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d78467ec6ee1..83bce5a04f3d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> preempt:
> if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> cancel_protect_slice(se);
> - clear_buddies(cfs_rq, se);
> + set_next_buddy(&p->se);
> }
On a tangential note, I just noticed set_preempt_buddy() has two unused
parameters. Seems to have been like that since it was introduced in
commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
EEVDF goals").
Perhaps this can be included in the series too as a cleanup:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7c541f27a1ed..34b3888c4ccf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9755,9 +9755,7 @@ enum preempt_wakeup_action {
PREEMPT_WAKEUP_RESCHED, /* Force reschedule. */
};
-static inline bool
-set_preempt_buddy(struct cfs_rq *cfs_rq, int wake_flags,
- struct sched_entity *pse, struct sched_entity *se)
+static inline bool set_preempt_buddy(struct cfs_rq *cfs_rq, struct sched_entity *pse)
{
/*
* Keep existing buddy if the deadline is sooner than pse.
@@ -9903,9 +9901,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
goto update;
/* Prefer picking wakee soon if appropriate. */
- if (sched_feat(NEXT_BUDDY) &&
- set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
-
+ if (sched_feat(NEXT_BUDDY) && set_preempt_buddy(cfs_rq, pse)) {
/*
* Decide whether to obey WF_SYNC hint for a new buddy. Old
* buddies are ignored as they may not be relevant to the
--
Thanks and Regards,
Prateek
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-25 6:24 ` K Prateek Nayak
@ 2026-06-25 12:40 ` Vincent Guittot
2026-06-25 12:43 ` Peter Zijlstra
0 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:40 UTC (permalink / raw)
To: K Prateek Nayak
Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 08:24, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d78467ec6ee1..83bce5a04f3d 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > preempt:
> > if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > cancel_protect_slice(se);
> > - clear_buddies(cfs_rq, se);
> > + set_next_buddy(&p->se);
> > }
>
> On a tangential note, I just noticed set_preempt_buddy() has two unused
> parameters. Seems to have been like that since it was introduced in
> commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
> EEVDF goals").
>
> Perhaps this can be included in the series too as a cleanup:
I would even go further and remove it. The NEXT_BUDDY feature is broken anyway
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7c541f27a1ed..34b3888c4ccf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9755,9 +9755,7 @@ enum preempt_wakeup_action {
> PREEMPT_WAKEUP_RESCHED, /* Force reschedule. */
> };
>
> -static inline bool
> -set_preempt_buddy(struct cfs_rq *cfs_rq, int wake_flags,
> - struct sched_entity *pse, struct sched_entity *se)
> +static inline bool set_preempt_buddy(struct cfs_rq *cfs_rq, struct sched_entity *pse)
> {
> /*
> * Keep existing buddy if the deadline is sooner than pse.
> @@ -9903,9 +9901,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> goto update;
>
> /* Prefer picking wakee soon if appropriate. */
> - if (sched_feat(NEXT_BUDDY) &&
> - set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
> -
> + if (sched_feat(NEXT_BUDDY) && set_preempt_buddy(cfs_rq, pse)) {
> /*
> * Decide whether to obey WF_SYNC hint for a new buddy. Old
> * buddies are ignored as they may not be relevant to the
> --
> Thanks and Regards,
> Prateek
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-25 12:40 ` Vincent Guittot
@ 2026-06-25 12:43 ` Peter Zijlstra
0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2026-06-25 12:43 UTC (permalink / raw)
To: Vincent Guittot
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 02:40:34PM +0200, Vincent Guittot wrote:
> On Thu, 25 Jun 2026 at 08:24, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >
> > Hello Vincent,
> >
> > On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d78467ec6ee1..83bce5a04f3d 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > > preempt:
> > > if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > > cancel_protect_slice(se);
> > > - clear_buddies(cfs_rq, se);
> > > + set_next_buddy(&p->se);
> > > }
> >
> > On a tangential note, I just noticed set_preempt_buddy() has two unused
> > parameters. Seems to have been like that since it was introduced in
> > commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
> > EEVDF goals").
> >
> > Perhaps this can be included in the series too as a cleanup:
>
> I would even go further and remove it. The NEXT_BUDDY feature is broken anyway
I thought Mel wanted to try again, but he's been somewhat silent on
matters. Mel?
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 2/6 v3] sched/eevdf: Take into account current's lag when updating slice protection
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
2026-06-24 15:12 ` [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-24 15:12 ` [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set Vincent Guittot
` (3 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
Take into account the lag of current task when updating the slice
protection in order to ensure that the absolute value of lags will remain
in the range [0 : slice+tick]
A task that already has a negative lag will see its protection reduced
whereas a task with positive lag will keep a full slice protection.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 83bce5a04f3d..8639086e5d9e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1099,8 +1099,9 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
u64 slice = cfs_rq_min_slice(cfs_rq);
+ u64 vruntime = min_vruntime(se->vruntime, avg_vruntime(cfs_rq));
- se->vprot = min_vruntime(se->vprot, se->vruntime + calc_delta_fair(slice, se));
+ se->vprot = min_vruntime(se->vprot, vruntime + calc_delta_fair(slice, se));
}
static inline bool protect_slice(struct sched_entity *se)
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
2026-06-24 15:12 ` [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short Vincent Guittot
2026-06-24 15:12 ` [PATCH 2/6 v3] sched/eevdf: Take into account current's lag when updating slice protection Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-24 15:12 ` [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible Vincent Guittot
` (2 subsequent siblings)
5 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
Even if resched is already set, we might want to update or even cancel
the slice protection and ensure that the newly waking task will be the
next one to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8639086e5d9e..854f3a9f1d80 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9804,7 +9804,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
* prevents us from potentially nominating it as a false LAST_BUDDY
* below.
*/
- if (test_tsk_need_resched(rq->curr))
+ if (!sched_feat(PREEMPT_SHORT) && test_tsk_need_resched(rq->curr))
return;
if (!sched_feat(WAKEUP_PREEMPTION))
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
` (2 preceding siblings ...)
2026-06-24 15:12 ` [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-25 6:00 ` K Prateek Nayak
2026-06-24 15:12 ` [PATCH 5/6 v3] sched/eevdf: Always update slice protection Vincent Guittot
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
5 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
If a short slice task will not be the next to be picked but is eligible,
we cancel the slice protection to speedup the time when the short slice
task will be the next to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 854f3a9f1d80..719aa53851e4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9816,18 +9816,13 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
cse_is_idle = se_is_idle(se);
pse_is_idle = se_is_idle(pse);
+ nse = se;
/*
* Preempt an idle entity in favor of a non-idle entity (and don't preempt
* in the inverse case).
*/
- if (cse_is_idle && !pse_is_idle) {
- /*
- * When non-idle entity preempt an idle entity,
- * don't give idle entity slice protection.
- */
- preempt_action = PREEMPT_WAKEUP_SHORT;
+ if (cse_is_idle && !pse_is_idle)
goto preempt;
- }
if (cse_is_idle != pse_is_idle)
return;
@@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
if (!nse && cfs_rq->nr_queued)
goto pick;
+ /*
+ * If @p is eligible but not the next task to run then cancel protection
+ * to prevent large scheduling latency
+ */
+ if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
+ goto preempt;
+
if (sched_feat(RUN_TO_PARITY))
update_protect_slice(cfs_rq, se);
return;
preempt:
- if (preempt_action == PREEMPT_WAKEUP_SHORT) {
- cancel_protect_slice(se);
+ cancel_protect_slice(se);
+
+ if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
set_next_buddy(&p->se);
- }
resched_curr_lazy(rq);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-24 15:12 ` [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible Vincent Guittot
@ 2026-06-25 6:00 ` K Prateek Nayak
2026-06-25 12:40 ` Vincent Guittot
0 siblings, 1 reply; 20+ messages in thread
From: K Prateek Nayak @ 2026-06-25 6:00 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, linux-kernel, qyousef
Hello Vincent,
On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> @@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> if (!nse && cfs_rq->nr_queued)
> goto pick;
>
> + /*
> + * If @p is eligible but not the next task to run then cancel protection
> + * to prevent large scheduling latency
> + */
> + if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
> + goto preempt;
We handle "pse->slice < se->slice" case before "pse->sched_delayed" case
and jump to "pick", but pse can get dequeued as a part of
pick_next_entity() if it was delayed and picked.
I think we can reach here for PREEMPT_WAKEUP_SHORT after pse is
completely dequeued from cfs_rq. If p is a task on root cfs_rq, we could
have blocked the task entirely and ideally it shouldn't be referenced
here.
Since a wakeup of delayed entity / on delayed hierarchy will call
wakeup_preempt() anyways, I think we should return early if we should
directly jump to update if we see "pse->sched_delayed".
> +
> if (sched_feat(RUN_TO_PARITY))
> update_protect_slice(cfs_rq, se);
>
> return;
>
> preempt:
> - if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> - cancel_protect_slice(se);
> + cancel_protect_slice(se);
> +
> + if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
> set_next_buddy(&p->se);
> - }
>
> resched_curr_lazy(rq);
> }
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-25 6:00 ` K Prateek Nayak
@ 2026-06-25 12:40 ` Vincent Guittot
0 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:40 UTC (permalink / raw)
To: K Prateek Nayak
Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 08:00, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > @@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > if (!nse && cfs_rq->nr_queued)
> > goto pick;
> >
> > + /*
> > + * If @p is eligible but not the next task to run then cancel protection
> > + * to prevent large scheduling latency
> > + */
> > + if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
> > + goto preempt;
>
> We handle "pse->slice < se->slice" case before "pse->sched_delayed" case
> and jump to "pick", but pse can get dequeued as a part of
> pick_next_entity() if it was delayed and picked.
yes
>
> I think we can reach here for PREEMPT_WAKEUP_SHORT after pse is
> completely dequeued from cfs_rq. If p is a task on root cfs_rq, we could
> have blocked the task entirely and ideally it shouldn't be referenced
> here.
I tried to find which use case calls wakeup_preempt_fair() for a
sched_delayed task but can't find it
>
> Since a wakeup of delayed entity / on delayed hierarchy will call
> wakeup_preempt() anyways, I think we should return early if we should
> directly jump to update if we see "pse->sched_delayed".
Fair enough. But we still need to consider the FORK case
>
> > +
> > if (sched_feat(RUN_TO_PARITY))
> > update_protect_slice(cfs_rq, se);
> >
> > return;
> >
> > preempt:
> > - if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > - cancel_protect_slice(se);
> > + cancel_protect_slice(se);
> > +
> > + if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
> > set_next_buddy(&p->se);
> > - }
> >
> > resched_curr_lazy(rq);
> > }
>
> --
> Thanks and Regards,
> Prateek
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH 5/6 v3] sched/eevdf: Always update slice protection
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
` (3 preceding siblings ...)
2026-06-24 15:12 ` [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
5 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
Even if p will not preempt current, it modifies the avg_vruntime and
possibly the min slice. Make sure to update the slice protection with the
updated figures. As an example, Batch and Sched Idle tasks can otherwise
get a larger lag than their slice and finaly delay the scheduling of a
normal task, which deadline will be a later.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 719aa53851e4..f972987618e7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9824,17 +9824,18 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
if (cse_is_idle && !pse_is_idle)
goto preempt;
+ cfs_rq = cfs_rq_of(se);
+ update_curr(cfs_rq);
+
if (cse_is_idle != pse_is_idle)
- return;
+ goto update;
/*
* BATCH and IDLE tasks do not preempt others.
*/
if (unlikely(!normal_policy(p->policy)))
- return;
+ goto update;
- cfs_rq = cfs_rq_of(se);
- update_curr(cfs_rq);
/*
* If @p has a shorter slice than current and @p is eligible, override
* current's slice protection in order to allow preemption.
@@ -9851,7 +9852,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
* EEVDF to forcibly queue an ineligible task.
*/
if ((wake_flags & WF_FORK) || pse->sched_delayed)
- return;
+ goto update;
/* Prefer picking wakee soon if appropriate. */
if (sched_feat(NEXT_BUDDY) &&
@@ -9897,7 +9898,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
*/
if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
goto preempt;
-
+update:
if (sched_feat(RUN_TO_PARITY))
update_protect_slice(cfs_rq, se);
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
` (4 preceding siblings ...)
2026-06-24 15:12 ` [PATCH 5/6 v3] sched/eevdf: Always update slice protection Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-25 7:37 ` K Prateek Nayak
2026-06-25 8:33 ` Peter Zijlstra
5 siblings, 2 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
When a task with a shorter slice is enqueued, we protect the running
task which has a longer slice until it becomes ineligible instead of a
full slice in order to speedup the switch to other tasks until the task
with the shortest slice is scheduled. This helps to the task to not wait
too many full slices before running.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 52 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f972987618e7..7c541f27a1ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -813,6 +813,48 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
return cfs_rq->zero_vruntime;
}
+/*
+ * Compute the vruntime until which the entity remains eligible when it runs
+ * or is about to run on the CPU. We use this value to set vprot to the min
+ * value until which other entities would not be picked anyway.
+ * \Sum (v_i - v0)*w_i
+ * V = ------------------- + v0
+ * \Sum w_i
+ *
+ * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
+ * in the rb tree and next is already dequeued so
+ *
+ * cfs_rq->sum_w_vruntime
+ * V' = ------------------------- + v0
+ * cfs_rq->sum_weight + w_se
+
+ */
+static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ struct sched_entity *curr = cfs_rq->curr;
+ long weight = cfs_rq->sum_weight;
+ s64 delta = 0;
+
+ if (weight) {
+ s64 runtime = cfs_rq->sum_w_vruntime;
+
+ weight += avg_vruntime_weight(cfs_rq, se->load.weight);
+
+ /* sign flips effective floor / ceiling */
+ if (runtime < 0)
+ runtime -= (weight - 1);
+
+ delta = div64_long(runtime, weight);
+ } else {
+ /*
+ * When there is but one element, it is the average.
+ */
+ delta = 0;
+ }
+
+ return cfs_rq->zero_vruntime + delta + 1;
+}
+
static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
/*
@@ -1090,8 +1132,14 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
slice = cfs_rq_min_slice(cfs_rq);
slice = min(slice, se->slice);
- if (slice != se->slice)
- vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+
+ /* If there are shorter slices than se's one */
+ if (slice != se->slice) {
+ if (sched_feat(PREEMPT_SHORT))
+ vprot = min_vruntime(vprot, eligible_vruntime(cfs_rq, se));
+ else
+ vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+ }
se->vprot = vprot;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 20+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
@ 2026-06-25 7:37 ` K Prateek Nayak
2026-06-25 8:37 ` Peter Zijlstra
2026-06-25 12:51 ` Vincent Guittot
2026-06-25 8:33 ` Peter Zijlstra
1 sibling, 2 replies; 20+ messages in thread
From: K Prateek Nayak @ 2026-06-25 7:37 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, linux-kernel, qyousef
Hello Vincent,
On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> +/*
> + * Compute the vruntime until which the entity remains eligible when it runs
> + * or is about to run on the CPU. We use this value to set vprot to the min
> + * value until which other entities would not be picked anyway.
> + * \Sum (v_i - v0)*w_i
> + * V = ------------------- + v0
> + * \Sum w_i
> + *
> + * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
> + * in the rb tree and next is already dequeued so
> + *
> + * cfs_rq->sum_w_vruntime
> + * V' = ------------------------- + v0
> + * cfs_rq->sum_weight + w_se
> +
nit.
^ is that a stray line or a Missing * at the beginning of the comment
line?
> + */
> +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +{
> + struct sched_entity *curr = cfs_rq->curr;
curr seems to be unused here and is NULL anyways when
set_protect_slice() is called ;-)
> + long weight = cfs_rq->sum_weight;
> + s64 delta = 0;
> +
> + if (weight) {
> + s64 runtime = cfs_rq->sum_w_vruntime;
> +
> + weight += avg_vruntime_weight(cfs_rq, se->load.weight);
> +
> + /* sign flips effective floor / ceiling */
> + if (runtime < 0)
> + runtime -= (weight - 1);
> +
> + delta = div64_long(runtime, weight);
> + } else {> + /*
> + * When there is but one element, it is the average.
> + */
> + delta = 0;
Even with a single entity, the se->vruntime can still diverge from
cfs_rq->zero_vruntime
Last avg_vruntime() call for cfs_rq was at update_entity_lag() during
last dequeue while se->on_rq was still set for the dequeuing entity.
Should this be entity_key(cfs_rq, se) instead?
> + }
> +
> + return cfs_rq->zero_vruntime + delta + 1;
> +}
> +
> static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
>
> /*
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 7:37 ` K Prateek Nayak
@ 2026-06-25 8:37 ` Peter Zijlstra
2026-06-25 10:09 ` Peter Zijlstra
2026-06-25 14:55 ` Vincent Guittot
2026-06-25 12:51 ` Vincent Guittot
1 sibling, 2 replies; 20+ messages in thread
From: Peter Zijlstra @ 2026-06-25 8:37 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Vincent Guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
> > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +{
> > + struct sched_entity *curr = cfs_rq->curr;
>
> curr seems to be unused here and is NULL anyways when
> set_protect_slice() is called ;-)
Ah, but it is not with the flat patches on, which is why I was a little
confused ;-)
That said; I now see se == curr. So let me go have another look at all
that.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 8:37 ` Peter Zijlstra
@ 2026-06-25 10:09 ` Peter Zijlstra
2026-06-25 12:57 ` Vincent Guittot
2026-06-25 14:55 ` Vincent Guittot
1 sibling, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2026-06-25 10:09 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Vincent Guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 10:37:20AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
>
> > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > +{
> > > + struct sched_entity *curr = cfs_rq->curr;
> >
> > curr seems to be unused here and is NULL anyways when
> > set_protect_slice() is called ;-)
>
> Ah, but it is not with the flat patches on, which is why I was a little
> confused ;-)
>
> That said; I now see se == curr. So let me go have another look at all
> that.
I might be slow -- it is definitely waay to warm already -- but I'm not
seeing how you don't want avg_vruntime() here.
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 10:09 ` Peter Zijlstra
@ 2026-06-25 12:57 ` Vincent Guittot
2026-06-25 12:59 ` Vincent Guittot
0 siblings, 1 reply; 20+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:57 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 12:10, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 25, 2026 at 10:37:20AM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
> >
> > > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > > +{
> > > > + struct sched_entity *curr = cfs_rq->curr;
> > >
> > > curr seems to be unused here and is NULL anyways when
> > > set_protect_slice() is called ;-)
> >
> > Ah, but it is not with the flat patches on, which is why I was a little
> > confused ;-)
> >
> > That said; I now see se == curr. So let me go have another look at all
> > that.
>
> I might be slow -- it is definitely waay to warm already -- but I'm not
> seeing how you don't want avg_vruntime() here.
It is somehow related to avg_vruntime() except that I don't want the
current avg_vruntime but the avg_vruntime when entity_key(se) will be
null and se will become ineligible
If I use current avg_vruntime(), once se will have run enough to get
its vruntime == (now old) avg_vruntime, the new avg_vruntime will have
move forward and the se's vruntime will still be eligible
>
>
^ permalink raw reply [flat|nested] 20+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 12:57 ` Vincent Guittot
@ 2026-06-25 12:59 ` Vincent Guittot
0 siblings, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 14:57, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 25 Jun 2026 at 12:10, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Jun 25, 2026 at 10:37:20AM +0200, Peter Zijlstra wrote:
> > > On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
> > >
> > > > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > > > +{
> > > > > + struct sched_entity *curr = cfs_rq->curr;
> > > >
> > > > curr seems to be unused here and is NULL anyways when
> > > > set_protect_slice() is called ;-)
> > >
> > > Ah, but it is not with the flat patches on, which is why I was a little
> > > confused ;-)
> > >
> > > That said; I now see se == curr. So let me go have another look at all
> > > that.
> >
> > I might be slow -- it is definitely waay to warm already -- but I'm not
> > seeing how you don't want avg_vruntime() here.
>
> It is somehow related to avg_vruntime() except that I don't want the
> current avg_vruntime but the avg_vruntime when entity_key(se) will be
> null and se will become ineligible
>
> If I use current avg_vruntime(), once se will have run enough to get
> its vruntime == (now old) avg_vruntime, the new avg_vruntime will have
> move forward and the se's vruntime will still be eligible
And I should name it ineligible_vruntime because of the +1
>
>
> >
> >
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 8:37 ` Peter Zijlstra
2026-06-25 10:09 ` Peter Zijlstra
@ 2026-06-25 14:55 ` Vincent Guittot
1 sibling, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-25 14:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 10:37, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
>
> > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > +{
> > > + struct sched_entity *curr = cfs_rq->curr;
> >
> > curr seems to be unused here and is NULL anyways when
> > set_protect_slice() is called ;-)
>
> Ah, but it is not with the flat patches on, which is why I was a little
> confused ;-)
Yeah, flat hierarchy has another level of complexity in tracking
scheduling latency and that will be the next step
>
> That said; I now see se == curr. So let me go have another look at all
> that.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 7:37 ` K Prateek Nayak
2026-06-25 8:37 ` Peter Zijlstra
@ 2026-06-25 12:51 ` Vincent Guittot
1 sibling, 0 replies; 20+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:51 UTC (permalink / raw)
To: K Prateek Nayak
Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 09:37, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > +/*
> > + * Compute the vruntime until which the entity remains eligible when it runs
> > + * or is about to run on the CPU. We use this value to set vprot to the min
> > + * value until which other entities would not be picked anyway.
> > + * \Sum (v_i - v0)*w_i
> > + * V = ------------------- + v0
> > + * \Sum w_i
> > + *
> > + * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
> > + * in the rb tree and next is already dequeued so
> > + *
> > + * cfs_rq->sum_w_vruntime
> > + * V' = ------------------------- + v0
> > + * cfs_rq->sum_weight + w_se
> > +
>
> nit.
>
> ^ is that a stray line or a Missing * at the beginning of the comment
> line?
yes
>
> > + */
> > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +{
> > + struct sched_entity *curr = cfs_rq->curr;
>
> curr seems to be unused here and is NULL anyways when
> set_protect_slice() is called ;-)
Yeah, I remember seeing the warning and forgot to remove it
>
> > + long weight = cfs_rq->sum_weight;
> > + s64 delta = 0;
> > +
> > + if (weight) {
> > + s64 runtime = cfs_rq->sum_w_vruntime;
> > +
> > + weight += avg_vruntime_weight(cfs_rq, se->load.weight);
> > +
> > + /* sign flips effective floor / ceiling */
> > + if (runtime < 0)
> > + runtime -= (weight - 1);
> > +
> > + delta = div64_long(runtime, weight);
> > + } else {> + /*
> > + * When there is but one element, it is the average.
> > + */
> > + delta = 0;
>
> Even with a single entity, the se->vruntime can still diverge from
> cfs_rq->zero_vruntime
>
> Last avg_vruntime() call for cfs_rq was at update_entity_lag() during
> last dequeue while se->on_rq was still set for the dequeuing entity.
>
> Should this be entity_key(cfs_rq, se) instead?
Probably although the case should never happen
>
> > + }
> > +
> > + return cfs_rq->zero_vruntime + delta + 1;
> > +}
> > +
> > static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
> >
> > /*
> --
> Thanks and Regards,
> Prateek
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
2026-06-25 7:37 ` K Prateek Nayak
@ 2026-06-25 8:33 ` Peter Zijlstra
1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2026-06-25 8:33 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, linux-kernel, qyousef
On Wed, Jun 24, 2026 at 05:12:29PM +0200, Vincent Guittot wrote:
> When a task with a shorter slice is enqueued, we protect the running
> task which has a longer slice until it becomes ineligible instead of a
> full slice in order to speedup the switch to other tasks until the task
> with the shortest slice is scheduled. This helps to the task to not wait
> too many full slices before running.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> kernel/sched/fair.c | 52 +++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f972987618e7..7c541f27a1ed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -813,6 +813,48 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
> return cfs_rq->zero_vruntime;
> }
>
> +/*
> + * Compute the vruntime until which the entity remains eligible when it runs
> + * or is about to run on the CPU. We use this value to set vprot to the min
> + * value until which other entities would not be picked anyway.
> + * \Sum (v_i - v0)*w_i
> + * V = ------------------- + v0
> + * \Sum w_i
> + *
> + * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
> + * in the rb tree and next is already dequeued so
> + *
> + * cfs_rq->sum_w_vruntime
> + * V' = ------------------------- + v0
> + * cfs_rq->sum_weight + w_se
> +
> + */
> +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +{
> + struct sched_entity *curr = cfs_rq->curr;
> + long weight = cfs_rq->sum_weight;
> + s64 delta = 0;
'curr' goes unused in this function, did you want:
if (curr && !curr->on_rq)
curr = NULL;
> +
> + if (weight) {
> + s64 runtime = cfs_rq->sum_w_vruntime;
if (curr) {
unsigned long w = avg_vruntime_weight(cfs_rq, curr->load.weight);
runtime += entity_key(cfs_rq, curr) * w;
weight += w;
}
?
> +
> + weight += avg_vruntime_weight(cfs_rq, se->load.weight);
> +
> + /* sign flips effective floor / ceiling */
> + if (runtime < 0)
> + runtime -= (weight - 1);
> +
> + delta = div64_long(runtime, weight);
> + } else {
> + /*
> + * When there is but one element, it is the average.
> + */
> + delta = 0;
> + }
> +
> + return cfs_rq->zero_vruntime + delta + 1;
> +}
^ permalink raw reply [flat|nested] 20+ messages in thread