* [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-25 6:24 ` K Prateek Nayak
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2026-06-24 15:12 ` [PATCH 2/6 v3] sched/eevdf: Take into account current's lag when updating slice protection Vincent Guittot
` (4 subsequent siblings)
5 siblings, 2 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
If a shorter slice task can preempt current at wakeup, we make sure that
the decision will not be overwritten in between by setting the task as the
next buddy. This still implies that the waking task remains eligible when
the scheduler will actually pick the next task to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d78467ec6ee1..83bce5a04f3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
preempt:
if (preempt_action == PREEMPT_WAKEUP_SHORT) {
cancel_protect_slice(se);
- clear_buddies(cfs_rq, se);
+ set_next_buddy(&p->se);
}
resched_curr_lazy(rq);
--
2.43.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-24 15:12 ` [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short Vincent Guittot
@ 2026-06-25 6:24 ` K Prateek Nayak
2026-06-25 12:40 ` Vincent Guittot
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
1 sibling, 1 reply; 44+ messages in thread
From: K Prateek Nayak @ 2026-06-25 6:24 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, linux-kernel, qyousef
Hello Vincent,
On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d78467ec6ee1..83bce5a04f3d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> preempt:
> if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> cancel_protect_slice(se);
> - clear_buddies(cfs_rq, se);
> + set_next_buddy(&p->se);
> }
On a tangential note, I just noticed set_preempt_buddy() has two unused
parameters. Seems to have been like that since it was introduced in
commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
EEVDF goals").
Perhaps this can be included in the series too as a cleanup:
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7c541f27a1ed..34b3888c4ccf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9755,9 +9755,7 @@ enum preempt_wakeup_action {
PREEMPT_WAKEUP_RESCHED, /* Force reschedule. */
};
-static inline bool
-set_preempt_buddy(struct cfs_rq *cfs_rq, int wake_flags,
- struct sched_entity *pse, struct sched_entity *se)
+static inline bool set_preempt_buddy(struct cfs_rq *cfs_rq, struct sched_entity *pse)
{
/*
* Keep existing buddy if the deadline is sooner than pse.
@@ -9903,9 +9901,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
goto update;
/* Prefer picking wakee soon if appropriate. */
- if (sched_feat(NEXT_BUDDY) &&
- set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
-
+ if (sched_feat(NEXT_BUDDY) && set_preempt_buddy(cfs_rq, pse)) {
/*
* Decide whether to obey WF_SYNC hint for a new buddy. Old
* buddies are ignored as they may not be relevant to the
--
Thanks and Regards,
Prateek
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-25 6:24 ` K Prateek Nayak
@ 2026-06-25 12:40 ` Vincent Guittot
2026-06-25 12:43 ` Peter Zijlstra
2026-06-26 6:54 ` Peter Zijlstra
0 siblings, 2 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:40 UTC (permalink / raw)
To: K Prateek Nayak
Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 08:24, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index d78467ec6ee1..83bce5a04f3d 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > preempt:
> > if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > cancel_protect_slice(se);
> > - clear_buddies(cfs_rq, se);
> > + set_next_buddy(&p->se);
> > }
>
> On a tangential note, I just noticed set_preempt_buddy() has two unused
> parameters. Seems to have been like that since it was introduced in
> commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
> EEVDF goals").
>
> Perhaps this can be included in the series too as a cleanup:
I would even go further and remove it. The NEXT_BUDDY feature is broken anyway
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7c541f27a1ed..34b3888c4ccf 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9755,9 +9755,7 @@ enum preempt_wakeup_action {
> PREEMPT_WAKEUP_RESCHED, /* Force reschedule. */
> };
>
> -static inline bool
> -set_preempt_buddy(struct cfs_rq *cfs_rq, int wake_flags,
> - struct sched_entity *pse, struct sched_entity *se)
> +static inline bool set_preempt_buddy(struct cfs_rq *cfs_rq, struct sched_entity *pse)
> {
> /*
> * Keep existing buddy if the deadline is sooner than pse.
> @@ -9903,9 +9901,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> goto update;
>
> /* Prefer picking wakee soon if appropriate. */
> - if (sched_feat(NEXT_BUDDY) &&
> - set_preempt_buddy(cfs_rq, wake_flags, pse, se)) {
> -
> + if (sched_feat(NEXT_BUDDY) && set_preempt_buddy(cfs_rq, pse)) {
> /*
> * Decide whether to obey WF_SYNC hint for a new buddy. Old
> * buddies are ignored as they may not be relevant to the
> --
> Thanks and Regards,
> Prateek
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-25 12:40 ` Vincent Guittot
@ 2026-06-25 12:43 ` Peter Zijlstra
2026-06-26 6:54 ` Peter Zijlstra
1 sibling, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-25 12:43 UTC (permalink / raw)
To: Vincent Guittot
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 02:40:34PM +0200, Vincent Guittot wrote:
> On Thu, 25 Jun 2026 at 08:24, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >
> > Hello Vincent,
> >
> > On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d78467ec6ee1..83bce5a04f3d 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > > preempt:
> > > if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > > cancel_protect_slice(se);
> > > - clear_buddies(cfs_rq, se);
> > > + set_next_buddy(&p->se);
> > > }
> >
> > On a tangential note, I just noticed set_preempt_buddy() has two unused
> > parameters. Seems to have been like that since it was introduced in
> > commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
> > EEVDF goals").
> >
> > Perhaps this can be included in the series too as a cleanup:
>
> I would even go further and remove it. The NEXT_BUDDY feature is broken anyway
I thought Mel wanted to try again, but he's been somewhat silent on
matters. Mel?
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-25 12:40 ` Vincent Guittot
2026-06-25 12:43 ` Peter Zijlstra
@ 2026-06-26 6:54 ` Peter Zijlstra
2026-06-26 7:02 ` Vincent Guittot
1 sibling, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-26 6:54 UTC (permalink / raw)
To: Vincent Guittot
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 02:40:34PM +0200, Vincent Guittot wrote:
> On Thu, 25 Jun 2026 at 08:24, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >
> > Hello Vincent,
> >
> > On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index d78467ec6ee1..83bce5a04f3d 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > > preempt:
> > > if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > > cancel_protect_slice(se);
> > > - clear_buddies(cfs_rq, se);
> > > + set_next_buddy(&p->se);
> > > }
> >
> > On a tangential note, I just noticed set_preempt_buddy() has two unused
> > parameters. Seems to have been like that since it was introduced in
> > commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
> > EEVDF goals").
> >
> > Perhaps this can be included in the series too as a cleanup:
>
> I would even go further and remove it. The NEXT_BUDDY feature is broken anyway
Anyway, the reason I mentioned using it, is that it only sets next if
the new entry has an earlier deadline.
But I suppose you want to violate the strict deadline order, but in that
case we should still order on slice length. We should not set next when
we already have one that is a shorter slice, no?
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-26 6:54 ` Peter Zijlstra
@ 2026-06-26 7:02 ` Vincent Guittot
2026-06-26 8:08 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-26 7:02 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, 26 Jun 2026 at 08:54, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 25, 2026 at 02:40:34PM +0200, Vincent Guittot wrote:
> > On Thu, 25 Jun 2026 at 08:24, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> > >
> > > Hello Vincent,
> > >
> > > On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > > index d78467ec6ee1..83bce5a04f3d 100644
> > > > --- a/kernel/sched/fair.c
> > > > +++ b/kernel/sched/fair.c
> > > > @@ -9903,7 +9903,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > > > preempt:
> > > > if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > > > cancel_protect_slice(se);
> > > > - clear_buddies(cfs_rq, se);
> > > > + set_next_buddy(&p->se);
> > > > }
> > >
> > > On a tangential note, I just noticed set_preempt_buddy() has two unused
> > > parameters. Seems to have been like that since it was introduced in
> > > commit e837456fdca8 ("sched/fair: Reimplement NEXT_BUDDY to align with
> > > EEVDF goals").
> > >
> > > Perhaps this can be included in the series too as a cleanup:
> >
> > I would even go further and remove it. The NEXT_BUDDY feature is broken anyway
>
> Anyway, the reason I mentioned using it, is that it only sets next if
> the new entry has an earlier deadline.
>
> But I suppose you want to violate the strict deadline order, but in that
yeah, and make sure that next is eligible
> case we should still order on slice length. We should not set next when
> we already have one that is a shorter slice, no?
set_next_buddy() is only called for PREEMPT_WAKEUP_SHORT which is only
used when (pse->slice < se->slice) with this patchset
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-26 7:02 ` Vincent Guittot
@ 2026-06-26 8:08 ` Peter Zijlstra
2026-06-26 8:55 ` Vincent Guittot
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-26 8:08 UTC (permalink / raw)
To: Vincent Guittot
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, Jun 26, 2026 at 09:02:09AM +0200, Vincent Guittot wrote:
> > But I suppose you want to violate the strict deadline order, but in that
>
> yeah, and make sure that next is eligible
>
> > case we should still order on slice length. We should not set next when
> > we already have one that is a shorter slice, no?
>
> set_next_buddy() is only called for PREEMPT_WAKEUP_SHORT which is only
> used when (pse->slice < se->slice) with this patchset
Right, but you can get multiple wakeups, and in that case you want to
preserve the shortest slice, right?
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short
2026-06-26 8:08 ` Peter Zijlstra
@ 2026-06-26 8:55 ` Vincent Guittot
0 siblings, 0 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-26 8:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, 26 Jun 2026 at 10:08, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 26, 2026 at 09:02:09AM +0200, Vincent Guittot wrote:
>
> > > But I suppose you want to violate the strict deadline order, but in that
> >
> > yeah, and make sure that next is eligible
> >
> > > case we should still order on slice length. We should not set next when
> > > we already have one that is a shorter slice, no?
> >
> > set_next_buddy() is only called for PREEMPT_WAKEUP_SHORT which is only
> > used when (pse->slice < se->slice) with this patchset
>
> Right, but you can get multiple wakeups, and in that case you want to
> preserve the shortest slice, right?
Fair enough
^ permalink raw reply [flat|nested] 44+ messages in thread
* [tip: sched/core] sched/fair: Set next buddy for preempt short
2026-06-24 15:12 ` [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short Vincent Guittot
2026-06-25 6:24 ` K Prateek Nayak
@ 2026-06-30 9:03 ` tip-bot2 for Vincent Guittot
1 sibling, 0 replies; 44+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2026-06-30 9:03 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel), K Prateek Nayak, x86,
linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 8f97b627ff1c9b0f2eba71b326ac09326bf6b4ea
Gitweb: https://git.kernel.org/tip/8f97b627ff1c9b0f2eba71b326ac09326bf6b4ea
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Wed, 24 Jun 2026 17:12:24 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 30 Jun 2026 10:56:54 +02:00
sched/fair: Set next buddy for preempt short
If a shorter slice task can preempt current at wakeup, we make sure that
the decision will not be overwritten in between by setting the task as the
next buddy. This still implies that the waking task remains eligible when
the scheduler will actually pick the next task to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260624151229.1710703-2-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 90d7f83..976cc3c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9757,6 +9757,15 @@ static inline bool set_preempt_buddy(struct cfs_rq *cfs_rq, struct sched_entity
return true;
}
+static inline bool set_short_buddy(struct cfs_rq *cfs_rq, struct sched_entity *pse)
+{
+ if (cfs_rq->next && cfs_rq->next->slice < pse->slice)
+ return false;
+
+ set_next_buddy(cfs_rq, pse);
+ return true;
+}
+
/*
* WF_SYNC|WF_TTWU indicates the waker expects to sleep but it is not
* strictly enforced because the hint is either misunderstood or
@@ -9931,7 +9940,7 @@ pick:
preempt:
if (preempt_action == PREEMPT_WAKEUP_SHORT) {
cancel_protect_slice(se);
- clear_buddies(cfs_rq, se);
+ set_short_buddy(cfs_rq, pse);
}
resched_curr_lazy(rq);
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/6 v3] sched/eevdf: Take into account current's lag when updating slice protection
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
2026-06-24 15:12 ` [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2026-06-24 15:12 ` [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set Vincent Guittot
` (3 subsequent siblings)
5 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
Take into account the lag of current task when updating the slice
protection in order to ensure that the absolute value of lags will remain
in the range [0 : slice+tick]
A task that already has a negative lag will see its protection reduced
whereas a task with positive lag will keep a full slice protection.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 83bce5a04f3d..8639086e5d9e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1099,8 +1099,9 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
u64 slice = cfs_rq_min_slice(cfs_rq);
+ u64 vruntime = min_vruntime(se->vruntime, avg_vruntime(cfs_rq));
- se->vprot = min_vruntime(se->vprot, se->vruntime + calc_delta_fair(slice, se));
+ se->vprot = min_vruntime(se->vprot, vruntime + calc_delta_fair(slice, se));
}
static inline bool protect_slice(struct sched_entity *se)
--
2.43.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* [tip: sched/core] sched/eevdf: Take into account current's lag when updating slice protection
2026-06-24 15:12 ` [PATCH 2/6 v3] sched/eevdf: Take into account current's lag when updating slice protection Vincent Guittot
@ 2026-06-30 9:03 ` tip-bot2 for Vincent Guittot
0 siblings, 0 replies; 44+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2026-06-30 9:03 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel), K Prateek Nayak, x86,
linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 49f87b4d0be8b440a040e998b155ed5379223871
Gitweb: https://git.kernel.org/tip/49f87b4d0be8b440a040e998b155ed5379223871
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Wed, 24 Jun 2026 17:12:25 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 30 Jun 2026 10:56:54 +02:00
sched/eevdf: Take into account current's lag when updating slice protection
Take into account the lag of current task when updating the slice
protection in order to ensure that the absolute value of lags will remain
in the range [0 : slice+tick]
A task that already has a negative lag will see its protection reduced
whereas a task with positive lag will keep a full slice protection.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260624151229.1710703-3-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 976cc3c..cadbb11 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1067,8 +1067,9 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
u64 slice = cfs_rq_min_slice(cfs_rq);
+ u64 vruntime = min_vruntime(se->vruntime, avg_vruntime(cfs_rq));
- se->vprot = min_vruntime(se->vprot, se->vruntime + calc_delta_fair(slice, se));
+ se->vprot = min_vruntime(se->vprot, vruntime + calc_delta_fair(slice, se));
}
static inline bool protect_slice(struct sched_entity *se)
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
2026-06-24 15:12 ` [PATCH 1/6 v3] sched/fair: Set next buddy for preempt short Vincent Guittot
2026-06-24 15:12 ` [PATCH 2/6 v3] sched/eevdf: Take into account current's lag when updating slice protection Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-26 7:21 ` Peter Zijlstra
2026-06-30 9:03 ` [tip: sched/core] sched/eevdf: Update slice protection even when resched is already set tip-bot2 for Vincent Guittot
2026-06-24 15:12 ` [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible Vincent Guittot
` (2 subsequent siblings)
5 siblings, 2 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
Even if resched is already set, we might want to update or even cancel
the slice protection and ensure that the newly waking task will be the
next one to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8639086e5d9e..854f3a9f1d80 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9804,7 +9804,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
* prevents us from potentially nominating it as a false LAST_BUDDY
* below.
*/
- if (test_tsk_need_resched(rq->curr))
+ if (!sched_feat(PREEMPT_SHORT) && test_tsk_need_resched(rq->curr))
return;
if (!sched_feat(WAKEUP_PREEMPTION))
--
2.43.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set
2026-06-24 15:12 ` [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set Vincent Guittot
@ 2026-06-26 7:21 ` Peter Zijlstra
2026-06-26 7:46 ` Peter Zijlstra
2026-06-30 9:03 ` [tip: sched/core] sched/eevdf: Update slice protection even when resched is already set tip-bot2 for Vincent Guittot
1 sibling, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-26 7:21 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, linux-kernel, qyousef
On Wed, Jun 24, 2026 at 05:12:26PM +0200, Vincent Guittot wrote:
> Even if resched is already set, we might want to update or even cancel
> the slice protection and ensure that the newly waking task will be the
> next one to run.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8639086e5d9e..854f3a9f1d80 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9804,7 +9804,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> * prevents us from potentially nominating it as a false LAST_BUDDY
> * below.
> */
> - if (test_tsk_need_resched(rq->curr))
> + if (!sched_feat(PREEMPT_SHORT) && test_tsk_need_resched(rq->curr))
> return;
>
> if (!sched_feat(WAKEUP_PREEMPTION))
This one is leading to boot splats for me -- let me try and figure out
why.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set
2026-06-26 7:21 ` Peter Zijlstra
@ 2026-06-26 7:46 ` Peter Zijlstra
2026-06-30 9:03 ` [tip: sched/core] sched/core: Fix inter-class wakeup_preempt() tip-bot2 for Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-26 7:46 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, linux-kernel, qyousef
On Fri, Jun 26, 2026 at 09:21:28AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 24, 2026 at 05:12:26PM +0200, Vincent Guittot wrote:
> > Even if resched is already set, we might want to update or even cancel
> > the slice protection and ensure that the newly waking task will be the
> > next one to run.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8639086e5d9e..854f3a9f1d80 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9804,7 +9804,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > * prevents us from potentially nominating it as a false LAST_BUDDY
> > * below.
> > */
> > - if (test_tsk_need_resched(rq->curr))
> > + if (!sched_feat(PREEMPT_SHORT) && test_tsk_need_resched(rq->curr))
> > return;
> >
> > if (!sched_feat(WAKEUP_PREEMPTION))
>
> This one is leading to boot splats for me -- let me try and figure out
> why.
Bah!
Subject: sched/core: Fix inter-class wakeup_preempt()
From: Peter Zijlstra <peterz@infradead.org>
Date: Fri Jun 26 09:36:52 CEST 2026
The way wakeup_preempt() works since commit 704069649b5b ("sched/core: Rework
sched_class::wakeup_preempt() and rq_modified_*()") is that it will call
rq->next_class->wakeup_preempt(rq, p) when p is of an equal or higher class,
and raise ->next_class when higher.
This means that:
running idle task
wakeup fair-A
(next_class == idle)
if (sched_class_above(fair, idle)) {
wakeup_preempt_idle(fair-A);
resched_curr(rq);
next_class = fair;
}
wakeup fair-B
(next_class == fair)
if (fair == fair)
wakeup_preempt_fair(fair-B);
(but current is idle)
All wakeup_preempt_$class() methods, except for wakeup_preempt_scx() (for whoem
this was build) ignore cross-class wakeups by testing if @p is of the right
class, but per the above case, it also should check current.
This is mostly harmless in the current form, but will lead to trouble with
later patches.
Fixes: 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/deadline.c | 6 ++++--
kernel/sched/fair.c | 3 ++-
kernel/sched/rt.c | 3 ++-
3 files changed, 8 insertions(+), 4 deletions(-)
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2733,15 +2733,17 @@ static int balance_dl(struct rq *rq, str
*/
static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int flags)
{
+ struct task_struct *donor = rq->donor;
/*
* Can only get preempted by stop-class, and those should be
* few and short lived, doesn't really make sense to push
* anything away for that.
*/
- if (p->sched_class != &dl_sched_class)
+ if (p->sched_class != &dl_sched_class ||
+ donor->sched_class != &dl_sched_class)
return;
- if (dl_entity_preempt(&p->dl, &rq->donor->dl)) {
+ if (dl_entity_preempt(&p->dl, &donor->dl)) {
resched_curr(rq);
return;
}
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9778,7 +9778,8 @@ static void wakeup_preempt_fair(struct r
/*
* XXX Getting preempted by higher class, try and find idle CPU?
*/
- if (p->sched_class != &fair_sched_class)
+ if (p->sched_class != &fair_sched_class ||
+ donor->sched_class != &fair_sched_class)
return;
if (unlikely(se == pse))
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1629,7 +1629,8 @@ static void wakeup_preempt_rt(struct rq
/*
* XXX If we're preempted by DL, queue a push?
*/
- if (p->sched_class != &rt_sched_class)
+ if (p->sched_class != &rt_sched_class ||
+ donor->sched_class != &rt_sched_class)
return;
if (p->prio < donor->prio) {
^ permalink raw reply [flat|nested] 44+ messages in thread* [tip: sched/core] sched/core: Fix inter-class wakeup_preempt()
2026-06-26 7:46 ` Peter Zijlstra
@ 2026-06-30 9:03 ` tip-bot2 for Peter Zijlstra
0 siblings, 0 replies; 44+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-06-30 9:03 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: fa02b2868420d9f33d64ddcb15ef0f96880b2a6d
Gitweb: https://git.kernel.org/tip/fa02b2868420d9f33d64ddcb15ef0f96880b2a6d
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 26 Jun 2026 09:36:52 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 30 Jun 2026 10:56:51 +02:00
sched/core: Fix inter-class wakeup_preempt()
The way wakeup_preempt() works since commit 704069649b5b ("sched/core: Rework
sched_class::wakeup_preempt() and rq_modified_*()") is that it will call
rq->next_class->wakeup_preempt(rq, p) when p is of an equal or higher class,
and raise ->next_class when higher.
This means that:
running idle task
wakeup fair-A
(next_class == idle)
if (sched_class_above(fair, idle)) {
wakeup_preempt_idle(fair-A);
resched_curr(rq);
next_class = fair;
}
wakeup fair-B
(next_class == fair)
if (fair == fair)
wakeup_preempt_fair(fair-B);
(but current is idle)
All wakeup_preempt_$class() methods, except for wakeup_preempt_scx() (for whoem
this was build) ignore cross-class wakeups by testing if @p is of the right
class, but per the above case, it also should check current.
This is mostly harmless in the current form, but will lead to trouble with
later patches.
Fixes: 704069649b5b ("sched/core: Rework sched_class::wakeup_preempt() and rq_modified_*()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260626074605.GB2568396%40noisy.programming.kicks-ass.net
---
kernel/sched/deadline.c | 6 ++++--
kernel/sched/fair.c | 3 ++-
kernel/sched/rt.c | 3 ++-
3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0f858b9..a3003b0 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2733,15 +2733,17 @@ static int balance_dl(struct rq *rq, struct rq_flags *rf)
*/
static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p, int flags)
{
+ struct task_struct *donor = rq->donor;
/*
* Can only get preempted by stop-class, and those should be
* few and short lived, doesn't really make sense to push
* anything away for that.
*/
- if (p->sched_class != &dl_sched_class)
+ if (p->sched_class != &dl_sched_class ||
+ donor->sched_class != &dl_sched_class)
return;
- if (dl_entity_preempt(&p->dl, &rq->donor->dl)) {
+ if (dl_entity_preempt(&p->dl, &donor->dl)) {
resched_curr(rq);
return;
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d78467e..a384f16 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9778,7 +9778,8 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
/*
* XXX Getting preempted by higher class, try and find idle CPU?
*/
- if (p->sched_class != &fair_sched_class)
+ if (p->sched_class != &fair_sched_class ||
+ donor->sched_class != &fair_sched_class)
return;
if (unlikely(se == pse))
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index e474c31..e6e5f8a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1629,7 +1629,8 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
/*
* XXX If we're preempted by DL, queue a push?
*/
- if (p->sched_class != &rt_sched_class)
+ if (p->sched_class != &rt_sched_class ||
+ donor->sched_class != &rt_sched_class)
return;
if (p->prio < donor->prio) {
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [tip: sched/core] sched/eevdf: Update slice protection even when resched is already set
2026-06-24 15:12 ` [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set Vincent Guittot
2026-06-26 7:21 ` Peter Zijlstra
@ 2026-06-30 9:03 ` tip-bot2 for Vincent Guittot
1 sibling, 0 replies; 44+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2026-06-30 9:03 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel), K Prateek Nayak, x86,
linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 7ad0ec7890b306dab20e53a02cadd867c92ea513
Gitweb: https://git.kernel.org/tip/7ad0ec7890b306dab20e53a02cadd867c92ea513
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Wed, 24 Jun 2026 17:12:26 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 30 Jun 2026 10:56:54 +02:00
sched/eevdf: Update slice protection even when resched is already set
Even if resched is already set, we might want to update or even cancel
the slice protection and ensure that the newly waking task will be the
next one to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260624151229.1710703-4-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cadbb11..af207f0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9848,7 +9848,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
* prevents us from potentially nominating it as a false LAST_BUDDY
* below.
*/
- if (test_tsk_need_resched(rq->curr))
+ if (!sched_feat(PREEMPT_SHORT) && test_tsk_need_resched(rq->curr))
return;
if (!sched_feat(WAKEUP_PREEMPTION))
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
` (2 preceding siblings ...)
2026-06-24 15:12 ` [PATCH 3/6 v3] sched/eevdf: Update slice protection even when resched is already set Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-25 6:00 ` K Prateek Nayak
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2026-06-24 15:12 ` [PATCH 5/6 v3] sched/eevdf: Always update slice protection Vincent Guittot
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
5 siblings, 2 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
If a short slice task will not be the next to be picked but is eligible,
we cancel the slice protection to speedup the time when the short slice
task will be the next to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 854f3a9f1d80..719aa53851e4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9816,18 +9816,13 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
cse_is_idle = se_is_idle(se);
pse_is_idle = se_is_idle(pse);
+ nse = se;
/*
* Preempt an idle entity in favor of a non-idle entity (and don't preempt
* in the inverse case).
*/
- if (cse_is_idle && !pse_is_idle) {
- /*
- * When non-idle entity preempt an idle entity,
- * don't give idle entity slice protection.
- */
- preempt_action = PREEMPT_WAKEUP_SHORT;
+ if (cse_is_idle && !pse_is_idle)
goto preempt;
- }
if (cse_is_idle != pse_is_idle)
return;
@@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
if (!nse && cfs_rq->nr_queued)
goto pick;
+ /*
+ * If @p is eligible but not the next task to run then cancel protection
+ * to prevent large scheduling latency
+ */
+ if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
+ goto preempt;
+
if (sched_feat(RUN_TO_PARITY))
update_protect_slice(cfs_rq, se);
return;
preempt:
- if (preempt_action == PREEMPT_WAKEUP_SHORT) {
- cancel_protect_slice(se);
+ cancel_protect_slice(se);
+
+ if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
set_next_buddy(&p->se);
- }
resched_curr_lazy(rq);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-24 15:12 ` [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible Vincent Guittot
@ 2026-06-25 6:00 ` K Prateek Nayak
2026-06-25 12:40 ` Vincent Guittot
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
1 sibling, 1 reply; 44+ messages in thread
From: K Prateek Nayak @ 2026-06-25 6:00 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, linux-kernel, qyousef
Hello Vincent,
On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> @@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> if (!nse && cfs_rq->nr_queued)
> goto pick;
>
> + /*
> + * If @p is eligible but not the next task to run then cancel protection
> + * to prevent large scheduling latency
> + */
> + if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
> + goto preempt;
We handle "pse->slice < se->slice" case before "pse->sched_delayed" case
and jump to "pick", but pse can get dequeued as a part of
pick_next_entity() if it was delayed and picked.
I think we can reach here for PREEMPT_WAKEUP_SHORT after pse is
completely dequeued from cfs_rq. If p is a task on root cfs_rq, we could
have blocked the task entirely and ideally it shouldn't be referenced
here.
Since a wakeup of delayed entity / on delayed hierarchy will call
wakeup_preempt() anyways, I think we should return early if we should
directly jump to update if we see "pse->sched_delayed".
> +
> if (sched_feat(RUN_TO_PARITY))
> update_protect_slice(cfs_rq, se);
>
> return;
>
> preempt:
> - if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> - cancel_protect_slice(se);
> + cancel_protect_slice(se);
> +
> + if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
> set_next_buddy(&p->se);
> - }
>
> resched_curr_lazy(rq);
> }
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-25 6:00 ` K Prateek Nayak
@ 2026-06-25 12:40 ` Vincent Guittot
2026-06-26 5:51 ` Shubhang Kaushik
0 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:40 UTC (permalink / raw)
To: K Prateek Nayak
Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 08:00, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > @@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> > if (!nse && cfs_rq->nr_queued)
> > goto pick;
> >
> > + /*
> > + * If @p is eligible but not the next task to run then cancel protection
> > + * to prevent large scheduling latency
> > + */
> > + if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
> > + goto preempt;
>
> We handle "pse->slice < se->slice" case before "pse->sched_delayed" case
> and jump to "pick", but pse can get dequeued as a part of
> pick_next_entity() if it was delayed and picked.
yes
>
> I think we can reach here for PREEMPT_WAKEUP_SHORT after pse is
> completely dequeued from cfs_rq. If p is a task on root cfs_rq, we could
> have blocked the task entirely and ideally it shouldn't be referenced
> here.
I tried to find which use case calls wakeup_preempt_fair() for a
sched_delayed task but can't find it
>
> Since a wakeup of delayed entity / on delayed hierarchy will call
> wakeup_preempt() anyways, I think we should return early if we should
> directly jump to update if we see "pse->sched_delayed".
Fair enough. But we still need to consider the FORK case
>
> > +
> > if (sched_feat(RUN_TO_PARITY))
> > update_protect_slice(cfs_rq, se);
> >
> > return;
> >
> > preempt:
> > - if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> > - cancel_protect_slice(se);
> > + cancel_protect_slice(se);
> > +
> > + if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
> > set_next_buddy(&p->se);
> > - }
> >
> > resched_curr_lazy(rq);
> > }
>
> --
> Thanks and Regards,
> Prateek
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-25 12:40 ` Vincent Guittot
@ 2026-06-26 5:51 ` Shubhang Kaushik
2026-06-26 7:39 ` Vincent Guittot
0 siblings, 1 reply; 44+ messages in thread
From: Shubhang Kaushik @ 2026-06-26 5:51 UTC (permalink / raw)
To: Vincent Guittot, K Prateek Nayak
Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel, qyousef
Hi Vincent,
On Thu, 25 Jun 2026, Vincent Guittot wrote:
> On Thu, 25 Jun 2026 at 08:00, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>>
>> Hello Vincent,
>>
>> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
>>> @@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
>>> if (!nse && cfs_rq->nr_queued)
>>> goto pick;
>>>
>>> + /*
>>> + * If @p is eligible but not the next task to run then cancel protection
>>> + * to prevent large scheduling latency
>>> + */
>>> + if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
>>> + goto preempt;
>>
>> We handle "pse->slice < se->slice" case before "pse->sched_delayed" case
>> and jump to "pick", but pse can get dequeued as a part of
>> pick_next_entity() if it was delayed and picked.
>
> yes
>
>>
>> I think we can reach here for PREEMPT_WAKEUP_SHORT after pse is
>> completely dequeued from cfs_rq. If p is a task on root cfs_rq, we could
>> have blocked the task entirely and ideally it shouldn't be referenced
>> here.
>
> I tried to find which use case calls wakeup_preempt_fair() for a
> sched_delayed task but can't find it
>
>>
>> Since a wakeup of delayed entity / on delayed hierarchy will call
>> wakeup_preempt() anyways, I think we should return early if we should
>> directly jump to update if we see "pse->sched_delayed".
>
> Fair enough. But we still need to consider the FORK case
>
I tested the v3 series on Ampere Altra using cyclictest with high
concurrency rt-app and hackbench load matrices. I tried to follow your
same setup for the testsuite as described in the cover letter
with the baseline as tip/sched/core.
The maximum latency changed as follows:-
- cyclictest: 112us -> 62us
- cyclictest + rt-app: 108us -> 166us
- cyclictest + hackbench: 173us -> 154us
The patchset improved the two cases above but regresses the rt-app tail.
Would the regression be caused by the interaction between the larger
competing slices and repeated short task wakeups? These repeatedly tend
to cancel or recompute the running entity's protection.
>>
>>> +
>>> if (sched_feat(RUN_TO_PARITY))
>>> update_protect_slice(cfs_rq, se);
>>>
>>> return;
>>>
>>> preempt:
>>> - if (preempt_action == PREEMPT_WAKEUP_SHORT) {
>>> - cancel_protect_slice(se);
>>> + cancel_protect_slice(se);
>>> +
>>> + if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
>>> set_next_buddy(&p->se);
>>> - }
>>>
>>> resched_curr_lazy(rq);
>>> }
>>
>> --
>> Thanks and Regards,
>> Prateek
>>
>
Regards,
Shubhang Kaushik
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-26 5:51 ` Shubhang Kaushik
@ 2026-06-26 7:39 ` Vincent Guittot
0 siblings, 0 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-26 7:39 UTC (permalink / raw)
To: Shubhang Kaushik
Cc: K Prateek Nayak, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, linux-kernel, qyousef
Hi Shubhang,
On Fri, 26 Jun 2026 at 07:52, Shubhang Kaushik
<shubhang@os.amperecomputing.com> wrote:
>
> Hi Vincent,
>
> On Thu, 25 Jun 2026, Vincent Guittot wrote:
>
> > On Thu, 25 Jun 2026 at 08:00, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >>
> >> Hello Vincent,
> >>
> >> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> >>> @@ -9896,16 +9891,23 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
> >>> if (!nse && cfs_rq->nr_queued)
> >>> goto pick;
> >>>
> >>> + /*
> >>> + * If @p is eligible but not the next task to run then cancel protection
> >>> + * to prevent large scheduling latency
> >>> + */
> >>> + if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
> >>> + goto preempt;
> >>
> >> We handle "pse->slice < se->slice" case before "pse->sched_delayed" case
> >> and jump to "pick", but pse can get dequeued as a part of
> >> pick_next_entity() if it was delayed and picked.
> >
> > yes
> >
> >>
> >> I think we can reach here for PREEMPT_WAKEUP_SHORT after pse is
> >> completely dequeued from cfs_rq. If p is a task on root cfs_rq, we could
> >> have blocked the task entirely and ideally it shouldn't be referenced
> >> here.
> >
> > I tried to find which use case calls wakeup_preempt_fair() for a
> > sched_delayed task but can't find it
> >
> >>
> >> Since a wakeup of delayed entity / on delayed hierarchy will call
> >> wakeup_preempt() anyways, I think we should return early if we should
> >> directly jump to update if we see "pse->sched_delayed".
> >
> > Fair enough. But we still need to consider the FORK case
> >
>
> I tested the v3 series on Ampere Altra using cyclictest with high
> concurrency rt-app and hackbench load matrices. I tried to follow your
> same setup for the testsuite as described in the cover letter
> with the baseline as tip/sched/core.
>
> The maximum latency changed as follows:-
> - cyclictest: 112us -> 62us
> - cyclictest + rt-app: 108us -> 166us
> - cyclictest + hackbench: 173us -> 154us
Thanks for testing. Your max values were already really short compared
to my figures. It doesn't seem that cyclictest is really affected by
other activities. You should increase the load of hackbench with more
groups as an example.
Also, How long did you run your tests? I ran each test for 133 seconds
to try to catch most of the corner cases
>
> The patchset improved the two cases above but regresses the rt-app tail.
> Would the regression be caused by the interaction between the larger
> competing slices and repeated short task wakeups? These repeatedly tend
> to cancel or recompute the running entity's protection.
I don't think this has an impact.
I haven't included the average and median values because they were similar.
- median value remains the same: 57us
- avg value was better with the use case involving rt-app: 231us vs 168us
>
> >>
> >>> +
> >>> if (sched_feat(RUN_TO_PARITY))
> >>> update_protect_slice(cfs_rq, se);
> >>>
> >>> return;
> >>>
> >>> preempt:
> >>> - if (preempt_action == PREEMPT_WAKEUP_SHORT) {
> >>> - cancel_protect_slice(se);
> >>> + cancel_protect_slice(se);
> >>> +
> >>> + if (preempt_action == PREEMPT_WAKEUP_SHORT && nse == pse)
> >>> set_next_buddy(&p->se);
> >>> - }
> >>>
> >>> resched_curr_lazy(rq);
> >>> }
> >>
> >> --
> >> Thanks and Regards,
> >> Prateek
> >>
> >
> Regards,
> Shubhang Kaushik
^ permalink raw reply [flat|nested] 44+ messages in thread
* [tip: sched/core] sched/eevdf: Cancel slice protection if short slice task is eligible
2026-06-24 15:12 ` [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible Vincent Guittot
2026-06-25 6:00 ` K Prateek Nayak
@ 2026-06-30 9:03 ` tip-bot2 for Vincent Guittot
1 sibling, 0 replies; 44+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2026-06-30 9:03 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel), K Prateek Nayak, x86,
linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: ba0d3bf5f97b74ee1cf8ea6cc35efe5a052514a4
Gitweb: https://git.kernel.org/tip/ba0d3bf5f97b74ee1cf8ea6cc35efe5a052514a4
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Wed, 24 Jun 2026 17:12:27 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 30 Jun 2026 10:56:55 +02:00
sched/eevdf: Cancel slice protection if short slice task is eligible
If a short slice task will not be the next to be picked but is eligible,
we cancel the slice protection to speedup the time when the short slice
task will be the next to run.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260624151229.1710703-5-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af207f0..9046b2e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9859,18 +9859,13 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
cse_is_idle = se_is_idle(se);
pse_is_idle = se_is_idle(pse);
+ nse = se;
/*
* Preempt an idle entity in favor of a non-idle entity (and don't preempt
* in the inverse case).
*/
- if (cse_is_idle && !pse_is_idle) {
- /*
- * When non-idle entity preempt an idle entity,
- * don't give idle entity slice protection.
- */
- preempt_action = PREEMPT_WAKEUP_SHORT;
+ if (cse_is_idle && !pse_is_idle)
goto preempt;
- }
if (cse_is_idle != pse_is_idle)
return;
@@ -9933,16 +9928,23 @@ pick:
goto preempt;
}
+ /*
+ * If @p is eligible but not the next task to run then cancel protection
+ * to prevent large scheduling latency
+ */
+ if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
+ goto preempt;
+
if (sched_feat(RUN_TO_PARITY))
update_protect_slice(cfs_rq, se);
return;
preempt:
- if (preempt_action == PREEMPT_WAKEUP_SHORT) {
- cancel_protect_slice(se);
+ cancel_protect_slice(se);
+
+ if (preempt_action == PREEMPT_WAKEUP_SHORT)
set_short_buddy(cfs_rq, pse);
- }
resched_curr_lazy(rq);
}
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 5/6 v3] sched/eevdf: Always update slice protection
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
` (3 preceding siblings ...)
2026-06-24 15:12 ` [PATCH 4/6 v3] sched/eevdf: Cancel slice protection if short slice task is eligible Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
5 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
Even if p will not preempt current, it modifies the avg_vruntime and
possibly the min slice. Make sure to update the slice protection with the
updated figures. As an example, Batch and Sched Idle tasks can otherwise
get a larger lag than their slice and finaly delay the scheduling of a
normal task, which deadline will be a later.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 719aa53851e4..f972987618e7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9824,17 +9824,18 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
if (cse_is_idle && !pse_is_idle)
goto preempt;
+ cfs_rq = cfs_rq_of(se);
+ update_curr(cfs_rq);
+
if (cse_is_idle != pse_is_idle)
- return;
+ goto update;
/*
* BATCH and IDLE tasks do not preempt others.
*/
if (unlikely(!normal_policy(p->policy)))
- return;
+ goto update;
- cfs_rq = cfs_rq_of(se);
- update_curr(cfs_rq);
/*
* If @p has a shorter slice than current and @p is eligible, override
* current's slice protection in order to allow preemption.
@@ -9851,7 +9852,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
* EEVDF to forcibly queue an ineligible task.
*/
if ((wake_flags & WF_FORK) || pse->sched_delayed)
- return;
+ goto update;
/* Prefer picking wakee soon if appropriate. */
if (sched_feat(NEXT_BUDDY) &&
@@ -9897,7 +9898,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
*/
if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
goto preempt;
-
+update:
if (sched_feat(RUN_TO_PARITY))
update_protect_slice(cfs_rq, se);
--
2.43.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* [tip: sched/core] sched/eevdf: Always update slice protection
2026-06-24 15:12 ` [PATCH 5/6 v3] sched/eevdf: Always update slice protection Vincent Guittot
@ 2026-06-30 9:03 ` tip-bot2 for Vincent Guittot
0 siblings, 0 replies; 44+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2026-06-30 9:03 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel), K Prateek Nayak, x86,
linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 7cd2ef17de5e11612001e98b6319f1793239b243
Gitweb: https://git.kernel.org/tip/7cd2ef17de5e11612001e98b6319f1793239b243
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Wed, 24 Jun 2026 17:12:28 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 30 Jun 2026 10:56:55 +02:00
sched/eevdf: Always update slice protection
Even if p will not preempt current, it modifies the avg_vruntime and
possibly the min slice. Make sure to update the slice protection with the
updated figures. As an example, Batch and Sched Idle tasks can otherwise
get a larger lag than their slice and finaly delay the scheduling of a
normal task, which deadline will be a later.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260624151229.1710703-6-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9046b2e..2547a97 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9867,16 +9867,17 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
if (cse_is_idle && !pse_is_idle)
goto preempt;
+ update_curr_fair(rq);
+
if (cse_is_idle != pse_is_idle)
- return;
+ goto update;
/*
* BATCH and IDLE tasks do not preempt others.
*/
if (unlikely(!normal_policy(p->policy)))
- return;
+ goto update;
- update_curr_fair(rq);
/*
* If @p has a shorter slice than current and @p is eligible, override
* current's slice protection in order to allow preemption.
@@ -9893,7 +9894,7 @@ static void wakeup_preempt_fair(struct rq *rq, struct task_struct *p, int wake_f
* EEVDF to forcibly queue an ineligible task.
*/
if ((wake_flags & WF_FORK) || pse->sched_delayed)
- return;
+ goto update;
/* Prefer picking wakee soon if appropriate. */
if (sched_feat(NEXT_BUDDY) && set_preempt_buddy(cfs_rq, pse)) {
@@ -9934,7 +9935,7 @@ pick:
*/
if (preempt_action == PREEMPT_WAKEUP_SHORT && entity_eligible(cfs_rq, pse))
goto preempt;
-
+update:
if (sched_feat(RUN_TO_PARITY))
update_protect_slice(cfs_rq, se);
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-24 15:12 [PATCH 0/6 v3] sched/eevdf: Improve scheduling latency of short slice task Vincent Guittot
` (4 preceding siblings ...)
2026-06-24 15:12 ` [PATCH 5/6 v3] sched/eevdf: Always update slice protection Vincent Guittot
@ 2026-06-24 15:12 ` Vincent Guittot
2026-06-25 7:37 ` K Prateek Nayak
` (2 more replies)
5 siblings, 3 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-24 15:12 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, kprateek.nayak, linux-kernel, qyousef
Cc: Vincent Guittot
When a task with a shorter slice is enqueued, we protect the running
task which has a longer slice until it becomes ineligible instead of a
full slice in order to speedup the switch to other tasks until the task
with the shortest slice is scheduled. This helps to the task to not wait
too many full slices before running.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/fair.c | 52 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 50 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f972987618e7..7c541f27a1ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -813,6 +813,48 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
return cfs_rq->zero_vruntime;
}
+/*
+ * Compute the vruntime until which the entity remains eligible when it runs
+ * or is about to run on the CPU. We use this value to set vprot to the min
+ * value until which other entities would not be picked anyway.
+ * \Sum (v_i - v0)*w_i
+ * V = ------------------- + v0
+ * \Sum w_i
+ *
+ * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
+ * in the rb tree and next is already dequeued so
+ *
+ * cfs_rq->sum_w_vruntime
+ * V' = ------------------------- + v0
+ * cfs_rq->sum_weight + w_se
+
+ */
+static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ struct sched_entity *curr = cfs_rq->curr;
+ long weight = cfs_rq->sum_weight;
+ s64 delta = 0;
+
+ if (weight) {
+ s64 runtime = cfs_rq->sum_w_vruntime;
+
+ weight += avg_vruntime_weight(cfs_rq, se->load.weight);
+
+ /* sign flips effective floor / ceiling */
+ if (runtime < 0)
+ runtime -= (weight - 1);
+
+ delta = div64_long(runtime, weight);
+ } else {
+ /*
+ * When there is but one element, it is the average.
+ */
+ delta = 0;
+ }
+
+ return cfs_rq->zero_vruntime + delta + 1;
+}
+
static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
/*
@@ -1090,8 +1132,14 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
slice = cfs_rq_min_slice(cfs_rq);
slice = min(slice, se->slice);
- if (slice != se->slice)
- vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+
+ /* If there are shorter slices than se's one */
+ if (slice != se->slice) {
+ if (sched_feat(PREEMPT_SHORT))
+ vprot = min_vruntime(vprot, eligible_vruntime(cfs_rq, se));
+ else
+ vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+ }
se->vprot = vprot;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
@ 2026-06-25 7:37 ` K Prateek Nayak
2026-06-25 8:37 ` Peter Zijlstra
2026-06-25 12:51 ` Vincent Guittot
2026-06-25 8:33 ` Peter Zijlstra
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2 siblings, 2 replies; 44+ messages in thread
From: K Prateek Nayak @ 2026-06-25 7:37 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, linux-kernel, qyousef
Hello Vincent,
On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> +/*
> + * Compute the vruntime until which the entity remains eligible when it runs
> + * or is about to run on the CPU. We use this value to set vprot to the min
> + * value until which other entities would not be picked anyway.
> + * \Sum (v_i - v0)*w_i
> + * V = ------------------- + v0
> + * \Sum w_i
> + *
> + * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
> + * in the rb tree and next is already dequeued so
> + *
> + * cfs_rq->sum_w_vruntime
> + * V' = ------------------------- + v0
> + * cfs_rq->sum_weight + w_se
> +
nit.
^ is that a stray line or a Missing * at the beginning of the comment
line?
> + */
> +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +{
> + struct sched_entity *curr = cfs_rq->curr;
curr seems to be unused here and is NULL anyways when
set_protect_slice() is called ;-)
> + long weight = cfs_rq->sum_weight;
> + s64 delta = 0;
> +
> + if (weight) {
> + s64 runtime = cfs_rq->sum_w_vruntime;
> +
> + weight += avg_vruntime_weight(cfs_rq, se->load.weight);
> +
> + /* sign flips effective floor / ceiling */
> + if (runtime < 0)
> + runtime -= (weight - 1);
> +
> + delta = div64_long(runtime, weight);
> + } else {> + /*
> + * When there is but one element, it is the average.
> + */
> + delta = 0;
Even with a single entity, the se->vruntime can still diverge from
cfs_rq->zero_vruntime
Last avg_vruntime() call for cfs_rq was at update_entity_lag() during
last dequeue while se->on_rq was still set for the dequeuing entity.
Should this be entity_key(cfs_rq, se) instead?
> + }
> +
> + return cfs_rq->zero_vruntime + delta + 1;
> +}
> +
> static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
>
> /*
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 7:37 ` K Prateek Nayak
@ 2026-06-25 8:37 ` Peter Zijlstra
2026-06-25 10:09 ` Peter Zijlstra
2026-06-25 14:55 ` Vincent Guittot
2026-06-25 12:51 ` Vincent Guittot
1 sibling, 2 replies; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-25 8:37 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Vincent Guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
> > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +{
> > + struct sched_entity *curr = cfs_rq->curr;
>
> curr seems to be unused here and is NULL anyways when
> set_protect_slice() is called ;-)
Ah, but it is not with the flat patches on, which is why I was a little
confused ;-)
That said; I now see se == curr. So let me go have another look at all
that.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 8:37 ` Peter Zijlstra
@ 2026-06-25 10:09 ` Peter Zijlstra
2026-06-25 12:57 ` Vincent Guittot
2026-06-25 14:55 ` Vincent Guittot
1 sibling, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-25 10:09 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Vincent Guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 10:37:20AM +0200, Peter Zijlstra wrote:
> On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
>
> > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > +{
> > > + struct sched_entity *curr = cfs_rq->curr;
> >
> > curr seems to be unused here and is NULL anyways when
> > set_protect_slice() is called ;-)
>
> Ah, but it is not with the flat patches on, which is why I was a little
> confused ;-)
>
> That said; I now see se == curr. So let me go have another look at all
> that.
I might be slow -- it is definitely waay to warm already -- but I'm not
seeing how you don't want avg_vruntime() here.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 10:09 ` Peter Zijlstra
@ 2026-06-25 12:57 ` Vincent Guittot
2026-06-25 12:59 ` Vincent Guittot
0 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:57 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 12:10, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 25, 2026 at 10:37:20AM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
> >
> > > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > > +{
> > > > + struct sched_entity *curr = cfs_rq->curr;
> > >
> > > curr seems to be unused here and is NULL anyways when
> > > set_protect_slice() is called ;-)
> >
> > Ah, but it is not with the flat patches on, which is why I was a little
> > confused ;-)
> >
> > That said; I now see se == curr. So let me go have another look at all
> > that.
>
> I might be slow -- it is definitely waay to warm already -- but I'm not
> seeing how you don't want avg_vruntime() here.
It is somehow related to avg_vruntime() except that I don't want the
current avg_vruntime but the avg_vruntime when entity_key(se) will be
null and se will become ineligible
If I use current avg_vruntime(), once se will have run enough to get
its vruntime == (now old) avg_vruntime, the new avg_vruntime will have
move forward and the se's vruntime will still be eligible
>
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 12:57 ` Vincent Guittot
@ 2026-06-25 12:59 ` Vincent Guittot
2026-06-25 22:28 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:59 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 14:57, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Thu, 25 Jun 2026 at 12:10, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Jun 25, 2026 at 10:37:20AM +0200, Peter Zijlstra wrote:
> > > On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
> > >
> > > > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > > > +{
> > > > > + struct sched_entity *curr = cfs_rq->curr;
> > > >
> > > > curr seems to be unused here and is NULL anyways when
> > > > set_protect_slice() is called ;-)
> > >
> > > Ah, but it is not with the flat patches on, which is why I was a little
> > > confused ;-)
> > >
> > > That said; I now see se == curr. So let me go have another look at all
> > > that.
> >
> > I might be slow -- it is definitely waay to warm already -- but I'm not
> > seeing how you don't want avg_vruntime() here.
>
> It is somehow related to avg_vruntime() except that I don't want the
> current avg_vruntime but the avg_vruntime when entity_key(se) will be
> null and se will become ineligible
>
> If I use current avg_vruntime(), once se will have run enough to get
> its vruntime == (now old) avg_vruntime, the new avg_vruntime will have
> move forward and the se's vruntime will still be eligible
And I should name it ineligible_vruntime because of the +1
>
>
> >
> >
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 12:59 ` Vincent Guittot
@ 2026-06-25 22:28 ` Peter Zijlstra
2026-06-26 3:57 ` K Prateek Nayak
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-25 22:28 UTC (permalink / raw)
To: Vincent Guittot
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, Jun 25, 2026 at 02:59:16PM +0200, Vincent Guittot wrote:
> On Thu, 25 Jun 2026 at 14:57, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Thu, 25 Jun 2026 at 12:10, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Thu, Jun 25, 2026 at 10:37:20AM +0200, Peter Zijlstra wrote:
> > > > On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
> > > >
> > > > > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > > > > +{
> > > > > > + struct sched_entity *curr = cfs_rq->curr;
> > > > >
> > > > > curr seems to be unused here and is NULL anyways when
> > > > > set_protect_slice() is called ;-)
> > > >
> > > > Ah, but it is not with the flat patches on, which is why I was a little
> > > > confused ;-)
> > > >
> > > > That said; I now see se == curr. So let me go have another look at all
> > > > that.
> > >
> > > I might be slow -- it is definitely waay to warm already -- but I'm not
> > > seeing how you don't want avg_vruntime() here.
> >
> > It is somehow related to avg_vruntime() except that I don't want the
> > current avg_vruntime but the avg_vruntime when entity_key(se) will be
> > null and se will become ineligible
> >
> > If I use current avg_vruntime(), once se will have run enough to get
> > its vruntime == (now old) avg_vruntime, the new avg_vruntime will have
> > move forward and the se's vruntime will still be eligible
>
> And I should name it ineligible_vruntime because of the +1
I've ended up with something like so.
---
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -777,6 +777,67 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
return cfs_rq->zero_vruntime;
}
+/*
+ * \Sum (v_i - v0)*w_i
+ * V = ------------------- + v0
+ * \Sum w_i
+ *
+ * Let W = \Sum w_i, and move v_j such that 'v_j == V', thus:
+ *
+ * V = 1/W * {(v_j - v0)*w_j + \Sum_i!=j (v_i - v0)*w_i} + v0
+ *
+ * v_j = 1/W * {(v_j - v0)*w_j + \Sum_i!=j (v_i - v0)*w_i} + v0
+ *
+ * v_j = 1/W * (v_j - v0)*w_j + 1/W * \Sum_i!=j (v_i - v0)*w_i + v0
+ *
+ * v_j - 1/W * (v_j - v0)*w_j = 1/W * \Sum_i!=j (v_i - v0)*w_i + v0
+ *
+ * v_j*W - (v_j - v0)*w_j = \Sum_i!=j (v_i - v0)*w_i + v0*W
+ *
+ * v_j*(W - w_j) + v0*w_j = \Sum_i!=j (v_i - v0)*w_i + v0*W
+ *
+ * v_j*(W - w_j) = \Sum_i!=j (v_i - v0)*w_i + v0*(W - w_j)
+ *
+ * \Sum_i!=j (v_i - v0)*w_i
+ * v_j = ------------------------ + v0
+ * W - w_j
+ *
+ * When v_j happens to be curr, then '\Sum_i!=j (v_i - v0)*w_i'
+ * is cfs_rq->sum_w_runtime, and 'W - w_j' is cfs_rq->sum_weight, since curr
+ * is not included in the sum.
+ */
+static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
+{
+ struct sched_entity *curr = cfs_rq->curr;
+ long weight = cfs_rq->sum_weight;
+ s64 delta = 0;
+
+ if (curr && !curr->on_rq)
+ curr = NULL;
+
+ /*
+ * This is called from set_next_task_fair(.first=true) /
+ * set_protect_slice() so curr had better be set and on_rq.
+ */
+ WARN_ON_ONCE(!curr);
+
+ if (weight) {
+ s64 runtime = cfs_rq->sum_w_vruntime;
+
+ /*
+ * Do not add @curr to obtain the effective '- w_j' terms.
+ */
+
+ /* sign flips effective floor / ceiling */
+ if (runtime < 0)
+ runtime -= (weight - 1);
+
+ delta = div64_long(runtime, weight);
+ }
+
+ return cfs_rq->zero_vruntime + delta + 1;
+}
+
static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
/*
@@ -1058,8 +1119,14 @@ static inline void set_protect_slice(str
slice = cfs_rq_min_slice(cfs_rq);
slice = min(slice, se->slice);
- if (slice != se->slice)
- vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+
+ /* If there are shorter slices than se's one */
+ if (slice != se->slice) {
+ if (sched_feat(PREEMPT_SHORT))
+ vprot = min_vruntime(vprot, ineligible_vruntime(cfs_rq));
+ else
+ vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+ }
se->vprot = vprot;
}
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 22:28 ` Peter Zijlstra
@ 2026-06-26 3:57 ` K Prateek Nayak
2026-06-26 6:41 ` Peter Zijlstra
2026-06-26 7:05 ` Vincent Guittot
0 siblings, 2 replies; 44+ messages in thread
From: K Prateek Nayak @ 2026-06-26 3:57 UTC (permalink / raw)
To: Peter Zijlstra, Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, qyousef
Hello Peter,
On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> +{
> + struct sched_entity *curr = cfs_rq->curr;
> + long weight = cfs_rq->sum_weight;
> + s64 delta = 0;
> +
> + if (curr && !curr->on_rq)
> + curr = NULL;
> +
> + /*
> + * This is called from set_next_task_fair(.first=true) /
> + * set_protect_slice() so curr had better be set and on_rq.
> + */
> + WARN_ON_ONCE(!curr);
set_protect_slice() is indeed called from set_next_entity(.first=true)
but it is done after __dequeue_entity() and before "cfs_rq->curr" is
set (both sched/flat and sched/core have the same pattern).
You should be hitting this splat very easily unless you have moved
set_protect_slice() after the setting of cfs_rq->curr in your tree.
> +
> + if (weight) {
> + s64 runtime = cfs_rq->sum_w_vruntime;
> +
> + /*
> + * Do not add @curr to obtain the effective '- w_j' terms.
> + */
> +
> + /* sign flips effective floor / ceiling */
> + if (runtime < 0)
> + runtime -= (weight - 1);
> +
> + delta = div64_long(runtime, weight);
> + }
> +
> + return cfs_rq->zero_vruntime + delta + 1;
> +}
> +
> static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
>
> /*
> @@ -1058,8 +1119,14 @@ static inline void set_protect_slice(str
> slice = cfs_rq_min_slice(cfs_rq);
>
> slice = min(slice, se->slice);
> - if (slice != se->slice)
> - vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
> +
> + /* If there are shorter slices than se's one */
> + if (slice != se->slice) {
I guess that condition was protecting a simple boot test but If I run:
for i in 10000000 20000000 30000000 40000000 50000000;
do
sudo taskset -c 0 chrt -v -o -T $i 0 loop& # Simple while(1) loop
done
I see the splat:
------------[ cut here ]------------
!curr
WARNING: kernel/sched/fair.c:858 at set_next_task_fair+0x1d1/0x870, CPU#0: kworker/0:5/3750
...
> + if (sched_feat(PREEMPT_SHORT))
> + vprot = min_vruntime(vprot, ineligible_vruntime(cfs_rq));
> + else
> + vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
> + }
>
> se->vprot = vprot;
> }
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-26 3:57 ` K Prateek Nayak
@ 2026-06-26 6:41 ` Peter Zijlstra
2026-06-26 7:05 ` Vincent Guittot
1 sibling, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-26 6:41 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Vincent Guittot, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, Jun 26, 2026 at 09:27:41AM +0530, K Prateek Nayak wrote:
> Hello Peter,
>
> On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> > +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> > +{
> > + struct sched_entity *curr = cfs_rq->curr;
> > + long weight = cfs_rq->sum_weight;
> > + s64 delta = 0;
> > +
> > + if (curr && !curr->on_rq)
> > + curr = NULL;
> > +
> > + /*
> > + * This is called from set_next_task_fair(.first=true) /
> > + * set_protect_slice() so curr had better be set and on_rq.
> > + */
> > + WARN_ON_ONCE(!curr);
>
> set_protect_slice() is indeed called from set_next_entity(.first=true)
> but it is done after __dequeue_entity() and before "cfs_rq->curr" is
> set (both sched/flat and sched/core have the same pattern).
>
> You should be hitting this splat very easily unless you have moved
> set_protect_slice() after the setting of cfs_rq->curr in your tree.
| static void set_next_task_fair(struct rq *rq, struct task_struct *p, bool first)
| {
| struct sched_entity *se = &p->se;
| bool throttled = false;
| struct cfs_rq *cfs_rq = &rq->cfs;
| unsigned long weight = NICE_0_LOAD;
| bool on_rq = se->on_rq;
|
| clear_buddies(cfs_rq, se);
|
| if (on_rq)
| __dequeue_entity(cfs_rq, se);
XXX
|
| for_each_sched_entity(se) {
| cfs_rq = cfs_rq_of(se);
|
| if (!IS_ENABLED(CONFIG_FAIR_GROUP_SCHED) ||
| !first || !cfs_rq->h_curr)
| set_next_entity(cfs_rq, se);
|
| /* ensure bandwidth has been allocated on our new cfs_rq */
| throttled |= account_cfs_rq_runtime(cfs_rq, 0);
|
| if (on_rq)
| weight = __calc_prop_weight(cfs_rq, se, weight);
| }
|
| if (throttled)
| task_throttle_setup_work(p);
|
| se = &p->se;
| cfs_rq->curr = se;
XXX
|
| if (on_rq) {
| reweight_eevdf(cfs_rq, se, weight, se->on_rq);
| if (first)
| set_protect_slice(cfs_rq, se);
XXX
| }
|
| if (task_on_rq_queued(p)) {
| /*
| * Move the next running task to the front of the list, so our
| * cfs_tasks list becomes MRU one.
| */
| list_move(&se->group_node, &rq->cfs_tasks);
| }
| if (!first)
| return;
|
| WARN_ON_ONCE(se->sched_delayed);
|
| if (hrtick_enabled_fair(rq))
| hrtick_start_fair(rq, p);
|
| update_misfit_status(p, rq);
| sched_fair_update_stop_tick(rq, p);
| }
Is what it looks like here. Anyway, let me actually build it and try ;-)
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-26 3:57 ` K Prateek Nayak
2026-06-26 6:41 ` Peter Zijlstra
@ 2026-06-26 7:05 ` Vincent Guittot
2026-06-26 7:18 ` Peter Zijlstra
1 sibling, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-26 7:05 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Peter Zijlstra, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, 26 Jun 2026 at 05:57, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Peter,
>
> On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> > +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> > +{
> > + struct sched_entity *curr = cfs_rq->curr;
> > + long weight = cfs_rq->sum_weight;
> > + s64 delta = 0;
> > +
> > + if (curr && !curr->on_rq)
> > + curr = NULL;
> > +
> > + /*
> > + * This is called from set_next_task_fair(.first=true) /
> > + * set_protect_slice() so curr had better be set and on_rq.
> > + */
> > + WARN_ON_ONCE(!curr);
>
> set_protect_slice() is indeed called from set_next_entity(.first=true)
> but it is done after __dequeue_entity() and before "cfs_rq->curr" is
> set (both sched/flat and sched/core have the same pattern).
Yes, I confirm that set_protect_slice() is called before setting curr
in tip/sched/core but I think that Peter added flat hierarchy patches
in his tree
>
> You should be hitting this splat very easily unless you have moved
> set_protect_slice() after the setting of cfs_rq->curr in your tree.
>
> > +
> > + if (weight) {
> > + s64 runtime = cfs_rq->sum_w_vruntime;
> > +
> > + /*
> > + * Do not add @curr to obtain the effective '- w_j' terms.
> > + */
> > +
> > + /* sign flips effective floor / ceiling */
> > + if (runtime < 0)
> > + runtime -= (weight - 1);
> > +
> > + delta = div64_long(runtime, weight);
> > + }
> > +
> > + return cfs_rq->zero_vruntime + delta + 1;
> > +}
> > +
> > static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
> >
> > /*
> > @@ -1058,8 +1119,14 @@ static inline void set_protect_slice(str
> > slice = cfs_rq_min_slice(cfs_rq);
> >
> > slice = min(slice, se->slice);
> > - if (slice != se->slice)
> > - vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
> > +
> > + /* If there are shorter slices than se's one */
> > + if (slice != se->slice) {
>
> I guess that condition was protecting a simple boot test but If I run:
>
> for i in 10000000 20000000 30000000 40000000 50000000;
> do
> sudo taskset -c 0 chrt -v -o -T $i 0 loop& # Simple while(1) loop
> done
>
>
> I see the splat:
>
> ------------[ cut here ]------------
> !curr
> WARNING: kernel/sched/fair.c:858 at set_next_task_fair+0x1d1/0x870, CPU#0: kworker/0:5/3750
> ...
>
> > + if (sched_feat(PREEMPT_SHORT))
> > + vprot = min_vruntime(vprot, ineligible_vruntime(cfs_rq));
> > + else
> > + vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
> > + }
> >
> > se->vprot = vprot;
> > }
>
> --
> Thanks and Regards,
> Prateek
>
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-26 7:05 ` Vincent Guittot
@ 2026-06-26 7:18 ` Peter Zijlstra
2026-06-26 7:54 ` Peter Zijlstra
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-26 7:18 UTC (permalink / raw)
To: Vincent Guittot
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, Jun 26, 2026 at 09:05:29AM +0200, Vincent Guittot wrote:
> On Fri, 26 Jun 2026 at 05:57, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> >
> > Hello Peter,
> >
> > On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> > > +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> > > +{
> > > + struct sched_entity *curr = cfs_rq->curr;
> > > + long weight = cfs_rq->sum_weight;
> > > + s64 delta = 0;
> > > +
> > > + if (curr && !curr->on_rq)
> > > + curr = NULL;
> > > +
> > > + /*
> > > + * This is called from set_next_task_fair(.first=true) /
> > > + * set_protect_slice() so curr had better be set and on_rq.
> > > + */
> > > + WARN_ON_ONCE(!curr);
> >
> > set_protect_slice() is indeed called from set_next_entity(.first=true)
> > but it is done after __dequeue_entity() and before "cfs_rq->curr" is
> > set (both sched/flat and sched/core have the same pattern).
>
> Yes, I confirm that set_protect_slice() is called before setting curr
> in tip/sched/core but I think that Peter added flat hierarchy patches
> in his tree
Indeed I did; I was planning to stick them in this cycle, so I figure I
should be doing these patches on top.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-26 7:18 ` Peter Zijlstra
@ 2026-06-26 7:54 ` Peter Zijlstra
2026-06-26 8:55 ` Vincent Guittot
0 siblings, 1 reply; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-26 7:54 UTC (permalink / raw)
To: Vincent Guittot
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, Jun 26, 2026 at 09:18:32AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 26, 2026 at 09:05:29AM +0200, Vincent Guittot wrote:
> > On Fri, 26 Jun 2026 at 05:57, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> > >
> > > Hello Peter,
> > >
> > > On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> > > > +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> > > > +{
> > > > + struct sched_entity *curr = cfs_rq->curr;
> > > > + long weight = cfs_rq->sum_weight;
> > > > + s64 delta = 0;
> > > > +
> > > > + if (curr && !curr->on_rq)
> > > > + curr = NULL;
> > > > +
> > > > + /*
> > > > + * This is called from set_next_task_fair(.first=true) /
> > > > + * set_protect_slice() so curr had better be set and on_rq.
> > > > + */
> > > > + WARN_ON_ONCE(!curr);
> > >
> > > set_protect_slice() is indeed called from set_next_entity(.first=true)
> > > but it is done after __dequeue_entity() and before "cfs_rq->curr" is
> > > set (both sched/flat and sched/core have the same pattern).
> >
> > Yes, I confirm that set_protect_slice() is called before setting curr
> > in tip/sched/core but I think that Peter added flat hierarchy patches
> > in his tree
>
> Indeed I did; I was planning to stick them in this cycle, so I figure I
> should be doing these patches on top.
I pushed out the now booting pile of gunk into queue:sched/core.
I've made some rando edits to your (Vincent's) patches, if you disagree
or want changes, let me know. This really is just my quilt stack of the
moment for sched/core and it can easily be changed.
/me wonders how the fuck it can be 28C before 10am and goes in search of
more water.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-26 7:54 ` Peter Zijlstra
@ 2026-06-26 8:55 ` Vincent Guittot
2026-06-26 13:55 ` Vincent Guittot
0 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-26 8:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, 26 Jun 2026 at 09:55, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jun 26, 2026 at 09:18:32AM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 26, 2026 at 09:05:29AM +0200, Vincent Guittot wrote:
> > > On Fri, 26 Jun 2026 at 05:57, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> > > >
> > > > Hello Peter,
> > > >
> > > > On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> > > > > +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> > > > > +{
> > > > > + struct sched_entity *curr = cfs_rq->curr;
> > > > > + long weight = cfs_rq->sum_weight;
> > > > > + s64 delta = 0;
> > > > > +
> > > > > + if (curr && !curr->on_rq)
> > > > > + curr = NULL;
> > > > > +
> > > > > + /*
> > > > > + * This is called from set_next_task_fair(.first=true) /
> > > > > + * set_protect_slice() so curr had better be set and on_rq.
> > > > > + */
> > > > > + WARN_ON_ONCE(!curr);
> > > >
> > > > set_protect_slice() is indeed called from set_next_entity(.first=true)
> > > > but it is done after __dequeue_entity() and before "cfs_rq->curr" is
> > > > set (both sched/flat and sched/core have the same pattern).
> > >
> > > Yes, I confirm that set_protect_slice() is called before setting curr
> > > in tip/sched/core but I think that Peter added flat hierarchy patches
> > > in his tree
> >
> > Indeed I did; I was planning to stick them in this cycle, so I figure I
> > should be doing these patches on top.
>
> I pushed out the now booting pile of gunk into queue:sched/core.
>
> I've made some rando edits to your (Vincent's) patches, if you disagree
> or want changes, let me know. This really is just my quilt stack of the
> moment for sched/core and it can easily be changed.
Okay, I'm going to have a look
>
> /me wonders how the fuck it can be 28C before 10am and goes in search of
> more water.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-26 8:55 ` Vincent Guittot
@ 2026-06-26 13:55 ` Vincent Guittot
2026-06-29 14:25 ` Vincent Guittot
0 siblings, 1 reply; 44+ messages in thread
From: Vincent Guittot @ 2026-06-26 13:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, 26 Jun 2026 at 10:55, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Fri, 26 Jun 2026 at 09:55, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Jun 26, 2026 at 09:18:32AM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 26, 2026 at 09:05:29AM +0200, Vincent Guittot wrote:
> > > > On Fri, 26 Jun 2026 at 05:57, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> > > > >
> > > > > Hello Peter,
> > > > >
> > > > > On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> > > > > > +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> > > > > > +{
> > > > > > + struct sched_entity *curr = cfs_rq->curr;
> > > > > > + long weight = cfs_rq->sum_weight;
> > > > > > + s64 delta = 0;
> > > > > > +
> > > > > > + if (curr && !curr->on_rq)
> > > > > > + curr = NULL;
> > > > > > +
> > > > > > + /*
> > > > > > + * This is called from set_next_task_fair(.first=true) /
> > > > > > + * set_protect_slice() so curr had better be set and on_rq.
> > > > > > + */
> > > > > > + WARN_ON_ONCE(!curr);
> > > > >
> > > > > set_protect_slice() is indeed called from set_next_entity(.first=true)
> > > > > but it is done after __dequeue_entity() and before "cfs_rq->curr" is
> > > > > set (both sched/flat and sched/core have the same pattern).
> > > >
> > > > Yes, I confirm that set_protect_slice() is called before setting curr
> > > > in tip/sched/core but I think that Peter added flat hierarchy patches
> > > > in his tree
> > >
> > > Indeed I did; I was planning to stick them in this cycle, so I figure I
> > > should be doing these patches on top.
> >
> > I pushed out the now booting pile of gunk into queue:sched/core.
> >
> > I've made some rando edits to your (Vincent's) patches, if you disagree
> > or want changes, let me know. This really is just my quilt stack of the
> > moment for sched/core and it can easily be changed.
>
> Okay, I'm going to have a look
The patches look good to me:
- The addition of set_short_buddy() makes sense
- The fix of ineligible_vruntime too
>
>
> >
> > /me wonders how the fuck it can be 28C before 10am and goes in search of
> > more water.
^ permalink raw reply [flat|nested] 44+ messages in thread* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-26 13:55 ` Vincent Guittot
@ 2026-06-29 14:25 ` Vincent Guittot
0 siblings, 0 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-29 14:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Fri, 26 Jun 2026 at 15:55, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
>
> On Fri, 26 Jun 2026 at 10:55, Vincent Guittot
> <vincent.guittot@linaro.org> wrote:
> >
> > On Fri, 26 Jun 2026 at 09:55, Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Fri, Jun 26, 2026 at 09:18:32AM +0200, Peter Zijlstra wrote:
> > > > On Fri, Jun 26, 2026 at 09:05:29AM +0200, Vincent Guittot wrote:
> > > > > On Fri, 26 Jun 2026 at 05:57, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> > > > > >
> > > > > > Hello Peter,
> > > > > >
> > > > > > On 6/26/2026 3:58 AM, Peter Zijlstra wrote:
> > > > > > > +static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
> > > > > > > +{
> > > > > > > + struct sched_entity *curr = cfs_rq->curr;
> > > > > > > + long weight = cfs_rq->sum_weight;
> > > > > > > + s64 delta = 0;
> > > > > > > +
> > > > > > > + if (curr && !curr->on_rq)
> > > > > > > + curr = NULL;
> > > > > > > +
> > > > > > > + /*
> > > > > > > + * This is called from set_next_task_fair(.first=true) /
> > > > > > > + * set_protect_slice() so curr had better be set and on_rq.
> > > > > > > + */
> > > > > > > + WARN_ON_ONCE(!curr);
> > > > > >
> > > > > > set_protect_slice() is indeed called from set_next_entity(.first=true)
> > > > > > but it is done after __dequeue_entity() and before "cfs_rq->curr" is
> > > > > > set (both sched/flat and sched/core have the same pattern).
> > > > >
> > > > > Yes, I confirm that set_protect_slice() is called before setting curr
> > > > > in tip/sched/core but I think that Peter added flat hierarchy patches
> > > > > in his tree
> > > >
> > > > Indeed I did; I was planning to stick them in this cycle, so I figure I
> > > > should be doing these patches on top.
> > >
> > > I pushed out the now booting pile of gunk into queue:sched/core.
> > >
> > > I've made some rando edits to your (Vincent's) patches, if you disagree
> > > or want changes, let me know. This really is just my quilt stack of the
> > > moment for sched/core and it can easily be changed.
> >
> > Okay, I'm going to have a look
>
> The patches look good to me:
> - The addition of set_short_buddy() makes sense
> - The fix of ineligible_vruntime too
>
I run a bunch of perf and latency tests at roor level with your
sched/core on top of 7.2-rc2 and results looks good
> >
> >
> > >
> > > /me wonders how the fuck it can be 28C before 10am and goes in search of
> > > more water.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 8:37 ` Peter Zijlstra
2026-06-25 10:09 ` Peter Zijlstra
@ 2026-06-25 14:55 ` Vincent Guittot
1 sibling, 0 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-25 14:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, mingo, juri.lelli, dietmar.eggemann, rostedt,
bsegall, mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 10:37, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Jun 25, 2026 at 01:07:43PM +0530, K Prateek Nayak wrote:
>
> > > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > > +{
> > > + struct sched_entity *curr = cfs_rq->curr;
> >
> > curr seems to be unused here and is NULL anyways when
> > set_protect_slice() is called ;-)
>
> Ah, but it is not with the flat patches on, which is why I was a little
> confused ;-)
Yeah, flat hierarchy has another level of complexity in tracking
scheduling latency and that will be the next step
>
> That said; I now see se == curr. So let me go have another look at all
> that.
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-25 7:37 ` K Prateek Nayak
2026-06-25 8:37 ` Peter Zijlstra
@ 2026-06-25 12:51 ` Vincent Guittot
1 sibling, 0 replies; 44+ messages in thread
From: Vincent Guittot @ 2026-06-25 12:51 UTC (permalink / raw)
To: K Prateek Nayak
Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel, qyousef
On Thu, 25 Jun 2026 at 09:37, K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello Vincent,
>
> On 6/24/2026 8:42 PM, Vincent Guittot wrote:
> > +/*
> > + * Compute the vruntime until which the entity remains eligible when it runs
> > + * or is about to run on the CPU. We use this value to set vprot to the min
> > + * value until which other entities would not be picked anyway.
> > + * \Sum (v_i - v0)*w_i
> > + * V = ------------------- + v0
> > + * \Sum w_i
> > + *
> > + * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
> > + * in the rb tree and next is already dequeued so
> > + *
> > + * cfs_rq->sum_w_vruntime
> > + * V' = ------------------------- + v0
> > + * cfs_rq->sum_weight + w_se
> > +
>
> nit.
>
> ^ is that a stray line or a Missing * at the beginning of the comment
> line?
yes
>
> > + */
> > +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > +{
> > + struct sched_entity *curr = cfs_rq->curr;
>
> curr seems to be unused here and is NULL anyways when
> set_protect_slice() is called ;-)
Yeah, I remember seeing the warning and forgot to remove it
>
> > + long weight = cfs_rq->sum_weight;
> > + s64 delta = 0;
> > +
> > + if (weight) {
> > + s64 runtime = cfs_rq->sum_w_vruntime;
> > +
> > + weight += avg_vruntime_weight(cfs_rq, se->load.weight);
> > +
> > + /* sign flips effective floor / ceiling */
> > + if (runtime < 0)
> > + runtime -= (weight - 1);
> > +
> > + delta = div64_long(runtime, weight);
> > + } else {> + /*
> > + * When there is but one element, it is the average.
> > + */
> > + delta = 0;
>
> Even with a single entity, the se->vruntime can still diverge from
> cfs_rq->zero_vruntime
>
> Last avg_vruntime() call for cfs_rq was at update_entity_lag() during
> last dequeue while se->on_rq was still set for the dequeuing entity.
>
> Should this be entity_key(cfs_rq, se) instead?
Probably although the case should never happen
>
> > + }
> > +
> > + return cfs_rq->zero_vruntime + delta + 1;
> > +}
> > +
> > static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
> >
> > /*
> --
> Thanks and Regards,
> Prateek
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
2026-06-25 7:37 ` K Prateek Nayak
@ 2026-06-25 8:33 ` Peter Zijlstra
2026-06-30 9:03 ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2 siblings, 0 replies; 44+ messages in thread
From: Peter Zijlstra @ 2026-06-25 8:33 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, linux-kernel, qyousef
On Wed, Jun 24, 2026 at 05:12:29PM +0200, Vincent Guittot wrote:
> When a task with a shorter slice is enqueued, we protect the running
> task which has a longer slice until it becomes ineligible instead of a
> full slice in order to speedup the switch to other tasks until the task
> with the shortest slice is scheduled. This helps to the task to not wait
> too many full slices before running.
>
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
> kernel/sched/fair.c | 52 +++++++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 50 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f972987618e7..7c541f27a1ed 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -813,6 +813,48 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
> return cfs_rq->zero_vruntime;
> }
>
> +/*
> + * Compute the vruntime until which the entity remains eligible when it runs
> + * or is about to run on the CPU. We use this value to set vprot to the min
> + * value until which other entities would not be picked anyway.
> + * \Sum (v_i - v0)*w_i
> + * V = ------------------- + v0
> + * \Sum w_i
> + *
> + * We want V' for (v_se - v0) == 0. Previous entity has already been enqueued
> + * in the rb tree and next is already dequeued so
> + *
> + * cfs_rq->sum_w_vruntime
> + * V' = ------------------------- + v0
> + * cfs_rq->sum_weight + w_se
> +
> + */
> +static u64 eligible_vruntime(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +{
> + struct sched_entity *curr = cfs_rq->curr;
> + long weight = cfs_rq->sum_weight;
> + s64 delta = 0;
'curr' goes unused in this function, did you want:
if (curr && !curr->on_rq)
curr = NULL;
> +
> + if (weight) {
> + s64 runtime = cfs_rq->sum_w_vruntime;
if (curr) {
unsigned long w = avg_vruntime_weight(cfs_rq, curr->load.weight);
runtime += entity_key(cfs_rq, curr) * w;
weight += w;
}
?
> +
> + weight += avg_vruntime_weight(cfs_rq, se->load.weight);
> +
> + /* sign flips effective floor / ceiling */
> + if (runtime < 0)
> + runtime -= (weight - 1);
> +
> + delta = div64_long(runtime, weight);
> + } else {
> + /*
> + * When there is but one element, it is the average.
> + */
> + delta = 0;
> + }
> +
> + return cfs_rq->zero_vruntime + delta + 1;
> +}
^ permalink raw reply [flat|nested] 44+ messages in thread* [tip: sched/core] sched/eevdf: Speedup short slice task scheduling
2026-06-24 15:12 ` [PATCH 6/6 v3] sched/eevdf: Speedup short slice task scheduling Vincent Guittot
2026-06-25 7:37 ` K Prateek Nayak
2026-06-25 8:33 ` Peter Zijlstra
@ 2026-06-30 9:03 ` tip-bot2 for Vincent Guittot
2 siblings, 0 replies; 44+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2026-06-30 9:03 UTC (permalink / raw)
To: linux-tip-commits
Cc: Vincent Guittot, Peter Zijlstra (Intel), K Prateek Nayak, x86,
linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 0d1e7a2bab35dbc898e8a6aa29fd074dafbbccda
Gitweb: https://git.kernel.org/tip/0d1e7a2bab35dbc898e8a6aa29fd074dafbbccda
Author: Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate: Wed, 24 Jun 2026 17:12:29 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 30 Jun 2026 10:56:55 +02:00
sched/eevdf: Speedup short slice task scheduling
When a task with a shorter slice is enqueued, we protect the running
task which has a longer slice until it becomes ineligible instead of a
full slice in order to speedup the switch to other tasks until the task
with the shortest slice is scheduled. This helps to the task to not wait
too many full slices before running.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260624151229.1710703-7-vincent.guittot@linaro.org
---
kernel/sched/fair.c | 71 ++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 69 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2547a97..c575910 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -777,6 +777,67 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
return cfs_rq->zero_vruntime;
}
+/*
+ * \Sum (v_i - v0)*w_i
+ * V = ------------------- + v0
+ * \Sum w_i
+ *
+ * Let W = \Sum w_i, and move v_j such that 'v_j == V', thus:
+ *
+ * V = 1/W * {(v_j - v0)*w_j + \Sum_i!=j (v_i - v0)*w_i} + v0
+ *
+ * v_j = 1/W * {(v_j - v0)*w_j + \Sum_i!=j (v_i - v0)*w_i} + v0
+ *
+ * v_j = 1/W * (v_j - v0)*w_j + 1/W * \Sum_i!=j (v_i - v0)*w_i + v0
+ *
+ * v_j - 1/W * (v_j - v0)*w_j = 1/W * \Sum_i!=j (v_i - v0)*w_i + v0
+ *
+ * v_j*W - (v_j - v0)*w_j = \Sum_i!=j (v_i - v0)*w_i + v0*W
+ *
+ * v_j*(W - w_j) + v0*w_j = \Sum_i!=j (v_i - v0)*w_i + v0*W
+ *
+ * v_j*(W - w_j) = \Sum_i!=j (v_i - v0)*w_i + v0*(W - w_j)
+ *
+ * \Sum_i!=j (v_i - v0)*w_i
+ * v_j = ------------------------ + v0
+ * W - w_j
+ *
+ * When v_j happens to be curr, then '\Sum_i!=j (v_i - v0)*w_i'
+ * is cfs_rq->sum_w_runtime, and 'W - w_j' is cfs_rq->sum_weight, since curr
+ * is not included in the sum.
+ */
+static u64 ineligible_vruntime(struct cfs_rq *cfs_rq)
+{
+ struct sched_entity *curr = cfs_rq->curr;
+ long weight = cfs_rq->sum_weight;
+ s64 delta = 0;
+
+ if (curr && !curr->on_rq)
+ curr = NULL;
+
+ /*
+ * This is called from set_next_task_fair(.first=true) /
+ * set_protect_slice() so curr had better be set and on_rq.
+ */
+ WARN_ON_ONCE(!curr);
+
+ if (weight) {
+ s64 runtime = cfs_rq->sum_w_vruntime;
+
+ /*
+ * Do not add @curr to obtain the effective '- w_j' terms.
+ */
+
+ /* sign flips effective floor / ceiling */
+ if (runtime < 0)
+ runtime -= (weight - 1);
+
+ delta = div64_long(runtime, weight);
+ }
+
+ return cfs_rq->zero_vruntime + delta + 1;
+}
+
static inline u64 cfs_rq_max_slice(struct cfs_rq *cfs_rq);
/*
@@ -1058,8 +1119,14 @@ static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity
slice = cfs_rq_min_slice(cfs_rq);
slice = min(slice, se->slice);
- if (slice != se->slice)
- vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+
+ /* If there are shorter slices than se's one */
+ if (slice != se->slice) {
+ if (sched_feat(PREEMPT_SHORT))
+ vprot = min_vruntime(vprot, ineligible_vruntime(cfs_rq));
+ else
+ vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+ }
se->vprot = vprot;
}
^ permalink raw reply related [flat|nested] 44+ messages in thread