* [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices
@ 2025-07-04 14:36 Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Vincent Guittot @ 2025-07-04 14:36 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, dhaval, linux-kernel
Cc: Vincent Guittot
This follows the attempt to better track maximum lag of tasks in presence
of different slices duration:
[1] https://lore.kernel.org/all/20250418151225.3006867-1-vincent.guittot@linaro.org/
Since v1, tracking of the max slice has been removed from the patchset
because we now ensure that the lag of an entity remains in the range of:
[-(slice + tick) : (slice + tick)] with run_to_parity
and
[max(-slice, -(0.7+tick) : max(slice , (0.7+tick)] without run to parity
As a result, there is no need the max slice of enqueued entities anymore.
Patch 1 is a simple cleanup to ease following changes.
Patch 2 fixes the lag for NO_RUN_TO_PARITY. It has been put 1st because of
its simplicity. The running task has a minimum protection of 0.7ms before
eevdf looks for another task.
Patch 3 ensures that the protection is canceled only if the waking task
will be selected by pick_task_fair. This case has been mentionned by Peter
will reviewing v1.
Patch 4 modifes the duration of the protection to take into account the
shortest slice of enqueued tasks instead of the slice of the running task.
Patch 5 fixes the case of tasks not being eligible at wakeup or after
migrating but with a shorter slice. We need to update the duration of the
protection to not exceed the lag.
Patch 6 fixes the case of tasks still being eligible after the protected
period but others must run to no exceed lag limit. This has been
highlighted in a test with delayed entities being dequeued with a positive
lag larger than their slice but it can happen for delayed dequeue entity
too.
The patchset has been tested with rt-app on 37 different use cases, some a
simple and should never trigger any problem but have been kept to increase
the test coverage. The tests have been run on dragon rb5 with affinity on
biggest cores. The lag has been checked when we update the entity's lag at
dequeue and every time we check if an entity is eligible.
RUN_TO_PARITY NO_RUN_TO_PARITY
lag error lag_error
mainline 14/37 14/37
+ patch 1-2 14/37 0/37
+ patch 3-5 1/37 0/37
+ patch 6 0/37 0/37
Vincent Guittot (6):
sched/fair: Use protect_slice() instead of direct comparison
sched/fair: Fix NO_RUN_TO_PARITY case
sched/fair: Remove spurious shorter slice preemption
sched/fair: Limit run to parity to the min slice of enqueued entities
sched/fair: Fix entity's lag with run to parity
sched/fair: Always trigger resched at the end of a protected period
kernel/sched/fair.c | 94 ++++++++++++++++++++++++---------------------
1 file changed, 50 insertions(+), 44 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2 1/6] sched/fair: Use protect_slice() instead of direct comparison
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
@ 2025-07-04 14:36 ` Vincent Guittot
2025-07-04 14:36 ` [PATCH 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2025-07-04 14:36 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, dhaval, linux-kernel
Cc: Vincent Guittot
Replace the test by the relevant protect_slice() function.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dhaval Giani (AMD) <dhaval@gianis.ca>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e2963efe800..43712403ec98 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1161,7 +1161,7 @@ static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_entity
if (!sched_feat(PREEMPT_SHORT))
return false;
- if (curr->vlag == curr->deadline)
+ if (protect_slice(curr))
return false;
return !entity_eligible(cfs_rq, curr);
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/6] sched/fair: Fix NO_RUN_TO_PARITY case
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
@ 2025-07-04 14:36 ` Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2025-07-04 14:36 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, dhaval, linux-kernel
Cc: Vincent Guittot
EEVDF expects the scheduler to allocate a time quantum to the selected
entity and then pick a new entity for next quantum.
Although this notion of time quantum is not strictly doable in our case,
we can ensure a minimum runtime for each task most of the time and pick a
new entity after minim time has elapsed.
Reuse the slice protection of run to parity to ensure such runtime
quantum.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 30 +++++++++++++++++++-----------
1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 43712403ec98..0c1abb079ebb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -882,23 +882,34 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
}
/*
- * HACK, stash a copy of deadline at the point of pick in vlag,
- * which isn't used until dequeue.
+ * HACK, Set the vruntime up to which an entity can run before picking another
+ * one, in vlag, which isn't used until dequeue.
+ * In case of run to parity, we let the entity run to its deadline.
+ * When run to parity is disabled, we give a minimum quantum to the running
+ * entity to ensure progress.
*/
static inline void set_protect_slice(struct sched_entity *se)
{
- se->vlag = se->deadline;
+ u64 quantum = se->slice;
+
+ if (!sched_feat(RUN_TO_PARITY))
+ quantum = min(quantum, normalized_sysctl_sched_base_slice);
+
+ if (quantum != se->slice)
+ se->vlag = min(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
+ else
+ se->vlag = se->deadline;
}
static inline bool protect_slice(struct sched_entity *se)
{
- return se->vlag == se->deadline;
+ return ((s64)(se->vlag - se->vruntime) > 0);
}
static inline void cancel_protect_slice(struct sched_entity *se)
{
if (protect_slice(se))
- se->vlag = se->deadline + 1;
+ se->vlag = se->vruntime;
}
/*
@@ -937,7 +948,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
curr = NULL;
- if (sched_feat(RUN_TO_PARITY) && curr && protect_slice(curr))
+ if (curr && protect_slice(curr))
return curr;
/* Pick the leftmost entity if it's eligible */
@@ -1156,11 +1167,8 @@ static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
cgroup_account_cputime(p, delta_exec);
}
-static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_entity *curr)
+static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_entity *curr)
{
- if (!sched_feat(PREEMPT_SHORT))
- return false;
-
if (protect_slice(curr))
return false;
@@ -1248,7 +1256,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
if (cfs_rq->nr_queued == 1)
return;
- if (resched || did_preempt_short(cfs_rq, curr)) {
+ if (resched || resched_next_quantum(cfs_rq, curr)) {
resched_curr_lazy(rq);
clear_buddies(cfs_rq, curr);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 3/6] sched/fair: Remove spurious shorter slice preemption
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
2025-07-04 14:36 ` [PATCH 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
@ 2025-07-04 14:36 ` Vincent Guittot
2025-07-05 14:15 ` kernel test robot
2025-07-04 14:36 ` [PATCH v2 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
` (3 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Vincent Guittot @ 2025-07-04 14:36 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, dhaval, linux-kernel
Cc: Vincent Guittot
Even if the waking task can preempt current, it might not be the one
selected by pick_task_fair. Check that the waking task will be selected
if we cancel the slice protection before doing so.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 44 ++++++++++++++------------------------------
1 file changed, 14 insertions(+), 30 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0c1abb079ebb..2e624a769b86 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -931,7 +931,7 @@ static inline void cancel_protect_slice(struct sched_entity *se)
*
* Which allows tree pruning through eligibility.
*/
-static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq, bool protect)
{
struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
struct sched_entity *se = __pick_first_entity(cfs_rq);
@@ -948,7 +948,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
curr = NULL;
- if (curr && protect_slice(curr))
+ if (curr && protect && protect_slice(curr))
return curr;
/* Pick the leftmost entity if it's eligible */
@@ -992,6 +992,11 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
return best;
}
+static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+{
+ return __pick_eevdf(cfs_rq, true);
+}
+
struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
{
struct rb_node *last = rb_last(&cfs_rq->tasks_timeline.rb_root);
@@ -1175,27 +1180,6 @@ static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_enti
return !entity_eligible(cfs_rq, curr);
}
-static inline bool do_preempt_short(struct cfs_rq *cfs_rq,
- struct sched_entity *pse, struct sched_entity *se)
-{
- if (!sched_feat(PREEMPT_SHORT))
- return false;
-
- if (pse->slice >= se->slice)
- return false;
-
- if (!entity_eligible(cfs_rq, pse))
- return false;
-
- if (entity_before(pse, se))
- return true;
-
- if (!entity_eligible(cfs_rq, se))
- return true;
-
- return false;
-}
-
/*
* Used by other classes to account runtime.
*/
@@ -8666,6 +8650,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
struct sched_entity *se = &donor->se, *pse = &p->se;
struct cfs_rq *cfs_rq = task_cfs_rq(donor);
int cse_is_idle, pse_is_idle;
+ bool do_preempt_short = false;
if (unlikely(se == pse))
return;
@@ -8714,7 +8699,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
* When non-idle entity preempt an idle entity,
* don't give idle entity slice protection.
*/
- cancel_protect_slice(se);
+ do_preempt_short = true;
goto preempt;
}
@@ -8732,22 +8717,21 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
/*
* If @p has a shorter slice than current and @p is eligible, override
* current's slice protection in order to allow preemption.
- *
- * Note that even if @p does not turn out to be the most eligible
- * task at this moment, current's slice protection will be lost.
*/
- if (do_preempt_short(cfs_rq, pse, se))
- cancel_protect_slice(se);
+ do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
/*
* If @p has become the most eligible task, force preemption.
*/
- if (pick_eevdf(cfs_rq) == pse)
+ if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
goto preempt;
return;
preempt:
+ if (do_preempt_short)
+ cancel_protect_slice(se);
+
resched_curr_lazy(rq);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
` (2 preceding siblings ...)
2025-07-04 14:36 ` [PATCH v2 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
@ 2025-07-04 14:36 ` Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 5/6] sched/fair: Fix entity's lag with run to parity Vincent Guittot
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2025-07-04 14:36 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, dhaval, linux-kernel
Cc: Vincent Guittot
Run to parity ensures that current will get a chance to run its full
slice in one go but this can create large latency and/or lag for
entities with shorter slice that has exhausted their previous slice
and wait to run their next slice.
Clamp the run to parity to the shortest slice of all enqueued entities.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2e624a769b86..2a0348b0cc3d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -884,16 +884,20 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
/*
* HACK, Set the vruntime up to which an entity can run before picking another
* one, in vlag, which isn't used until dequeue.
- * In case of run to parity, we let the entity run to its deadline.
+ * In case of run to parity, we use the shortest slice of the enqueued entities
+ * to define the longest runtime.
* When run to parity is disabled, we give a minimum quantum to the running
* entity to ensure progress.
*/
static inline void set_protect_slice(struct sched_entity *se)
{
- u64 quantum = se->slice;
+ u64 quantum;
- if (!sched_feat(RUN_TO_PARITY))
- quantum = min(quantum, normalized_sysctl_sched_base_slice);
+ if (sched_feat(RUN_TO_PARITY))
+ quantum = cfs_rq_min_slice(cfs_rq_of(se));
+ else
+ quantum = normalized_sysctl_sched_base_slice;
+ quantum = min(quantum, se->slice);
if (quantum != se->slice)
se->vlag = min(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 5/6] sched/fair: Fix entity's lag with run to parity
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
` (3 preceding siblings ...)
2025-07-04 14:36 ` [PATCH v2 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
@ 2025-07-04 14:36 ` Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 6/6] sched/fair: Always trigger resched at the end of a protected period Vincent Guittot
2025-07-07 14:14 ` [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Peter Zijlstra
6 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2025-07-04 14:36 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, dhaval, linux-kernel
Cc: Vincent Guittot
When an entity is enqueued without preempting current, we must ensure
that the slice protection is updated to take into account the slice
duration of the newly enqueued task so that its lag will not exceed
its slice (+ tick).
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2a0348b0cc3d..58f25f0be25f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -889,12 +889,12 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
* When run to parity is disabled, we give a minimum quantum to the running
* entity to ensure progress.
*/
-static inline void set_protect_slice(struct sched_entity *se)
+static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
u64 quantum;
if (sched_feat(RUN_TO_PARITY))
- quantum = cfs_rq_min_slice(cfs_rq_of(se));
+ quantum = cfs_rq_min_slice(cfs_rq);
else
quantum = normalized_sysctl_sched_base_slice;
quantum = min(quantum, se->slice);
@@ -905,6 +905,13 @@ static inline void set_protect_slice(struct sched_entity *se)
se->vlag = se->deadline;
}
+static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+ u64 quantum = cfs_rq_min_slice(cfs_rq);
+
+ se->vlag = min(se->vlag, (s64)(se->vruntime + calc_delta_fair(quantum, se)));
+}
+
static inline bool protect_slice(struct sched_entity *se)
{
return ((s64)(se->vlag - se->vruntime) > 0);
@@ -5468,7 +5475,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
__dequeue_entity(cfs_rq, se);
update_load_avg(cfs_rq, se, UPDATE_TG);
- set_protect_slice(se);
+ set_protect_slice(cfs_rq, se);
}
update_stats_curr_start(cfs_rq, se);
@@ -8730,6 +8737,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
goto preempt;
+ if (sched_feat(RUN_TO_PARITY) && do_preempt_short)
+ update_protect_slice(cfs_rq, se);
+
return;
preempt:
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2 6/6] sched/fair: Always trigger resched at the end of a protected period
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
` (4 preceding siblings ...)
2025-07-04 14:36 ` [PATCH v2 5/6] sched/fair: Fix entity's lag with run to parity Vincent Guittot
@ 2025-07-04 14:36 ` Vincent Guittot
2025-07-07 14:14 ` [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Peter Zijlstra
6 siblings, 0 replies; 11+ messages in thread
From: Vincent Guittot @ 2025-07-04 14:36 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, dhaval, linux-kernel
Cc: Vincent Guittot
Always trigger a resched after a protected period even if the entity is
still eligible. It can happen that an entity remains eligible at the end
of the protected period but must let an entity with a shorter slice to run
in order to keep its lag shorter than slice. This is particulalry true
with run to parity which tries to maximize the lag.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 58f25f0be25f..debcc136f1c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1188,7 +1188,7 @@ static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_enti
if (protect_slice(curr))
return false;
- return !entity_eligible(cfs_rq, curr);
+ return true;
}
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2 3/6] sched/fair: Remove spurious shorter slice preemption
2025-07-04 14:36 ` [PATCH v2 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
@ 2025-07-05 14:15 ` kernel test robot
0 siblings, 0 replies; 11+ messages in thread
From: kernel test robot @ 2025-07-05 14:15 UTC (permalink / raw)
To: Vincent Guittot, mingo, peterz, juri.lelli, dietmar.eggemann,
rostedt, bsegall, mgorman, vschneid, dhaval, linux-kernel
Cc: oe-kbuild-all, Vincent Guittot
Hi Vincent,
kernel test robot noticed the following build warnings:
[auto build test WARNING on tip/sched/core]
[also build test WARNING on peterz-queue/sched/core linus/master v6.16-rc4 next-20250704]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vincent-Guittot/sched-fair-Use-protect_slice-instead-of-direct-comparison/20250704-223850
base: tip/sched/core
patch link: https://lore.kernel.org/r/20250704143612.998419-4-vincent.guittot%40linaro.org
patch subject: [PATCH v2 3/6] sched/fair: Remove spurious shorter slice preemption
config: s390-randconfig-r072-20250705 (https://download.01.org/0day-ci/archive/20250705/202507052215.5HznxNGI-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 11.5.0
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202507052215.5HznxNGI-lkp@intel.com/
smatch warnings:
kernel/sched/fair.c:8721 check_preempt_wakeup_fair() warn: inconsistent indenting
vim +8721 kernel/sched/fair.c
8643
8644 /*
8645 * Preempt the current task with a newly woken task if needed:
8646 */
8647 static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
8648 {
8649 struct task_struct *donor = rq->donor;
8650 struct sched_entity *se = &donor->se, *pse = &p->se;
8651 struct cfs_rq *cfs_rq = task_cfs_rq(donor);
8652 int cse_is_idle, pse_is_idle;
8653 bool do_preempt_short = false;
8654
8655 if (unlikely(se == pse))
8656 return;
8657
8658 /*
8659 * This is possible from callers such as attach_tasks(), in which we
8660 * unconditionally wakeup_preempt() after an enqueue (which may have
8661 * lead to a throttle). This both saves work and prevents false
8662 * next-buddy nomination below.
8663 */
8664 if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
8665 return;
8666
8667 if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && !pse->sched_delayed) {
8668 set_next_buddy(pse);
8669 }
8670
8671 /*
8672 * We can come here with TIF_NEED_RESCHED already set from new task
8673 * wake up path.
8674 *
8675 * Note: this also catches the edge-case of curr being in a throttled
8676 * group (e.g. via set_curr_task), since update_curr() (in the
8677 * enqueue of curr) will have resulted in resched being set. This
8678 * prevents us from potentially nominating it as a false LAST_BUDDY
8679 * below.
8680 */
8681 if (test_tsk_need_resched(rq->curr))
8682 return;
8683
8684 if (!sched_feat(WAKEUP_PREEMPTION))
8685 return;
8686
8687 find_matching_se(&se, &pse);
8688 WARN_ON_ONCE(!pse);
8689
8690 cse_is_idle = se_is_idle(se);
8691 pse_is_idle = se_is_idle(pse);
8692
8693 /*
8694 * Preempt an idle entity in favor of a non-idle entity (and don't preempt
8695 * in the inverse case).
8696 */
8697 if (cse_is_idle && !pse_is_idle) {
8698 /*
8699 * When non-idle entity preempt an idle entity,
8700 * don't give idle entity slice protection.
8701 */
8702 do_preempt_short = true;
8703 goto preempt;
8704 }
8705
8706 if (cse_is_idle != pse_is_idle)
8707 return;
8708
8709 /*
8710 * BATCH and IDLE tasks do not preempt others.
8711 */
8712 if (unlikely(!normal_policy(p->policy)))
8713 return;
8714
8715 cfs_rq = cfs_rq_of(se);
8716 update_curr(cfs_rq);
8717 /*
8718 * If @p has a shorter slice than current and @p is eligible, override
8719 * current's slice protection in order to allow preemption.
8720 */
> 8721 do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
8722
8723 /*
8724 * If @p has become the most eligible task, force preemption.
8725 */
8726 if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
8727 goto preempt;
8728
8729 return;
8730
8731 preempt:
8732 if (do_preempt_short)
8733 cancel_protect_slice(se);
8734
8735 resched_curr_lazy(rq);
8736 }
8737
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
` (5 preceding siblings ...)
2025-07-04 14:36 ` [PATCH v2 6/6] sched/fair: Always trigger resched at the end of a protected period Vincent Guittot
@ 2025-07-07 14:14 ` Peter Zijlstra
2025-07-07 15:49 ` Vincent Guittot
6 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2025-07-07 14:14 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, dhaval, linux-kernel
On Fri, Jul 04, 2025 at 04:36:06PM +0200, Vincent Guittot wrote:
> This follows the attempt to better track maximum lag of tasks in presence
> of different slices duration:
> [1] https://lore.kernel.org/all/20250418151225.3006867-1-vincent.guittot@linaro.org/
>
> Since v1, tracking of the max slice has been removed from the patchset
> because we now ensure that the lag of an entity remains in the range of:
>
> [-(slice + tick) : (slice + tick)] with run_to_parity
> and
> [max(-slice, -(0.7+tick) : max(slice , (0.7+tick)] without run to parity
>
> As a result, there is no need the max slice of enqueued entities anymore.
>
> Patch 1 is a simple cleanup to ease following changes.
>
> Patch 2 fixes the lag for NO_RUN_TO_PARITY. It has been put 1st because of
> its simplicity. The running task has a minimum protection of 0.7ms before
> eevdf looks for another task.
The usage of min() on vruntimes is broken; it doesn't work right in the
face of wrapping; use min_vruntime().
Also, perhaps it is time to better document this vlag abuse.
> Patch 3 ensures that the protection is canceled only if the waking task
> will be selected by pick_task_fair. This case has been mentionned by Peter
> will reviewing v1.
>
> Patch 4 modifes the duration of the protection to take into account the
> shortest slice of enqueued tasks instead of the slice of the running task.
>
> Patch 5 fixes the case of tasks not being eligible at wakeup or after
> migrating but with a shorter slice. We need to update the duration of the
> protection to not exceed the lag.
This has issues with non-determinism; specifically,
update_protected_slice() will use the current ->vruntime, and as such
can unduly push forward the protection window.
> Patch 6 fixes the case of tasks still being eligible after the protected
> period but others must run to no exceed lag limit. This has been
> highlighted in a test with delayed entities being dequeued with a positive
> lag larger than their slice but it can happen for delayed dequeue entity
> too.
At this point resched_next_quantum() becomes !protec_slice() and can be
removed.
How about something like so? I've probably wrecked the whole
!RUN_TO_PARITY thing -- so that needs to be put back in.
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -579,7 +579,15 @@ struct sched_entity {
u64 sum_exec_runtime;
u64 prev_sum_exec_runtime;
u64 vruntime;
- s64 vlag;
+ union {
+ /*
+ * When !@on_rq this field is vlag.
+ * When cfs_rq->curr == se (which implies @on_rq)
+ * this field is vprot. See protect_slice().
+ */
+ s64 vlag;
+ u64 vprot;
+ };
u64 slice;
u64 nr_migrations;
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -882,45 +882,36 @@ struct sched_entity *__pick_first_entity
}
/*
- * HACK, Set the vruntime up to which an entity can run before picking another
- * one, in vlag, which isn't used until dequeue.
- * In case of run to parity, we use the shortest slice of the enqueued entities
- * to define the longest runtime.
- * When run to parity is disabled, we give a minimum quantum to the running
- * entity to ensure progress.
+ * Take a snapshot of the vruntime at the point the task gets scheduled.
*/
static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- u64 quantum;
-
- if (sched_feat(RUN_TO_PARITY))
- quantum = cfs_rq_min_slice(cfs_rq);
- else
- quantum = normalized_sysctl_sched_base_slice;
- quantum = min(quantum, se->slice);
-
- if (quantum != se->slice)
- se->vlag = min(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
- else
- se->vlag = se->deadline;
+ se->vprot = se->vruntime;
}
-static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
+/*
+ * Should we still run @se? It is allowed to run until either se->deadline or
+ * until se->vprot + min_vslice, whichever comes first.
+ */
+static inline bool protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- u64 quantum = cfs_rq_min_slice(cfs_rq);
+ u64 min_vslice, deadline = se->deadline;
+ u64 min_slice = cfs_rq_min_slice(cfs_rq);
- se->vlag = min(se->vlag, (s64)(se->vruntime + calc_delta_fair(quantum, se)));
-}
+ if (min_slice != se->slice) {
+ min_vslice = calc_delta_fair(min_slice, se);
+ deadline = min_vruntime(se->deadline, se->vprot + min_vslice);
+ }
-static inline bool protect_slice(struct sched_entity *se)
-{
- return ((s64)(se->vlag - se->vruntime) > 0);
+ WARN_ON_ONCE(!se->on_rq);
+
+ return ((s64)(deadline - se->vruntime) > 0);
}
-static inline void cancel_protect_slice(struct sched_entity *se)
+static inline void cancel_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- if (protect_slice(se))
- se->vlag = se->vruntime;
+ if (protect_slice(cfs_rq, se))
+ se->vprot = se->vruntime - calc_delta_fair(NSEC_PER_SEC, se);
}
/*
@@ -959,7 +950,7 @@ static struct sched_entity *__pick_eevdf
if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
curr = NULL;
- if (curr && protect && protect_slice(curr))
+ if (curr && protect && protect_slice(cfs_rq, curr))
return curr;
/* Pick the leftmost entity if it's eligible */
@@ -1183,14 +1174,6 @@ static inline void update_curr_task(stru
cgroup_account_cputime(p, delta_exec);
}
-static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_entity *curr)
-{
- if (protect_slice(curr))
- return false;
-
- return true;
-}
-
/*
* Used by other classes to account runtime.
*/
@@ -1251,7 +1234,7 @@ static void update_curr(struct cfs_rq *c
if (cfs_rq->nr_queued == 1)
return;
- if (resched || resched_next_quantum(cfs_rq, curr)) {
+ if (resched || !protect_slice(cfs_rq, curr)) {
resched_curr_lazy(rq);
clear_buddies(cfs_rq, curr);
}
@@ -8729,7 +8712,7 @@ static void check_preempt_wakeup_fair(st
* If @p has a shorter slice than current and @p is eligible, override
* current's slice protection in order to allow preemption.
*/
- do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
+ do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
/*
* If @p has become the most eligible task, force preemption.
@@ -8737,14 +8720,11 @@ static void check_preempt_wakeup_fair(st
if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
goto preempt;
- if (sched_feat(RUN_TO_PARITY) && do_preempt_short)
- update_protect_slice(cfs_rq, se);
-
return;
preempt:
if (do_preempt_short)
- cancel_protect_slice(se);
+ cancel_protect_slice(cfs_rq, se);
resched_curr_lazy(rq);
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices
2025-07-07 14:14 ` [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Peter Zijlstra
@ 2025-07-07 15:49 ` Vincent Guittot
2025-07-08 18:44 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Vincent Guittot @ 2025-07-07 15:49 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, dhaval, linux-kernel
On Mon, 7 Jul 2025 at 16:14, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Jul 04, 2025 at 04:36:06PM +0200, Vincent Guittot wrote:
> > This follows the attempt to better track maximum lag of tasks in presence
> > of different slices duration:
> > [1] https://lore.kernel.org/all/20250418151225.3006867-1-vincent.guittot@linaro.org/
> >
> > Since v1, tracking of the max slice has been removed from the patchset
> > because we now ensure that the lag of an entity remains in the range of:
> >
> > [-(slice + tick) : (slice + tick)] with run_to_parity
> > and
> > [max(-slice, -(0.7+tick) : max(slice , (0.7+tick)] without run to parity
> >
> > As a result, there is no need the max slice of enqueued entities anymore.
> >
> > Patch 1 is a simple cleanup to ease following changes.
> >
> > Patch 2 fixes the lag for NO_RUN_TO_PARITY. It has been put 1st because of
> > its simplicity. The running task has a minimum protection of 0.7ms before
> > eevdf looks for another task.
>
> The usage of min() on vruntimes is broken; it doesn't work right in the
> face of wrapping; use min_vruntime().
Good point. Don't know why I didn't use it
>
> Also, perhaps it is time to better document this vlag abuse.
will add something
>
> > Patch 3 ensures that the protection is canceled only if the waking task
> > will be selected by pick_task_fair. This case has been mentionned by Peter
> > will reviewing v1.
> >
> > Patch 4 modifes the duration of the protection to take into account the
> > shortest slice of enqueued tasks instead of the slice of the running task.
> >
> > Patch 5 fixes the case of tasks not being eligible at wakeup or after
> > migrating but with a shorter slice. We need to update the duration of the
> > protection to not exceed the lag.
>
> This has issues with non-determinism; specifically,
> update_protected_slice() will use the current ->vruntime, and as such
> can unduly push forward the protection window.
se->vprot = min_vruntime(se->vprot, (se->vruntime +
calc_delta_fair(quantum, se)));
the min_vruntime (previously min) with current vprot (previously vlag)
should prevent pushing forward the protection. We can only reduce
further the vlag but never increase it
>
> > Patch 6 fixes the case of tasks still being eligible after the protected
> > period but others must run to no exceed lag limit. This has been
> > highlighted in a test with delayed entities being dequeued with a positive
> > lag larger than their slice but it can happen for delayed dequeue entity
> > too.
>
> At this point resched_next_quantum() becomes !protec_slice() and can be
> removed.
yes, it was the last remaining fix and forgot to simplify
>
> How about something like so? I've probably wrecked the whole
> !RUN_TO_PARITY thing -- so that needs to be put back in.
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -579,7 +579,15 @@ struct sched_entity {
> u64 sum_exec_runtime;
> u64 prev_sum_exec_runtime;
> u64 vruntime;
> - s64 vlag;
> + union {
> + /*
> + * When !@on_rq this field is vlag.
> + * When cfs_rq->curr == se (which implies @on_rq)
> + * this field is vprot. See protect_slice().
> + */
> + s64 vlag;
> + u64 vprot;
> + };
> u64 slice;
>
> u64 nr_migrations;
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -882,45 +882,36 @@ struct sched_entity *__pick_first_entity
> }
>
> /*
> - * HACK, Set the vruntime up to which an entity can run before picking another
> - * one, in vlag, which isn't used until dequeue.
> - * In case of run to parity, we use the shortest slice of the enqueued entities
> - * to define the longest runtime.
> - * When run to parity is disabled, we give a minimum quantum to the running
> - * entity to ensure progress.
> + * Take a snapshot of the vruntime at the point the task gets scheduled.
> */
> static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> - u64 quantum;
> -
> - if (sched_feat(RUN_TO_PARITY))
> - quantum = cfs_rq_min_slice(cfs_rq);
> - else
> - quantum = normalized_sysctl_sched_base_slice;
> - quantum = min(quantum, se->slice);
> -
> - if (quantum != se->slice)
> - se->vlag = min(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
> - else
> - se->vlag = se->deadline;
> + se->vprot = se->vruntime;
> }
>
> -static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +/*
> + * Should we still run @se? It is allowed to run until either se->deadline or
> + * until se->vprot + min_vslice, whichever comes first.
> + */
> +static inline bool protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> - u64 quantum = cfs_rq_min_slice(cfs_rq);
> + u64 min_vslice, deadline = se->deadline;
> + u64 min_slice = cfs_rq_min_slice(cfs_rq);
>
> - se->vlag = min(se->vlag, (s64)(se->vruntime + calc_delta_fair(quantum, se)));
> -}
> + if (min_slice != se->slice) {
> + min_vslice = calc_delta_fair(min_slice, se);
> + deadline = min_vruntime(se->deadline, se->vprot + min_vslice);
I didn't go into that direction because protect_slice() is call far
more often than set_protect_slice() or update_protect_slice()
> + }
>
> -static inline bool protect_slice(struct sched_entity *se)
> -{
> - return ((s64)(se->vlag - se->vruntime) > 0);
> + WARN_ON_ONCE(!se->on_rq);
> +
> + return ((s64)(deadline - se->vruntime) > 0);
> }
>
> -static inline void cancel_protect_slice(struct sched_entity *se)
> +static inline void cancel_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
> - if (protect_slice(se))
> - se->vlag = se->vruntime;
> + if (protect_slice(cfs_rq, se))
> + se->vprot = se->vruntime - calc_delta_fair(NSEC_PER_SEC, se);
> }
>
> /*
> @@ -959,7 +950,7 @@ static struct sched_entity *__pick_eevdf
> if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> curr = NULL;
>
> - if (curr && protect && protect_slice(curr))
> + if (curr && protect && protect_slice(cfs_rq, curr))
> return curr;
>
> /* Pick the leftmost entity if it's eligible */
> @@ -1183,14 +1174,6 @@ static inline void update_curr_task(stru
> cgroup_account_cputime(p, delta_exec);
> }
>
> -static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_entity *curr)
> -{
> - if (protect_slice(curr))
> - return false;
> -
> - return true;
> -}
> -
> /*
> * Used by other classes to account runtime.
> */
> @@ -1251,7 +1234,7 @@ static void update_curr(struct cfs_rq *c
> if (cfs_rq->nr_queued == 1)
> return;
>
> - if (resched || resched_next_quantum(cfs_rq, curr)) {
> + if (resched || !protect_slice(cfs_rq, curr)) {
> resched_curr_lazy(rq);
> clear_buddies(cfs_rq, curr);
> }
> @@ -8729,7 +8712,7 @@ static void check_preempt_wakeup_fair(st
> * If @p has a shorter slice than current and @p is eligible, override
> * current's slice protection in order to allow preemption.
> */
> - do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
> + do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
>
> /*
> * If @p has become the most eligible task, force preemption.
> @@ -8737,14 +8720,11 @@ static void check_preempt_wakeup_fair(st
> if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
> goto preempt;
>
> - if (sched_feat(RUN_TO_PARITY) && do_preempt_short)
> - update_protect_slice(cfs_rq, se);
> -
> return;
>
> preempt:
> if (do_preempt_short)
> - cancel_protect_slice(se);
> + cancel_protect_slice(cfs_rq, se);
>
> resched_curr_lazy(rq);
> }
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices
2025-07-07 15:49 ` Vincent Guittot
@ 2025-07-08 18:44 ` Peter Zijlstra
0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2025-07-08 18:44 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, dhaval, linux-kernel
On Mon, Jul 07, 2025 at 05:49:05PM +0200, Vincent Guittot wrote:
> > > Patch 5 fixes the case of tasks not being eligible at wakeup or after
> > > migrating but with a shorter slice. We need to update the duration of the
> > > protection to not exceed the lag.
> >
> > This has issues with non-determinism; specifically,
> > update_protected_slice() will use the current ->vruntime, and as such
> > can unduly push forward the protection window.
>
> se->vprot = min_vruntime(se->vprot, (se->vruntime +
> calc_delta_fair(quantum, se)));
>
> the min_vruntime (previously min) with current vprot (previously vlag)
> should prevent pushing forward the protection. We can only reduce
> further the vlag but never increase it
Fair enough.
> > +/*
> > + * Should we still run @se? It is allowed to run until either se->deadline or
> > + * until se->vprot + min_vslice, whichever comes first.
> > + */
> > +static inline bool protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> > {
> > + u64 min_vslice, deadline = se->deadline;
> > + u64 min_slice = cfs_rq_min_slice(cfs_rq);
> >
> > + if (min_slice != se->slice) {
> > + min_vslice = calc_delta_fair(min_slice, se);
> > + deadline = min_vruntime(se->deadline, se->vprot + min_vslice);
>
> I didn't go into that direction because protect_slice() is call far
> more often than set_protect_slice() or update_protect_slice()
Right.
> > + }
> >
> > + WARN_ON_ONCE(!se->on_rq);
> > +
> > + return ((s64)(deadline - se->vruntime) > 0);
> > }
Anyway, I see you posted a new version. Let me go have a look.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-07-08 18:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-04 14:36 [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
2025-07-04 14:36 ` [PATCH 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
2025-07-05 14:15 ` kernel test robot
2025-07-04 14:36 ` [PATCH v2 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 5/6] sched/fair: Fix entity's lag with run to parity Vincent Guittot
2025-07-04 14:36 ` [PATCH v2 6/6] sched/fair: Always trigger resched at the end of a protected period Vincent Guittot
2025-07-07 14:14 ` [PATCH v2 0/6] sched/fair: Manage lag and run to parity with different slices Peter Zijlstra
2025-07-07 15:49 ` Vincent Guittot
2025-07-08 18:44 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).