linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices
@ 2025-07-08 16:56 Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: Vincent Guittot @ 2025-07-08 16:56 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel
  Cc: Vincent Guittot

This follows the attempt to better track maximum lag of tasks in presence
of different slices duration:
[1]  https://lore.kernel.org/all/20250418151225.3006867-1-vincent.guittot@linaro.org/

Since v2:
- Fixed some typos.
- Created a union of vlag and vprot to help understanding the usage
  of the field as suggested by Peter
- Use min_vruntime instead of min
- Removed resched_next_quantum() which is equals to !protect_slice()

Since v1, tracking of the max slice has been removed from the patchset
because we now ensure that the lag of an entity remains in the range of:
   
  [-(slice + tick) : (slice + tick)] with run_to_parity
and
  [max(-slice, -(0.7+tick) : max(slice , (0.7+tick)] without run to parity
  
As a result, there is no need the max slice of enqueued entities anymore.

Patch 1 is a simple cleanup to ease following changes.

Patch 2 fixes the lag for NO_RUN_TO_PARITY. It has been put 1st because of
its simplicity. The running task has a minimum protection of 0.7ms before
eevdf looks for another task.

Patch 3 ensures that the protection is canceled only if the waking task
will be selected by pick_task_fair. This case has been mentionned by Peter
will reviewing v1.

Patch 4 modifes the duration of the protection to take into account the
shortest slice of enqueued tasks instead of the slice of the running task.

Patch 5 fixes the case of tasks not being eligible at wakeup or after
migrating  but with a shorter slice. We need to update the duration of the
protection to not exceed the lag.

Patch 6 fixes the case of tasks still being eligible after the protected
period but others must run to no exceed lag limit. This has been
highlighted in a test with delayed entities being dequeued with a positive
lag larger than their slice but it can happen for delayed dequeue entity
too.

The patchset has been tested with rt-app on 37 different use cases, some a
simple and should never trigger any problem but have been kept to increase
the test coverage. The tests have been run on dragon rb5 with affinity on
biggest cores. The lag has been checked when we update the entity's lag at
dequeue and every time we check if an entity is eligible.

             RUN_TO_PARITY    NO_RUN_TO_PARITY
	     lag error        lag_error 
mainline       14/37            14/37
+ patch 1-2    14/37             0/37
+ patch 3-5     1/37             0/37
+ patch 6       0/37             0/37

Vincent Guittot (6):
  sched/fair: Use protect_slice() instead of direct comparison
  sched/fair: Fix NO_RUN_TO_PARITY case
  sched/fair: Remove spurious shorter slice preemption
  sched/fair: Limit run to parity to the min slice of enqueued entities
  sched/fair: Fix entity's lag with run to parity
  sched/fair: Always trigger resched at the end of a protected period

 include/linux/sched.h | 10 ++++-
 kernel/sched/fair.c   | 96 +++++++++++++++++++++----------------------
 2 files changed, 56 insertions(+), 50 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3 1/6] sched/fair: Use protect_slice() instead of direct comparison
  2025-07-08 16:56 [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
@ 2025-07-08 16:56 ` Vincent Guittot
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2025-07-08 16:56 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel
  Cc: Vincent Guittot

Replace the test by the relevant protect_slice() function.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Dhaval Giani (AMD) <dhaval@gianis.ca>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e2963efe800..43712403ec98 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1161,7 +1161,7 @@ static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_entity
 	if (!sched_feat(PREEMPT_SHORT))
 		return false;
 
-	if (curr->vlag == curr->deadline)
+	if (protect_slice(curr))
 		return false;
 
 	return !entity_eligible(cfs_rq, curr);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case
  2025-07-08 16:56 [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
@ 2025-07-08 16:56 ` Vincent Guittot
  2025-07-09  9:17   ` Peter Zijlstra
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 23+ messages in thread
From: Vincent Guittot @ 2025-07-08 16:56 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel
  Cc: Vincent Guittot

EEVDF expects the scheduler to allocate a time quantum to the selected
entity and then pick a new entity for next quantum.
Although this notion of time quantum is not strictly doable in our case,
we can ensure a minimum runtime for each task most of the time and pick a
new entity after a minimum time has elapsed.
Reuse the slice protection of run to parity to ensure such runtime
quantum.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 include/linux/sched.h | 10 +++++++++-
 kernel/sched/fair.c   | 30 +++++++++++++++++++-----------
 2 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index eec6b225e9d1..75579f2fb009 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -583,7 +583,15 @@ struct sched_entity {
 	u64				sum_exec_runtime;
 	u64				prev_sum_exec_runtime;
 	u64				vruntime;
-	s64				vlag;
+	union {
+		/*
+		 * When !@on_rq this field is vlag.
+		 * When cfs_rq->curr == se (which implies @on_rq)
+		 * this field is vprot. See protect_slice().
+		 */
+		s64                     vlag;
+		u64                     vprot;
+	};
 	u64				slice;
 
 	u64				nr_migrations;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 43712403ec98..97cf99bb71d6 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -882,23 +882,34 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
 }
 
 /*
- * HACK, stash a copy of deadline at the point of pick in vlag,
- * which isn't used until dequeue.
+ * Set the vruntime up to which an entity can run before looking
+ * for another entity to pick.
+ * In case of run to parity, we protect the entity up to its deadline.
+ * When run to parity is disabled, we give a minimum quantum to the running
+ * entity to ensure progress.
  */
 static inline void set_protect_slice(struct sched_entity *se)
 {
-	se->vlag = se->deadline;
+	u64 quantum = se->slice;
+
+	if (!sched_feat(RUN_TO_PARITY))
+		quantum = min(quantum, normalized_sysctl_sched_base_slice);
+
+	if (quantum != se->slice)
+		se->vprot = min_vruntime(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
+	else
+		se->vprot = se->deadline;
 }
 
 static inline bool protect_slice(struct sched_entity *se)
 {
-	return se->vlag == se->deadline;
+	return ((s64)(se->vprot - se->vruntime) > 0);
 }
 
 static inline void cancel_protect_slice(struct sched_entity *se)
 {
 	if (protect_slice(se))
-		se->vlag = se->deadline + 1;
+		se->vprot = se->vruntime;
 }
 
 /*
@@ -937,7 +948,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
 		curr = NULL;
 
-	if (sched_feat(RUN_TO_PARITY) && curr && protect_slice(curr))
+	if (curr && protect_slice(curr))
 		return curr;
 
 	/* Pick the leftmost entity if it's eligible */
@@ -1156,11 +1167,8 @@ static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
 	cgroup_account_cputime(p, delta_exec);
 }
 
-static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_entity *curr)
+static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
-	if (!sched_feat(PREEMPT_SHORT))
-		return false;
-
 	if (protect_slice(curr))
 		return false;
 
@@ -1248,7 +1256,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	if (cfs_rq->nr_queued == 1)
 		return;
 
-	if (resched || did_preempt_short(cfs_rq, curr)) {
+	if (resched || resched_next_quantum(cfs_rq, curr)) {
 		resched_curr_lazy(rq);
 		clear_buddies(cfs_rq, curr);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 3/6] sched/fair: Remove spurious shorter slice preemption
  2025-07-08 16:56 [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
@ 2025-07-08 16:56 ` Vincent Guittot
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2025-07-08 16:56 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel
  Cc: Vincent Guittot

Even if the waking task can preempt current, it might not be the one
selected by pick_task_fair. Check that the waking task will be selected
if we cancel the slice protection before doing so.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 44 ++++++++++++++------------------------------
 1 file changed, 14 insertions(+), 30 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 97cf99bb71d6..7e82b357763a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -931,7 +931,7 @@ static inline void cancel_protect_slice(struct sched_entity *se)
  *
  * Which allows tree pruning through eligibility.
  */
-static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq, bool protect)
 {
 	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
 	struct sched_entity *se = __pick_first_entity(cfs_rq);
@@ -948,7 +948,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
 		curr = NULL;
 
-	if (curr && protect_slice(curr))
+	if (curr && protect && protect_slice(curr))
 		return curr;
 
 	/* Pick the leftmost entity if it's eligible */
@@ -992,6 +992,11 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 	return best;
 }
 
+static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+{
+	return __pick_eevdf(cfs_rq, true);
+}
+
 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
 {
 	struct rb_node *last = rb_last(&cfs_rq->tasks_timeline.rb_root);
@@ -1175,27 +1180,6 @@ static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_enti
 	return !entity_eligible(cfs_rq, curr);
 }
 
-static inline bool do_preempt_short(struct cfs_rq *cfs_rq,
-				    struct sched_entity *pse, struct sched_entity *se)
-{
-	if (!sched_feat(PREEMPT_SHORT))
-		return false;
-
-	if (pse->slice >= se->slice)
-		return false;
-
-	if (!entity_eligible(cfs_rq, pse))
-		return false;
-
-	if (entity_before(pse, se))
-		return true;
-
-	if (!entity_eligible(cfs_rq, se))
-		return true;
-
-	return false;
-}
-
 /*
  * Used by other classes to account runtime.
  */
@@ -8666,6 +8650,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	struct sched_entity *se = &donor->se, *pse = &p->se;
 	struct cfs_rq *cfs_rq = task_cfs_rq(donor);
 	int cse_is_idle, pse_is_idle;
+	bool do_preempt_short = false;
 
 	if (unlikely(se == pse))
 		return;
@@ -8714,7 +8699,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 		 * When non-idle entity preempt an idle entity,
 		 * don't give idle entity slice protection.
 		 */
-		cancel_protect_slice(se);
+		do_preempt_short = true;
 		goto preempt;
 	}
 
@@ -8732,22 +8717,21 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	/*
 	 * If @p has a shorter slice than current and @p is eligible, override
 	 * current's slice protection in order to allow preemption.
-	 *
-	 * Note that even if @p does not turn out to be the most eligible
-	 * task at this moment, current's slice protection will be lost.
 	 */
-	if (do_preempt_short(cfs_rq, pse, se))
-		cancel_protect_slice(se);
+	do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
 
 	/*
 	 * If @p has become the most eligible task, force preemption.
 	 */
-	if (pick_eevdf(cfs_rq) == pse)
+	if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
 		goto preempt;
 
 	return;
 
 preempt:
+	if (do_preempt_short)
+		cancel_protect_slice(se);
+
 	resched_curr_lazy(rq);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-08 16:56 [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
                   ` (2 preceding siblings ...)
  2025-07-08 16:56 ` [PATCH v3 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
@ 2025-07-08 16:56 ` Vincent Guittot
  2025-07-10  6:59   ` Madadi Vineeth Reddy
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 5/6] sched/fair: Fix entity's lag with run to parity Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 6/6] sched/fair: Always trigger resched at the end of a protected period Vincent Guittot
  5 siblings, 2 replies; 23+ messages in thread
From: Vincent Guittot @ 2025-07-08 16:56 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel
  Cc: Vincent Guittot

Run to parity ensures that current will get a chance to run its full
slice in one go but this can create large latency and/or lag for
entities with shorter slice that have exhausted their previous slice
and wait to run their next slice.

Clamp the run to parity to the shortest slice of all enqueued entities.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7e82b357763a..85238f2e026a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -884,16 +884,20 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
 /*
  * Set the vruntime up to which an entity can run before looking
  * for another entity to pick.
- * In case of run to parity, we protect the entity up to its deadline.
+ * In case of run to parity, we use the shortest slice of the enqueued
+ * entities to set the protected period.
  * When run to parity is disabled, we give a minimum quantum to the running
  * entity to ensure progress.
  */
 static inline void set_protect_slice(struct sched_entity *se)
 {
-	u64 quantum = se->slice;
+	u64 quantum;
 
-	if (!sched_feat(RUN_TO_PARITY))
-		quantum = min(quantum, normalized_sysctl_sched_base_slice);
+	if (sched_feat(RUN_TO_PARITY))
+		quantum = cfs_rq_min_slice(cfs_rq_of(se));
+	else
+		quantum = normalized_sysctl_sched_base_slice;
+	quantum = min(quantum, se->slice);
 
 	if (quantum != se->slice)
 		se->vprot = min_vruntime(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 5/6] sched/fair: Fix entity's lag with run to parity
  2025-07-08 16:56 [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
                   ` (3 preceding siblings ...)
  2025-07-08 16:56 ` [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
@ 2025-07-08 16:56 ` Vincent Guittot
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  2025-07-08 16:56 ` [PATCH v3 6/6] sched/fair: Always trigger resched at the end of a protected period Vincent Guittot
  5 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2025-07-08 16:56 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel
  Cc: Vincent Guittot

When an entity is enqueued without preempting current, we must ensure
that the slice protection is updated to take into account the slice
duration of the newly enqueued task so that its lag will not exceed
its slice (+ tick).

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 85238f2e026a..d815609526d5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -889,12 +889,12 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
  * When run to parity is disabled, we give a minimum quantum to the running
  * entity to ensure progress.
  */
-static inline void set_protect_slice(struct sched_entity *se)
+static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	u64 quantum;
 
 	if (sched_feat(RUN_TO_PARITY))
-		quantum = cfs_rq_min_slice(cfs_rq_of(se));
+		quantum = cfs_rq_min_slice(cfs_rq);
 	else
 		quantum = normalized_sysctl_sched_base_slice;
 	quantum = min(quantum, se->slice);
@@ -905,6 +905,13 @@ static inline void set_protect_slice(struct sched_entity *se)
 		se->vprot = se->deadline;
 }
 
+static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	u64 quantum = cfs_rq_min_slice(cfs_rq);
+
+	se->vprot = min_vruntime(se->vprot, se->vruntime + calc_delta_fair(quantum, se));
+}
+
 static inline bool protect_slice(struct sched_entity *se)
 {
 	return ((s64)(se->vprot - se->vruntime) > 0);
@@ -5468,7 +5475,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		__dequeue_entity(cfs_rq, se);
 		update_load_avg(cfs_rq, se, UPDATE_TG);
 
-		set_protect_slice(se);
+		set_protect_slice(cfs_rq, se);
 	}
 
 	update_stats_curr_start(cfs_rq, se);
@@ -8730,6 +8737,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
 	if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
 		goto preempt;
 
+	if (sched_feat(RUN_TO_PARITY) && do_preempt_short)
+		update_protect_slice(cfs_rq, se);
+
 	return;
 
 preempt:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH v3 6/6] sched/fair: Always trigger resched at the end of a protected period
  2025-07-08 16:56 [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
                   ` (4 preceding siblings ...)
  2025-07-08 16:56 ` [PATCH v3 5/6] sched/fair: Fix entity's lag with run to parity Vincent Guittot
@ 2025-07-08 16:56 ` Vincent Guittot
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  5 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2025-07-08 16:56 UTC (permalink / raw)
  To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel
  Cc: Vincent Guittot

Always trigger a resched after a protected period even if the entity is
still eligible. It can happen that an entity remains eligible at the end
of the protected period but must let an entity with a shorter slice to run
in order to keep its lag shorter than slice. This is particulalry true
with run to parity which tries to maximize the lag.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d815609526d5..fbe969adf287 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1183,14 +1183,6 @@ static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
 	cgroup_account_cputime(p, delta_exec);
 }
 
-static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_entity *curr)
-{
-	if (protect_slice(curr))
-		return false;
-
-	return !entity_eligible(cfs_rq, curr);
-}
-
 /*
  * Used by other classes to account runtime.
  */
@@ -1251,7 +1243,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	if (cfs_rq->nr_queued == 1)
 		return;
 
-	if (resched || resched_next_quantum(cfs_rq, curr)) {
+	if (resched || !protect_slice(curr)) {
 		resched_curr_lazy(rq);
 		clear_buddies(cfs_rq, curr);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case
  2025-07-08 16:56 ` [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
@ 2025-07-09  9:17   ` Peter Zijlstra
  2025-07-09  9:32     ` Vincent Guittot
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  1 sibling, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2025-07-09  9:17 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, dhaval, linux-kernel

On Tue, Jul 08, 2025 at 06:56:26PM +0200, Vincent Guittot wrote:

>  static inline void set_protect_slice(struct sched_entity *se)
>  {
> -	se->vlag = se->deadline;
> +	u64 quantum = se->slice;
> +
> +	if (!sched_feat(RUN_TO_PARITY))
> +		quantum = min(quantum, normalized_sysctl_sched_base_slice);
> +
> +	if (quantum != se->slice)
> +		se->vprot = min_vruntime(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
> +	else
> +		se->vprot = se->deadline;
>  }

I've done s/quantum/slice/ on the whole series. In the end this thing:

> +static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_entity *curr)

is gone, and *_protect_slice() has slice in the name, and its mostly
assigned from slice named variables.

Final form ends up looking like so:

static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
	u64 slice = normalized_sysctl_sched_base_slice;
	u64 vprot = se->deadline;

	if (sched_feat(RUN_TO_PARITY))
		slice = cfs_rq_min_slice(cfs_rq);

	slice = min(slice, se->slice);
	if (slice != se->slice)
		vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));

	se->vprot = vprot;
}

I'll run a few compiles and then push out to queue/sched/core (and stick
the ttwu bits in queue/sched/ttwu -- as I should've done earlier).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case
  2025-07-09  9:17   ` Peter Zijlstra
@ 2025-07-09  9:32     ` Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2025-07-09  9:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, dhaval, linux-kernel

On Wed, 9 Jul 2025 at 11:17, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jul 08, 2025 at 06:56:26PM +0200, Vincent Guittot wrote:
>
> >  static inline void set_protect_slice(struct sched_entity *se)
> >  {
> > -     se->vlag = se->deadline;
> > +     u64 quantum = se->slice;
> > +
> > +     if (!sched_feat(RUN_TO_PARITY))
> > +             quantum = min(quantum, normalized_sysctl_sched_base_slice);
> > +
> > +     if (quantum != se->slice)
> > +             se->vprot = min_vruntime(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
> > +     else
> > +             se->vprot = se->deadline;
> >  }
>
> I've done s/quantum/slice/ on the whole series. In the end this thing:
>
> > +static inline bool resched_next_quantum(struct cfs_rq *cfs_rq, struct sched_entity *curr)
>
> is gone, and *_protect_slice() has slice in the name, and its mostly
> assigned from slice named variables.
>
> Final form ends up looking like so:
>
> static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
> {
>         u64 slice = normalized_sysctl_sched_base_slice;
>         u64 vprot = se->deadline;
>
>         if (sched_feat(RUN_TO_PARITY))
>                 slice = cfs_rq_min_slice(cfs_rq);
>
>         slice = min(slice, se->slice);
>         if (slice != se->slice)
>                 vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
>
>         se->vprot = vprot;
> }

ok, looks good to me

>
> I'll run a few compiles and then push out to queue/sched/core (and stick
> the ttwu bits in queue/sched/ttwu -- as I should've done earlier).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-08 16:56 ` [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
@ 2025-07-10  6:59   ` Madadi Vineeth Reddy
  2025-07-10 10:40     ` Vincent Guittot
  2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
  1 sibling, 1 reply; 23+ messages in thread
From: Madadi Vineeth Reddy @ 2025-07-10  6:59 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel, Madadi Vineeth Reddy

Hi Vincent,

On 08/07/25 22:26, Vincent Guittot wrote:
> Run to parity ensures that current will get a chance to run its full
> slice in one go but this can create large latency and/or lag for
> entities with shorter slice that have exhausted their previous slice
> and wait to run their next slice.
> 
> Clamp the run to parity to the shortest slice of all enqueued entities.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
>  kernel/sched/fair.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7e82b357763a..85238f2e026a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -884,16 +884,20 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
>  /*
>   * Set the vruntime up to which an entity can run before looking
>   * for another entity to pick.
> - * In case of run to parity, we protect the entity up to its deadline.
> + * In case of run to parity, we use the shortest slice of the enqueued
> + * entities to set the protected period.
>   * When run to parity is disabled, we give a minimum quantum to the running
>   * entity to ensure progress.
>   */

If I set my task’s custom slice to a larger value but another task has a smaller slice,
this change will cap my protected window to the smaller slice. Does that mean my custom
slice is no longer honored?

Thanks,
Madadi Vineeth Reddy

>  static inline void set_protect_slice(struct sched_entity *se)
>  {
> -	u64 quantum = se->slice;
> +	u64 quantum;
>  
> -	if (!sched_feat(RUN_TO_PARITY))
> -		quantum = min(quantum, normalized_sysctl_sched_base_slice);
> +	if (sched_feat(RUN_TO_PARITY))
> +		quantum = cfs_rq_min_slice(cfs_rq_of(se));
> +	else
> +		quantum = normalized_sysctl_sched_base_slice;
> +	quantum = min(quantum, se->slice);
>  
>  	if (quantum != se->slice)
>  		se->vprot = min_vruntime(se->deadline, se->vruntime + calc_delta_fair(quantum, se));


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-10  6:59   ` Madadi Vineeth Reddy
@ 2025-07-10 10:40     ` Vincent Guittot
  2025-07-10 12:34       ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2025-07-10 10:40 UTC (permalink / raw)
  To: Madadi Vineeth Reddy
  Cc: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, dhaval, linux-kernel

On Thu, 10 Jul 2025 at 09:00, Madadi Vineeth Reddy
<vineethr@linux.ibm.com> wrote:
>
> Hi Vincent,
>
> On 08/07/25 22:26, Vincent Guittot wrote:
> > Run to parity ensures that current will get a chance to run its full
> > slice in one go but this can create large latency and/or lag for
> > entities with shorter slice that have exhausted their previous slice
> > and wait to run their next slice.
> >
> > Clamp the run to parity to the shortest slice of all enqueued entities.
> >
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >  kernel/sched/fair.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 7e82b357763a..85238f2e026a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -884,16 +884,20 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
> >  /*
> >   * Set the vruntime up to which an entity can run before looking
> >   * for another entity to pick.
> > - * In case of run to parity, we protect the entity up to its deadline.
> > + * In case of run to parity, we use the shortest slice of the enqueued
> > + * entities to set the protected period.
> >   * When run to parity is disabled, we give a minimum quantum to the running
> >   * entity to ensure progress.
> >   */
>
> If I set my task’s custom slice to a larger value but another task has a smaller slice,
> this change will cap my protected window to the smaller slice. Does that mean my custom
> slice is no longer honored?

What do you mean by honored ? EEVDF never mandates that a request of
size slice will be done in one go. Slice mainly defines the deadline
and orders the entities but not that it will always run your slice in
one go. Run to parity tries to minimize the number of context switches
between runnable tasks but must not break fairness and lag theorem.So
If your task A has a slice of 10ms and task B wakes up with a slice of
1ms. B will preempt A because its deadline is earlier. If task B still
wants to run after its slice is exhausted, it will not be eligible and
task A will run until task B becomes eligible, which is as long as
task B's slice.




>
> Thanks,
> Madadi Vineeth Reddy
>
> >  static inline void set_protect_slice(struct sched_entity *se)
> >  {
> > -     u64 quantum = se->slice;
> > +     u64 quantum;
> >
> > -     if (!sched_feat(RUN_TO_PARITY))
> > -             quantum = min(quantum, normalized_sysctl_sched_base_slice);
> > +     if (sched_feat(RUN_TO_PARITY))
> > +             quantum = cfs_rq_min_slice(cfs_rq_of(se));
> > +     else
> > +             quantum = normalized_sysctl_sched_base_slice;
> > +     quantum = min(quantum, se->slice);
> >
> >       if (quantum != se->slice)
> >               se->vprot = min_vruntime(se->deadline, se->vruntime + calc_delta_fair(quantum, se));
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-10 10:40     ` Vincent Guittot
@ 2025-07-10 12:34       ` Peter Zijlstra
  2025-07-13 18:17         ` Madadi Vineeth Reddy
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2025-07-10 12:34 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Madadi Vineeth Reddy, mingo, juri.lelli, dietmar.eggemann,
	rostedt, bsegall, mgorman, vschneid, dhaval, linux-kernel


> > If I set my task’s custom slice to a larger value but another task has a smaller slice,
> > this change will cap my protected window to the smaller slice. Does that mean my custom
> > slice is no longer honored?
> 
> What do you mean by honored ? EEVDF never mandates that a request of
> size slice will be done in one go. Slice mainly defines the deadline
> and orders the entities but not that it will always run your slice in
> one go. Run to parity tries to minimize the number of context switches
> between runnable tasks but must not break fairness and lag theorem.So
> If your task A has a slice of 10ms and task B wakes up with a slice of
> 1ms. B will preempt A because its deadline is earlier. If task B still
> wants to run after its slice is exhausted, it will not be eligible and
> task A will run until task B becomes eligible, which is as long as
> task B's slice.

Right. Added if you don't want wakeup preemption, we've got SCHED_BATCH
for you.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [tip: sched/core] sched/fair: Always trigger resched at the end of a protected period
  2025-07-08 16:56 ` [PATCH v3 6/6] sched/fair: Always trigger resched at the end of a protected period Vincent Guittot
@ 2025-07-10 12:46   ` tip-bot2 for Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2025-07-10 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     0b9ca2dcabc3c8816a6ee75599cab7bef3330609
Gitweb:        https://git.kernel.org/tip/0b9ca2dcabc3c8816a6ee75599cab7bef3330609
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 08 Jul 2025 18:56:30 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Jul 2025 13:40:23 +02:00

sched/fair: Always trigger resched at the end of a protected period

Always trigger a resched after a protected period even if the entity is
still eligible. It can happen that an entity remains eligible at the end
of the protected period but must let an entity with a shorter slice to run
in order to keep its lag shorter than slice. This is particulalry true
with run to parity which tries to maximize the lag.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250708165630.1948751-7-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1660960..20a8456 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1182,14 +1182,6 @@ static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
 	cgroup_account_cputime(p, delta_exec);
 }
 
-static inline bool resched_next_slice(struct cfs_rq *cfs_rq, struct sched_entity *curr)
-{
-	if (protect_slice(curr))
-		return false;
-
-	return !entity_eligible(cfs_rq, curr);
-}
-
 /*
  * Used by other classes to account runtime.
  */
@@ -1250,7 +1242,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	if (cfs_rq->nr_queued == 1)
 		return;
 
-	if (resched || resched_next_slice(cfs_rq, curr)) {
+	if (resched || !protect_slice(curr)) {
 		resched_curr_lazy(rq);
 		clear_buddies(cfs_rq, curr);
 	}

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/core] sched/fair: Fix entity's lag with run to parity
  2025-07-08 16:56 ` [PATCH v3 5/6] sched/fair: Fix entity's lag with run to parity Vincent Guittot
@ 2025-07-10 12:46   ` tip-bot2 for Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2025-07-10 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     3a0baa8e6c570c252999cb651398a88f8f990b4a
Gitweb:        https://git.kernel.org/tip/3a0baa8e6c570c252999cb651398a88f8f990b4a
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 08 Jul 2025 18:56:29 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Jul 2025 13:40:23 +02:00

sched/fair: Fix entity's lag with run to parity

When an entity is enqueued without preempting current, we must ensure
that the slice protection is updated to take into account the slice
duration of the newly enqueued task so that its lag will not exceed
its slice (+ tick).

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250708165630.1948751-6-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 45e057f..1660960 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -889,13 +889,13 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
  * When run to parity is disabled, we give a minimum quantum to the running
  * entity to ensure progress.
  */
-static inline void set_protect_slice(struct sched_entity *se)
+static inline void set_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
 	u64 slice = normalized_sysctl_sched_base_slice;
 	u64 vprot = se->deadline;
 
 	if (sched_feat(RUN_TO_PARITY))
-		slice = cfs_rq_min_slice(cfs_rq_of(se));
+		slice = cfs_rq_min_slice(cfs_rq);
 
 	slice = min(slice, se->slice);
 	if (slice != se->slice)
@@ -904,6 +904,13 @@ static inline void set_protect_slice(struct sched_entity *se)
 	se->vprot = vprot;
 }
 
+static inline void update_protect_slice(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	u64 slice = cfs_rq_min_slice(cfs_rq);
+
+	se->vprot = min_vruntime(se->vprot, se->vruntime + calc_delta_fair(slice, se));
+}
+
 static inline bool protect_slice(struct sched_entity *se)
 {
 	return ((s64)(se->vprot - se->vruntime) > 0);
@@ -5467,7 +5474,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 		__dequeue_entity(cfs_rq, se);
 		update_load_avg(cfs_rq, se, UPDATE_TG);
 
-		set_protect_slice(se);
+		set_protect_slice(cfs_rq, se);
 	}
 
 	update_stats_curr_start(cfs_rq, se);
@@ -8720,6 +8727,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int 
 	if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
 		goto preempt;
 
+	if (sched_feat(RUN_TO_PARITY) && do_preempt_short)
+		update_protect_slice(cfs_rq, se);
+
 	return;
 
 preempt:

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/core] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-08 16:56 ` [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
  2025-07-10  6:59   ` Madadi Vineeth Reddy
@ 2025-07-10 12:46   ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 23+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2025-07-10 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     052c3d87c82ea4ee83232b747512847b4e8c9976
Gitweb:        https://git.kernel.org/tip/052c3d87c82ea4ee83232b747512847b4e8c9976
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 08 Jul 2025 18:56:28 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Jul 2025 13:40:23 +02:00

sched/fair: Limit run to parity to the min slice of enqueued entities

Run to parity ensures that current will get a chance to run its full
slice in one go but this can create large latency and/or lag for
entities with shorter slice that have exhausted their previous slice
and wait to run their next slice.

Clamp the run to parity to the shortest slice of all enqueued entities.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250708165630.1948751-5-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 96718b3..45e057f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -884,18 +884,20 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
 /*
  * Set the vruntime up to which an entity can run before looking
  * for another entity to pick.
- * In case of run to parity, we protect the entity up to its deadline.
+ * In case of run to parity, we use the shortest slice of the enqueued
+ * entities to set the protected period.
  * When run to parity is disabled, we give a minimum quantum to the running
  * entity to ensure progress.
  */
 static inline void set_protect_slice(struct sched_entity *se)
 {
-	u64 slice = se->slice;
+	u64 slice = normalized_sysctl_sched_base_slice;
 	u64 vprot = se->deadline;
 
-	if (!sched_feat(RUN_TO_PARITY))
-		slice = min(slice, normalized_sysctl_sched_base_slice);
+	if (sched_feat(RUN_TO_PARITY))
+		slice = cfs_rq_min_slice(cfs_rq_of(se));
 
+	slice = min(slice, se->slice);
 	if (slice != se->slice)
 		vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/core] sched/fair: Remove spurious shorter slice preemption
  2025-07-08 16:56 ` [PATCH v3 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
@ 2025-07-10 12:46   ` tip-bot2 for Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2025-07-10 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9de74a9850b9468ac2f515bfbe0844e0bfae869d
Gitweb:        https://git.kernel.org/tip/9de74a9850b9468ac2f515bfbe0844e0bfae869d
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 08 Jul 2025 18:56:27 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Jul 2025 13:40:22 +02:00

sched/fair: Remove spurious shorter slice preemption

Even if the waking task can preempt current, it might not be the one
selected by pick_task_fair. Check that the waking task will be selected
if we cancel the slice protection before doing so.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250708165630.1948751-4-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 44 ++++++++++++++------------------------------
 1 file changed, 14 insertions(+), 30 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8d288df..96718b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -932,7 +932,7 @@ static inline void cancel_protect_slice(struct sched_entity *se)
  *
  * Which allows tree pruning through eligibility.
  */
-static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq, bool protect)
 {
 	struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
 	struct sched_entity *se = __pick_first_entity(cfs_rq);
@@ -949,7 +949,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
 		curr = NULL;
 
-	if (curr && protect_slice(curr))
+	if (curr && protect && protect_slice(curr))
 		return curr;
 
 	/* Pick the leftmost entity if it's eligible */
@@ -993,6 +993,11 @@ found:
 	return best;
 }
 
+static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
+{
+	return __pick_eevdf(cfs_rq, true);
+}
+
 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
 {
 	struct rb_node *last = rb_last(&cfs_rq->tasks_timeline.rb_root);
@@ -1176,27 +1181,6 @@ static inline bool resched_next_slice(struct cfs_rq *cfs_rq, struct sched_entity
 	return !entity_eligible(cfs_rq, curr);
 }
 
-static inline bool do_preempt_short(struct cfs_rq *cfs_rq,
-				    struct sched_entity *pse, struct sched_entity *se)
-{
-	if (!sched_feat(PREEMPT_SHORT))
-		return false;
-
-	if (pse->slice >= se->slice)
-		return false;
-
-	if (!entity_eligible(cfs_rq, pse))
-		return false;
-
-	if (entity_before(pse, se))
-		return true;
-
-	if (!entity_eligible(cfs_rq, se))
-		return true;
-
-	return false;
-}
-
 /*
  * Used by other classes to account runtime.
  */
@@ -8658,6 +8642,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int 
 	struct sched_entity *se = &donor->se, *pse = &p->se;
 	struct cfs_rq *cfs_rq = task_cfs_rq(donor);
 	int cse_is_idle, pse_is_idle;
+	bool do_preempt_short = false;
 
 	if (unlikely(se == pse))
 		return;
@@ -8706,7 +8691,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int 
 		 * When non-idle entity preempt an idle entity,
 		 * don't give idle entity slice protection.
 		 */
-		cancel_protect_slice(se);
+		do_preempt_short = true;
 		goto preempt;
 	}
 
@@ -8724,22 +8709,21 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int 
 	/*
 	 * If @p has a shorter slice than current and @p is eligible, override
 	 * current's slice protection in order to allow preemption.
-	 *
-	 * Note that even if @p does not turn out to be the most eligible
-	 * task at this moment, current's slice protection will be lost.
 	 */
-	if (do_preempt_short(cfs_rq, pse, se))
-		cancel_protect_slice(se);
+	do_preempt_short = sched_feat(PREEMPT_SHORT) && (pse->slice < se->slice);
 
 	/*
 	 * If @p has become the most eligible task, force preemption.
 	 */
-	if (pick_eevdf(cfs_rq) == pse)
+	if (__pick_eevdf(cfs_rq, !do_preempt_short) == pse)
 		goto preempt;
 
 	return;
 
 preempt:
+	if (do_preempt_short)
+		cancel_protect_slice(se);
+
 	resched_curr_lazy(rq);
 }
 

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/core] sched/fair: Fix NO_RUN_TO_PARITY case
  2025-07-08 16:56 ` [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
  2025-07-09  9:17   ` Peter Zijlstra
@ 2025-07-10 12:46   ` tip-bot2 for Vincent Guittot
  1 sibling, 0 replies; 23+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2025-07-10 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     74eec63661d46a7153d04c2e0249eeb76cc76d44
Gitweb:        https://git.kernel.org/tip/74eec63661d46a7153d04c2e0249eeb76cc76d44
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 08 Jul 2025 18:56:26 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Jul 2025 13:40:22 +02:00

sched/fair: Fix NO_RUN_TO_PARITY case

EEVDF expects the scheduler to allocate a time quantum to the selected
entity and then pick a new entity for next quantum.
Although this notion of time quantum is not strictly doable in our case,
we can ensure a minimum runtime for each task most of the time and pick a
new entity after a minimum time has elapsed.
Reuse the slice protection of run to parity to ensure such runtime
quantum.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20250708165630.1948751-3-vincent.guittot@linaro.org
---
 include/linux/sched.h | 10 +++++++++-
 kernel/sched/fair.c   | 31 ++++++++++++++++++++-----------
 2 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 4802fcf..5592138 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -583,7 +583,15 @@ struct sched_entity {
 	u64				sum_exec_runtime;
 	u64				prev_sum_exec_runtime;
 	u64				vruntime;
-	s64				vlag;
+	union {
+		/*
+		 * When !@on_rq this field is vlag.
+		 * When cfs_rq->curr == se (which implies @on_rq)
+		 * this field is vprot. See protect_slice().
+		 */
+		s64                     vlag;
+		u64                     vprot;
+	};
 	u64				slice;
 
 	u64				nr_migrations;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 43fe5c8..8d288df 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -882,23 +882,35 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
 }
 
 /*
- * HACK, stash a copy of deadline at the point of pick in vlag,
- * which isn't used until dequeue.
+ * Set the vruntime up to which an entity can run before looking
+ * for another entity to pick.
+ * In case of run to parity, we protect the entity up to its deadline.
+ * When run to parity is disabled, we give a minimum quantum to the running
+ * entity to ensure progress.
  */
 static inline void set_protect_slice(struct sched_entity *se)
 {
-	se->vlag = se->deadline;
+	u64 slice = se->slice;
+	u64 vprot = se->deadline;
+
+	if (!sched_feat(RUN_TO_PARITY))
+		slice = min(slice, normalized_sysctl_sched_base_slice);
+
+	if (slice != se->slice)
+		vprot = min_vruntime(vprot, se->vruntime + calc_delta_fair(slice, se));
+
+	se->vprot = vprot;
 }
 
 static inline bool protect_slice(struct sched_entity *se)
 {
-	return se->vlag == se->deadline;
+	return ((s64)(se->vprot - se->vruntime) > 0);
 }
 
 static inline void cancel_protect_slice(struct sched_entity *se)
 {
 	if (protect_slice(se))
-		se->vlag = se->deadline + 1;
+		se->vprot = se->vruntime;
 }
 
 /*
@@ -937,7 +949,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
 	if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
 		curr = NULL;
 
-	if (sched_feat(RUN_TO_PARITY) && curr && protect_slice(curr))
+	if (curr && protect_slice(curr))
 		return curr;
 
 	/* Pick the leftmost entity if it's eligible */
@@ -1156,11 +1168,8 @@ static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
 	cgroup_account_cputime(p, delta_exec);
 }
 
-static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_entity *curr)
+static inline bool resched_next_slice(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
-	if (!sched_feat(PREEMPT_SHORT))
-		return false;
-
 	if (protect_slice(curr))
 		return false;
 
@@ -1248,7 +1257,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	if (cfs_rq->nr_queued == 1)
 		return;
 
-	if (resched || did_preempt_short(cfs_rq, curr)) {
+	if (resched || resched_next_slice(cfs_rq, curr)) {
 		resched_curr_lazy(rq);
 		clear_buddies(cfs_rq, curr);
 	}

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [tip: sched/core] sched/fair: Use protect_slice() instead of direct comparison
  2025-07-08 16:56 ` [PATCH v3 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
@ 2025-07-10 12:46   ` tip-bot2 for Vincent Guittot
  0 siblings, 0 replies; 23+ messages in thread
From: tip-bot2 for Vincent Guittot @ 2025-07-10 12:46 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Vincent Guittot, Peter Zijlstra (Intel), Dhaval Giani (AMD), x86,
	linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9cdb4fe20cd239c848b5c3f5753d83a9443ba329
Gitweb:        https://git.kernel.org/tip/9cdb4fe20cd239c848b5c3f5753d83a9443ba329
Author:        Vincent Guittot <vincent.guittot@linaro.org>
AuthorDate:    Tue, 08 Jul 2025 18:56:25 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 09 Jul 2025 13:40:22 +02:00

sched/fair: Use protect_slice() instead of direct comparison

Replace the test by the relevant protect_slice() function.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dhaval Giani (AMD) <dhaval@gianis.ca>
Link: https://lkml.kernel.org/r/20250708165630.1948751-2-vincent.guittot@linaro.org
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a1350c5..43fe5c8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1161,7 +1161,7 @@ static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_entity 
 	if (!sched_feat(PREEMPT_SHORT))
 		return false;
 
-	if (curr->vlag == curr->deadline)
+	if (protect_slice(curr))
 		return false;
 
 	return !entity_eligible(cfs_rq, curr);

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-10 12:34       ` Peter Zijlstra
@ 2025-07-13 18:17         ` Madadi Vineeth Reddy
  2025-07-20 10:57           ` Madadi Vineeth Reddy
  2025-07-21  9:06           ` Vincent Guittot
  0 siblings, 2 replies; 23+ messages in thread
From: Madadi Vineeth Reddy @ 2025-07-13 18:17 UTC (permalink / raw)
  To: Peter Zijlstra, Vincent Guittot
  Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, dhaval, linux-kernel, Madadi Vineeth Reddy

Hi Vincent, Peter

On 10/07/25 18:04, Peter Zijlstra wrote:
> 
>>> If I set my task’s custom slice to a larger value but another task has a smaller slice,
>>> this change will cap my protected window to the smaller slice. Does that mean my custom
>>> slice is no longer honored?
>>
>> What do you mean by honored ? EEVDF never mandates that a request of
>> size slice will be done in one go. Slice mainly defines the deadline
>> and orders the entities but not that it will always run your slice in
>> one go. Run to parity tries to minimize the number of context switches
>> between runnable tasks but must not break fairness and lag theorem.So
>> If your task A has a slice of 10ms and task B wakes up with a slice of
>> 1ms. B will preempt A because its deadline is earlier. If task B still
>> wants to run after its slice is exhausted, it will not be eligible and
>> task A will run until task B becomes eligible, which is as long as
>> task B's slice.
> 
> Right. Added if you don't want wakeup preemption, we've got SCHED_BATCH
> for you.

Thanks for the explanation. Understood now that slice is only for deadline
calculation and ordering for eligible tasks.

Before your patch, I observed that each task ran for its full custom slice
before preemption, which led me to assume that slice directly controlled
uninterrupted runtime.

With the patch series applied and RUN_TO_PARITY=true, I now see the expected behavior:
- Default slice (~2.8 ms): tasks run ~3 ms each.
- Increasing one task’s slice doesn’t extend its single‐run duration—it remains ~3 ms.
- Decreasing one tasks’ slice shortens everyone’s run to that new minimum.

With this patch series, With NO_RUN_TO_PARITY, I see runtimes near 1 ms (CONFIG_HZ=1000).

However, without your patches, I was still seeing ~3 ms runs even with NO_RUN_TO_PARITY,
which confused me because I expected runtime to drop to ~1 ms (preempt at every tick)
rather than run up to the default slice.

Without your patches and having RUN_TO_PARITY is as expected. Task running till it's
slice when eligible.

I ran these with 16 stress‑ng threads pinned to one CPU.

Please let me know if my understanding is incorrect, and why I was still seeing ~3 ms
runtimes with NO_RUN_TO_PARITY before this patch series.

Thanks,
Madadi Vineeth Reddy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-13 18:17         ` Madadi Vineeth Reddy
@ 2025-07-20 10:57           ` Madadi Vineeth Reddy
  2025-07-21  9:11             ` Vincent Guittot
  2025-07-21  9:06           ` Vincent Guittot
  1 sibling, 1 reply; 23+ messages in thread
From: Madadi Vineeth Reddy @ 2025-07-20 10:57 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, mingo, juri.lelli, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, dhaval, linux-kernel,
	Madadi Vineeth Reddy

On 13/07/25 23:47, Madadi Vineeth Reddy wrote:
> Hi Vincent, Peter
> 
> On 10/07/25 18:04, Peter Zijlstra wrote:
>>
>>>> If I set my task’s custom slice to a larger value but another task has a smaller slice,
>>>> this change will cap my protected window to the smaller slice. Does that mean my custom
>>>> slice is no longer honored?
>>>
>>> What do you mean by honored ? EEVDF never mandates that a request of
>>> size slice will be done in one go. Slice mainly defines the deadline
>>> and orders the entities but not that it will always run your slice in
>>> one go. Run to parity tries to minimize the number of context switches
>>> between runnable tasks but must not break fairness and lag theorem.So
>>> If your task A has a slice of 10ms and task B wakes up with a slice of
>>> 1ms. B will preempt A because its deadline is earlier. If task B still
>>> wants to run after its slice is exhausted, it will not be eligible and
>>> task A will run until task B becomes eligible, which is as long as
>>> task B's slice.
>>
>> Right. Added if you don't want wakeup preemption, we've got SCHED_BATCH
>> for you.
> 
> Thanks for the explanation. Understood now that slice is only for deadline
> calculation and ordering for eligible tasks.
> 
> Before your patch, I observed that each task ran for its full custom slice
> before preemption, which led me to assume that slice directly controlled
> uninterrupted runtime.
> 
> With the patch series applied and RUN_TO_PARITY=true, I now see the expected behavior:
> - Default slice (~2.8 ms): tasks run ~3 ms each.
> - Increasing one task’s slice doesn’t extend its single‐run duration—it remains ~3 ms.
> - Decreasing one tasks’ slice shortens everyone’s run to that new minimum.
> 
> With this patch series, With NO_RUN_TO_PARITY, I see runtimes near 1 ms (CONFIG_HZ=1000).
> 
> However, without your patches, I was still seeing ~3 ms runs even with NO_RUN_TO_PARITY,
> which confused me because I expected runtime to drop to ~1 ms (preempt at every tick)
> rather than run up to the default slice.
> 
> Without your patches and having RUN_TO_PARITY is as expected. Task running till it's
> slice when eligible.
> 
> I ran these with 16 stress‑ng threads pinned to one CPU.
> 
> Please let me know if my understanding is incorrect, and why I was still seeing ~3 ms
> runtimes with NO_RUN_TO_PARITY before this patch series.
> 

Hi Vincent,

Just following up on my earlier question: with the patch applied (and RUN_TO_PARITY=true),
reducing one task’s slice now clamps the runtime of all tasks on that runqueue to the new
minimum.(By “runtime” I mean the continuous time a task runs before preemption.). Could this
negatively impact throughput oriented workloads where remaining threads need longer run time
before preemption? 

I understand that slice is only for ordering of deadlines but just curious about it's
effect in scenarios like this.

Thanks,
Madadi Vineeth Reddy

> Thanks,
> Madadi Vineeth Reddy


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-13 18:17         ` Madadi Vineeth Reddy
  2025-07-20 10:57           ` Madadi Vineeth Reddy
@ 2025-07-21  9:06           ` Vincent Guittot
  1 sibling, 0 replies; 23+ messages in thread
From: Vincent Guittot @ 2025-07-21  9:06 UTC (permalink / raw)
  To: Madadi Vineeth Reddy
  Cc: Peter Zijlstra, mingo, juri.lelli, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, dhaval, linux-kernel

Hi Madadi,

Sorry for the late reply but I have limited network access at the moment.

On Sun, 13 Jul 2025 at 20:17, Madadi Vineeth Reddy
<vineethr@linux.ibm.com> wrote:
>
> Hi Vincent, Peter
>
> On 10/07/25 18:04, Peter Zijlstra wrote:
> >
> >>> If I set my task’s custom slice to a larger value but another task has a smaller slice,
> >>> this change will cap my protected window to the smaller slice. Does that mean my custom
> >>> slice is no longer honored?
> >>
> >> What do you mean by honored ? EEVDF never mandates that a request of
> >> size slice will be done in one go. Slice mainly defines the deadline
> >> and orders the entities but not that it will always run your slice in
> >> one go. Run to parity tries to minimize the number of context switches
> >> between runnable tasks but must not break fairness and lag theorem.So
> >> If your task A has a slice of 10ms and task B wakes up with a slice of
> >> 1ms. B will preempt A because its deadline is earlier. If task B still
> >> wants to run after its slice is exhausted, it will not be eligible and
> >> task A will run until task B becomes eligible, which is as long as
> >> task B's slice.
> >
> > Right. Added if you don't want wakeup preemption, we've got SCHED_BATCH
> > for you.
>
> Thanks for the explanation. Understood now that slice is only for deadline
> calculation and ordering for eligible tasks.
>
> Before your patch, I observed that each task ran for its full custom slice
> before preemption, which led me to assume that slice directly controlled
> uninterrupted runtime.
>
> With the patch series applied and RUN_TO_PARITY=true, I now see the expected behavior:
> - Default slice (~2.8 ms): tasks run ~3 ms each.
> - Increasing one task’s slice doesn’t extend its single‐run duration—it remains ~3 ms.
> - Decreasing one tasks’ slice shortens everyone’s run to that new minimum.
>
> With this patch series, With NO_RUN_TO_PARITY, I see runtimes near 1 ms (CONFIG_HZ=1000).
>
> However, without your patches, I was still seeing ~3 ms runs even with NO_RUN_TO_PARITY,
> which confused me because I expected runtime to drop to ~1 ms (preempt at every tick)
> rather than run up to the default slice.
>
> Without your patches and having RUN_TO_PARITY is as expected. Task running till it's
> slice when eligible.
>
> I ran these with 16 stress‑ng threads pinned to one CPU.
>
> Please let me know if my understanding is incorrect, and why I was still seeing ~3 ms
> runtimes with NO_RUN_TO_PARITY before this patch series.

Before my patchset both NO_RUN_TO_PARITY and RUN_TO_PARITY were wrong.
Patch 2 fixes NO_RUN_TO_PARITY and others RUN_TO_PARITY

>
> Thanks,
> Madadi Vineeth Reddy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-20 10:57           ` Madadi Vineeth Reddy
@ 2025-07-21  9:11             ` Vincent Guittot
  2025-07-21 15:22               ` Madadi Vineeth Reddy
  0 siblings, 1 reply; 23+ messages in thread
From: Vincent Guittot @ 2025-07-21  9:11 UTC (permalink / raw)
  To: Madadi Vineeth Reddy
  Cc: Peter Zijlstra, mingo, juri.lelli, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, dhaval, linux-kernel

On Sun, 20 Jul 2025 at 12:57, Madadi Vineeth Reddy
<vineethr@linux.ibm.com> wrote:
>
> On 13/07/25 23:47, Madadi Vineeth Reddy wrote:
> > Hi Vincent, Peter
> >
> > On 10/07/25 18:04, Peter Zijlstra wrote:
> >>
> >>>> If I set my task’s custom slice to a larger value but another task has a smaller slice,
> >>>> this change will cap my protected window to the smaller slice. Does that mean my custom
> >>>> slice is no longer honored?
> >>>
> >>> What do you mean by honored ? EEVDF never mandates that a request of
> >>> size slice will be done in one go. Slice mainly defines the deadline
> >>> and orders the entities but not that it will always run your slice in
> >>> one go. Run to parity tries to minimize the number of context switches
> >>> between runnable tasks but must not break fairness and lag theorem.So
> >>> If your task A has a slice of 10ms and task B wakes up with a slice of
> >>> 1ms. B will preempt A because its deadline is earlier. If task B still
> >>> wants to run after its slice is exhausted, it will not be eligible and
> >>> task A will run until task B becomes eligible, which is as long as
> >>> task B's slice.
> >>
> >> Right. Added if you don't want wakeup preemption, we've got SCHED_BATCH
> >> for you.
> >
> > Thanks for the explanation. Understood now that slice is only for deadline
> > calculation and ordering for eligible tasks.
> >
> > Before your patch, I observed that each task ran for its full custom slice
> > before preemption, which led me to assume that slice directly controlled
> > uninterrupted runtime.
> >
> > With the patch series applied and RUN_TO_PARITY=true, I now see the expected behavior:
> > - Default slice (~2.8 ms): tasks run ~3 ms each.
> > - Increasing one task’s slice doesn’t extend its single‐run duration—it remains ~3 ms.
> > - Decreasing one tasks’ slice shortens everyone’s run to that new minimum.
> >
> > With this patch series, With NO_RUN_TO_PARITY, I see runtimes near 1 ms (CONFIG_HZ=1000).
> >
> > However, without your patches, I was still seeing ~3 ms runs even with NO_RUN_TO_PARITY,
> > which confused me because I expected runtime to drop to ~1 ms (preempt at every tick)
> > rather than run up to the default slice.
> >
> > Without your patches and having RUN_TO_PARITY is as expected. Task running till it's
> > slice when eligible.
> >
> > I ran these with 16 stress‑ng threads pinned to one CPU.
> >
> > Please let me know if my understanding is incorrect, and why I was still seeing ~3 ms
> > runtimes with NO_RUN_TO_PARITY before this patch series.
> >
>
> Hi Vincent,
>
> Just following up on my earlier question: with the patch applied (and RUN_TO_PARITY=true),
> reducing one task’s slice now clamps the runtime of all tasks on that runqueue to the new
> minimum.(By “runtime” I mean the continuous time a task runs before preemption.). Could this
> negatively impact throughput oriented workloads where remaining threads need longer run time
> before preemption?

Probably, it is also expected that tasks which have shorter slices,
don't want to run forever. The shorter runtime will only apply while
the task is runnable and this task should run 1st or almost and go
back to sleep so its impact should be small. I agree that if you have
an always running task which sets its slice to 1ms it will increase
number of context switch for other tasks which don't have a longer
slice but we can't do much against that

>
> I understand that slice is only for ordering of deadlines but just curious about it's
> effect in scenarios like this.
>
> Thanks,
> Madadi Vineeth Reddy
>
> > Thanks,
> > Madadi Vineeth Reddy
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities
  2025-07-21  9:11             ` Vincent Guittot
@ 2025-07-21 15:22               ` Madadi Vineeth Reddy
  0 siblings, 0 replies; 23+ messages in thread
From: Madadi Vineeth Reddy @ 2025-07-21 15:22 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, mingo, juri.lelli, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, dhaval, linux-kernel,
	Madadi Vineeth Reddy

Hi Vincent,

On 21/07/25 14:41, Vincent Guittot wrote:
> On Sun, 20 Jul 2025 at 12:57, Madadi Vineeth Reddy
> <vineethr@linux.ibm.com> wrote:
>>
>> On 13/07/25 23:47, Madadi Vineeth Reddy wrote:
>>> Hi Vincent, Peter
>>>
>>> On 10/07/25 18:04, Peter Zijlstra wrote:
>>>>
>>>>>> If I set my task’s custom slice to a larger value but another task has a smaller slice,
>>>>>> this change will cap my protected window to the smaller slice. Does that mean my custom
>>>>>> slice is no longer honored?
>>>>>
>>>>> What do you mean by honored ? EEVDF never mandates that a request of
>>>>> size slice will be done in one go. Slice mainly defines the deadline
>>>>> and orders the entities but not that it will always run your slice in
>>>>> one go. Run to parity tries to minimize the number of context switches
>>>>> between runnable tasks but must not break fairness and lag theorem.So
>>>>> If your task A has a slice of 10ms and task B wakes up with a slice of
>>>>> 1ms. B will preempt A because its deadline is earlier. If task B still
>>>>> wants to run after its slice is exhausted, it will not be eligible and
>>>>> task A will run until task B becomes eligible, which is as long as
>>>>> task B's slice.
>>>>
>>>> Right. Added if you don't want wakeup preemption, we've got SCHED_BATCH
>>>> for you.
>>>
>>> Thanks for the explanation. Understood now that slice is only for deadline
>>> calculation and ordering for eligible tasks.
>>>
>>> Before your patch, I observed that each task ran for its full custom slice
>>> before preemption, which led me to assume that slice directly controlled
>>> uninterrupted runtime.
>>>
>>> With the patch series applied and RUN_TO_PARITY=true, I now see the expected behavior:
>>> - Default slice (~2.8 ms): tasks run ~3 ms each.
>>> - Increasing one task’s slice doesn’t extend its single‐run duration—it remains ~3 ms.
>>> - Decreasing one tasks’ slice shortens everyone’s run to that new minimum.
>>>
>>> With this patch series, With NO_RUN_TO_PARITY, I see runtimes near 1 ms (CONFIG_HZ=1000).
>>>
>>> However, without your patches, I was still seeing ~3 ms runs even with NO_RUN_TO_PARITY,
>>> which confused me because I expected runtime to drop to ~1 ms (preempt at every tick)
>>> rather than run up to the default slice.
>>>
>>> Without your patches and having RUN_TO_PARITY is as expected. Task running till it's
>>> slice when eligible.
>>>
>>> I ran these with 16 stress‑ng threads pinned to one CPU.
>>>
>>> Please let me know if my understanding is incorrect, and why I was still seeing ~3 ms
>>> runtimes with NO_RUN_TO_PARITY before this patch series.
>>>
>>
>> Hi Vincent,
>>
>> Just following up on my earlier question: with the patch applied (and RUN_TO_PARITY=true),
>> reducing one task’s slice now clamps the runtime of all tasks on that runqueue to the new
>> minimum.(By “runtime” I mean the continuous time a task runs before preemption.). Could this
>> negatively impact throughput oriented workloads where remaining threads need longer run time
>> before preemption?
> 
> Probably, it is also expected that tasks which have shorter slices,
> don't want to run forever. The shorter runtime will only apply while
> the task is runnable and this task should run 1st or almost and go
> back to sleep so its impact should be small. I agree that if you have
> an always running task which sets its slice to 1ms it will increase
> number of context switch for other tasks which don't have a longer
> slice but we can't do much against that
> 
>>
>> I understand that slice is only for ordering of deadlines but just curious about it's
>> effect in scenarios like this.

Understood, thank you for the clarification. Since fairness is the first priority, I see
that there's not much that can be done in the "always running" case.

Thanks again for the detailed explanation.

Thanks,
Madadi Vineeth Reddy

>>
>> Thanks,
>> Madadi Vineeth Reddy
>>
>>> Thanks,
>>> Madadi Vineeth Reddy
>>


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-07-21 15:23 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-08 16:56 [PATCH v3 0/6] sched/fair: Manage lag and run to parity with different slices Vincent Guittot
2025-07-08 16:56 ` [PATCH v3 1/6] sched/fair: Use protect_slice() instead of direct comparison Vincent Guittot
2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2025-07-08 16:56 ` [PATCH v3 2/6] sched/fair: Fix NO_RUN_TO_PARITY case Vincent Guittot
2025-07-09  9:17   ` Peter Zijlstra
2025-07-09  9:32     ` Vincent Guittot
2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2025-07-08 16:56 ` [PATCH v3 3/6] sched/fair: Remove spurious shorter slice preemption Vincent Guittot
2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2025-07-08 16:56 ` [PATCH v3 4/6] sched/fair: Limit run to parity to the min slice of enqueued entities Vincent Guittot
2025-07-10  6:59   ` Madadi Vineeth Reddy
2025-07-10 10:40     ` Vincent Guittot
2025-07-10 12:34       ` Peter Zijlstra
2025-07-13 18:17         ` Madadi Vineeth Reddy
2025-07-20 10:57           ` Madadi Vineeth Reddy
2025-07-21  9:11             ` Vincent Guittot
2025-07-21 15:22               ` Madadi Vineeth Reddy
2025-07-21  9:06           ` Vincent Guittot
2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2025-07-08 16:56 ` [PATCH v3 5/6] sched/fair: Fix entity's lag with run to parity Vincent Guittot
2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot
2025-07-08 16:56 ` [PATCH v3 6/6] sched/fair: Always trigger resched at the end of a protected period Vincent Guittot
2025-07-10 12:46   ` [tip: sched/core] " tip-bot2 for Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).