* [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue
@ 2024-11-28 9:27 Vincent Guittot
2024-11-28 9:27 ` [PATCH 1/9] sched/eevdf: More PELT vs DELAYED_DEQUEUE Vincent Guittot
` (8 more replies)
0 siblings, 9 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
Delayed dequeued feature keeps a sleeping sched_entitiy enqueued until its
lag has elapsed. As a result, it stays also visible in the statistics that
are used to balance the system and in particular the field h_nr_running.
This serie fixes those metrics by creating a new h_nr_enqueued that tracks
all enqueued tasks and restore the behavior of h_nr_running i.e. tracking
the number of fair tasks that want to run.
h_nr_running is used in several places to make decision on load balance:
- PELT runnable_avg
- deciding if a group is overloaded or has spare capacity
- numa stats
- reduced capacity management
- load balance between groups
While fixing h_nr_running, some fields have been renamed to follow the
same pattern. We now have:
- cfs.h_nr_running : running tasks in the hierarchy
- cfs.h_nr_enqueued : enqueued tasks in the hierarchy either running or
delayed dequeue
- cfs.h_nr_idle : enqueued sched idle tasks in the hierarchy
cfs.nr_running has been rename cfs.nr_enqueued because it includes the
delayed dequeued entities
The unused cfs.idle_nr_running has been removed
Load balance compares the number of running tasks when selecting the
busiest group or runqueue and tries to migrate a runnable task and not a
sleeping delayed dequeue one.
It should be noticed that this serie doesn't fix the problem of delayed
dequeued tasks that can't migrate at wakeup.
Some additional cleanups have been added:
- move variable declaration at the beginning of pick_next_entity()
- sched_can_stop_tick() should use cfs.h_nr_enqueued instead of
cfs.nr_enqueued (previously cfs.nr_running) to know how many tasks
are running in the whole hierarchy instead of how many entities at
root level
Peter Zijlstra (1):
sched/eevdf: More PELT vs DELAYED_DEQUEUE
Vincent Guittot (8):
sched/fair: Add new cfs_rq.h_nr_enqueued
sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle
sched/fair: Remove unused cfs_rq.idle_nr_running
sched/fair: Rename cfs_rq.nr_running into nr_enqueued
sched/fair: Removed unsued cfs_rq.h_nr_delayed
sched/fair: Do not try to migrate delayed dequeue task
sched/fair: Fix sched_can_stop_tick() for fair tasks
sched/fair: Fix variable declaration position
kernel/sched/core.c | 4 +-
kernel/sched/debug.c | 13 ++-
kernel/sched/fair.c | 214 +++++++++++++++++++++++++------------------
kernel/sched/sched.h | 8 +-
4 files changed, 138 insertions(+), 101 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH 1/9] sched/eevdf: More PELT vs DELAYED_DEQUEUE
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 9:27 ` [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued Vincent Guittot
` (7 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
From: Peter Zijlstra <peterz@infradead.org>
Vincent and Dietmar noted that while
commit fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE") fixes
the entity runnable stats, it does not adjust the cfs_rq runnable stats,
which are based off of h_nr_running.
Track h_nr_delayed such that we can discount those and adjust the
signal.
Fixes: fc1892becd56 ("sched/eevdf: Fixup PELT vs DELAYED_DEQUEUE")
Reported-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Closes: https://lore.kernel.org/lkml/a9a45193-d0c6-4ba2-a822-464ad30b550e@arm.com/
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Closes: https://lore.kernel.org/lkml/CAKfTPtCNUvWE_GX5LyvTF-WdxUT=ZgvZZv-4t=eWntg5uOFqiQ@mail.gmail.com/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[ Fixes checkpatch warnings ]
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20240906104525.GG4928@noisy.programming.kicks-ass.net
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
kernel/sched/debug.c | 1 +
kernel/sched/fair.c | 51 +++++++++++++++++++++++++++++++++++++++-----
kernel/sched/pelt.c | 2 +-
kernel/sched/sched.h | 8 +++++--
4 files changed, 54 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index a48b2a701ec2..a1be00a988bf 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -845,6 +845,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", SPLIT_NS(spread));
SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running);
+ SEQ_printf(m, " .%-30s: %d\n", "h_nr_delayed", cfs_rq->h_nr_delayed);
SEQ_printf(m, " .%-30s: %d\n", "idle_nr_running",
cfs_rq->idle_nr_running);
SEQ_printf(m, " .%-30s: %d\n", "idle_h_nr_running",
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fbdca89c677f..dc43a8daea35 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5465,9 +5465,33 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se)
static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq);
-static inline void finish_delayed_dequeue_entity(struct sched_entity *se)
+static void set_delayed(struct sched_entity *se)
+{
+ se->sched_delayed = 1;
+ for_each_sched_entity(se) {
+ struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+ cfs_rq->h_nr_delayed++;
+ if (cfs_rq_throttled(cfs_rq))
+ break;
+ }
+}
+
+static void clear_delayed(struct sched_entity *se)
{
se->sched_delayed = 0;
+ for_each_sched_entity(se) {
+ struct cfs_rq *cfs_rq = cfs_rq_of(se);
+
+ cfs_rq->h_nr_delayed--;
+ if (cfs_rq_throttled(cfs_rq))
+ break;
+ }
+}
+
+static inline void finish_delayed_dequeue_entity(struct sched_entity *se)
+{
+ clear_delayed(se);
if (sched_feat(DELAY_ZERO) && se->vlag > 0)
se->vlag = 0;
}
@@ -5497,7 +5521,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (cfs_rq->next == se)
cfs_rq->next = NULL;
update_load_avg(cfs_rq, se, 0);
- se->sched_delayed = 1;
+ set_delayed(se);
return false;
}
}
@@ -5911,7 +5935,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long task_delta, idle_task_delta, dequeue = 1;
+ long task_delta, idle_task_delta, delayed_delta, dequeue = 1;
long rq_h_nr_running = rq->cfs.h_nr_running;
raw_spin_lock(&cfs_b->lock);
@@ -5944,6 +5968,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
task_delta = cfs_rq->h_nr_running;
idle_task_delta = cfs_rq->idle_h_nr_running;
+ delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
struct cfs_rq *qcfs_rq = cfs_rq_of(se);
int flags;
@@ -5967,6 +5992,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running -= task_delta;
qcfs_rq->idle_h_nr_running -= idle_task_delta;
+ qcfs_rq->h_nr_delayed -= delayed_delta;
if (qcfs_rq->load.weight) {
/* Avoid re-evaluating load for this entity: */
@@ -5989,6 +6015,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running -= task_delta;
qcfs_rq->idle_h_nr_running -= idle_task_delta;
+ qcfs_rq->h_nr_delayed -= delayed_delta;
}
/* At this point se is NULL and we are at root level*/
@@ -6014,7 +6041,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long task_delta, idle_task_delta;
+ long task_delta, idle_task_delta, delayed_delta;
long rq_h_nr_running = rq->cfs.h_nr_running;
se = cfs_rq->tg->se[cpu_of(rq)];
@@ -6050,6 +6077,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
task_delta = cfs_rq->h_nr_running;
idle_task_delta = cfs_rq->idle_h_nr_running;
+ delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
struct cfs_rq *qcfs_rq = cfs_rq_of(se);
@@ -6067,6 +6095,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running += task_delta;
qcfs_rq->idle_h_nr_running += idle_task_delta;
+ qcfs_rq->h_nr_delayed += delayed_delta;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(qcfs_rq))
@@ -6084,6 +6113,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running += task_delta;
qcfs_rq->idle_h_nr_running += idle_task_delta;
+ qcfs_rq->h_nr_delayed += delayed_delta;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(qcfs_rq))
@@ -6937,7 +6967,7 @@ requeue_delayed_entity(struct sched_entity *se)
}
update_load_avg(cfs_rq, se, 0);
- se->sched_delayed = 0;
+ clear_delayed(se);
}
/*
@@ -6951,6 +6981,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
int idle_h_nr_running = task_has_idle_policy(p);
+ int h_nr_delayed = 0;
int task_new = !(flags & ENQUEUE_WAKEUP);
int rq_h_nr_running = rq->cfs.h_nr_running;
u64 slice = 0;
@@ -6977,6 +7008,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (p->in_iowait)
cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
+ if (task_new)
+ h_nr_delayed = !!se->sched_delayed;
+
for_each_sched_entity(se) {
if (se->on_rq) {
if (se->sched_delayed)
@@ -6999,6 +7033,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
cfs_rq->h_nr_running++;
cfs_rq->idle_h_nr_running += idle_h_nr_running;
+ cfs_rq->h_nr_delayed += h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
idle_h_nr_running = 1;
@@ -7022,6 +7057,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
cfs_rq->h_nr_running++;
cfs_rq->idle_h_nr_running += idle_h_nr_running;
+ cfs_rq->h_nr_delayed += h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
idle_h_nr_running = 1;
@@ -7084,6 +7120,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
struct task_struct *p = NULL;
int idle_h_nr_running = 0;
int h_nr_running = 0;
+ int h_nr_delayed = 0;
struct cfs_rq *cfs_rq;
u64 slice = 0;
@@ -7091,6 +7128,8 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
p = task_of(se);
h_nr_running = 1;
idle_h_nr_running = task_has_idle_policy(p);
+ if (!task_sleep && !task_delayed)
+ h_nr_delayed = !!se->sched_delayed;
} else {
cfs_rq = group_cfs_rq(se);
slice = cfs_rq_min_slice(cfs_rq);
@@ -7108,6 +7147,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
cfs_rq->h_nr_running -= h_nr_running;
cfs_rq->idle_h_nr_running -= idle_h_nr_running;
+ cfs_rq->h_nr_delayed -= h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
idle_h_nr_running = h_nr_running;
@@ -7146,6 +7186,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
cfs_rq->h_nr_running -= h_nr_running;
cfs_rq->idle_h_nr_running -= idle_h_nr_running;
+ cfs_rq->h_nr_delayed -= h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
idle_h_nr_running = h_nr_running;
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index fc07382361a8..fee75cc2c47b 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -321,7 +321,7 @@ int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq)
{
if (___update_load_sum(now, &cfs_rq->avg,
scale_load_down(cfs_rq->load.weight),
- cfs_rq->h_nr_running,
+ cfs_rq->h_nr_running - cfs_rq->h_nr_delayed,
cfs_rq->curr != NULL)) {
___update_load_avg(&cfs_rq->avg, 1);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 76f5f53a645f..1e494af2cd23 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -649,6 +649,7 @@ struct cfs_rq {
unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
unsigned int idle_nr_running; /* SCHED_IDLE */
unsigned int idle_h_nr_running; /* SCHED_IDLE */
+ unsigned int h_nr_delayed;
s64 avg_vruntime;
u64 avg_load;
@@ -898,8 +899,11 @@ struct dl_rq {
static inline void se_update_runnable(struct sched_entity *se)
{
- if (!entity_is_task(se))
- se->runnable_weight = se->my_q->h_nr_running;
+ if (!entity_is_task(se)) {
+ struct cfs_rq *cfs_rq = se->my_q;
+
+ se->runnable_weight = cfs_rq->h_nr_running - cfs_rq->h_nr_delayed;
+ }
}
static inline long se_runnable(struct sched_entity *se)
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
2024-11-28 9:27 ` [PATCH 1/9] sched/eevdf: More PELT vs DELAYED_DEQUEUE Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 9:56 ` Peter Zijlstra
2024-11-28 9:27 ` [PATCH 3/9] sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle Vincent Guittot
` (6 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
With delayed dequeued feature, a sleeping sched_entity remains enqueued
in the rq until its lag has elapsed. As a result, it stays also visible
in the statistics that are used to balance the system and in particular
the field h_nr_running when the sched_entity is associated to a task.
Create a new h_nr_enqueued that tracks all enqueued tasks and restore the
behavior of h_nr_running i.e. tracking the number of fair tasks that want
to run.
h_nr_running is used in several places to make decision on load balance:
- PELT runnable_avg
- deciding if a group is overloaded or has spare capacity
- numa stats
- reduced capacity management
- load balance
- nohz kick
It should be noticed that the rq->nr_running still counts the delayed
dequeued tasks as delayed dequeue is a fair feature that is meaningless
at core level.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/core.c | 2 +-
kernel/sched/debug.c | 5 +--
kernel/sched/fair.c | 81 +++++++++++++++++++++++++++-----------------
kernel/sched/pelt.c | 2 +-
kernel/sched/sched.h | 8 ++---
5 files changed, 57 insertions(+), 41 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 95e40895a519..425739bbdc63 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6018,7 +6018,7 @@ __pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
* opportunity to pull in more work from other CPUs.
*/
if (likely(!sched_class_above(prev->sched_class, &fair_sched_class) &&
- rq->nr_running == rq->cfs.h_nr_running)) {
+ rq->nr_running == rq->cfs.h_nr_enqueued)) {
p = pick_next_task_fair(rq, prev, rf);
if (unlikely(p == RETRY_TASK))
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index a1be00a988bf..6b8cd869a2f4 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -379,7 +379,7 @@ static ssize_t sched_fair_server_write(struct file *filp, const char __user *ubu
return -EINVAL;
}
- if (rq->cfs.h_nr_running) {
+ if (rq->cfs.h_nr_enqueued) {
update_rq_clock(rq);
dl_server_stop(&rq->fair_server);
}
@@ -392,7 +392,7 @@ static ssize_t sched_fair_server_write(struct file *filp, const char __user *ubu
printk_deferred("Fair server disabled in CPU %d, system may crash due to starvation.\n",
cpu_of(rq));
- if (rq->cfs.h_nr_running)
+ if (rq->cfs.h_nr_enqueued)
dl_server_start(&rq->fair_server);
}
@@ -845,6 +845,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", SPLIT_NS(spread));
SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running);
+ SEQ_printf(m, " .%-30s: %d\n", "h_nr_enqueued", cfs_rq->h_nr_enqueued);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_delayed", cfs_rq->h_nr_delayed);
SEQ_printf(m, " .%-30s: %d\n", "idle_nr_running",
cfs_rq->idle_nr_running);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dc43a8daea35..6b7afb69d8ff 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5471,18 +5471,21 @@ static void set_delayed(struct sched_entity *se)
for_each_sched_entity(se) {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ cfs_rq->h_nr_running--;
cfs_rq->h_nr_delayed++;
if (cfs_rq_throttled(cfs_rq))
break;
}
}
-static void clear_delayed(struct sched_entity *se)
+static void clear_delayed(struct sched_entity *se, bool running)
{
se->sched_delayed = 0;
for_each_sched_entity(se) {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
+ if (running)
+ cfs_rq->h_nr_running++;
cfs_rq->h_nr_delayed--;
if (cfs_rq_throttled(cfs_rq))
break;
@@ -5491,7 +5494,7 @@ static void clear_delayed(struct sched_entity *se)
static inline void finish_delayed_dequeue_entity(struct sched_entity *se)
{
- clear_delayed(se);
+ clear_delayed(se, false);
if (sched_feat(DELAY_ZERO) && se->vlag > 0)
se->vlag = 0;
}
@@ -5935,8 +5938,8 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long task_delta, idle_task_delta, delayed_delta, dequeue = 1;
- long rq_h_nr_running = rq->cfs.h_nr_running;
+ long running_delta, enqueued_delta, idle_task_delta, delayed_delta, dequeue = 1;
+ long rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
raw_spin_lock(&cfs_b->lock);
/* This will start the period timer if necessary */
@@ -5966,7 +5969,8 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
walk_tg_tree_from(cfs_rq->tg, tg_throttle_down, tg_nop, (void *)rq);
rcu_read_unlock();
- task_delta = cfs_rq->h_nr_running;
+ running_delta = cfs_rq->h_nr_running;
+ enqueued_delta = cfs_rq->h_nr_enqueued;
idle_task_delta = cfs_rq->idle_h_nr_running;
delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
@@ -5988,9 +5992,10 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
dequeue_entity(qcfs_rq, se, flags);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_running;
+ idle_task_delta = cfs_rq->h_nr_enqueued;
- qcfs_rq->h_nr_running -= task_delta;
+ qcfs_rq->h_nr_running -= running_delta;
+ qcfs_rq->h_nr_enqueued -= enqueued_delta;
qcfs_rq->idle_h_nr_running -= idle_task_delta;
qcfs_rq->h_nr_delayed -= delayed_delta;
@@ -6011,18 +6016,19 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
se_update_runnable(se);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_running;
+ idle_task_delta = cfs_rq->h_nr_enqueued;
- qcfs_rq->h_nr_running -= task_delta;
+ qcfs_rq->h_nr_running -= running_delta;
+ qcfs_rq->h_nr_enqueued -= enqueued_delta;
qcfs_rq->idle_h_nr_running -= idle_task_delta;
qcfs_rq->h_nr_delayed -= delayed_delta;
}
/* At this point se is NULL and we are at root level*/
- sub_nr_running(rq, task_delta);
+ sub_nr_running(rq, enqueued_delta);
/* Stop the fair server if throttling resulted in no runnable tasks */
- if (rq_h_nr_running && !rq->cfs.h_nr_running)
+ if (rq_h_nr_enqueued && !rq->cfs.h_nr_enqueued)
dl_server_stop(&rq->fair_server);
done:
/*
@@ -6041,8 +6047,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long task_delta, idle_task_delta, delayed_delta;
- long rq_h_nr_running = rq->cfs.h_nr_running;
+ long running_delta, enqueued_delta, idle_task_delta, delayed_delta;
+ long rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
se = cfs_rq->tg->se[cpu_of(rq)];
@@ -6075,7 +6081,8 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
goto unthrottle_throttle;
}
- task_delta = cfs_rq->h_nr_running;
+ running_delta = cfs_rq->h_nr_running;
+ enqueued_delta = cfs_rq->h_nr_enqueued;
idle_task_delta = cfs_rq->idle_h_nr_running;
delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
@@ -6091,9 +6098,10 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_running;
+ idle_task_delta = cfs_rq->h_nr_enqueued;
- qcfs_rq->h_nr_running += task_delta;
+ qcfs_rq->h_nr_running += running_delta;
+ qcfs_rq->h_nr_enqueued += enqueued_delta;
qcfs_rq->idle_h_nr_running += idle_task_delta;
qcfs_rq->h_nr_delayed += delayed_delta;
@@ -6109,9 +6117,10 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
se_update_runnable(se);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_running;
+ idle_task_delta = cfs_rq->h_nr_enqueued;
- qcfs_rq->h_nr_running += task_delta;
+ qcfs_rq->h_nr_running += running_delta;
+ qcfs_rq->h_nr_enqueued += enqueued_delta;
qcfs_rq->idle_h_nr_running += idle_task_delta;
qcfs_rq->h_nr_delayed += delayed_delta;
@@ -6121,11 +6130,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
}
/* Start the fair server if un-throttling resulted in new runnable tasks */
- if (!rq_h_nr_running && rq->cfs.h_nr_running)
+ if (!rq_h_nr_enqueued && rq->cfs.h_nr_enqueued)
dl_server_start(&rq->fair_server);
/* At this point se is NULL and we are at root level*/
- add_nr_running(rq, task_delta);
+ add_nr_running(rq, enqueued_delta);
unthrottle_throttle:
assert_list_leaf_cfs_rq(rq);
@@ -6840,7 +6849,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
SCHED_WARN_ON(task_rq(p) != rq);
- if (rq->cfs.h_nr_running > 1) {
+ if (rq->cfs.h_nr_enqueued > 1) {
u64 ran = se->sum_exec_runtime - se->prev_sum_exec_runtime;
u64 slice = se->slice;
s64 delta = slice - ran;
@@ -6967,7 +6976,7 @@ requeue_delayed_entity(struct sched_entity *se)
}
update_load_avg(cfs_rq, se, 0);
- clear_delayed(se);
+ clear_delayed(se, true);
}
/*
@@ -6983,7 +6992,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
int idle_h_nr_running = task_has_idle_policy(p);
int h_nr_delayed = 0;
int task_new = !(flags & ENQUEUE_WAKEUP);
- int rq_h_nr_running = rq->cfs.h_nr_running;
+ int rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
u64 slice = 0;
/*
@@ -7031,7 +7040,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
enqueue_entity(cfs_rq, se, flags);
slice = cfs_rq_min_slice(cfs_rq);
- cfs_rq->h_nr_running++;
+ if (!h_nr_delayed)
+ cfs_rq->h_nr_running++;
+ cfs_rq->h_nr_enqueued++;
cfs_rq->idle_h_nr_running += idle_h_nr_running;
cfs_rq->h_nr_delayed += h_nr_delayed;
@@ -7055,7 +7066,9 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
se->slice = slice;
slice = cfs_rq_min_slice(cfs_rq);
- cfs_rq->h_nr_running++;
+ if (!h_nr_delayed)
+ cfs_rq->h_nr_running++;
+ cfs_rq->h_nr_enqueued++;
cfs_rq->idle_h_nr_running += idle_h_nr_running;
cfs_rq->h_nr_delayed += h_nr_delayed;
@@ -7067,7 +7080,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
goto enqueue_throttle;
}
- if (!rq_h_nr_running && rq->cfs.h_nr_running) {
+ if (!rq_h_nr_enqueued && rq->cfs.h_nr_enqueued) {
/* Account for idle runtime */
if (!rq->nr_running)
dl_server_update_idle_time(rq, rq->curr);
@@ -7114,7 +7127,7 @@ static void set_next_buddy(struct sched_entity *se);
static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
{
bool was_sched_idle = sched_idle_rq(rq);
- int rq_h_nr_running = rq->cfs.h_nr_running;
+ int rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
bool task_sleep = flags & DEQUEUE_SLEEP;
bool task_delayed = flags & DEQUEUE_DELAYED;
struct task_struct *p = NULL;
@@ -7145,7 +7158,9 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
break;
}
- cfs_rq->h_nr_running -= h_nr_running;
+ if (!h_nr_delayed)
+ cfs_rq->h_nr_running -= h_nr_running;
+ cfs_rq->h_nr_enqueued -= h_nr_running;
cfs_rq->idle_h_nr_running -= idle_h_nr_running;
cfs_rq->h_nr_delayed -= h_nr_delayed;
@@ -7184,7 +7199,9 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
se->slice = slice;
slice = cfs_rq_min_slice(cfs_rq);
- cfs_rq->h_nr_running -= h_nr_running;
+ if (!h_nr_delayed)
+ cfs_rq->h_nr_running -= h_nr_running;
+ cfs_rq->h_nr_enqueued -= h_nr_running;
cfs_rq->idle_h_nr_running -= idle_h_nr_running;
cfs_rq->h_nr_delayed -= h_nr_delayed;
@@ -7198,7 +7215,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
sub_nr_running(rq, h_nr_running);
- if (rq_h_nr_running && !rq->cfs.h_nr_running)
+ if (rq_h_nr_enqueued && !rq->cfs.h_nr_enqueued)
dl_server_stop(&rq->fair_server);
/* balance early to pull high priority tasks */
@@ -12862,7 +12879,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
pulled_task = 1;
/* Is there a task of a high priority class? */
- if (this_rq->nr_running != this_rq->cfs.h_nr_running)
+ if (this_rq->nr_running != this_rq->cfs.h_nr_enqueued)
pulled_task = -1;
out:
@@ -13549,7 +13566,7 @@ int sched_group_set_idle(struct task_group *tg, long idle)
parent_cfs_rq->idle_nr_running--;
}
- idle_task_delta = grp_cfs_rq->h_nr_running -
+ idle_task_delta = grp_cfs_rq->h_nr_enqueued -
grp_cfs_rq->idle_h_nr_running;
if (!cfs_rq_is_idle(grp_cfs_rq))
idle_task_delta *= -1;
diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c
index fee75cc2c47b..fc07382361a8 100644
--- a/kernel/sched/pelt.c
+++ b/kernel/sched/pelt.c
@@ -321,7 +321,7 @@ int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq)
{
if (___update_load_sum(now, &cfs_rq->avg,
scale_load_down(cfs_rq->load.weight),
- cfs_rq->h_nr_running - cfs_rq->h_nr_delayed,
+ cfs_rq->h_nr_running,
cfs_rq->curr != NULL)) {
___update_load_avg(&cfs_rq->avg, 1);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1e494af2cd23..b5fe4a622822 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -647,6 +647,7 @@ struct cfs_rq {
struct load_weight load;
unsigned int nr_running;
unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
+ unsigned int h_nr_enqueued;
unsigned int idle_nr_running; /* SCHED_IDLE */
unsigned int idle_h_nr_running; /* SCHED_IDLE */
unsigned int h_nr_delayed;
@@ -899,11 +900,8 @@ struct dl_rq {
static inline void se_update_runnable(struct sched_entity *se)
{
- if (!entity_is_task(se)) {
- struct cfs_rq *cfs_rq = se->my_q;
-
- se->runnable_weight = cfs_rq->h_nr_running - cfs_rq->h_nr_delayed;
- }
+ if (!entity_is_task(se))
+ se->runnable_weight = se->my_q->h_nr_running;
}
static inline long se_runnable(struct sched_entity *se)
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 3/9] sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
2024-11-28 9:27 ` [PATCH 1/9] sched/eevdf: More PELT vs DELAYED_DEQUEUE Vincent Guittot
2024-11-28 9:27 ` [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 9:27 ` [PATCH 4/9] sched/fair: Remove unused cfs_rq.idle_nr_running Vincent Guittot
` (5 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
Use same naming convention as others starting with h_nr_* and rename
idle_h_nr_running into h_nr_idle.
The "running" is not correct anymore as it includes delayed dequeue tasks
as well.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/debug.c | 4 ++--
kernel/sched/fair.c | 52 ++++++++++++++++++++++----------------------
kernel/sched/sched.h | 2 +-
3 files changed, 29 insertions(+), 29 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 6b8cd869a2f4..63ec08c8ccf1 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -849,8 +849,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, " .%-30s: %d\n", "h_nr_delayed", cfs_rq->h_nr_delayed);
SEQ_printf(m, " .%-30s: %d\n", "idle_nr_running",
cfs_rq->idle_nr_running);
- SEQ_printf(m, " .%-30s: %d\n", "idle_h_nr_running",
- cfs_rq->idle_h_nr_running);
+ SEQ_printf(m, " .%-30s: %d\n", "h_nr_idle",
+ cfs_rq->h_nr_idle);
SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight);
#ifdef CONFIG_SMP
SEQ_printf(m, " .%-30s: %lu\n", "load_avg",
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6b7afb69d8ff..2cd2651305ae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5938,7 +5938,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long running_delta, enqueued_delta, idle_task_delta, delayed_delta, dequeue = 1;
+ long running_delta, enqueued_delta, idle_delta, delayed_delta, dequeue = 1;
long rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
raw_spin_lock(&cfs_b->lock);
@@ -5971,7 +5971,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
running_delta = cfs_rq->h_nr_running;
enqueued_delta = cfs_rq->h_nr_enqueued;
- idle_task_delta = cfs_rq->idle_h_nr_running;
+ idle_delta = cfs_rq->h_nr_idle;
delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
struct cfs_rq *qcfs_rq = cfs_rq_of(se);
@@ -5992,11 +5992,11 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
dequeue_entity(qcfs_rq, se, flags);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_enqueued;
+ idle_delta = cfs_rq->h_nr_enqueued;
qcfs_rq->h_nr_running -= running_delta;
qcfs_rq->h_nr_enqueued -= enqueued_delta;
- qcfs_rq->idle_h_nr_running -= idle_task_delta;
+ qcfs_rq->h_nr_idle -= idle_delta;
qcfs_rq->h_nr_delayed -= delayed_delta;
if (qcfs_rq->load.weight) {
@@ -6016,11 +6016,11 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
se_update_runnable(se);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_enqueued;
+ idle_delta = cfs_rq->h_nr_enqueued;
qcfs_rq->h_nr_running -= running_delta;
qcfs_rq->h_nr_enqueued -= enqueued_delta;
- qcfs_rq->idle_h_nr_running -= idle_task_delta;
+ qcfs_rq->h_nr_idle -= idle_delta;
qcfs_rq->h_nr_delayed -= delayed_delta;
}
@@ -6047,7 +6047,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long running_delta, enqueued_delta, idle_task_delta, delayed_delta;
+ long running_delta, enqueued_delta, idle_delta, delayed_delta;
long rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
se = cfs_rq->tg->se[cpu_of(rq)];
@@ -6083,7 +6083,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
running_delta = cfs_rq->h_nr_running;
enqueued_delta = cfs_rq->h_nr_enqueued;
- idle_task_delta = cfs_rq->idle_h_nr_running;
+ idle_delta = cfs_rq->h_nr_idle;
delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
struct cfs_rq *qcfs_rq = cfs_rq_of(se);
@@ -6098,11 +6098,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
enqueue_entity(qcfs_rq, se, ENQUEUE_WAKEUP);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_enqueued;
+ idle_delta = cfs_rq->h_nr_enqueued;
qcfs_rq->h_nr_running += running_delta;
qcfs_rq->h_nr_enqueued += enqueued_delta;
- qcfs_rq->idle_h_nr_running += idle_task_delta;
+ qcfs_rq->h_nr_idle += idle_delta;
qcfs_rq->h_nr_delayed += delayed_delta;
/* end evaluation on encountering a throttled cfs_rq */
@@ -6117,11 +6117,11 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
se_update_runnable(se);
if (cfs_rq_is_idle(group_cfs_rq(se)))
- idle_task_delta = cfs_rq->h_nr_enqueued;
+ idle_delta = cfs_rq->h_nr_enqueued;
qcfs_rq->h_nr_running += running_delta;
qcfs_rq->h_nr_enqueued += enqueued_delta;
- qcfs_rq->idle_h_nr_running += idle_task_delta;
+ qcfs_rq->h_nr_idle += idle_delta;
qcfs_rq->h_nr_delayed += delayed_delta;
/* end evaluation on encountering a throttled cfs_rq */
@@ -6937,7 +6937,7 @@ static inline void check_update_overutilized_status(struct rq *rq) { }
/* Runqueue only has SCHED_IDLE tasks enqueued */
static int sched_idle_rq(struct rq *rq)
{
- return unlikely(rq->nr_running == rq->cfs.idle_h_nr_running &&
+ return unlikely(rq->nr_running == rq->cfs.h_nr_idle &&
rq->nr_running);
}
@@ -6989,7 +6989,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
- int idle_h_nr_running = task_has_idle_policy(p);
+ int h_nr_idle = task_has_idle_policy(p);
int h_nr_delayed = 0;
int task_new = !(flags & ENQUEUE_WAKEUP);
int rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
@@ -7043,11 +7043,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!h_nr_delayed)
cfs_rq->h_nr_running++;
cfs_rq->h_nr_enqueued++;
- cfs_rq->idle_h_nr_running += idle_h_nr_running;
+ cfs_rq->h_nr_idle += h_nr_idle;
cfs_rq->h_nr_delayed += h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
- idle_h_nr_running = 1;
+ h_nr_idle = 1;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
@@ -7069,11 +7069,11 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
if (!h_nr_delayed)
cfs_rq->h_nr_running++;
cfs_rq->h_nr_enqueued++;
- cfs_rq->idle_h_nr_running += idle_h_nr_running;
+ cfs_rq->h_nr_idle += h_nr_idle;
cfs_rq->h_nr_delayed += h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
- idle_h_nr_running = 1;
+ h_nr_idle = 1;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
@@ -7131,7 +7131,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
bool task_sleep = flags & DEQUEUE_SLEEP;
bool task_delayed = flags & DEQUEUE_DELAYED;
struct task_struct *p = NULL;
- int idle_h_nr_running = 0;
+ int h_nr_idle = 0;
int h_nr_running = 0;
int h_nr_delayed = 0;
struct cfs_rq *cfs_rq;
@@ -7140,7 +7140,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
if (entity_is_task(se)) {
p = task_of(se);
h_nr_running = 1;
- idle_h_nr_running = task_has_idle_policy(p);
+ h_nr_idle = task_has_idle_policy(p);
if (!task_sleep && !task_delayed)
h_nr_delayed = !!se->sched_delayed;
} else {
@@ -7161,11 +7161,11 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
if (!h_nr_delayed)
cfs_rq->h_nr_running -= h_nr_running;
cfs_rq->h_nr_enqueued -= h_nr_running;
- cfs_rq->idle_h_nr_running -= idle_h_nr_running;
+ cfs_rq->h_nr_idle -= h_nr_idle;
cfs_rq->h_nr_delayed -= h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
- idle_h_nr_running = h_nr_running;
+ h_nr_idle = h_nr_running;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
@@ -7202,11 +7202,11 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
if (!h_nr_delayed)
cfs_rq->h_nr_running -= h_nr_running;
cfs_rq->h_nr_enqueued -= h_nr_running;
- cfs_rq->idle_h_nr_running -= idle_h_nr_running;
+ cfs_rq->h_nr_idle -= h_nr_idle;
cfs_rq->h_nr_delayed -= h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
- idle_h_nr_running = h_nr_running;
+ h_nr_idle = h_nr_running;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(cfs_rq))
@@ -13567,7 +13567,7 @@ int sched_group_set_idle(struct task_group *tg, long idle)
}
idle_task_delta = grp_cfs_rq->h_nr_enqueued -
- grp_cfs_rq->idle_h_nr_running;
+ grp_cfs_rq->h_nr_idle;
if (!cfs_rq_is_idle(grp_cfs_rq))
idle_task_delta *= -1;
@@ -13577,7 +13577,7 @@ int sched_group_set_idle(struct task_group *tg, long idle)
if (!se->on_rq)
break;
- cfs_rq->idle_h_nr_running += idle_task_delta;
+ cfs_rq->h_nr_idle += idle_task_delta;
/* Already accounted at parent level and above. */
if (cfs_rq_is_idle(cfs_rq))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b5fe4a622822..8727bfb0e080 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -649,7 +649,7 @@ struct cfs_rq {
unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
unsigned int h_nr_enqueued;
unsigned int idle_nr_running; /* SCHED_IDLE */
- unsigned int idle_h_nr_running; /* SCHED_IDLE */
+ unsigned int h_nr_idle; /* SCHED_IDLE */
unsigned int h_nr_delayed;
s64 avg_vruntime;
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 4/9] sched/fair: Remove unused cfs_rq.idle_nr_running
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
` (2 preceding siblings ...)
2024-11-28 9:27 ` [PATCH 3/9] sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 9:27 ` [PATCH 5/9] sched/fair: Rename cfs_rq.nr_running into nr_enqueued Vincent Guittot
` (4 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
cfs_rq.idle_nr_running field is not used anywhere so we can remove the
useless associated computation
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/debug.c | 2 --
kernel/sched/fair.c | 14 +-------------
kernel/sched/sched.h | 1 -
3 files changed, 1 insertion(+), 16 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 63ec08c8ccf1..0c8e21f9039c 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -847,8 +847,6 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_enqueued", cfs_rq->h_nr_enqueued);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_delayed", cfs_rq->h_nr_delayed);
- SEQ_printf(m, " .%-30s: %d\n", "idle_nr_running",
- cfs_rq->idle_nr_running);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_idle",
cfs_rq->h_nr_idle);
SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2cd2651305ae..65492399f200 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3674,8 +3674,6 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
}
#endif
cfs_rq->nr_running++;
- if (se_is_idle(se))
- cfs_rq->idle_nr_running++;
}
static void
@@ -3689,8 +3687,6 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
}
#endif
cfs_rq->nr_running--;
- if (se_is_idle(se))
- cfs_rq->idle_nr_running--;
}
/*
@@ -13547,7 +13543,7 @@ int sched_group_set_idle(struct task_group *tg, long idle)
for_each_possible_cpu(i) {
struct rq *rq = cpu_rq(i);
struct sched_entity *se = tg->se[i];
- struct cfs_rq *parent_cfs_rq, *grp_cfs_rq = tg->cfs_rq[i];
+ struct cfs_rq *grp_cfs_rq = tg->cfs_rq[i];
bool was_idle = cfs_rq_is_idle(grp_cfs_rq);
long idle_task_delta;
struct rq_flags rf;
@@ -13558,14 +13554,6 @@ int sched_group_set_idle(struct task_group *tg, long idle)
if (WARN_ON_ONCE(was_idle == cfs_rq_is_idle(grp_cfs_rq)))
goto next_cpu;
- if (se->on_rq) {
- parent_cfs_rq = cfs_rq_of(se);
- if (cfs_rq_is_idle(grp_cfs_rq))
- parent_cfs_rq->idle_nr_running++;
- else
- parent_cfs_rq->idle_nr_running--;
- }
-
idle_task_delta = grp_cfs_rq->h_nr_enqueued -
grp_cfs_rq->h_nr_idle;
if (!cfs_rq_is_idle(grp_cfs_rq))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8727bfb0e080..d795202a7174 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -648,7 +648,6 @@ struct cfs_rq {
unsigned int nr_running;
unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
unsigned int h_nr_enqueued;
- unsigned int idle_nr_running; /* SCHED_IDLE */
unsigned int h_nr_idle; /* SCHED_IDLE */
unsigned int h_nr_delayed;
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 5/9] sched/fair: Rename cfs_rq.nr_running into nr_enqueued
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
` (3 preceding siblings ...)
2024-11-28 9:27 ` [PATCH 4/9] sched/fair: Remove unused cfs_rq.idle_nr_running Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 9:27 ` [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed Vincent Guittot
` (3 subsequent siblings)
8 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
Rename cfs_rq.nr_running into cfs_rq.nr_enqueued which better reflects the
reality as the value includes both the ready to run tasks and the delayed
dequeue tasks.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/core.c | 2 +-
kernel/sched/debug.c | 2 +-
kernel/sched/fair.c | 38 +++++++++++++++++++-------------------
kernel/sched/sched.h | 4 ++--
4 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 425739bbdc63..daaa20f01ad4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1341,7 +1341,7 @@ bool sched_can_stop_tick(struct rq *rq)
if (scx_enabled() && !scx_can_stop_tick(rq))
return false;
- if (rq->cfs.nr_running > 1)
+ if (rq->cfs.nr_enqueued > 1)
return false;
/*
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 0c8e21f9039c..8f5273043c16 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -843,7 +843,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SPLIT_NS(right_vruntime));
spread = right_vruntime - left_vruntime;
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "spread", SPLIT_NS(spread));
- SEQ_printf(m, " .%-30s: %d\n", "nr_running", cfs_rq->nr_running);
+ SEQ_printf(m, " .%-30s: %d\n", "nr_enqueued", cfs_rq->nr_enqueued);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_enqueued", cfs_rq->h_nr_enqueued);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_delayed", cfs_rq->h_nr_delayed);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 65492399f200..a96a771d8e61 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -915,7 +915,7 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
* We can safely skip eligibility check if there is only one entity
* in this cfs_rq, saving some cycles.
*/
- if (cfs_rq->nr_running == 1)
+ if (cfs_rq->nr_enqueued == 1)
return curr && curr->on_rq ? curr : se;
if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
@@ -1247,7 +1247,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
account_cfs_rq_runtime(cfs_rq, delta_exec);
- if (cfs_rq->nr_running == 1)
+ if (cfs_rq->nr_enqueued == 1)
return;
if (resched || did_preempt_short(cfs_rq, curr)) {
@@ -3673,7 +3673,7 @@ account_entity_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
list_add(&se->group_node, &rq->cfs_tasks);
}
#endif
- cfs_rq->nr_running++;
+ cfs_rq->nr_enqueued++;
}
static void
@@ -3686,7 +3686,7 @@ account_entity_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se)
list_del_init(&se->group_node);
}
#endif
- cfs_rq->nr_running--;
+ cfs_rq->nr_enqueued--;
}
/*
@@ -5220,7 +5220,7 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
static inline bool cfs_rq_is_decayed(struct cfs_rq *cfs_rq)
{
- return !cfs_rq->nr_running;
+ return !cfs_rq->nr_enqueued;
}
#define UPDATE_TG 0x0
@@ -5276,7 +5276,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
*
* EEVDF: placement strategy #1 / #2
*/
- if (sched_feat(PLACE_LAG) && cfs_rq->nr_running && se->vlag) {
+ if (sched_feat(PLACE_LAG) && cfs_rq->nr_enqueued && se->vlag) {
struct sched_entity *curr = cfs_rq->curr;
unsigned long load;
@@ -5425,7 +5425,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
__enqueue_entity(cfs_rq, se);
se->on_rq = 1;
- if (cfs_rq->nr_running == 1) {
+ if (cfs_rq->nr_enqueued == 1) {
check_enqueue_throttle(cfs_rq);
if (!throttled_hierarchy(cfs_rq)) {
list_add_leaf_cfs_rq(cfs_rq);
@@ -5573,7 +5573,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
if (flags & DEQUEUE_DELAYED)
finish_delayed_dequeue_entity(se);
- if (cfs_rq->nr_running == 0)
+ if (cfs_rq->nr_enqueued == 0)
update_idle_cfs_rq_clock_pelt(cfs_rq);
return true;
@@ -5921,7 +5921,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)
list_del_leaf_cfs_rq(cfs_rq);
SCHED_WARN_ON(cfs_rq->throttled_clock_self);
- if (cfs_rq->nr_running)
+ if (cfs_rq->nr_enqueued)
cfs_rq->throttled_clock_self = rq_clock(rq);
}
cfs_rq->throttle_count++;
@@ -6033,7 +6033,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
*/
cfs_rq->throttled = 1;
SCHED_WARN_ON(cfs_rq->throttled_clock);
- if (cfs_rq->nr_running)
+ if (cfs_rq->nr_enqueued)
cfs_rq->throttled_clock = rq_clock(rq);
return true;
}
@@ -6136,7 +6136,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
assert_list_leaf_cfs_rq(rq);
/* Determine whether we need to wake up potentially idle CPU: */
- if (rq->curr == rq->idle && rq->cfs.nr_running)
+ if (rq->curr == rq->idle && rq->cfs.nr_enqueued)
resched_curr(rq);
}
@@ -6437,7 +6437,7 @@ static __always_inline void return_cfs_rq_runtime(struct cfs_rq *cfs_rq)
if (!cfs_bandwidth_used())
return;
- if (!cfs_rq->runtime_enabled || cfs_rq->nr_running)
+ if (!cfs_rq->runtime_enabled || cfs_rq->nr_enqueued)
return;
__return_cfs_rq_runtime(cfs_rq);
@@ -6960,14 +6960,14 @@ requeue_delayed_entity(struct sched_entity *se)
if (sched_feat(DELAY_ZERO)) {
update_entity_lag(cfs_rq, se);
if (se->vlag > 0) {
- cfs_rq->nr_running--;
+ cfs_rq->nr_enqueued--;
if (se != cfs_rq->curr)
__dequeue_entity(cfs_rq, se);
se->vlag = 0;
place_entity(cfs_rq, se, 0);
if (se != cfs_rq->curr)
__enqueue_entity(cfs_rq, se);
- cfs_rq->nr_running++;
+ cfs_rq->nr_enqueued++;
}
}
@@ -8900,7 +8900,7 @@ static struct task_struct *pick_task_fair(struct rq *rq)
again:
cfs_rq = &rq->cfs;
- if (!cfs_rq->nr_running)
+ if (!cfs_rq->nr_enqueued)
return NULL;
do {
@@ -9017,7 +9017,7 @@ static struct task_struct *__pick_next_task_fair(struct rq *rq, struct task_stru
static bool fair_server_has_tasks(struct sched_dl_entity *dl_se)
{
- return !!dl_se->rq->cfs.nr_running;
+ return !!dl_se->rq->cfs.nr_enqueued;
}
static struct task_struct *fair_server_pick_task(struct sched_dl_entity *dl_se)
@@ -9807,7 +9807,7 @@ static bool __update_blocked_fair(struct rq *rq, bool *done)
if (update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), cfs_rq)) {
update_tg_load_avg(cfs_rq);
- if (cfs_rq->nr_running == 0)
+ if (cfs_rq->nr_enqueued == 0)
update_idle_cfs_rq_clock_pelt(cfs_rq);
if (cfs_rq == &rq->cfs)
@@ -12989,7 +12989,7 @@ static inline void task_tick_core(struct rq *rq, struct task_struct *curr)
* MIN_NR_TASKS_DURING_FORCEIDLE - 1 tasks and use that to check
* if we need to give up the CPU.
*/
- if (rq->core->core_forceidle_count && rq->cfs.nr_running == 1 &&
+ if (rq->core->core_forceidle_count && rq->cfs.nr_enqueued == 1 &&
__entity_slice_used(&curr->se, MIN_NR_TASKS_DURING_FORCEIDLE))
resched_curr(rq);
}
@@ -13133,7 +13133,7 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
if (!task_on_rq_queued(p))
return;
- if (rq->cfs.nr_running == 1)
+ if (rq->cfs.nr_enqueued == 1)
return;
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d795202a7174..7d99d18e8984 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -645,7 +645,7 @@ struct balance_callback {
/* CFS-related fields in a runqueue */
struct cfs_rq {
struct load_weight load;
- unsigned int nr_running;
+ unsigned int nr_enqueued;
unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
unsigned int h_nr_enqueued;
unsigned int h_nr_idle; /* SCHED_IDLE */
@@ -2566,7 +2566,7 @@ static inline bool sched_rt_runnable(struct rq *rq)
static inline bool sched_fair_runnable(struct rq *rq)
{
- return rq->cfs.nr_running > 0;
+ return rq->cfs.nr_enqueued > 0;
}
extern struct task_struct *pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
` (4 preceding siblings ...)
2024-11-28 9:27 ` [PATCH 5/9] sched/fair: Rename cfs_rq.nr_running into nr_enqueued Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 10:03 ` Peter Zijlstra
2024-11-28 9:27 ` [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task Vincent Guittot
` (2 subsequent siblings)
8 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
h_nr_delayed is not used anymore. We now have
- h_nr_running which tracks tasks ready to run
- h_nr_enqueued which tracks enqueued tasks either ready to run or delayed
dequeue
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/debug.c | 1 -
kernel/sched/fair.c | 41 ++++++++++++++---------------------------
kernel/sched/sched.h | 1 -
3 files changed, 14 insertions(+), 29 deletions(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 8f5273043c16..84f623b9d4af 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -846,7 +846,6 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, " .%-30s: %d\n", "nr_enqueued", cfs_rq->nr_enqueued);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_running", cfs_rq->h_nr_running);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_enqueued", cfs_rq->h_nr_enqueued);
- SEQ_printf(m, " .%-30s: %d\n", "h_nr_delayed", cfs_rq->h_nr_delayed);
SEQ_printf(m, " .%-30s: %d\n", "h_nr_idle",
cfs_rq->h_nr_idle);
SEQ_printf(m, " .%-30s: %ld\n", "load", cfs_rq->load.weight);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a96a771d8e61..1b4f1b610543 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5468,21 +5468,18 @@ static void set_delayed(struct sched_entity *se)
struct cfs_rq *cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_running--;
- cfs_rq->h_nr_delayed++;
if (cfs_rq_throttled(cfs_rq))
break;
}
}
-static void clear_delayed(struct sched_entity *se, bool running)
+static void clear_delayed(struct sched_entity *se)
{
se->sched_delayed = 0;
for_each_sched_entity(se) {
struct cfs_rq *cfs_rq = cfs_rq_of(se);
- if (running)
- cfs_rq->h_nr_running++;
- cfs_rq->h_nr_delayed--;
+ cfs_rq->h_nr_running++;
if (cfs_rq_throttled(cfs_rq))
break;
}
@@ -5490,7 +5487,7 @@ static void clear_delayed(struct sched_entity *se, bool running)
static inline void finish_delayed_dequeue_entity(struct sched_entity *se)
{
- clear_delayed(se, false);
+ se->sched_delayed = 0;
if (sched_feat(DELAY_ZERO) && se->vlag > 0)
se->vlag = 0;
}
@@ -5934,7 +5931,7 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long running_delta, enqueued_delta, idle_delta, delayed_delta, dequeue = 1;
+ long running_delta, enqueued_delta, idle_delta, dequeue = 1;
long rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
raw_spin_lock(&cfs_b->lock);
@@ -5968,7 +5965,6 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
running_delta = cfs_rq->h_nr_running;
enqueued_delta = cfs_rq->h_nr_enqueued;
idle_delta = cfs_rq->h_nr_idle;
- delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
struct cfs_rq *qcfs_rq = cfs_rq_of(se);
int flags;
@@ -5993,7 +5989,6 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running -= running_delta;
qcfs_rq->h_nr_enqueued -= enqueued_delta;
qcfs_rq->h_nr_idle -= idle_delta;
- qcfs_rq->h_nr_delayed -= delayed_delta;
if (qcfs_rq->load.weight) {
/* Avoid re-evaluating load for this entity: */
@@ -6017,7 +6012,6 @@ static bool throttle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running -= running_delta;
qcfs_rq->h_nr_enqueued -= enqueued_delta;
qcfs_rq->h_nr_idle -= idle_delta;
- qcfs_rq->h_nr_delayed -= delayed_delta;
}
/* At this point se is NULL and we are at root level*/
@@ -6043,7 +6037,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);
struct cfs_bandwidth *cfs_b = tg_cfs_bandwidth(cfs_rq->tg);
struct sched_entity *se;
- long running_delta, enqueued_delta, idle_delta, delayed_delta;
+ long running_delta, enqueued_delta, idle_delta;
long rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
se = cfs_rq->tg->se[cpu_of(rq)];
@@ -6080,7 +6074,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
running_delta = cfs_rq->h_nr_running;
enqueued_delta = cfs_rq->h_nr_enqueued;
idle_delta = cfs_rq->h_nr_idle;
- delayed_delta = cfs_rq->h_nr_delayed;
for_each_sched_entity(se) {
struct cfs_rq *qcfs_rq = cfs_rq_of(se);
@@ -6099,7 +6092,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running += running_delta;
qcfs_rq->h_nr_enqueued += enqueued_delta;
qcfs_rq->h_nr_idle += idle_delta;
- qcfs_rq->h_nr_delayed += delayed_delta;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(qcfs_rq))
@@ -6118,7 +6110,6 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
qcfs_rq->h_nr_running += running_delta;
qcfs_rq->h_nr_enqueued += enqueued_delta;
qcfs_rq->h_nr_idle += idle_delta;
- qcfs_rq->h_nr_delayed += delayed_delta;
/* end evaluation on encountering a throttled cfs_rq */
if (cfs_rq_throttled(qcfs_rq))
@@ -6972,7 +6963,7 @@ requeue_delayed_entity(struct sched_entity *se)
}
update_load_avg(cfs_rq, se, 0);
- clear_delayed(se, true);
+ clear_delayed(se);
}
/*
@@ -6986,7 +6977,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
int h_nr_idle = task_has_idle_policy(p);
- int h_nr_delayed = 0;
+ int se_delayed = 0;
int task_new = !(flags & ENQUEUE_WAKEUP);
int rq_h_nr_enqueued = rq->cfs.h_nr_enqueued;
u64 slice = 0;
@@ -7014,7 +7005,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
cpufreq_update_util(rq, SCHED_CPUFREQ_IOWAIT);
if (task_new)
- h_nr_delayed = !!se->sched_delayed;
+ se_delayed = !!se->sched_delayed;
for_each_sched_entity(se) {
if (se->on_rq) {
@@ -7036,11 +7027,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
enqueue_entity(cfs_rq, se, flags);
slice = cfs_rq_min_slice(cfs_rq);
- if (!h_nr_delayed)
+ if (!se_delayed)
cfs_rq->h_nr_running++;
cfs_rq->h_nr_enqueued++;
cfs_rq->h_nr_idle += h_nr_idle;
- cfs_rq->h_nr_delayed += h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = 1;
@@ -7062,11 +7052,10 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
se->slice = slice;
slice = cfs_rq_min_slice(cfs_rq);
- if (!h_nr_delayed)
+ if (!se_delayed)
cfs_rq->h_nr_running++;
cfs_rq->h_nr_enqueued++;
cfs_rq->h_nr_idle += h_nr_idle;
- cfs_rq->h_nr_delayed += h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = 1;
@@ -7129,7 +7118,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
struct task_struct *p = NULL;
int h_nr_idle = 0;
int h_nr_running = 0;
- int h_nr_delayed = 0;
+ int se_delayed = 0;
struct cfs_rq *cfs_rq;
u64 slice = 0;
@@ -7138,7 +7127,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
h_nr_running = 1;
h_nr_idle = task_has_idle_policy(p);
if (!task_sleep && !task_delayed)
- h_nr_delayed = !!se->sched_delayed;
+ se_delayed = !!se->sched_delayed;
} else {
cfs_rq = group_cfs_rq(se);
slice = cfs_rq_min_slice(cfs_rq);
@@ -7154,11 +7143,10 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
break;
}
- if (!h_nr_delayed)
+ if (!se_delayed)
cfs_rq->h_nr_running -= h_nr_running;
cfs_rq->h_nr_enqueued -= h_nr_running;
cfs_rq->h_nr_idle -= h_nr_idle;
- cfs_rq->h_nr_delayed -= h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = h_nr_running;
@@ -7195,11 +7183,10 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
se->slice = slice;
slice = cfs_rq_min_slice(cfs_rq);
- if (!h_nr_delayed)
+ if (!se_delayed)
cfs_rq->h_nr_running -= h_nr_running;
cfs_rq->h_nr_enqueued -= h_nr_running;
cfs_rq->h_nr_idle -= h_nr_idle;
- cfs_rq->h_nr_delayed -= h_nr_delayed;
if (cfs_rq_is_idle(cfs_rq))
h_nr_idle = h_nr_running;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7d99d18e8984..0b297242eb5d 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -649,7 +649,6 @@ struct cfs_rq {
unsigned int h_nr_running; /* SCHED_{NORMAL,BATCH,IDLE} */
unsigned int h_nr_enqueued;
unsigned int h_nr_idle; /* SCHED_IDLE */
- unsigned int h_nr_delayed;
s64 avg_vruntime;
u64 avg_load;
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
` (5 preceding siblings ...)
2024-11-28 9:27 ` [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 9:49 ` Peter Zijlstra
2024-11-28 9:27 ` [PATCH 8/9] sched/fair: Fix sched_can_stop_tick() for fair tasks Vincent Guittot
2024-11-28 9:27 ` [PATCH 9/9] sched/fair: Fix variable declaration position Vincent Guittot
8 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
Migrating a delayed dequeued task doesn't help in balancing the number
of runnable tasks in the system.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1b4f1b610543..9d80f3a61082 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9405,11 +9405,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
/*
* We do not migrate tasks that are:
- * 1) throttled_lb_pair, or
- * 2) cannot be migrated to this CPU due to cpus_ptr, or
- * 3) running (obviously), or
- * 4) are cache-hot on their current CPU.
+ * 1) delayed dequeued, or
+ * 2) throttled_lb_pair, or
+ * 3) cannot be migrated to this CPU due to cpus_ptr, or
+ * 4) running (obviously), or
+ * 5) are cache-hot on their current CPU.
*/
+ if (p->se.sched_delayed)
+ return 0;
+
if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 8/9] sched/fair: Fix sched_can_stop_tick() for fair tasks
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
` (6 preceding siblings ...)
2024-11-28 9:27 ` [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
2024-11-28 9:27 ` [PATCH 9/9] sched/fair: Fix variable declaration position Vincent Guittot
8 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
We can't stop the tick of a rq if there are at least 2 tasks enqueued in
the whole hierarchy and not only at the root cfs rq.
rq->cfs.nr_running tracks the number of sched_entity at one level
whereas rq->cfs.h_nr_enqueued tracks all enqueued tasks in the
hierarchy.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index daaa20f01ad4..263e0b9bb7d5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1341,7 +1341,7 @@ bool sched_can_stop_tick(struct rq *rq)
if (scx_enabled() && !scx_can_stop_tick(rq))
return false;
- if (rq->cfs.nr_enqueued > 1)
+ if (rq->cfs.h_nr_enqueued > 1)
return false;
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH 9/9] sched/fair: Fix variable declaration position
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
` (7 preceding siblings ...)
2024-11-28 9:27 ` [PATCH 8/9] sched/fair: Fix sched_can_stop_tick() for fair tasks Vincent Guittot
@ 2024-11-28 9:27 ` Vincent Guittot
8 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 9:27 UTC (permalink / raw)
To: mingo, peterz, juri.lelli, dietmar.eggemann, rostedt, bsegall,
mgorman, vschneid, linux-kernel
Cc: kprateek.nayak, pauld, efault, luis.machado, Vincent Guittot
Move variable declaration at the beginning of the function
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
kernel/sched/fair.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9d80f3a61082..47faaed3ad92 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5632,6 +5632,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags);
static struct sched_entity *
pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
{
+ struct sched_entity *se;
/*
* Enabling NEXT_BUDDY will affect latency but not fairness.
*/
@@ -5642,7 +5643,7 @@ pick_next_entity(struct rq *rq, struct cfs_rq *cfs_rq)
return cfs_rq->next;
}
- struct sched_entity *se = pick_eevdf(cfs_rq);
+ se = pick_eevdf(cfs_rq);
if (se->sched_delayed) {
dequeue_entities(rq, se, DEQUEUE_SLEEP | DEQUEUE_DELAYED);
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task
2024-11-28 9:27 ` [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task Vincent Guittot
@ 2024-11-28 9:49 ` Peter Zijlstra
2024-11-28 10:03 ` Vincent Guittot
0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2024-11-28 9:49 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, Nov 28, 2024 at 10:27:48AM +0100, Vincent Guittot wrote:
> Migrating a delayed dequeued task doesn't help in balancing the number
> of runnable tasks in the system.
But it can help balance the weight; furthermore, by moving them to a
lighter queue, they'll get picked sooner and disappear sooner.
Perhaps make it: p->se.sched_delayed && !env->sd->nr_balance_failed ?
> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1b4f1b610543..9d80f3a61082 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9405,11 +9405,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
>
> /*
> * We do not migrate tasks that are:
> - * 1) throttled_lb_pair, or
> - * 2) cannot be migrated to this CPU due to cpus_ptr, or
> - * 3) running (obviously), or
> - * 4) are cache-hot on their current CPU.
> + * 1) delayed dequeued, or
> + * 2) throttled_lb_pair, or
> + * 3) cannot be migrated to this CPU due to cpus_ptr, or
> + * 4) running (obviously), or
> + * 5) are cache-hot on their current CPU.
> */
> + if (p->se.sched_delayed)
> + return 0;
> +
> if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
> return 0;
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued
2024-11-28 9:27 ` [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued Vincent Guittot
@ 2024-11-28 9:56 ` Peter Zijlstra
2024-11-28 10:24 ` Vincent Guittot
0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2024-11-28 9:56 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, Nov 28, 2024 at 10:27:43AM +0100, Vincent Guittot wrote:
> With delayed dequeued feature, a sleeping sched_entity remains enqueued
> in the rq until its lag has elapsed. As a result, it stays also visible
> in the statistics that are used to balance the system and in particular
> the field h_nr_running when the sched_entity is associated to a task.
>
> Create a new h_nr_enqueued that tracks all enqueued tasks and restore the
> behavior of h_nr_running i.e. tracking the number of fair tasks that want
> to run.
Isn't h_nr_enqueued := h_nr_running - h_nr_delayed ? Does it really make
sense to have another variable that is so trivially computable?
Also naming; h_nr_enqueued isn't really adequate I feel, because the
whole problem is that the delayed tasks are still very much enqueued.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task
2024-11-28 9:49 ` Peter Zijlstra
@ 2024-11-28 10:03 ` Vincent Guittot
2024-11-28 10:15 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 10:03 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, 28 Nov 2024 at 10:49, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Nov 28, 2024 at 10:27:48AM +0100, Vincent Guittot wrote:
> > Migrating a delayed dequeued task doesn't help in balancing the number
> > of runnable tasks in the system.
>
> But it can help balance the weight; furthermore, by moving them to a
> lighter queue, they'll get picked sooner and disappear sooner.
When groups are not overloaded, we don't compare load but only running
tasks t balance them across cpus
It's only when both src and dst groups are overloaded that we look at
the load and the weight
>
> Perhaps make it: p->se.sched_delayed && !env->sd->nr_balance_failed ?
So we could take into account which type of migration with
env->migration_type == migrate_load
In this case, migrating a delayed dequeue task would help
>
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> > kernel/sched/fair.c | 12 ++++++++----
> > 1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1b4f1b610543..9d80f3a61082 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9405,11 +9405,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> >
> > /*
> > * We do not migrate tasks that are:
> > - * 1) throttled_lb_pair, or
> > - * 2) cannot be migrated to this CPU due to cpus_ptr, or
> > - * 3) running (obviously), or
> > - * 4) are cache-hot on their current CPU.
> > + * 1) delayed dequeued, or
> > + * 2) throttled_lb_pair, or
> > + * 3) cannot be migrated to this CPU due to cpus_ptr, or
> > + * 4) running (obviously), or
> > + * 5) are cache-hot on their current CPU.
> > */
> > + if (p->se.sched_delayed)
> > + return 0;
> > +
> > if (throttled_lb_pair(task_group(p), env->src_cpu, env->dst_cpu))
> > return 0;
> >
> > --
> > 2.43.0
> >
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed
2024-11-28 9:27 ` [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed Vincent Guittot
@ 2024-11-28 10:03 ` Peter Zijlstra
2024-11-28 10:15 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2024-11-28 10:03 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, Nov 28, 2024 at 10:27:47AM +0100, Vincent Guittot wrote:
> h_nr_delayed is not used anymore. We now have
> - h_nr_running which tracks tasks ready to run
> - h_nr_enqueued which tracks enqueued tasks either ready to run or delayed
> dequeue
Oh, now I see where you're going.
Let me read the lot again, because this sure as hell was a confusing
swizzle.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed
2024-11-28 10:03 ` Peter Zijlstra
@ 2024-11-28 10:15 ` Peter Zijlstra
2024-11-28 10:32 ` Vincent Guittot
0 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2024-11-28 10:15 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, Nov 28, 2024 at 11:03:48AM +0100, Peter Zijlstra wrote:
> On Thu, Nov 28, 2024 at 10:27:47AM +0100, Vincent Guittot wrote:
> > h_nr_delayed is not used anymore. We now have
> > - h_nr_running which tracks tasks ready to run
> > - h_nr_enqueued which tracks enqueued tasks either ready to run or delayed
> > dequeue
>
> Oh, now I see where you're going.
>
> Let me read the lot again, because this sure as hell was a confusing
> swizzle.
So the first patch adds h_nr_delayed.
Then confusion
Then we end up with:
h_nr_enqueued = h_nr_running + h_nr_delayed
Where h_nr_enqueued is part of rq->nr_running (and somewhere along the
way you rename and remove some idle numbers).
Can't we structure it like:
- add h_nr_delayed
- rename h_nr_running to h_nr_queued
- add h_nr_runnable = h_nr_queued - h_nr_delayed
- use h_hr_runnable
- remove h_nr_delayed
- clean up idle muck
And I'm assuming this ordering is because people want h_nr_delayed
backported. Because the even more sensible order would be something
like:
- rename h_nr_running into h_nr_queued
- add h_nr_runnable (being h_nr_queued - h_nr_delayed, without ever
having had h_nr_delayed).
- use h_nr_runnable
- clean up idle muck
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task
2024-11-28 10:03 ` Vincent Guittot
@ 2024-11-28 10:15 ` Peter Zijlstra
0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2024-11-28 10:15 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, Nov 28, 2024 at 11:03:44AM +0100, Vincent Guittot wrote:
> On Thu, 28 Nov 2024 at 10:49, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Nov 28, 2024 at 10:27:48AM +0100, Vincent Guittot wrote:
> > > Migrating a delayed dequeued task doesn't help in balancing the number
> > > of runnable tasks in the system.
> >
> > But it can help balance the weight; furthermore, by moving them to a
> > lighter queue, they'll get picked sooner and disappear sooner.
>
> When groups are not overloaded, we don't compare load but only running
> tasks t balance them across cpus
>
> It's only when both src and dst groups are overloaded that we look at
> the load and the weight
>
> >
> > Perhaps make it: p->se.sched_delayed && !env->sd->nr_balance_failed ?
>
> So we could take into account which type of migration with
> env->migration_type == migrate_load
Yeah that makes sense.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued
2024-11-28 9:56 ` Peter Zijlstra
@ 2024-11-28 10:24 ` Vincent Guittot
2024-11-28 10:27 ` Peter Zijlstra
0 siblings, 1 reply; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 10:24 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, 28 Nov 2024 at 10:56, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Nov 28, 2024 at 10:27:43AM +0100, Vincent Guittot wrote:
> > With delayed dequeued feature, a sleeping sched_entity remains enqueued
> > in the rq until its lag has elapsed. As a result, it stays also visible
> > in the statistics that are used to balance the system and in particular
> > the field h_nr_running when the sched_entity is associated to a task.
> >
> > Create a new h_nr_enqueued that tracks all enqueued tasks and restore the
> > behavior of h_nr_running i.e. tracking the number of fair tasks that want
> > to run.
>
> Isn't h_nr_enqueued := h_nr_running - h_nr_delayed ? Does it really make
> sense to have another variable that is so trivially computable?
I changed h_nr_running to track only running tasks and not delayed dequeue
With this patch we have:
h_nr_enqueued := h_nr_running + h_nr_delayed
And I remove h_nr_delayed in a later patch to keep only h_nr_enqueued
and h_nr_running
>
> Also naming; h_nr_enqueued isn't really adequate I feel, because the
> whole problem is that the delayed tasks are still very much enqueued.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued
2024-11-28 10:24 ` Vincent Guittot
@ 2024-11-28 10:27 ` Peter Zijlstra
0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2024-11-28 10:27 UTC (permalink / raw)
To: Vincent Guittot
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, Nov 28, 2024 at 11:24:45AM +0100, Vincent Guittot wrote:
> On Thu, 28 Nov 2024 at 10:56, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Thu, Nov 28, 2024 at 10:27:43AM +0100, Vincent Guittot wrote:
> > > With delayed dequeued feature, a sleeping sched_entity remains enqueued
> > > in the rq until its lag has elapsed. As a result, it stays also visible
> > > in the statistics that are used to balance the system and in particular
> > > the field h_nr_running when the sched_entity is associated to a task.
> > >
> > > Create a new h_nr_enqueued that tracks all enqueued tasks and restore the
> > > behavior of h_nr_running i.e. tracking the number of fair tasks that want
> > > to run.
> >
> > Isn't h_nr_enqueued := h_nr_running - h_nr_delayed ? Does it really make
> > sense to have another variable that is so trivially computable?
>
> I changed h_nr_running to track only running tasks and not delayed dequeue
Yes, that was hidden so well, I couldn't even find it on the second
reading :/
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed
2024-11-28 10:15 ` Peter Zijlstra
@ 2024-11-28 10:32 ` Vincent Guittot
0 siblings, 0 replies; 19+ messages in thread
From: Vincent Guittot @ 2024-11-28 10:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: mingo, juri.lelli, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, linux-kernel, kprateek.nayak, pauld, efault,
luis.machado
On Thu, 28 Nov 2024 at 11:15, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Nov 28, 2024 at 11:03:48AM +0100, Peter Zijlstra wrote:
> > On Thu, Nov 28, 2024 at 10:27:47AM +0100, Vincent Guittot wrote:
> > > h_nr_delayed is not used anymore. We now have
> > > - h_nr_running which tracks tasks ready to run
> > > - h_nr_enqueued which tracks enqueued tasks either ready to run or delayed
> > > dequeue
> >
> > Oh, now I see where you're going.
> >
> > Let me read the lot again, because this sure as hell was a confusing
> > swizzle.
>
> So the first patch adds h_nr_delayed.
>
> Then confusion
I started from your patch that adds h_nr_delayed and added on top the
steps to move to h_nr_enqueued and h_nr_running to make it easier to
understand the changes
>
> Then we end up with:
>
> h_nr_enqueued = h_nr_running + h_nr_delayed
>
> Where h_nr_enqueued is part of rq->nr_running (and somewhere along the
> way you rename and remove some idle numbers).
>
> Can't we structure it like:
>
> - add h_nr_delayed
> - rename h_nr_running to h_nr_queued
> - add h_nr_runnable = h_nr_queued - h_nr_delayed
> - use h_hr_runnable
> - remove h_nr_delayed
>
> - clean up idle muck
>
I can reorder the patches following the above
>
> And I'm assuming this ordering is because people want h_nr_delayed
> backported. Because the even more sensible order would be something
> like:
>
> - rename h_nr_running into h_nr_queued
> - add h_nr_runnable (being h_nr_queued - h_nr_delayed, without ever
> having had h_nr_delayed).
> - use h_nr_runnable
>
> - clean up idle muck
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2024-11-28 10:32 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-28 9:27 [PATCH 0/9] sched/fair: Fix statistics with delayed dequeue Vincent Guittot
2024-11-28 9:27 ` [PATCH 1/9] sched/eevdf: More PELT vs DELAYED_DEQUEUE Vincent Guittot
2024-11-28 9:27 ` [PATCH 2/9] sched/fair: Add new cfs_rq.h_nr_enqueued Vincent Guittot
2024-11-28 9:56 ` Peter Zijlstra
2024-11-28 10:24 ` Vincent Guittot
2024-11-28 10:27 ` Peter Zijlstra
2024-11-28 9:27 ` [PATCH 3/9] sched/fair: Rename cfs_rq.idle_h_nr_running into h_nr_idle Vincent Guittot
2024-11-28 9:27 ` [PATCH 4/9] sched/fair: Remove unused cfs_rq.idle_nr_running Vincent Guittot
2024-11-28 9:27 ` [PATCH 5/9] sched/fair: Rename cfs_rq.nr_running into nr_enqueued Vincent Guittot
2024-11-28 9:27 ` [PATCH 6/9] sched/fair: Removed unsued cfs_rq.h_nr_delayed Vincent Guittot
2024-11-28 10:03 ` Peter Zijlstra
2024-11-28 10:15 ` Peter Zijlstra
2024-11-28 10:32 ` Vincent Guittot
2024-11-28 9:27 ` [PATCH 7/9] sched/fair: Do not try to migrate delayed dequeue task Vincent Guittot
2024-11-28 9:49 ` Peter Zijlstra
2024-11-28 10:03 ` Vincent Guittot
2024-11-28 10:15 ` Peter Zijlstra
2024-11-28 9:27 ` [PATCH 8/9] sched/fair: Fix sched_can_stop_tick() for fair tasks Vincent Guittot
2024-11-28 9:27 ` [PATCH 9/9] sched/fair: Fix variable declaration position Vincent Guittot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.