* [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix
2026-04-01 13:20 [PATCH 0/2] sched/urgent: zero_vruntime fixes Peter Zijlstra
@ 2026-04-01 13:20 ` Peter Zijlstra
2026-04-01 14:55 ` Vincent Guittot
2026-04-02 11:46 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2026-04-01 13:20 ` [PATCH 2/2] sched/debug: Fix avg_vruntime() usage Peter Zijlstra
` (2 subsequent siblings)
3 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2026-04-01 13:20 UTC (permalink / raw)
To: jstultz, kprateek.nayak
Cc: linux-kernel, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid
John reported that stress-ng-yield could make his machine unhappy and
managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
zero_vruntime tracking").
The combination of yield and that commit was specific enough to
hypothesize the following scenario:
Suppose we have 2 runnable tasks, both doing yield. Then one will be
eligible and one will not be, because the average position must be in
between these two entities.
Therefore, the runnable task will be eligible, and be promoted a full
slice (all the tasks do is yield after all). This causes it to jump over
the other task and now the other task is eligible and current is no
longer. So we schedule.
Since we are runnable, there is no {de,en}queue. All we have is the
__{en,de}queue_entity() from {put_prev,set_next}_task(). But per the
fingered commit, those two no longer move zero_vruntime.
All that moves zero_vruntime are tick and full {de,en}queue.
This means, that if the two tasks playing leapfrog can reach the
critical speed to reach the overflow point inside one tick's worth of
time, we're up a creek.
Additionally, when multiple cgroups are involved, there is no guarantee
the tick will in fact hit every cgroup in a timely manner. Statistically
speaking it will, but that same statistics does not rule out the
possibility of one cgroup not getting a tick for a significant amount of
time -- however unlikely.
Therefore, just like with the yield() case, force an update at the end
of every slice. This ensures the update is never more than a single
slice behind and the whole thing is within 2 lag bounds as per the
comment on entity_key().
Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: John Stultz <jstultz@google.com>
---
kernel/sched/fair.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -707,7 +707,7 @@ void update_zero_vruntime(struct cfs_rq
* Called in:
* - place_entity() -- before enqueue
* - update_entity_lag() -- before dequeue
- * - entity_tick()
+ * - update_deadline() -- slice expiration
*
* This means it is one entry 'behind' but that puts it close enough to where
* the bound on entity_key() is at most two lag bounds.
@@ -1131,6 +1131,7 @@ static bool update_deadline(struct cfs_r
* EEVDF: vd_i = ve_i + r_i / w_i
*/
se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
+ avg_vruntime(cfs_rq);
/*
* The task has consumed its request, reschedule.
@@ -5593,11 +5594,6 @@ entity_tick(struct cfs_rq *cfs_rq, struc
update_load_avg(cfs_rq, curr, UPDATE_TG);
update_cfs_group(curr);
- /*
- * Pulls along cfs_rq::zero_vruntime.
- */
- avg_vruntime(cfs_rq);
-
#ifdef CONFIG_SCHED_HRTICK
/*
* queued ticks are scheduled to match the slice, so don't bother
@@ -9128,7 +9124,7 @@ static void yield_task_fair(struct rq *r
*/
if (entity_eligible(cfs_rq, se)) {
se->vruntime = se->deadline;
- se->deadline += calc_delta_fair(se->slice, se);
+ update_deadline(cfs_rq, se);
}
}
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix
2026-04-01 13:20 ` [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix Peter Zijlstra
@ 2026-04-01 14:55 ` Vincent Guittot
2026-04-02 11:46 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 10+ messages in thread
From: Vincent Guittot @ 2026-04-01 14:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: jstultz, kprateek.nayak, linux-kernel, mingo, juri.lelli,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid
On Wed, 1 Apr 2026 at 15:24, Peter Zijlstra <peterz@infradead.org> wrote:
>
> John reported that stress-ng-yield could make his machine unhappy and
> managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
> zero_vruntime tracking").
>
> The combination of yield and that commit was specific enough to
> hypothesize the following scenario:
>
> Suppose we have 2 runnable tasks, both doing yield. Then one will be
> eligible and one will not be, because the average position must be in
> between these two entities.
>
> Therefore, the runnable task will be eligible, and be promoted a full
> slice (all the tasks do is yield after all). This causes it to jump over
> the other task and now the other task is eligible and current is no
> longer. So we schedule.
>
> Since we are runnable, there is no {de,en}queue. All we have is the
> __{en,de}queue_entity() from {put_prev,set_next}_task(). But per the
> fingered commit, those two no longer move zero_vruntime.
>
> All that moves zero_vruntime are tick and full {de,en}queue.
>
> This means, that if the two tasks playing leapfrog can reach the
> critical speed to reach the overflow point inside one tick's worth of
> time, we're up a creek.
>
> Additionally, when multiple cgroups are involved, there is no guarantee
> the tick will in fact hit every cgroup in a timely manner. Statistically
> speaking it will, but that same statistics does not rule out the
> possibility of one cgroup not getting a tick for a significant amount of
> time -- however unlikely.
>
> Therefore, just like with the yield() case, force an update at the end
> of every slice. This ensures the update is never more than a single
> slice behind and the whole thing is within 2 lag bounds as per the
> comment on entity_key().
>
> Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
> Reported-by: John Stultz <jstultz@google.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Tested-by: John Stultz <jstultz@google.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
> ---
> kernel/sched/fair.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -707,7 +707,7 @@ void update_zero_vruntime(struct cfs_rq
> * Called in:
> * - place_entity() -- before enqueue
> * - update_entity_lag() -- before dequeue
> - * - entity_tick()
> + * - update_deadline() -- slice expiration
> *
> * This means it is one entry 'behind' but that puts it close enough to where
> * the bound on entity_key() is at most two lag bounds.
> @@ -1131,6 +1131,7 @@ static bool update_deadline(struct cfs_r
> * EEVDF: vd_i = ve_i + r_i / w_i
> */
> se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
> + avg_vruntime(cfs_rq);
>
> /*
> * The task has consumed its request, reschedule.
> @@ -5593,11 +5594,6 @@ entity_tick(struct cfs_rq *cfs_rq, struc
> update_load_avg(cfs_rq, curr, UPDATE_TG);
> update_cfs_group(curr);
>
> - /*
> - * Pulls along cfs_rq::zero_vruntime.
> - */
> - avg_vruntime(cfs_rq);
> -
> #ifdef CONFIG_SCHED_HRTICK
> /*
> * queued ticks are scheduled to match the slice, so don't bother
> @@ -9128,7 +9124,7 @@ static void yield_task_fair(struct rq *r
> */
> if (entity_eligible(cfs_rq, se)) {
> se->vruntime = se->deadline;
> - se->deadline += calc_delta_fair(se->slice, se);
> + update_deadline(cfs_rq, se);
> }
> }
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* [tip: sched/urgent] sched/fair: Fix zero_vruntime tracking fix
2026-04-01 13:20 ` [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix Peter Zijlstra
2026-04-01 14:55 ` Vincent Guittot
@ 2026-04-02 11:46 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 10+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-04-02 11:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: John Stultz, Peter Zijlstra (Intel), Vincent Guittot,
K Prateek Nayak, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: 1319ea57529e131822bab56bf417c8edc2db9ae8
Gitweb: https://git.kernel.org/tip/1319ea57529e131822bab56bf417c8edc2db9ae8
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 01 Apr 2026 15:20:20 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 02 Apr 2026 13:42:43 +02:00
sched/fair: Fix zero_vruntime tracking fix
John reported that stress-ng-yield could make his machine unhappy and
managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
zero_vruntime tracking").
The combination of yield and that commit was specific enough to
hypothesize the following scenario:
Suppose we have 2 runnable tasks, both doing yield. Then one will be
eligible and one will not be, because the average position must be in
between these two entities.
Therefore, the runnable task will be eligible, and be promoted a full
slice (all the tasks do is yield after all). This causes it to jump over
the other task and now the other task is eligible and current is no
longer. So we schedule.
Since we are runnable, there is no {de,en}queue. All we have is the
__{en,de}queue_entity() from {put_prev,set_next}_task(). But per the
fingered commit, those two no longer move zero_vruntime.
All that moves zero_vruntime are tick and full {de,en}queue.
This means, that if the two tasks playing leapfrog can reach the
critical speed to reach the overflow point inside one tick's worth of
time, we're up a creek.
Additionally, when multiple cgroups are involved, there is no guarantee
the tick will in fact hit every cgroup in a timely manner. Statistically
speaking it will, but that same statistics does not rule out the
possibility of one cgroup not getting a tick for a significant amount of
time -- however unlikely.
Therefore, just like with the yield() case, force an update at the end
of every slice. This ensures the update is never more than a single
slice behind and the whole thing is within 2 lag bounds as per the
comment on entity_key().
Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260401132355.081530332@infradead.org
---
kernel/sched/fair.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bf948db..ab41147 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -707,7 +707,7 @@ void update_zero_vruntime(struct cfs_rq *cfs_rq, s64 delta)
* Called in:
* - place_entity() -- before enqueue
* - update_entity_lag() -- before dequeue
- * - entity_tick()
+ * - update_deadline() -- slice expiration
*
* This means it is one entry 'behind' but that puts it close enough to where
* the bound on entity_key() is at most two lag bounds.
@@ -1131,6 +1131,7 @@ static bool update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
* EEVDF: vd_i = ve_i + r_i / w_i
*/
se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
+ avg_vruntime(cfs_rq);
/*
* The task has consumed its request, reschedule.
@@ -5593,11 +5594,6 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
update_load_avg(cfs_rq, curr, UPDATE_TG);
update_cfs_group(curr);
- /*
- * Pulls along cfs_rq::zero_vruntime.
- */
- avg_vruntime(cfs_rq);
-
#ifdef CONFIG_SCHED_HRTICK
/*
* queued ticks are scheduled to match the slice, so don't bother
@@ -9128,7 +9124,7 @@ static void yield_task_fair(struct rq *rq)
*/
if (entity_eligible(cfs_rq, se)) {
se->vruntime = se->deadline;
- se->deadline += calc_delta_fair(se->slice, se);
+ update_deadline(cfs_rq, se);
}
}
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/2] sched/debug: Fix avg_vruntime() usage
2026-04-01 13:20 [PATCH 0/2] sched/urgent: zero_vruntime fixes Peter Zijlstra
2026-04-01 13:20 ` [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix Peter Zijlstra
@ 2026-04-01 13:20 ` Peter Zijlstra
2026-04-01 14:13 ` Vincent Guittot
2026-04-02 11:46 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2026-04-01 13:26 ` [PATCH 0/2] sched/urgent: zero_vruntime fixes Peter Zijlstra
2026-04-01 17:26 ` John Stultz
3 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2026-04-01 13:20 UTC (permalink / raw)
To: jstultz, kprateek.nayak
Cc: linux-kernel, peterz, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid
John reported that stress-ng-yield could make his machine unhappy and
managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
zero_vruntime tracking").
The commit in question changes avg_vruntime() from a function that is
a pure reader, to a function that updates variables. This turns an
unlocked sched/debug usage of this function from a minor mistake into
a data corruptor.
Fixes: af4cf40470c2 ("sched/fair: Add cfs_rq::avg_vruntime")
Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: John Stultz <jstultz@google.com>
---
kernel/sched/debug.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -902,6 +902,7 @@ static void print_rq(struct seq_file *m,
void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
s64 left_vruntime = -1, zero_vruntime, right_vruntime = -1, left_deadline = -1, spread;
+ u64 avruntime;
struct sched_entity *last, *first, *root;
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
@@ -925,6 +926,7 @@ void print_cfs_rq(struct seq_file *m, in
if (last)
right_vruntime = last->vruntime;
zero_vruntime = cfs_rq->zero_vruntime;
+ avruntime = avg_vruntime(cfs_rq);
raw_spin_rq_unlock_irqrestore(rq, flags);
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline",
@@ -934,7 +936,7 @@ void print_cfs_rq(struct seq_file *m, in
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "zero_vruntime",
SPLIT_NS(zero_vruntime));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime",
- SPLIT_NS(avg_vruntime(cfs_rq)));
+ SPLIT_NS(avruntime));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime",
SPLIT_NS(right_vruntime));
spread = right_vruntime - left_vruntime;
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2/2] sched/debug: Fix avg_vruntime() usage
2026-04-01 13:20 ` [PATCH 2/2] sched/debug: Fix avg_vruntime() usage Peter Zijlstra
@ 2026-04-01 14:13 ` Vincent Guittot
2026-04-01 16:14 ` Peter Zijlstra
2026-04-02 11:46 ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
1 sibling, 1 reply; 10+ messages in thread
From: Vincent Guittot @ 2026-04-01 14:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: jstultz, kprateek.nayak, linux-kernel, mingo, juri.lelli,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid
On Wed, 1 Apr 2026 at 15:24, Peter Zijlstra <peterz@infradead.org> wrote:
>
> John reported that stress-ng-yield could make his machine unhappy and
> managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
> zero_vruntime tracking").
>
> The commit in question changes avg_vruntime() from a function that is
> a pure reader, to a function that updates variables. This turns an
> unlocked sched/debug usage of this function from a minor mistake into
> a data corruptor.
>
> Fixes: af4cf40470c2 ("sched/fair: Add cfs_rq::avg_vruntime")
> Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
> Reported-by: John Stultz <jstultz@google.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Tested-by: John Stultz <jstultz@google.com>
> ---
> kernel/sched/debug.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/kernel/sched/debug.c
> +++ b/kernel/sched/debug.c
> @@ -902,6 +902,7 @@ static void print_rq(struct seq_file *m,
> void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
> {
> s64 left_vruntime = -1, zero_vruntime, right_vruntime = -1, left_deadline = -1, spread;
> + u64 avruntime;
> struct sched_entity *last, *first, *root;
> struct rq *rq = cpu_rq(cpu);
> unsigned long flags;
> @@ -925,6 +926,7 @@ void print_cfs_rq(struct seq_file *m, in
> if (last)
> right_vruntime = last->vruntime;
> zero_vruntime = cfs_rq->zero_vruntime;
> + avruntime = avg_vruntime(cfs_rq);
Minor comment:
Do you intentionally save zero_vruntime before callling avg_vruntime()
which will update zero_vruntime ?
That could make sense to take a snapshot before being modified by
print_cfs_rq() but I'm afraid the call to debugfs will anyway trigger
an update before we save and display the value
Reviewed-by: Vincent Guittot <vincent.guittot@linaor.rog>
> raw_spin_rq_unlock_irqrestore(rq, flags);
>
> SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline",
> @@ -934,7 +936,7 @@ void print_cfs_rq(struct seq_file *m, in
> SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "zero_vruntime",
> SPLIT_NS(zero_vruntime));
> SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime",
> - SPLIT_NS(avg_vruntime(cfs_rq)));
> + SPLIT_NS(avruntime));
> SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime",
> SPLIT_NS(right_vruntime));
> spread = right_vruntime - left_vruntime;
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH 2/2] sched/debug: Fix avg_vruntime() usage
2026-04-01 14:13 ` Vincent Guittot
@ 2026-04-01 16:14 ` Peter Zijlstra
0 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2026-04-01 16:14 UTC (permalink / raw)
To: Vincent Guittot
Cc: jstultz, kprateek.nayak, linux-kernel, mingo, juri.lelli,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid
On Wed, Apr 01, 2026 at 04:13:06PM +0200, Vincent Guittot wrote:
> On Wed, 1 Apr 2026 at 15:24, Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > John reported that stress-ng-yield could make his machine unhappy and
> > managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
> > zero_vruntime tracking").
> >
> > The commit in question changes avg_vruntime() from a function that is
> > a pure reader, to a function that updates variables. This turns an
> > unlocked sched/debug usage of this function from a minor mistake into
> > a data corruptor.
> >
> > Fixes: af4cf40470c2 ("sched/fair: Add cfs_rq::avg_vruntime")
> > Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
> > Reported-by: John Stultz <jstultz@google.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > Tested-by: John Stultz <jstultz@google.com>
> > ---
> > kernel/sched/debug.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > --- a/kernel/sched/debug.c
> > +++ b/kernel/sched/debug.c
> > @@ -902,6 +902,7 @@ static void print_rq(struct seq_file *m,
> > void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
> > {
> > s64 left_vruntime = -1, zero_vruntime, right_vruntime = -1, left_deadline = -1, spread;
> > + u64 avruntime;
> > struct sched_entity *last, *first, *root;
> > struct rq *rq = cpu_rq(cpu);
> > unsigned long flags;
> > @@ -925,6 +926,7 @@ void print_cfs_rq(struct seq_file *m, in
> > if (last)
> > right_vruntime = last->vruntime;
> > zero_vruntime = cfs_rq->zero_vruntime;
> > + avruntime = avg_vruntime(cfs_rq);
>
> Minor comment:
> Do you intentionally save zero_vruntime before callling avg_vruntime()
> which will update zero_vruntime ?
> That could make sense to take a snapshot before being modified by
> print_cfs_rq() but I'm afraid the call to debugfs will anyway trigger
> an update before we save and display the value
Intentional might be a big word, but yeah, printing the same value twice
seemed pointless. This way you can at least see where it came from or
something.
> Reviewed-by: Vincent Guittot <vincent.guittot@linaor.rog>
Thanks!
^ permalink raw reply [flat|nested] 10+ messages in thread
* [tip: sched/urgent] sched/debug: Fix avg_vruntime() usage
2026-04-01 13:20 ` [PATCH 2/2] sched/debug: Fix avg_vruntime() usage Peter Zijlstra
2026-04-01 14:13 ` Vincent Guittot
@ 2026-04-02 11:46 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 10+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-04-02 11:46 UTC (permalink / raw)
To: linux-tip-commits
Cc: John Stultz, Peter Zijlstra (Intel), Vincent Guittot,
K Prateek Nayak, x86, linux-kernel
The following commit has been merged into the sched/urgent branch of tip:
Commit-ID: e08d007f9d813616ce7093600bc4fdb9c9d81d89
Gitweb: https://git.kernel.org/tip/e08d007f9d813616ce7093600bc4fdb9c9d81d89
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 01 Apr 2026 15:20:21 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 02 Apr 2026 13:42:43 +02:00
sched/debug: Fix avg_vruntime() usage
John reported that stress-ng-yield could make his machine unhappy and
managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
zero_vruntime tracking").
The commit in question changes avg_vruntime() from a function that is
a pure reader, to a function that updates variables. This turns an
unlocked sched/debug usage of this function from a minor mistake into
a data corruptor.
Fixes: af4cf40470c2 ("sched/fair: Add cfs_rq::avg_vruntime")
Fixes: b3d99f43c72b ("sched/fair: Fix zero_vruntime tracking")
Reported-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Tested-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260401132355.196370805@infradead.org
---
kernel/sched/debug.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index b24f40f..15bf45b 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -902,6 +902,7 @@ static void print_rq(struct seq_file *m, struct rq *rq, int rq_cpu)
void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
{
s64 left_vruntime = -1, zero_vruntime, right_vruntime = -1, left_deadline = -1, spread;
+ u64 avruntime;
struct sched_entity *last, *first, *root;
struct rq *rq = cpu_rq(cpu);
unsigned long flags;
@@ -925,6 +926,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
if (last)
right_vruntime = last->vruntime;
zero_vruntime = cfs_rq->zero_vruntime;
+ avruntime = avg_vruntime(cfs_rq);
raw_spin_rq_unlock_irqrestore(rq, flags);
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "left_deadline",
@@ -934,7 +936,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "zero_vruntime",
SPLIT_NS(zero_vruntime));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "avg_vruntime",
- SPLIT_NS(avg_vruntime(cfs_rq)));
+ SPLIT_NS(avruntime));
SEQ_printf(m, " .%-30s: %Ld.%06ld\n", "right_vruntime",
SPLIT_NS(right_vruntime));
spread = right_vruntime - left_vruntime;
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] sched/urgent: zero_vruntime fixes
2026-04-01 13:20 [PATCH 0/2] sched/urgent: zero_vruntime fixes Peter Zijlstra
2026-04-01 13:20 ` [PATCH 1/2] sched/fair: Fix zero_vruntime tracking fix Peter Zijlstra
2026-04-01 13:20 ` [PATCH 2/2] sched/debug: Fix avg_vruntime() usage Peter Zijlstra
@ 2026-04-01 13:26 ` Peter Zijlstra
2026-04-01 17:26 ` John Stultz
3 siblings, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2026-04-01 13:26 UTC (permalink / raw)
To: jstultz, kprateek.nayak
Cc: linux-kernel, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid
On Wed, Apr 01, 2026 at 03:20:19PM +0200, Peter Zijlstra wrote:
> Hi!
>
> Slightly modified and with a changelog on patches from the thread here:
>
> https://lkml.kernel.org/r/CANDhNCr3ooATiBgcnq8CAZ+AwzvmHeoKhjvfy=awF1RKFHydCA@mail.gmail.com
>
> Where John found stress-ng-yield made mainline sad.
>
> I have retained the tested-by tags even though I've slightly modified the
> patches; holler if you disagree. Local testing shows its good, and my VM was
> very quick to panic once I had the right ingredients.
Should now live in: queue/sched/urgent
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/2] sched/urgent: zero_vruntime fixes
2026-04-01 13:20 [PATCH 0/2] sched/urgent: zero_vruntime fixes Peter Zijlstra
` (2 preceding siblings ...)
2026-04-01 13:26 ` [PATCH 0/2] sched/urgent: zero_vruntime fixes Peter Zijlstra
@ 2026-04-01 17:26 ` John Stultz
3 siblings, 0 replies; 10+ messages in thread
From: John Stultz @ 2026-04-01 17:26 UTC (permalink / raw)
To: Peter Zijlstra
Cc: kprateek.nayak, linux-kernel, mingo, juri.lelli, vincent.guittot,
dietmar.eggemann, rostedt, bsegall, mgorman, vschneid
On Wed, Apr 1, 2026 at 6:24 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Slightly modified and with a changelog on patches from the thread here:
>
> https://lkml.kernel.org/r/CANDhNCr3ooATiBgcnq8CAZ+AwzvmHeoKhjvfy=awF1RKFHydCA@mail.gmail.com
>
> Where John found stress-ng-yield made mainline sad.
>
> I have retained the tested-by tags even though I've slightly modified the
> patches; holler if you disagree. Local testing shows its good, and my VM was
> very quick to panic once I had the right ingredients.
Testing ran well overnight! Thank you again for chasing all of these
edge cases down so fast!
-john
^ permalink raw reply [flat|nested] 10+ messages in thread