From: Peter Zijlstra <peterz@infradead.org>
To: John Stultz <jstultz@google.com>
Cc: mingo@kernel.org, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, linux-kernel@vger.kernel.org,
wangtao554@huawei.com, quzicheng@huawei.com,
kprateek.nayak@amd.com, dsmythies@telus.net,
shubhang@os.amperecomputing.com,
Suleiman Souhlal <suleiman@google.com>
Subject: Re: [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking
Date: Mon, 30 Mar 2026 12:10:18 +0200 [thread overview]
Message-ID: <20260330101018.GN3738786@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CANDhNCr3ooATiBgcnq8CAZ+AwzvmHeoKhjvfy=awF1RKFHydCA@mail.gmail.com>
On Fri, Mar 27, 2026 at 10:44:28PM -0700, John Stultz wrote:
> On Wed, Feb 18, 2026 at 11:58 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > It turns out that zero_vruntime tracking is broken when there is but a single
> > task running. Current update paths are through __{en,de}queue_entity(), and
> > when there is but a single task, pick_next_task() will always return that one
> > task, and put_prev_set_next_task() will end up in neither function.
> >
> > This can cause entity_key() to grow indefinitely large and cause overflows,
> > leading to much pain and suffering.
> >
> > Furtermore, doing update_zero_vruntime() from __{de,en}queue_entity(), which
> > are called from {set_next,put_prev}_entity() has problems because:
> >
> > - set_next_entity() calls __dequeue_entity() before it does cfs_rq->curr = se.
> > This means the avg_vruntime() will see the removal but not current, missing
> > the entity for accounting.
> >
> > - put_prev_entity() calls __enqueue_entity() before it does cfs_rq->curr =
> > NULL. This means the avg_vruntime() will see the addition *and* current,
> > leading to double accounting.
> >
> > Both cases are incorrect/inconsistent.
> >
> > Noting that avg_vruntime is already called on each {en,de}queue, remove the
> > explicit avg_vruntime() calls (which removes an extra 64bit division for each
> > {en,de}queue) and have avg_vruntime() update zero_vruntime itself.
> >
> > Additionally, have the tick call avg_vruntime() -- discarding the result, but
> > for the side-effect of updating zero_vruntime.
>
> Hey all,
>
> So in stress testing with my full proxy-exec series, I was
> occasionally tripping over the situation where __pick_eevdf() returns
> null which quickly crashes.
> The backtrace is usually due to stress-ng stress-ng-yield test:
Suppose we have 2 runnable tasks, both doing yield. Then one will be
eligible and one will not be, because the average position must be in
between these two entities.
Therefore, the runnable task will be eligible, and be promoted a full
slice (all the tasks do is yield after all). This causes it to jump over
the other task and now the other task is eligible and it is no longer.
So we schedule.
Since we are runnable, there is no dequeue or enqueue. All we have is
the __enqueue_entity() and __dequeue_entity() from put_prev_task() /
set_next_task(). But per the fingered commit, those two no longer move
zero_vruntime head.
All that moves zero_vruntime is tick and full dequeue or enqueue.
This means, that if the two tasks playing leapfrog can reach the
critical speed to reach the overflow point inside one tick's worth of
time, we're up a creek.
If this is indeed the case, then the below should cure things.
This also means that running a HZ=100 config will increase the chances
of hitting this vs HZ=1000.
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9298f49f842c..c7daaf941b26 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -9307,6 +9307,7 @@ static void yield_task_fair(struct rq *rq)
if (entity_eligible(cfs_rq, se)) {
se->vruntime = se->deadline;
se->deadline += calc_delta_fair(se->slice, se);
+ avg_vruntime(cfs_rq);
}
}
next prev parent reply other threads:[~2026-03-30 10:10 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 7:58 [PATCH v2 0/7] sched: Various reweight_entity() fixes Peter Zijlstra
2026-02-19 7:58 ` [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking Peter Zijlstra
2026-02-23 10:56 ` Vincent Guittot
2026-02-23 13:09 ` Dietmar Eggemann
2026-02-23 14:15 ` Peter Zijlstra
2026-02-24 8:53 ` Dietmar Eggemann
2026-02-24 9:02 ` Peter Zijlstra
2026-03-28 5:44 ` John Stultz
2026-03-28 17:04 ` Steven Rostedt
2026-03-30 17:58 ` John Stultz
2026-03-30 18:27 ` Steven Rostedt
2026-03-30 9:43 ` Peter Zijlstra
2026-03-30 17:49 ` John Stultz
2026-03-30 10:10 ` Peter Zijlstra [this message]
2026-03-30 14:37 ` K Prateek Nayak
2026-03-30 14:40 ` Peter Zijlstra
2026-03-30 15:50 ` K Prateek Nayak
2026-03-30 19:11 ` Peter Zijlstra
2026-03-31 0:38 ` K Prateek Nayak
2026-03-31 4:58 ` K Prateek Nayak
2026-03-31 7:08 ` Peter Zijlstra
2026-03-31 7:14 ` Peter Zijlstra
2026-03-31 8:49 ` K Prateek Nayak
2026-03-31 9:29 ` Peter Zijlstra
2026-03-31 12:20 ` Peter Zijlstra
2026-03-31 16:14 ` Peter Zijlstra
2026-03-31 17:02 ` K Prateek Nayak
2026-03-31 22:40 ` John Stultz
2026-03-30 19:40 ` John Stultz
2026-03-30 19:43 ` Peter Zijlstra
2026-03-30 21:45 ` John Stultz
2026-02-19 7:58 ` [PATCH v2 2/7] sched/fair: Only set slice protection at pick time Peter Zijlstra
2026-02-19 7:58 ` [PATCH v2 3/7] sched/eevdf: Update se->vprot in reweight_entity() Peter Zijlstra
2026-02-19 7:58 ` [PATCH v2 4/7] sched/fair: Fix lag clamp Peter Zijlstra
2026-02-23 10:23 ` Dietmar Eggemann
2026-02-23 10:57 ` Vincent Guittot
2026-02-19 7:58 ` [PATCH v2 5/7] sched/fair: Increase weight bits for avg_vruntime Peter Zijlstra
2026-02-23 10:56 ` Vincent Guittot
2026-02-23 11:51 ` Peter Zijlstra
2026-02-23 12:36 ` Peter Zijlstra
2026-02-23 13:06 ` Vincent Guittot
2026-03-30 7:55 ` K Prateek Nayak
2026-03-30 9:27 ` Peter Zijlstra
2026-04-02 5:28 ` K Prateek Nayak
2026-04-02 10:22 ` Peter Zijlstra
2026-04-02 10:56 ` K Prateek Nayak
2026-04-03 4:02 ` K Prateek Nayak
2026-04-07 12:00 ` Peter Zijlstra
2026-04-07 13:42 ` [tip: sched/core] sched/fair: Avoid overflow in enqueue_entity() tip-bot2 for K Prateek Nayak
2026-02-19 7:58 ` [PATCH v2 6/7] sched/fair: Revert 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") Peter Zijlstra
2026-02-23 10:57 ` Vincent Guittot
2026-03-24 10:01 ` William Montaz
2026-04-07 13:45 ` Peter Zijlstra
2026-02-19 7:58 ` [PATCH v2 7/7] sched/fair: Use full weight to __calc_delta() Peter Zijlstra
2026-02-23 10:57 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260330101018.GN3738786@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=dsmythies@telus.net \
--cc=jstultz@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=quzicheng@huawei.com \
--cc=rostedt@goodmis.org \
--cc=shubhang@os.amperecomputing.com \
--cc=suleiman@google.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=wangtao554@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox