From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
mingo@kernel.org, juri.lelli@redhat.com,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
linux-kernel@vger.kernel.org, wangtao554@huawei.com,
quzicheng@huawei.com, dsmythies@telus.net,
shubhang@os.amperecomputing.com
Subject: Re: [PATCH v2 5/7] sched/fair: Increase weight bits for avg_vruntime
Date: Tue, 7 Apr 2026 14:00:52 +0200 [thread overview]
Message-ID: <20260407120052.GG3738010@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <b004fa56-b4b8-4cd9-9431-a576f629f31d@amd.com>
On Fri, Apr 03, 2026 at 09:32:22AM +0530, K Prateek Nayak wrote:
> On 4/2/2026 4:26 PM, K Prateek Nayak wrote:
> >> That is, something like the below... But with a comment ofc :-)
> >>
> >> Does that make sense?
> >
> > Let me go queue an overnight test to see if I trip that warning or
> > not.
>
> Didn't trip any warning and the machine is still up and running
> after 15 Hours so feel free to include:
>
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
>
> Perhaps the comment can read something like:
>
> /*
> * A heavy entity can pull the avg_vruntime close to its
> * vruntime post enqueue but the zero_vruntime point is
> * only updated at the next update_deadline() / enqueue
> * / dequeue.
> *
> * Until then, the sum_w_vruntime grow quadratically,
> * proportional to the entity's weight (w_i) as:
> *
> * sum_w_vruntime -= (lag_i * (W + w_i) / W) * w_i
> *
> * If w_i > W, it is beneficial to pull the
> * zero_vruntime towards the entity's vruntime (V_i)
> * since the sum_w_vruntime would only grow by
> * (lag_i * W) which consumes lesser bits than leaving
> * the zero_vruntime at the pre-enqueue avg_vruntime.
> */
> if (weight > load)
> update_zero = true;
>
> Feel free to reword as you see fit :-)
I've made it like so. You did all the hard work after all. Thanks!
---
Subject: sched/fair: Avoid overflow in enqueue_entity()
From: K Prateek Nayak <kprateek.nayak@amd.com>
Date: Tue Apr 7 13:36:17 CEST 2026
Here is one scenario which was triggered when running:
stress-ng --yield=32 -t 10000000s&
while true; do perf bench sched messaging -p -t -l 100000 -g 16; done
on a 256CPUs machine after about an hour into the run:
__enqeue_entity: entity_key(-141245081754) weight(90891264) overflow_mul(5608800059305154560) vlag(57498) delayed?(0)
cfs_rq: zero_vruntime(3809707759657809) sum_w_vruntime(0) sum_weight(0) nr_queued(1)
cfs_rq->curr: entity_key(0) vruntime(3809707759657809) deadline(3809723966988476) weight(37)
The above comes from __enqueue_entity() after a place_entity(). Breaking
this down:
vlag_initial = 57498
vlag = (57498 * (37 + 90891264)) / 37 = 141,245,081,754
vruntime = 3809707759657809 - 141245081754 = 3,809,566,514,576,055
entity_key(se, cfs_rq) = -141,245,081,754
Now, multiplying the entity_key with its own weight results to
5,608,800,059,305,154,560 (same as what overflow_mul() suggests) but
in Python, without overflow, this would be: -1,2837,944,014,404,397,056
Avoid the overflow (without doing the division for avg_vruntime()), by moving
zero_vruntime to the new entity when it is heavier.
Fixes: 4823725d9d1d ("sched/fair: Increase weight bits for avg_vruntime")
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
[peterz: suggested 'weight > load' condition]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/fair.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5352,6 +5352,7 @@ static void
place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
u64 vslice, vruntime = avg_vruntime(cfs_rq);
+ bool update_zero = false;
s64 lag = 0;
if (!se->custom_slice)
@@ -5368,7 +5369,7 @@ place_entity(struct cfs_rq *cfs_rq, stru
*/
if (sched_feat(PLACE_LAG) && cfs_rq->nr_queued && se->vlag) {
struct sched_entity *curr = cfs_rq->curr;
- long load;
+ long load, weight;
lag = se->vlag;
@@ -5428,14 +5429,41 @@ place_entity(struct cfs_rq *cfs_rq, stru
if (curr && curr->on_rq)
load += avg_vruntime_weight(cfs_rq, curr->load.weight);
- lag *= load + avg_vruntime_weight(cfs_rq, se->load.weight);
+ weight = avg_vruntime_weight(cfs_rq, se->load.weight);
+ lag *= load + weight;
if (WARN_ON_ONCE(!load))
load = 1;
lag = div64_long(lag, load);
+
+ /*
+ * A heavy entity (relative to the tree) will pull the
+ * avg_vruntime close to its vruntime position on enqueue. But
+ * the zero_vruntime point is only updated at the next
+ * update_deadline()/place_entity()/update_entity_lag().
+ *
+ * Specifically (see the comment near avg_vruntime_weight()):
+ *
+ * sum_w_vruntime = \Sum (v_i - v0) * w_i
+ *
+ * Note that if v0 is near a light entity, both terms will be
+ * small for the light entity, while in that case both terms
+ * are large for the heavy entity, leading to risk of
+ * overflow.
+ *
+ * OTOH if v0 is near the heavy entity, then the difference is
+ * larger for the light entity, but the factor is small, while
+ * for the heavy entity the difference is small but the factor
+ * is large. Avoiding the multiplication overflow.
+ */
+ if (weight > load)
+ update_zero = true;
}
se->vruntime = vruntime - lag;
+ if (update_zero)
+ update_zero_vruntime(cfs_rq, -lag);
+
if (sched_feat(PLACE_REL_DEADLINE) && se->rel_deadline) {
se->deadline += se->vruntime;
se->rel_deadline = 0;
next prev parent reply other threads:[~2026-04-07 12:01 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-19 7:58 [PATCH v2 0/7] sched: Various reweight_entity() fixes Peter Zijlstra
2026-02-19 7:58 ` [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking Peter Zijlstra
2026-02-23 10:56 ` Vincent Guittot
2026-02-23 13:09 ` Dietmar Eggemann
2026-02-23 14:15 ` Peter Zijlstra
2026-02-24 8:53 ` Dietmar Eggemann
2026-02-24 9:02 ` Peter Zijlstra
2026-03-28 5:44 ` John Stultz
2026-03-28 17:04 ` Steven Rostedt
2026-03-30 17:58 ` John Stultz
2026-03-30 18:27 ` Steven Rostedt
2026-03-30 9:43 ` Peter Zijlstra
2026-03-30 17:49 ` John Stultz
2026-03-30 10:10 ` Peter Zijlstra
2026-03-30 14:37 ` K Prateek Nayak
2026-03-30 14:40 ` Peter Zijlstra
2026-03-30 15:50 ` K Prateek Nayak
2026-03-30 19:11 ` Peter Zijlstra
2026-03-31 0:38 ` K Prateek Nayak
2026-03-31 4:58 ` K Prateek Nayak
2026-03-31 7:08 ` Peter Zijlstra
2026-03-31 7:14 ` Peter Zijlstra
2026-03-31 8:49 ` K Prateek Nayak
2026-03-31 9:29 ` Peter Zijlstra
2026-03-31 12:20 ` Peter Zijlstra
2026-03-31 16:14 ` Peter Zijlstra
2026-03-31 17:02 ` K Prateek Nayak
2026-03-31 22:40 ` John Stultz
2026-03-30 19:40 ` John Stultz
2026-03-30 19:43 ` Peter Zijlstra
2026-03-30 21:45 ` John Stultz
2026-02-19 7:58 ` [PATCH v2 2/7] sched/fair: Only set slice protection at pick time Peter Zijlstra
2026-02-19 7:58 ` [PATCH v2 3/7] sched/eevdf: Update se->vprot in reweight_entity() Peter Zijlstra
2026-02-19 7:58 ` [PATCH v2 4/7] sched/fair: Fix lag clamp Peter Zijlstra
2026-02-23 10:23 ` Dietmar Eggemann
2026-02-23 10:57 ` Vincent Guittot
2026-02-19 7:58 ` [PATCH v2 5/7] sched/fair: Increase weight bits for avg_vruntime Peter Zijlstra
2026-02-23 10:56 ` Vincent Guittot
2026-02-23 11:51 ` Peter Zijlstra
2026-02-23 12:36 ` Peter Zijlstra
2026-02-23 13:06 ` Vincent Guittot
2026-03-30 7:55 ` K Prateek Nayak
2026-03-30 9:27 ` Peter Zijlstra
2026-04-02 5:28 ` K Prateek Nayak
2026-04-02 10:22 ` Peter Zijlstra
2026-04-02 10:56 ` K Prateek Nayak
2026-04-03 4:02 ` K Prateek Nayak
2026-04-07 12:00 ` Peter Zijlstra [this message]
2026-04-07 13:42 ` [tip: sched/core] sched/fair: Avoid overflow in enqueue_entity() tip-bot2 for K Prateek Nayak
2026-02-19 7:58 ` [PATCH v2 6/7] sched/fair: Revert 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") Peter Zijlstra
2026-02-23 10:57 ` Vincent Guittot
2026-03-24 10:01 ` William Montaz
2026-04-07 13:45 ` Peter Zijlstra
2026-04-15 21:10 ` [PATCH 6.18] " John Stultz
2026-02-19 7:58 ` [PATCH v2 7/7] sched/fair: Use full weight to __calc_delta() Peter Zijlstra
2026-02-23 10:57 ` Vincent Guittot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260407120052.GG3738010@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bsegall@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=dsmythies@telus.net \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=quzicheng@huawei.com \
--cc=rostedt@goodmis.org \
--cc=shubhang@os.amperecomputing.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=wangtao554@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.