From: Peter Zijlstra <peterz@infradead.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Zhan Xusheng <zhanxusheng1024@gmail.com>,
Ingo Molnar <mingo@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Juri Lelli <juri.lelli@redhat.com>,
linux-kernel@vger.kernel.org,
Zhan Xusheng <zhanxusheng@xiaomi.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Ben Segall <bsegall@google.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mel Gorman <mgorman@suse.de>,
hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com,
borntraeger@linux.ibm.com, svens@linux.ibm.com,
maddy@linux.ibm.com, mpe@ellerman.id.au, chleroy@kernel.org
Subject: Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible()
Date: Mon, 4 May 2026 13:22:39 +0200 [thread overview]
Message-ID: <20260504112239.GA1174357@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260501104006.GA3102624@noisy.programming.kicks-ass.net>
On Fri, May 01, 2026 at 12:40:06PM +0200, Peter Zijlstra wrote:
> Anyway, I had a poke around with godbolt, and the below seems to
> generate the best code for things like x86_64 and arm64.
>
> Specifically, the __builtin_mul_overflow() already has to compute the
> 128 bit product anyway for most architectures, so using that directly
> then leads to saner asm and easier to understand code.
>
> AFAICT HPPA64 is the only 64bit architecture that doesn't implement
> __int128 and will thus be demoted to doing what we do on 32bit.
I forgot we had ARCH_SUPPORTS_INT128, and I suppose this had better
check that. Now, s390 is a bit weird and excludes GCC even though that
definitely supports __int128. Supposedly there was a issue, but perhaps
modern GCC has this fixed?
Also, PowerPC does not seem to set this at all.
Anyway, I've then ended up with this, it keeps all 64bit architectures
on the extra precision, but incurs an extra division for all that do not
have __int128 selected.
---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 728965851842..01405f6011b0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -882,11 +882,11 @@ bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
*
* lag_i >= 0 -> V >= v_i
*
- * \Sum (v_i - v)*w_i
- * V = ------------------ + v
+ * \Sum (v_i - v0)*w_i
+ * V = ------------------- + v0
* \Sum w_i
*
- * lag_i >= 0 -> \Sum (v_i - v)*w_i >= (v_i - v)*(\Sum w_i)
+ * lag_i >= 0 -> \Sum (v_i - v0)*w_i >= (v_i - v0)*(\Sum w_i)
*
* Note: using 'avg_vruntime() > se->vruntime' is inaccurate due
* to the loss in precision caused by the division.
@@ -894,7 +894,7 @@ bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime)
{
struct sched_entity *curr = cfs_rq->curr;
- s64 avg = cfs_rq->sum_w_vruntime;
+ s64 key, avg = cfs_rq->sum_w_vruntime;
long load = cfs_rq->sum_weight;
if (curr && curr->on_rq) {
@@ -904,7 +904,35 @@ static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime)
load += weight;
}
- return avg >= vruntime_op(vruntime, "-", cfs_rq->zero_vruntime) * load;
+ key = vruntime_op(vruntime, "-", cfs_rq->zero_vruntime);
+
+ /*
+ * The worst case term for @key includes 'NSEC_TICK * NICE_0_LOAD'
+ * and @load obviously includes NICE_0_LOAD. NSEC_TICK is around 24
+ * bits, while NICE_0_LOAD is 20 on 64bit and 10 otherwise.
+ *
+ * This gives that on 64bit the product will be at least 64bit which
+ * overflows s64, while on 32bit it will only be 44bits and should fit
+ * comfortably.
+ */
+#ifdef CONFIG_64BIT
+#ifdef CONFIG_ARCH_SUPPORTS_INT128
+ /* An __int128 mult should be cheaper than a division. */
+ return avg >= (__int128)key * load;
+#else
+ /*
+ * Since the divisor is @load, which is guaranteed positive, the
+ * inequality: avg >= key * load, can be rewritten into a division
+ * like: avg/load > key || (avg/load == key && avg%load >= 0).
+ */
+ s64 div = avg / load;
+ if (div > key)
+ return true;
+ return div == key && avg % load >= 0;
+#endif
+#else /* 32bit */
+ return avg >= key * load;
+#endif
}
int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
next prev parent reply other threads:[~2026-05-04 11:22 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-15 14:57 [PATCH] sched/fair: Fix overflow in vruntime_eligible() Zhan Xusheng
2026-04-28 14:49 ` [PATCH RESEND] " Zhan Xusheng
2026-04-28 16:17 ` K Prateek Nayak
2026-04-28 17:35 ` Peter Zijlstra
2026-04-29 5:03 ` K Prateek Nayak
2026-04-29 7:31 ` Zhan Xusheng
2026-05-01 10:40 ` Peter Zijlstra
2026-05-01 10:48 ` Peter Zijlstra
2026-05-01 13:05 ` David Laight
2026-05-01 13:43 ` Peter Zijlstra
2026-05-04 11:22 ` Peter Zijlstra [this message]
2026-05-04 13:16 ` Heiko Carstens
2026-05-04 14:57 ` Peter Zijlstra
2026-05-04 17:40 ` Stefan Schulze Frielinghaus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260504112239.GA1174357@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=agordeev@linux.ibm.com \
--cc=borntraeger@linux.ibm.com \
--cc=bsegall@google.com \
--cc=chleroy@kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=maddy@linux.ibm.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=mpe@ellerman.id.au \
--cc=rostedt@goodmis.org \
--cc=svens@linux.ibm.com \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=zhanxusheng1024@gmail.com \
--cc=zhanxusheng@xiaomi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox