From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 25D7E38A73B for ; Mon, 4 May 2026 11:22:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=90.155.50.34 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777893783; cv=none; b=JMqr5EbEahM4i/q/YH08mH1sd1E+nsVeHD36UTDtR3rXxlaRqHEIqx7h5EfmLbZieSH21WOCj/VanWC4glMe4TaO+/U/xKyqUrXxlgNMv/uctMBsW5HU8Qlqr2q+sCYcdgawFpj9J7Lu1iHCasDkV36dWjqDguTBwx+hwDjEdwA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777893783; c=relaxed/simple; bh=OjgJTXOBwLoI//jFypUNlgJ/lwYluihubb/LlFo+E1Q=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=HTrOmYg9XCj6kzBU+c/Fhfp9SPg8q8Ol92pIvpeSsPSE9xB6H6REULgC1vc4oYOD75fBevCzy96vaa2wlWNdGGlqa5YV60jcikinPdxPv9VC40gR/QpKwMwzax4QFhmhcR7dW3O3QRBjAMzNxsfywMpxf4FdqD4TqDLfUNXeoA8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org; spf=none smtp.mailfrom=infradead.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b=PYa2eIzL; arc=none smtp.client-ip=90.155.50.34 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=infradead.org Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=infradead.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="PYa2eIzL" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=CgDzmQiA/hpU0ziPl9gP8lRLmB4H/6BBdEkAGxW+PbU=; b=PYa2eIzLF/LvOacYhexsQqD4UC 7iiKB7PgGoSRwfzoHEWkmrtj6qxDPoh828cotq1y01eM/m+TfnVqhR7vBh+Tk2iK82kL/hd1RkNuX NSyySYbI6rAuahv4oK7kh256t+T++vyYzO3ktOsMKs9/r7i9wZY7YyejZfgn1zQSI2Er5zu8yOJy2 U5XirspHhEs8UlDkM4d4aKE9uZQuNtGbVvelygraINlB3jB7fgybxLh+QTCXdvezwAEH7io29PSr2 yy8pQZyY6OXXSlAmZMJH8qAS3cCDGIM5uCf5sZoBUrw7xnf96QOTiazaLFV0i1qvbuvko+q3jM+uH 389ARbvg==; Received: from 77-249-17-252.cable.dynamic.v4.ziggo.nl ([77.249.17.252] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.98.2 #2 (Red Hat Linux)) id 1wJrNo-00000000mog-3wox; Mon, 04 May 2026 11:22:41 +0000 Received: by noisy.programming.kicks-ass.net (Postfix, from userid 1000) id A1AB2300969; Mon, 04 May 2026 13:22:39 +0200 (CEST) Date: Mon, 4 May 2026 13:22:39 +0200 From: Peter Zijlstra To: K Prateek Nayak Cc: Zhan Xusheng , Ingo Molnar , Vincent Guittot , Juri Lelli , linux-kernel@vger.kernel.org, Zhan Xusheng , Dietmar Eggemann , Valentin Schneider , Ben Segall , Steven Rostedt , Mel Gorman , hca@linux.ibm.com, gor@linux.ibm.com, agordeev@linux.ibm.com, borntraeger@linux.ibm.com, svens@linux.ibm.com, maddy@linux.ibm.com, mpe@ellerman.id.au, chleroy@kernel.org Subject: Re: [PATCH RESEND] sched/fair: Fix overflow in vruntime_eligible() Message-ID: <20260504112239.GA1174357@noisy.programming.kicks-ass.net> References: <20260415145742.10359-1-zhanxusheng@xiaomi.com> <20260428144951.121765-1-zhanxusheng@xiaomi.com> <57e94d7e-b8fb-49bc-a914-8a7401e43af3@amd.com> <20260428173521.GK3126523@noisy.programming.kicks-ass.net> <4870ea19-78d4-44b2-9f18-14c3f8782726@amd.com> <20260501104006.GA3102624@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260501104006.GA3102624@noisy.programming.kicks-ass.net> On Fri, May 01, 2026 at 12:40:06PM +0200, Peter Zijlstra wrote: > Anyway, I had a poke around with godbolt, and the below seems to > generate the best code for things like x86_64 and arm64. > > Specifically, the __builtin_mul_overflow() already has to compute the > 128 bit product anyway for most architectures, so using that directly > then leads to saner asm and easier to understand code. > > AFAICT HPPA64 is the only 64bit architecture that doesn't implement > __int128 and will thus be demoted to doing what we do on 32bit. I forgot we had ARCH_SUPPORTS_INT128, and I suppose this had better check that. Now, s390 is a bit weird and excludes GCC even though that definitely supports __int128. Supposedly there was a issue, but perhaps modern GCC has this fixed? Also, PowerPC does not seem to set this at all. Anyway, I've then ended up with this, it keeps all 64bit architectures on the extra precision, but incurs an extra division for all that do not have __int128 selected. --- diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 728965851842..01405f6011b0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -882,11 +882,11 @@ bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se) * * lag_i >= 0 -> V >= v_i * - * \Sum (v_i - v)*w_i - * V = ------------------ + v + * \Sum (v_i - v0)*w_i + * V = ------------------- + v0 * \Sum w_i * - * lag_i >= 0 -> \Sum (v_i - v)*w_i >= (v_i - v)*(\Sum w_i) + * lag_i >= 0 -> \Sum (v_i - v0)*w_i >= (v_i - v0)*(\Sum w_i) * * Note: using 'avg_vruntime() > se->vruntime' is inaccurate due * to the loss in precision caused by the division. @@ -894,7 +894,7 @@ bool update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se) static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime) { struct sched_entity *curr = cfs_rq->curr; - s64 avg = cfs_rq->sum_w_vruntime; + s64 key, avg = cfs_rq->sum_w_vruntime; long load = cfs_rq->sum_weight; if (curr && curr->on_rq) { @@ -904,7 +904,35 @@ static int vruntime_eligible(struct cfs_rq *cfs_rq, u64 vruntime) load += weight; } - return avg >= vruntime_op(vruntime, "-", cfs_rq->zero_vruntime) * load; + key = vruntime_op(vruntime, "-", cfs_rq->zero_vruntime); + + /* + * The worst case term for @key includes 'NSEC_TICK * NICE_0_LOAD' + * and @load obviously includes NICE_0_LOAD. NSEC_TICK is around 24 + * bits, while NICE_0_LOAD is 20 on 64bit and 10 otherwise. + * + * This gives that on 64bit the product will be at least 64bit which + * overflows s64, while on 32bit it will only be 44bits and should fit + * comfortably. + */ +#ifdef CONFIG_64BIT +#ifdef CONFIG_ARCH_SUPPORTS_INT128 + /* An __int128 mult should be cheaper than a division. */ + return avg >= (__int128)key * load; +#else + /* + * Since the divisor is @load, which is guaranteed positive, the + * inequality: avg >= key * load, can be rewritten into a division + * like: avg/load > key || (avg/load == key && avg%load >= 0). + */ + s64 div = avg / load; + if (div > key) + return true; + return div == key && avg % load >= 0; +#endif +#else /* 32bit */ + return avg >= key * load; +#endif } int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)