From: Peter Zijlstra <peterz@infradead.org>
To: Mario Roy <marioeroy@gmail.com>
Cc: Chris Mason <clm@meta.com>,
Joseph Salisbury <joseph.salisbury@oracle.com>,
Adam Li <adamli@os.amperecomputing.com>,
Hazem Mohamed Abuelfotoh <abuehaze@amazon.com>,
Josh Don <joshdon@google.com>,
mingo@redhat.com, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
vschneid@redhat.com, linux-kernel@vger.kernel.org,
kprateek.nayak@amd.com
Subject: Re: [PATCH 4/4] sched/fair: Proportional newidle balance
Date: Tue, 27 Jan 2026 16:17:48 +0100 [thread overview]
Message-ID: <20260127151748.GA1079264@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <20260127104041.GD217302@noisy.programming.kicks-ass.net>
On Tue, Jan 27, 2026 at 11:40:41AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 23, 2026 at 12:03:06PM +0100, Peter Zijlstra wrote:
> > On Fri, Jan 23, 2026 at 11:50:46AM +0100, Peter Zijlstra wrote:
> > > On Sun, Jan 18, 2026 at 03:46:22PM -0500, Mario Roy wrote:
> > > > The patch "Proportional newidle balance" introduced a regression
> > > > with Linux 6.12.65 and 6.18.5. There is noticeable regression with
> > > > easyWave testing. [1]
> > > >
> > > > The CPU is AMD Threadripper 9960X CPU (24/48). I followed the source
> > > > to install easyWave [2]. That is fetching the two tar.gz archives.
> > >
> > > What is the actual configuration of that chip? Is it like 3*8 or 4*6
> > > (CCX wise). A quick google couldn't find me the answer :/
> >
> > Obviously I found it right after sending this. It's a 4x6 config.
> > Meaning it needs newidle to balance between those 4 domains.
>
> So with the below patch on top of my Xeon w7-2495X (which is 24-core
> 48-thread) I too have 4 LLC :-)
>
> And I think I can see a slight difference, but nowhere near as terrible.
>
> Let me go stick some tracing on.
Does this help some?
Turns out, this easywave thing has a very low newidle rate, but then
also a fairly low success rate. But since it doesn't do it that often,
the cost isn't that significant so we might as well always do it etc..
This adds a second term to the ratio computation that takes time into
account, For low rate newidle this term will dominate, while for higher
rate the success ratio is more important.
Chris, afaict this still DTRT for schbench, but if this works for Mario,
could you also re-run things at your end?
[ the 4 'second' thing is a bit random, but looking at the timings
between easywave and schbench this seems to be a reasonable middle
ground. Although I think 8 'seconds' -- 23 shift -- would also work.
That would give:
1024 - 8 s - 64 Hz
512 - 4 s - 128 Hz
256 - 2 s - 256 Hz
128 - 1 s - 512 Hz
64 - .5 s - 1024 Hz
32 - .25 s - 2048 Hz
]
---
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index 45c0022b91ce..a1e1032426dc 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -95,6 +95,7 @@ struct sched_domain {
unsigned int newidle_call;
unsigned int newidle_success;
unsigned int newidle_ratio;
+ u64 newidle_stamp;
u64 max_newidle_lb_cost;
unsigned long last_decay_max_lb_cost;
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index eca642295c4b..ab9cf06c6a76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -12224,8 +12224,31 @@ static inline void update_newidle_stats(struct sched_domain *sd, unsigned int su
sd->newidle_call++;
sd->newidle_success += success;
if (sd->newidle_call >= 1024) {
- sd->newidle_ratio = sd->newidle_success;
+ u64 now = sched_clock();
+ s64 delta = now - sd->newidle_stamp;
+ sd->newidle_stamp = now;
+ int ratio = 0;
+
+ if (delta < 0)
+ delta = 0;
+
+ if (sched_feat(NI_RATE)) {
+ /*
+ * ratio delta freq
+ *
+ * 1024 - 4 s - 128 Hz
+ * 512 - 2 s - 256 Hz
+ * 256 - 1 s - 512 Hz
+ * 128 - .5 s - 1024 Hz
+ * 64 - .25 s - 2048 Hz
+ */
+ ratio = delta >> 22;
+ }
+
+ ratio += sd->newidle_success;
+
+ sd->newidle_ratio = min(1024, ratio);
sd->newidle_call /= 2;
sd->newidle_success /= 2;
}
@@ -12932,7 +12959,7 @@ static int sched_balance_newidle(struct rq *this_rq, struct rq_flags *rf)
if (sd->flags & SD_BALANCE_NEWIDLE) {
unsigned int weight = 1;
- if (sched_feat(NI_RANDOM)) {
+ if (sched_feat(NI_RANDOM) && sd->newidle_ratio < 1024) {
/*
* Throw a 1k sided dice; and only run
* newidle_balance according to the success
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 980d92bab8ab..7aba7523c6c1 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -126,3 +126,4 @@ SCHED_FEAT(LATENCY_WARN, false)
* Do newidle balancing proportional to its success rate using randomization.
*/
SCHED_FEAT(NI_RANDOM, true)
+SCHED_FEAT(NI_RATE, true)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index cf643a5ddedd..05741f18f334 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -4,6 +4,7 @@
*/
#include <linux/sched/isolation.h>
+#include <linux/sched/clock.h>
#include <linux/bsearch.h>
#include "sched.h"
@@ -1637,6 +1638,7 @@ sd_init(struct sched_domain_topology_level *tl,
struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu);
int sd_id, sd_weight, sd_flags = 0;
struct cpumask *sd_span;
+ u64 now = sched_clock();
sd_weight = cpumask_weight(tl->mask(tl, cpu));
@@ -1674,6 +1676,7 @@ sd_init(struct sched_domain_topology_level *tl,
.newidle_call = 512,
.newidle_success = 256,
.newidle_ratio = 512,
+ .newidle_stamp = now,
.max_newidle_lb_cost = 0,
.last_decay_max_lb_cost = jiffies,
next prev parent reply other threads:[~2026-01-27 15:18 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-07 16:06 [PATCH 0/4] sched: The newidle balance regression Peter Zijlstra
2025-11-07 16:06 ` [PATCH 1/4] sched/fair: Revert max_newidle_lb_cost bump Peter Zijlstra
2025-11-14 12:19 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-11-17 16:23 ` tip-bot2 for Peter Zijlstra
2025-11-07 16:06 ` [PATCH 2/4] sched/fair: Small cleanup to sched_balance_newidle() Peter Zijlstra
2025-11-10 13:55 ` Dietmar Eggemann
2025-11-10 14:04 ` Peter Zijlstra
2025-11-12 14:37 ` Shrikanth Hegde
2025-11-12 14:42 ` Peter Zijlstra
2025-11-12 15:08 ` Peter Zijlstra
2025-11-12 15:28 ` Shrikanth Hegde
2025-11-14 9:49 ` Peter Zijlstra
2025-11-14 10:22 ` Vincent Guittot
2025-11-14 11:05 ` Peter Zijlstra
2025-11-14 13:11 ` Vincent Guittot
2025-11-14 12:19 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-11-17 16:23 ` tip-bot2 for Peter Zijlstra
2025-11-07 16:06 ` [PATCH 3/4] sched/fair: Small cleanup to update_newidle_cost() Peter Zijlstra
2025-11-14 12:19 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-11-17 16:23 ` tip-bot2 for Peter Zijlstra
2025-11-07 16:06 ` [PATCH 4/4] sched/fair: Proportional newidle balance Peter Zijlstra
2025-11-10 13:55 ` Dietmar Eggemann
2025-11-11 9:07 ` Adam Li
2025-11-11 9:20 ` Peter Zijlstra
2025-11-12 12:04 ` Adam Li
2025-11-12 13:41 ` Peter Zijlstra
2025-11-12 15:42 ` Shrikanth Hegde
2025-11-14 9:35 ` Peter Zijlstra
2025-11-14 12:18 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-11-17 16:23 ` tip-bot2 for Peter Zijlstra
2026-01-18 20:46 ` [PATCH 4/4] " Mario Roy
2026-01-23 10:50 ` Peter Zijlstra
2026-01-23 11:03 ` Peter Zijlstra
2026-01-23 12:24 ` K Prateek Nayak
2026-01-28 4:08 ` K Prateek Nayak
2026-01-27 4:15 ` Mario Roy
2026-01-27 10:40 ` Peter Zijlstra
2026-01-27 15:17 ` Peter Zijlstra [this message]
2026-01-30 1:44 ` Mario Roy
2026-01-30 4:14 ` Mario Roy
2026-02-24 9:13 ` [tip: sched/core] sched/fair: More complex proportional " tip-bot2 for Peter Zijlstra
2026-01-25 12:22 ` [PATCH 4/4] sched/fair: Proportional " Mohamed Abuelfotoh, Hazem
2026-01-27 8:44 ` Peter Zijlstra
2026-01-28 15:48 ` Mohamed Abuelfotoh, Hazem
2026-01-29 9:19 ` Peter Zijlstra
2026-01-29 9:24 ` Peter Zijlstra
2026-01-30 16:12 ` Mohamed Abuelfotoh, Hazem
2026-01-30 13:16 ` Mohamed Abuelfotoh, Hazem
2026-02-02 10:51 ` Peter Zijlstra
2026-02-02 11:07 ` Mohamed Abuelfotoh, Hazem
2026-02-04 12:45 ` Mohamed Abuelfotoh, Hazem
2026-02-04 13:27 ` Peter Zijlstra
2026-02-04 13:59 ` Mohamed Abuelfotoh, Hazem
2026-02-04 14:05 ` Peter Zijlstra
2026-02-04 22:48 ` Mohamed Abuelfotoh, Hazem
2026-01-27 8:50 ` Peter Zijlstra
2026-01-27 9:13 ` Peter Zijlstra
2026-01-28 16:24 ` Mohamed Abuelfotoh, Hazem
2026-01-28 16:03 ` Mohamed Abuelfotoh, Hazem
2026-04-29 8:51 ` Qing Wang
2026-05-06 2:44 ` [PATCH] sched/fair: Replace random newidle_balance with Bresenham accumulator Qing Wang
2025-11-10 19:47 ` [PATCH 0/4] sched: The newidle balance regression Chris Mason
2025-11-11 19:08 ` Josh Don
2025-11-12 21:59 ` Chris Mason
2025-11-14 9:37 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260127151748.GA1079264@noisy.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=abuehaze@amazon.com \
--cc=adamli@os.amperecomputing.com \
--cc=bsegall@google.com \
--cc=clm@meta.com \
--cc=dietmar.eggemann@arm.com \
--cc=joseph.salisbury@oracle.com \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=marioeroy@gmail.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=rostedt@goodmis.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox