public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state
@ 2026-02-04  0:49 Shubhang Kaushik
  2026-02-12 14:33 ` Frederic Weisbecker
  0 siblings, 1 reply; 8+ messages in thread
From: Shubhang Kaushik @ 2026-02-04  0:49 UTC (permalink / raw)
  To: Anna-Maria Behnsen, Frederic Weisbecker, Ingo Molnar,
	Thomas Gleixner, Vincent Guittot, Valentin Schneider
  Cc: dietmar.eggemann, bsegall, mgorman, rostedt, Shubhang Kaushik,
	Christoph Lameter, linux-kernel, Shubhang Kaushik, Adam Li

Under CONFIG_NO_HZ_FULL, the scheduler tick can get stopped earlier via
tick_nohz_full_stop_tick() before the CPU subsequently enters the idle
path. In this case, tick_nohz_idle_stop_tick() observes TS_FLAG_STOPPED
already set and skips nohz_balance_enter_idle() because the !was_stopped
condition assumes tick-stop and idle-entry are coupled.
This leaves a tickless idle CPU absent from nohz.idle_cpus_mask, making
it invisible to NOHZ idle load balancing while periodic balancing is
also suppressed.

The patch fixes this by decoupling tick-stop transition accounting from
scheduler bookkeeping. idle_jiffies remains updated only on the
tick-stop transition, while nohz_balance_enter_idle() is invoked
whenever a CPU enters idle with the tick already stopped, relying on its
existing idempotent gaurd to avoid duplicate registration.

Tested on Ampere Altra on 6.19.0-rc8 with CONFIG_NO_HZ_FULL enabled:
- This change improves load distribution by ensuring that tickless idle
  CPUs are visible to NOHZ idle load balancing. In llama-batched-bench,
  throughput improves by up to ~14% across multiple thread counts.
- Hackbench single-process results improve by 5% and multi-process
  results improve by up to ~26%, consistent with reduced scheduler
  jitter and earlier utilization of fully idle cores.
  No regressions observed.

Signed-off-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
Signed-off-by: Adam Li <adamli@os.amperecomputing.com>
Reviewed-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Reviewed-by: Shubhang Kaushik <shubhang@os.amperecomputing.com>
---
This is a resend of the original patch to ensure visibility.
Previous resend: https://lkml.org/lkml/2025/8/21/170
Original thread: https://lkml.org/lkml/2025/8/21/171

The patch addresses a performance regression in NOHZ idle load balancing 
observed under CONFIG_NO_HZ_FULL, where idle CPUs were becoming 
invisible to the balancer.
---
 kernel/time/tick-sched.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 2f8a7923fa279409ffe950f770ff2eac868f6ece..eee6fcebe78c2f8d93464a55fe332e12fe9c164e 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1250,8 +1250,9 @@ void tick_nohz_idle_stop_tick(void)
 		ts->idle_sleeps++;
 		ts->idle_expires = expires;
 
-		if (!was_stopped && tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
-			ts->idle_jiffies = ts->last_jiffies;
+		if (tick_sched_flag_test(ts, TS_FLAG_STOPPED)) {
+			if (!was_stopped)
+				ts->idle_jiffies = ts->last_jiffies;
 			nohz_balance_enter_idle(cpu);
 		}
 	} else {

---
base-commit: 18f7fcd5e69a04df57b563360b88be72471d6b62
change-id: 20260203-fix-nohz-idle-b2838276cb91

Best regards,
-- 
Shubhang Kaushik <shubhang@os.amperecomputing.com>


^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-03-11 11:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-04  0:49 [RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state Shubhang Kaushik
2026-02-12 14:33 ` Frederic Weisbecker
2026-02-12 19:36   ` Shubhang Kaushik
2026-02-12 20:04     ` Shubhang Kaushik
2026-02-13 13:11       ` Frederic Weisbecker
2026-02-13 12:56     ` Frederic Weisbecker
2026-02-13 18:15       ` Christoph Lameter (Ampere)
2026-03-11 11:06         ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox