public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Christian Loehle <christian.loehle@arm.com>
Subject: [patch 2/2] sched/idle: Make default_idle_call() NOHZ aware
Date: Sun, 01 Mar 2026 20:30:51 +0100	[thread overview]
Message-ID: <20260301192915.171574741@kernel.org> (raw)
In-Reply-To: 20260301191959.406218221@kernel.org

Guests fall back to default_idle_call() as there is no cpuidle driver
available to them by default. That causes a problem in fully loaded
scenarios where CPUs go briefly idle for a couple of microseconds:

tick_nohz_idle_stop_tick() is invoked unconditionally which means unless
there is timer pending in the next tick, the tick is stopped and a couple
of microseconds later when the idle condition goes away restarted. That
requires to program the clockevent device twice which implies a VM exit for
each reprogramming.

It was suggested to remove the tick_nohz_idle_stop_tick() invocation from
the default idle code, but would be counterproductive. It would not allow
the host to go into deeper idle states when the guest CPU is fully idle as
it has to maintain the periodic tick.

Cure this by implementing a trivial moving average filter which keeps track
of the recent idle recidency time and only stop the tick when the average
is larger than a tick.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
---
 kernel/sched/idle.c |   65 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 57 insertions(+), 8 deletions(-)

--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -105,12 +105,7 @@ static inline void cond_tick_broadcast_e
 static inline void cond_tick_broadcast_exit(void) { }
 #endif /* !CONFIG_GENERIC_CLOCKEVENTS_BROADCAST_IDLE */
 
-/**
- * default_idle_call - Default CPU idle routine.
- *
- * To use when the cpuidle framework cannot be used.
- */
-static void __cpuidle default_idle_call(void)
+static void __cpuidle __default_idle_call(void)
 {
 	instrumentation_begin();
 	if (!current_clr_polling_and_test()) {
@@ -130,6 +125,61 @@ static void __cpuidle default_idle_call(
 	instrumentation_end();
 }
 
+#ifdef CONFIG_NO_HZ_COMMON
+
+/* Limit to 4 entries so it fits in a cache line */
+#define IDLE_DUR_ENTRIES	4
+#define IDLE_DUR_MASK		(IDLE_DUR_ENTRIES - 1)
+
+struct idle_nohz_data {
+	u64		duration[IDLE_DUR_ENTRIES];
+	u64		entry_time;
+	u64		sum;
+	unsigned int	idx;
+};
+
+static DEFINE_PER_CPU_ALIGNED(struct idle_nohz_data, nohz_data);
+
+/**
+ * default_idle_call - Default CPU idle routine.
+ *
+ * To use when the cpuidle framework cannot be used.
+ */
+static void default_idle_call(void)
+{
+	struct idle_nohz_data *nd = this_cpu_ptr(&nohz_data);
+	unsigned int idx = nd->idx;
+	s64 delta;
+
+	/*
+	 * If the CPU spends more than a tick on average in idle, try to stop
+	 * the tick.
+	 */
+	if (nd->sum > TICK_NSEC * IDLE_DUR_ENTRIES)
+		tick_nohz_idle_stop_tick();
+
+	__default_idle_call();
+
+	/*
+	 * Build a moving average of the time spent in idle to prevent stopping
+	 * the tick on a loaded system which only goes idle briefly.
+	 */
+	delta = max(sched_clock() - nd->entry_time, 0);
+	nd->sum += delta - nd->duration[idx];
+	nd->duration[idx] = delta;
+	nd->idx = (idx + 1) & IDLE_DUR_MASK;
+}
+
+static void default_idle_enter(void)
+{
+	this_cpu_write(nohz_data.entry_time, sched_clock());
+}
+
+#else  /* CONFIG_NO_HZ_COMMON */
+static inline void default_idle_call(void { __default_idle_call(); }
+static inline void default_idle_enter(void) { }
+#endif /* !CONFIG_NO_HZ_COMMON */
+
 static int call_cpuidle_s2idle(struct cpuidle_driver *drv,
 			       struct cpuidle_device *dev,
 			       u64 max_latency_ns)
@@ -186,8 +236,6 @@ static void cpuidle_idle_call(void)
 	}
 
 	if (cpuidle_not_available(drv, dev)) {
-		tick_nohz_idle_stop_tick();
-
 		default_idle_call();
 		goto exit_idle;
 	}
@@ -276,6 +324,7 @@ static void do_idle(void)
 
 	__current_set_polling();
 	tick_nohz_idle_enter();
+	default_idle_enter();
 
 	while (!need_resched()) {
 


  parent reply	other threads:[~2026-03-01 19:30 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-01 19:30 [patch 0/2] sched/idle: Prevent pointless NOHZ transitions in default_idle_call() Thomas Gleixner
2026-03-01 19:30 ` [patch 1/2] sched/idle: Make default_idle_call() static Thomas Gleixner
2026-03-01 19:30 ` Thomas Gleixner [this message]
2026-03-02  6:05   ` [patch 2/2] sched/idle: Make default_idle_call() NOHZ aware K Prateek Nayak
2026-03-02 10:43   ` Frederic Weisbecker
2026-03-02 11:03     ` Christian Loehle
2026-03-02 11:11       ` Frederic Weisbecker
2026-03-02 11:39         ` Christian Loehle
2026-03-04  3:35           ` Qais Yousef
2026-03-02 11:03   ` Christian Loehle
2026-03-02 21:25     ` Rafael J. Wysocki
2026-03-04  3:03       ` Qais Yousef
2026-03-06 21:21         ` Rafael J. Wysocki
2026-03-06 21:31           ` Rafael J. Wysocki
2026-03-07 16:25             ` Rafael J. Wysocki
2026-03-10  3:54               ` Qais Yousef
2026-03-10  9:18                 ` Christian Loehle
2026-03-10 15:03                   ` Qais Yousef
2026-03-10 15:09                     ` Rafael J. Wysocki
2026-03-10 15:14                       ` Qais Yousef
2026-03-07 16:12         ` [PATCH v1] sched: idle: Make skipping governor callbacks more consistent Rafael J. Wysocki
2026-03-09  9:13           ` Christian Loehle
2026-03-09 12:26             ` Rafael J. Wysocki
2026-03-10  3:57               ` Qais Yousef
2026-03-09 12:44           ` Aboorva Devarajan
2026-03-10 14:28           ` Frederic Weisbecker
2026-03-02 12:17   ` [patch 2/2] sched/idle: Make default_idle_call() NOHZ aware Peter Zijlstra
2026-03-02 12:19     ` Peter Zijlstra
2026-03-02 21:23       ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260301192915.171574741@kernel.org \
    --to=tglx@kernel.org \
    --cc=christian.loehle@arm.com \
    --cc=frederic@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox