public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>,
	John Stultz <jstultz@google.com>, Stephen Boyd <sboyd@kernel.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	x86@kernel.org, Peter Zijlstra <peterz@infradead.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Eric Dumazet <edumazet@google.com>
Subject: [patch 28/48] hrtimer: Optimize for local timers
Date: Tue, 24 Feb 2026 17:37:28 +0100	[thread overview]
Message-ID: <20260224163430.607935269@kernel.org> (raw)
In-Reply-To: 20260224163022.795809588@kernel.org

The decision whether to keep timers on the local CPU or on the CPU they are
associated to is suboptimal and causes the expensive switch_hrtimer_base()
mechanism to be invoked more than necessary. This is especially true for
pinned timers.

Rewrite the decision logic so that the current base is kept if:

   1) The callback is running on the base

   2) The timer is associated to the local CPU and the first expiring timer as
      that allows to optimize for reprogramming avoidance

   3) The timer is associated to the local CPU and pinned

   4) The timer is associated to the local CPU and timer migration is
      disabled.

Only #2 was covered by the original code, but especially #3 makes a
difference for high frequency rearming timers like the scheduler hrtick
timer. If timer migration is disabled, then #4 avoids most of the base
switches.

Signed-off-by: Thomas Gleixner <tglx@kernel.org
---
 kernel/time/hrtimer.c |  101 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 65 insertions(+), 36 deletions(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1147,7 +1147,7 @@ static void __remove_hrtimer(struct hrti
 }
 
 static inline bool remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base,
-				 bool restart, bool keep_local)
+				 bool restart, bool keep_base)
 {
 	bool queued_state = timer->is_queued;
 
@@ -1177,7 +1177,7 @@ static inline bool remove_hrtimer(struct
 		if (!restart)
 			queued_state = HRTIMER_STATE_INACTIVE;
 		else
-			reprogram &= !keep_local;
+			reprogram &= !keep_base;
 
 		__remove_hrtimer(timer, base, queued_state, reprogram);
 		return true;
@@ -1220,29 +1220,57 @@ static void hrtimer_update_softirq_timer
 	hrtimer_reprogram(cpu_base->softirq_next_timer, reprogram);
 }
 
+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+static __always_inline bool hrtimer_prefer_local(bool is_local, bool is_first, bool is_pinned)
+{
+	if (static_branch_likely(&timers_migration_enabled)) {
+		/*
+		 * If it is local and the first expiring timer keep it on the local
+		 * CPU to optimize reprogramming of the clockevent device. Also
+		 * avoid switch_hrtimer_base() overhead when local and pinned.
+		 */
+		if (!is_local)
+			return false;
+		return is_first || is_pinned;
+	}
+	return is_local;
+}
+#else
+static __always_inline bool hrtimer_prefer_local(bool is_local, bool is_first, bool is_pinned)
+{
+	return is_local;
+}
+#endif
+
+static inline bool hrtimer_keep_base(struct hrtimer *timer, bool is_local, bool is_first,
+				     bool is_pinned)
+{
+	/* If the timer is running the callback it has to stay on its CPU base. */
+	if (unlikely(timer->base->running == timer))
+		return true;
+
+	return hrtimer_prefer_local(is_local, is_first, is_pinned);
+}
+
 static bool __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, u64 delta_ns,
 				     const enum hrtimer_mode mode, struct hrtimer_clock_base *base)
 {
 	struct hrtimer_cpu_base *this_cpu_base = this_cpu_ptr(&hrtimer_bases);
-	struct hrtimer_clock_base *new_base;
-	bool force_local, first, was_armed;
+	bool is_pinned, first, was_first, was_armed, keep_base = false;
+	struct hrtimer_cpu_base *cpu_base = base->cpu_base;
 
-	/*
-	 * If the timer is on the local cpu base and is the first expiring
-	 * timer then this might end up reprogramming the hardware twice
-	 * (on removal and on enqueue). To avoid that prevent the reprogram
-	 * on removal, keep the timer local to the current CPU and enforce
-	 * reprogramming after it is queued no matter whether it is the new
-	 * first expiring timer again or not.
-	 */
-	force_local = base->cpu_base == this_cpu_base;
-	force_local &= base->cpu_base->next_timer == timer;
+	was_first = cpu_base->next_timer == timer;
+	is_pinned = !!(mode & HRTIMER_MODE_PINNED);
 
 	/*
-	 * Don't force local queuing if this enqueue happens on a unplugged
-	 * CPU after hrtimer_cpu_dying() has been invoked.
+	 * Don't keep it local if this enqueue happens on a unplugged CPU
+	 * after hrtimer_cpu_dying() has been invoked.
 	 */
-	force_local &= this_cpu_base->online;
+	if (likely(this_cpu_base->online)) {
+		bool is_local = cpu_base == this_cpu_base;
+
+		keep_base = hrtimer_keep_base(timer, is_local, was_first, is_pinned);
+	}
 
 	/*
 	 * Remove an active timer from the queue. In case it is not queued
@@ -1254,8 +1282,11 @@ static bool __hrtimer_start_range_ns(str
 	 * reprogramming later if it was the first expiring timer.  This
 	 * avoids programming the underlying clock event twice (once at
 	 * removal and once after enqueue).
+	 *
+	 * @keep_base is also true if the timer callback is running on a
+	 * remote CPU and for local pinned timers.
 	 */
-	was_armed = remove_hrtimer(timer, base, true, force_local);
+	was_armed = remove_hrtimer(timer, base, true, keep_base);
 
 	if (mode & HRTIMER_MODE_REL)
 		tim = ktime_add_safe(tim, __hrtimer_cb_get_time(base->clockid));
@@ -1265,21 +1296,21 @@ static bool __hrtimer_start_range_ns(str
 	hrtimer_set_expires_range_ns(timer, tim, delta_ns);
 
 	/* Switch the timer base, if necessary: */
-	if (!force_local)
-		new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
-	else
-		new_base = base;
+	if (!keep_base) {
+		base = switch_hrtimer_base(timer, base, is_pinned);
+		cpu_base = base->cpu_base;
+	}
 
-	first = enqueue_hrtimer(timer, new_base, mode, was_armed);
+	first = enqueue_hrtimer(timer, base, mode, was_armed);
 
 	/*
 	 * If the hrtimer interrupt is running, then it will reevaluate the
 	 * clock bases and reprogram the clock event device.
 	 */
-	if (new_base->cpu_base->in_hrtirq)
+	if (cpu_base->in_hrtirq)
 		return false;
 
-	if (!force_local) {
+	if (!was_first || cpu_base != this_cpu_base) {
 		/*
 		 * If the current CPU base is online, then the timer is never
 		 * queued on a remote CPU if it would be the first expiring
@@ -1288,7 +1319,7 @@ static bool __hrtimer_start_range_ns(str
 		 * re-evaluate the first expiring timer after completing the
 		 * callbacks.
 		 */
-		if (hrtimer_base_is_online(this_cpu_base))
+		if (likely(hrtimer_base_is_online(this_cpu_base)))
 			return first;
 
 		/*
@@ -1296,11 +1327,8 @@ static bool __hrtimer_start_range_ns(str
 		 * already offline. If the timer is the first to expire,
 		 * kick the remote CPU to reprogram the clock event.
 		 */
-		if (first) {
-			struct hrtimer_cpu_base *new_cpu_base = new_base->cpu_base;
-
-			smp_call_function_single_async(new_cpu_base->cpu, &new_cpu_base->csd);
-		}
+		if (first)
+			smp_call_function_single_async(cpu_base->cpu, &cpu_base->csd);
 		return false;
 	}
 
@@ -1314,16 +1342,17 @@ static bool __hrtimer_start_range_ns(str
 	 * required.
 	 */
 	if (timer->is_lazy) {
-		if (new_base->cpu_base->expires_next <= hrtimer_get_expires(timer))
+		if (cpu_base->expires_next <= hrtimer_get_expires(timer))
 			return false;
 	}
 
 	/*
-	 * Timer was forced to stay on the current CPU to avoid
-	 * reprogramming on removal and enqueue. Force reprogram the
-	 * hardware by evaluating the new first expiring timer.
+	 * Timer was the first expiring timer and forced to stay on the
+	 * current CPU to avoid reprogramming on removal and enqueue. Force
+	 * reprogram the hardware by evaluating the new first expiring
+	 * timer.
 	 */
-	hrtimer_force_reprogram(new_base->cpu_base, /* skip_equal */ true);
+	hrtimer_force_reprogram(cpu_base, /* skip_equal */ true);
 	return false;
 }
 


  parent reply	other threads:[~2026-02-24 16:37 UTC|newest]

Thread overview: 128+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24 16:35 [patch 00/48] hrtimer,sched: General optimizations and hrtick enablement Thomas Gleixner
2026-02-24 16:35 ` [patch 01/48] sched/eevdf: Fix HRTICK duration Thomas Gleixner
2026-02-28 15:37   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-03-20 14:59     ` Shrikanth Hegde
2026-03-20 15:38       ` Peter Zijlstra
2026-03-20 15:40         ` Shrikanth Hegde
2026-02-24 16:35 ` [patch 02/48] sched/fair: Simplify hrtick_update() Thomas Gleixner
2026-02-28 15:37   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra (Intel)
2026-02-24 16:35 ` [patch 03/48] sched/fair: Make hrtick resched hard Thomas Gleixner
2026-02-28 15:37   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra (Intel)
2026-02-24 16:35 ` [patch 04/48] sched: Avoid ktime_get() indirection Thomas Gleixner
2026-02-28 15:37   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:35 ` [patch 05/48] hrtimer: Avoid pointless reprogramming in __hrtimer_start_range_ns() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:35 ` [patch 06/48] hrtimer: Provide a static branch based hrtimer_hres_enabled() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:35 ` [patch 07/48] sched: Use hrtimer_highres_enabled() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:35 ` [patch 08/48] sched: Optimize hrtimer handling Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:35 ` [patch 09/48] sched/hrtick: Avoid tiny hrtick rearms Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 10/48] hrtimer: Provide LAZY_REARM mode Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:36 ` [patch 11/48] sched/hrtick: Mark hrtick timer LAZY_REARM Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:36 ` [patch 12/48] tick/sched: Avoid hrtimer_cancel/start() sequence Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 13/48] clockevents: Remove redundant CLOCK_EVT_FEAT_KTIME Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 14/48] timekeeping: Allow inlining clocksource::read() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 15/48] x86: Inline TSC reads in timekeeping Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 16/48] x86/apic: Remove pointless fence in lapic_next_deadline() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 17/48] x86/apic: Avoid the PVOPS indirection for the TSC deadline timer Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 18/48] timekeeping: Provide infrastructure for coupled clockevents Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 19/48] clockevents: Provide support for clocksource coupled comparators Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-03-03 18:44   ` [patch 19/48] " Michael Kelley
2026-03-03 19:14     ` Peter Zijlstra
2026-03-23  4:24     ` Michael Kelley
2026-03-23 21:36       ` Thomas Gleixner
2026-03-24  0:22         ` mhklkml
2026-03-24  3:37           ` Michael Kelley
2026-03-24 17:24             ` Thomas Gleixner
2026-03-24 17:34               ` Peter Zijlstra
2026-02-24 16:36 ` [patch 20/48] x86/apic: Enable TSC coupled programming mode Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-03-03  1:29   ` [patch 20/48] " Nathan Chancellor
2026-03-03 14:37     ` Thomas Gleixner
2026-03-03 14:45       ` Thomas Gleixner
2026-03-03 17:38       ` Nathan Chancellor
2026-03-03 20:21         ` Thomas Gleixner
2026-03-03 21:30           ` Nathan Chancellor
2026-03-04 18:40             ` Thomas Gleixner
2026-03-04 18:49               ` [patch 20/48] clocksource: Update clocksource::freq_khz on registration Thomas Gleixner
2026-03-04 19:10                 ` Borislav Petkov
2026-03-04 22:57                 ` Nathan Chancellor
2026-03-05 16:47                 ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-03-03 21:56         ` [PATCH] Subject: timekeeping: Initialize the coupled clocksource conversion completely Thomas Gleixner
2026-03-03 23:16           ` John Stultz
2026-03-05 16:47           ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 21/48] hrtimer: Add debug object init assertion Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:36 ` [patch 22/48] hrtimer: Reduce trace noise in hrtimer_start() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 23/48] hrtimer: Use guards where appropriate Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 24/48] hrtimer: Cleanup coding style and comments Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 25/48] hrtimer: Evaluate timer expiry only once Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 26/48] hrtimer: Replace the bitfield in hrtimer_cpu_base Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 27/48] hrtimer: Convert state and properties to boolean Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` Thomas Gleixner [this message]
2026-02-28 15:36   ` [tip: sched/hrtick] hrtimer: Optimize for local timers tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 29/48] hrtimer: Use NOHZ information for locality Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 30/48] hrtimer: Separate remove/enqueue handling for local timers Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 31/48] hrtimer: Add hrtimer_rearm tracepoint Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 32/48] hrtimer: Re-arrange hrtimer_interrupt() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:37 ` [patch 33/48] hrtimer: Rename hrtimer_cpu_base::in_hrtirq to deferred_rearm Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:37 ` [patch 34/48] hrtimer: Prepare stubs for deferred rearming Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:38 ` [patch 35/48] entry: Prepare for deferred hrtimer rearming Thomas Gleixner
2026-02-27 15:57   ` Christian Loehle
2026-02-27 16:25     ` Peter Zijlstra
2026-02-27 16:32       ` Christian Loehle
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:38 ` [patch 36/48] softirq: " Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:38 ` [patch 37/48] sched/core: " Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:38 ` [patch 38/48] hrtimer: Push reprogramming timers into the interrupt return path Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-24 16:38 ` [patch 39/48] hrtimer: Avoid re-evaluation when nothing changed Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:38 ` [patch 40/48] hrtimer: Keep track of first expiring timer per clock base Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:38 ` [patch 41/48] hrtimer: Rework next event evaluation Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:38 ` [patch 42/48] hrtimer: Simplify run_hrtimer_queues() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:38 ` [patch 43/48] hrtimer: Optimize for_each_active_base() Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:38 ` [patch 44/48] rbtree: Provide rbtree with links Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:38 ` [patch 45/48] timerqueue: Provide linked timerqueue Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:38 ` [patch 46/48] hrtimer: Use " Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:39 ` [patch 47/48] hrtimer: Try to modify timers in place Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Thomas Gleixner
2026-02-24 16:39 ` [patch 48/48] sched: Default enable HRTICK when deferred rearming is enabled Thomas Gleixner
2026-02-28 15:36   ` [tip: sched/hrtick] " tip-bot2 for Peter Zijlstra
2026-02-25 15:25 ` [patch 00/48] hrtimer,sched: General optimizations and hrtick enablement Peter Zijlstra
2026-02-25 16:02   ` Thomas Gleixner
2026-03-04 15:59 ` Christian Loehle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260224163430.607935269@kernel.org \
    --to=tglx@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=bsegall@google.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=edumazet@google.com \
    --cc=frederic@kernel.org \
    --cc=jstultz@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sboyd@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox