[patch 00/12] hrtimers: Prevent hrtimer interrupt starvation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
@ 2026-04-07  8:54 Thomas Gleixner
  2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
                   ` (13 more replies)
  0 siblings, 14 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
up in user space:

  https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/

He provided a reproducer, which sets up a timerfd based timer and then
rearms it in a loop with an absolute expiry time of 1ns.

As the expiry time is in the past, the timer ends up as the first expiring
timer in the per CPU hrtimer base and the clockevent device is programmed
with the minimum delta value. If the machine is fast enough, this ends up
in a endless loop of programming the delta value to the minimum value
defined by the clock event device, before the timer interrupt can fire,
which starves the interrupt and consequently triggers the lockup detector
because the hrtimer callback of the lockup mechanism is never invoked.

The first patch in the series changes the clockevent set next event
mechanism to prevent reprogramming of the clockevent device when the
minimum delta value was programmed unless the new delta is larger than
that. It's a less convoluted variant of the patch which was posted in the
above linked thread and was confirmed to prevent the starvation problem.

But that's only to be considered the last resort because it results in an
insane amount of avoidable hrtimer interrupts.

The problem of user controlled timers is that the input value is only
sanity checked vs. validity of the provided timespec and clamped to be in
the maximum allowable range. But for performance reasons for in kernel
usage there is no check whether a to be armed timer might have been expired
already at enqueue time.

The rest of the series addresses this by providing a separate interface to
arm user controlled timers. This works the same way as the existing
hrtimer_start_range_ns(), but in case that the timer ends up as the first
timer in the clock base after enqueue it provides additional checks:

      - Whether the timer becomes the first expiring timer in the CPU base.

      	If not the timer is considered to expire in the future as there is
	already an earlier event programmed.

      - Whether the timer has expired already by comparing the expiry value
        against current time.

	If it is expired, the timer is removed from the clock base and the
	function returns false, so that the caller can handle it. That's
	required because the function cannot invoke the callback as that
	might need to acquire a lock which is held by the caller.

This function is then used for the user controlled timer arming interfaces
mainly by converting hrtimer sleeper over to it. That affects a few in
kernel users too, but the overhead is minimal in that case and it spares a
tedious whack the mole game all over the tree.

The other usage sites in posixtimers, alarmtimers and timerfd are converted
as well, which should cover the vast majority of user space controllable
timers as far as my investigation goes.

The series applies against Linux tree and is also available from git:

    git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hrtimer-exp-v1

There needs to be some discussion about the scope of backporting. The first
patch preventing the stall is obviously a backport candidate. The remaining
series can be obviously argued about, but in my opinion it should be
backported as well as it prevents stupid or malicious user space from
generating tons of pointless timer interrupts.

Thanks,

	tglx
---
 drivers/power/supply/charger-manager.c |   12 +-
 fs/timerfd.c                           |  115 +++++++++++++++-----------
 include/linux/alarmtimer.h             |    9 +-
 include/linux/clockchips.h             |    2 
 include/linux/hrtimer.h                |   20 +++-
 include/trace/events/timer.h           |   13 +++
 kernel/time/alarmtimer.c               |   70 +++++++---------
 kernel/time/clockevents.c              |   23 +++--
 kernel/time/hrtimer.c                  |  142 +++++++++++++++++++++++++++++----
 kernel/time/posix-cpu-timers.c         |   18 ++--
 kernel/time/posix-timers.c             |   35 +++++---
 kernel/time/posix-timers.h             |    4 
 kernel/time/tick-common.c              |    1 
 kernel/time/tick-sched.c               |    1 
 net/netfilter/xt_IDLETIMER.c           |   24 ++++-
 15 files changed, 341 insertions(+), 148 deletions(-)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07  9:42   ` Peter Zijlstra
                     ` (2 more replies)
  2026-04-07  8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
                   ` (12 subsequent siblings)
  13 siblings, 3 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

From: Thomas Gleixner <tglx@kernel.org>

Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
up in user space. He provided a reproducer, which sets up a timerfd based
timer and then rearms it in a loop with an absolute expiry time of 1ns.

As the expiry time is in the past, the timer ends up as the first expiring
timer in the per CPU hrtimer base and the clockevent device is programmed
with the minimum delta value. If the machine is fast enough, this ends up
in a endless loop of programming the delta value to the minimum value
defined by the clock event device, before the timer interrupt can fire,
which starves the interrupt and consequently triggers the lockup detector
because the hrtimer callback of the lockup mechanism is never invoked.

As a first step to prevent this, avoid reprogramming the clock event device
when:
     - a forced minimum delta event is pending
     - the new expiry delta is less then or equal to the minimum delta

Thanks to Calvin for providing the reproducer and to Borislav for testing
and providing data from his Zen5 machine.

The problem is not limited to Zen5, but depending on the underlying
clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
not necessarily observable.

This change serves only as the last resort and further changes will be made
to prevent this scenario earlier in the call chain as far as possible.

Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
Reported-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
---
V2: Simplified the clockevents code - Peter
---
 include/linux/clockchips.h |    2 ++
 kernel/time/clockevents.c  |   23 +++++++++++++++--------
 kernel/time/hrtimer.c      |    1 +
 kernel/time/tick-common.c  |    1 +
 kernel/time/tick-sched.c   |    1 +
 5 files changed, 20 insertions(+), 8 deletions(-)
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -80,6 +80,7 @@ enum clock_event_state {
  * @shift:		nanoseconds to cycles divisor (power of two)
  * @state_use_accessors:current state of the device, assigned by the core code
  * @features:		features
+ * @next_event_forced:	True if the last programming was a forced event
  * @retries:		number of forced programming retries
  * @set_state_periodic:	switch state to periodic
  * @set_state_oneshot:	switch state to oneshot
@@ -108,6 +109,7 @@ struct clock_event_device {
 	u32			shift;
 	enum clock_event_state	state_use_accessors;
 	unsigned int		features;
+	unsigned int		next_event_forced;
 	unsigned long		retries;
 
 	int			(*set_state_periodic)(struct clock_event_device *);
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -172,6 +172,7 @@ void clockevents_shutdown(struct clock_e
 {
 	clockevents_switch_state(dev, CLOCK_EVT_STATE_SHUTDOWN);
 	dev->next_event = KTIME_MAX;
+	dev->next_event_forced = 0;
 }
 
 /**
@@ -305,7 +306,6 @@ int clockevents_program_event(struct clo
 {
 	unsigned long long clc;
 	int64_t delta;
-	int rc;
 
 	if (WARN_ON_ONCE(expires < 0))
 		return -ETIME;
@@ -324,16 +324,23 @@ int clockevents_program_event(struct clo
 		return dev->set_next_ktime(expires, dev);
 
 	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
-	if (delta <= 0)
-		return force ? clockevents_program_min_delta(dev) : -ETIME;
 
-	delta = min(delta, (int64_t) dev->max_delta_ns);
-	delta = max(delta, (int64_t) dev->min_delta_ns);
+	if (delta > (int64_t)dev->min_delta_ns) {
+		delta = min(delta, (int64_t) dev->max_delta_ns);
+		clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
+		if (!dev->set_next_event((unsigned long) clc, dev))
+			return 0;
+	}
 
-	clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
-	rc = dev->set_next_event((unsigned long) clc, dev);
+	if (dev->next_event_forced)
+		return 0;
 
-	return (rc && force) ? clockevents_program_min_delta(dev) : rc;
+	if (dev->set_next_event(dev->min_delta_ticks, dev)) {
+		if (!force || clockevents_program_min_delta(dev))
+			return -ETIME;
+	}
+	dev->next_event_forced = 1;
+	return 0;
 }
 
 /*
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1888,6 +1888,7 @@ void hrtimer_interrupt(struct clock_even
 	BUG_ON(!cpu_base->hres_active);
 	cpu_base->nr_events++;
 	dev->next_event = KTIME_MAX;
+	dev->next_event_forced = 0;
 
 	raw_spin_lock_irqsave(&cpu_base->lock, flags);
 	entry_time = now = hrtimer_update_base(cpu_base);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -110,6 +110,7 @@ void tick_handle_periodic(struct clock_e
 	int cpu = smp_processor_id();
 	ktime_t next = dev->next_event;
 
+	dev->next_event_forced = 0;
 	tick_periodic(cpu);
 
 	/*
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1513,6 +1513,7 @@ static void tick_nohz_lowres_handler(str
 	struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
 
 	dev->next_event = KTIME_MAX;
+	dev->next_event_forced = 0;
 
 	if (likely(tick_nohz_handler(&ts->sched_timer) == HRTIMER_RESTART))
 		tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user()
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
  2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07  9:54   ` Peter Zijlstra
  2026-04-07  9:57   ` Peter Zijlstra
  2026-04-07  8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
                   ` (11 subsequent siblings)
  13 siblings, 2 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Peter Zijlstra, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
up in user space. He provided a reproducer, which set's up a timerfd based
timer and then rearms it in a loop with an absolute expiry time of 1ns.

As the expiry time is in the past, the timer ends up as the first expiring
timer in the per CPU hrtimer base and the clockevent device is programmed
with the minimum delta value. If the machine is fast enough, this ends up
in a endless loop of programming the delta value to the minimum value
defined by the clock event device, before the timer interrupt can fire,
which starves the interrupt and consequently triggers the lockup detector
because the hrtimer callback of the lockup mechanism is never invoked.

The clockevents code already has a last resort mechanism to prevent that,
but it's sensible to catch such issues before trying to reprogram the clock
event device.

Provide a variant of hrtimer_start_range_ns(), which sanity checks the
timer after queueing it. It does not so before because the timer might be
armed and therefore needs to be dequeued. also we optimize for the latest
possible point to check, so that the clock event prevention is avoided as
much as possible.

If the timer is already expired _before_ the clock event is reprogrammed,
remove the timer from the queue and signal to the caller that the operation
failed by returning false.

That allows the caller to take immediate action without going through the
loops and hoops of the hrtimer interrupt.

The queueing code can't invoke the timer callback as the caller might hold
a lock which is taken in the callback.

Add a tracepoint which allows to analyze the expired at start situation.

Reported-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
---
 include/linux/hrtimer.h      |   20 +++++-
 include/trace/events/timer.h |   13 ++++
 kernel/time/hrtimer.c        |  135 ++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 151 insertions(+), 17 deletions(-)

--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -230,6 +230,9 @@ static inline void destroy_hrtimer_on_st
 extern void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
 				   u64 range_ns, const enum hrtimer_mode mode);
 
+extern bool hrtimer_start_range_ns_user(struct hrtimer *timer, ktime_t tim,
+					u64 range_ns, const enum hrtimer_mode mode);
+
 /**
  * hrtimer_start - (re)start an hrtimer
  * @timer:	the timer to be added
@@ -247,17 +250,28 @@ static inline void hrtimer_start(struct
 extern int hrtimer_cancel(struct hrtimer *timer);
 extern int hrtimer_try_to_cancel(struct hrtimer *timer);
 
-static inline void hrtimer_start_expires(struct hrtimer *timer,
-					 enum hrtimer_mode mode)
+static inline void hrtimer_start_expires(struct hrtimer *timer, enum hrtimer_mode mode)
 {
-	u64 delta;
 	ktime_t soft, hard;
+	u64 delta;
+
 	soft = hrtimer_get_softexpires(timer);
 	hard = hrtimer_get_expires(timer);
 	delta = ktime_to_ns(ktime_sub(hard, soft));
 	hrtimer_start_range_ns(timer, soft, delta, mode);
 }
 
+static inline bool hrtimer_start_expires_user(struct hrtimer *timer, enum hrtimer_mode mode)
+{
+	ktime_t soft, hard;
+	u64 delta;
+
+	soft = hrtimer_get_softexpires(timer);
+	hard = hrtimer_get_expires(timer);
+	delta = ktime_to_ns(ktime_sub(hard, soft));
+	return hrtimer_start_range_ns_user(timer, soft, delta, mode);
+}
+
 void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl,
 				   enum hrtimer_mode mode);
 
--- a/include/trace/events/timer.h
+++ b/include/trace/events/timer.h
@@ -297,6 +297,19 @@ DECLARE_EVENT_CLASS(hrtimer_class,
 );
 
 /**
+ * hrtimer_start_expired - Invoked when a expired timer was started
+ * @hrtimer:	pointer to struct hrtimer
+ *
+ * Preceeded by a hrtimer_start tracepoint.
+ */
+DEFINE_EVENT(hrtimer_class, hrtimer_start_expired,
+
+	TP_PROTO(struct hrtimer *hrtimer),
+
+	TP_ARGS(hrtimer)
+);
+
+/**
  * hrtimer_expire_exit - called immediately after the hrtimer callback returns
  * @hrtimer:	pointer to struct hrtimer
  *
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1215,6 +1215,12 @@ hrtimer_update_softirq_timer(struct hrti
 	hrtimer_reprogram(cpu_base->softirq_next_timer, reprogram);
 }
 
+enum {
+	HRTIMER_REPROGRAM_NONE,
+	HRTIMER_REPROGRAM,
+	HRTIMER_REPROGRAM_FORCE,
+};
+
 static int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
 				    u64 delta_ns, const enum hrtimer_mode mode,
 				    struct hrtimer_clock_base *base)
@@ -1276,7 +1282,7 @@ static int __hrtimer_start_range_ns(stru
 		 * expiring timer there.
 		 */
 		if (hrtimer_base_is_online(this_cpu_base))
-			return first;
+			return first ? HRTIMER_REPROGRAM : HRTIMER_REPROGRAM_NONE;
 
 		/*
 		 * Timer was enqueued remote because the current base is
@@ -1296,8 +1302,24 @@ static int __hrtimer_start_range_ns(stru
 	 * reprogramming on removal and enqueue. Force reprogram the
 	 * hardware by evaluating the new first expiring timer.
 	 */
-	hrtimer_force_reprogram(new_base->cpu_base, 1);
-	return 0;
+	return HRTIMER_REPROGRAM_FORCE;
+}
+
+static int hrtimer_start_range_ns_common(struct hrtimer *timer, ktime_t tim,
+					 u64 delta_ns, const enum hrtimer_mode mode,
+					 struct hrtimer_clock_base *base)
+{
+	/*
+	 * Check whether the HRTIMER_MODE_SOFT bit and hrtimer.is_soft
+	 * match on CONFIG_PREEMPT_RT = n. With PREEMPT_RT check the hard
+	 * expiry mode because unmarked timers are moved to softirq expiry.
+	 */
+	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+		WARN_ON_ONCE(!(mode & HRTIMER_MODE_SOFT) ^ !timer->is_soft);
+	else
+		WARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard);
+
+	return __hrtimer_start_range_ns(timer, tim, delta_ns, mode, base);
 }
 
 /**
@@ -1315,25 +1337,110 @@ void hrtimer_start_range_ns(struct hrtim
 	struct hrtimer_clock_base *base;
 	unsigned long flags;
 
-	/*
-	 * Check whether the HRTIMER_MODE_SOFT bit and hrtimer.is_soft
-	 * match on CONFIG_PREEMPT_RT = n. With PREEMPT_RT check the hard
-	 * expiry mode because unmarked timers are moved to softirq expiry.
-	 */
-	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
-		WARN_ON_ONCE(!(mode & HRTIMER_MODE_SOFT) ^ !timer->is_soft);
-	else
-		WARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard);
-
 	base = lock_hrtimer_base(timer, &flags);
 
-	if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))
+	switch (hrtimer_start_range_ns_common(timer, tim, delta_ns, mode, base)) {
+	case HRTIMER_REPROGRAM:
 		hrtimer_reprogram(timer, true);
+		break;
+	case HRTIMER_REPROGRAM_FORCE:
+		hrtimer_force_reprogram(timer->base->cpu_base, 1);
+		break;
+	}
 
 	unlock_hrtimer_base(timer, &flags);
 }
 EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);
 
+static inline bool hrtimer_check_user_timer(struct hrtimer *timer)
+{
+	struct hrtimer_cpu_base *cpu_base = timer->base->cpu_base;
+	ktime_t expires;
+
+	/*
+	 * This uses soft expires because that's the user provided
+	 * expiry time, while expires can be further in the past
+	 * due to a slack value added to the user expiry time.
+	 */
+	expires = hrtimer_get_softexpires(timer);
+
+	/* Convert to monotonic */
+	expires = ktime_sub(expires, timer->base->offset);
+
+	/*
+	 * Check whether this timer will end up as the first expiring timer in
+	 * the CPU base. If not, no further checks required as it's then
+	 * guaranteed to expire in the future.
+	 */
+	if (expires >= cpu_base->expires_next)
+		return true;
+
+	/* Validate that the expiry time is in the future. */
+	if (expires > ktime_get())
+		return true;
+
+	debug_deactivate(timer);
+	__remove_hrtimer(timer, timer->base, HRTIMER_STATE_INACTIVE, false);
+	trace_hrtimer_start_expired(timer);
+	return false;
+}
+
+static bool hrtimer_reprogram_user(struct hrtimer *timer)
+{
+	if (!hrtimer_check_user_timer(timer))
+		return false;
+	hrtimer_reprogram(timer, true);
+	return true;
+}
+
+static bool hrtimer_force_reprogram_user(struct hrtimer *timer)
+{
+	bool ret = hrtimer_check_user_timer(timer);
+
+	/*
+	 * The base must always be reevaluated, independent of the result
+	 * above because the timer was the first pending timer.
+	 */
+	hrtimer_force_reprogram(timer->base->cpu_base, 1);
+	return ret;
+}
+
+/**
+ * hrtimer_start_range_ns_user - (re)start an user controlled hrtimer
+ * @timer:	the timer to be added
+ * @tim:	expiry time
+ * @delta_ns:	"slack" range for the timer
+ * @mode:	timer mode: absolute (HRTIMER_MODE_ABS) or
+ *		relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED);
+ *		softirq based mode is considered for debug purpose only!
+ *
+ * Returns: True when the timer was queued, false if it was already expired
+ *
+ * This function cannot invoke the timer callback for expired timers as it might
+ * be called under a lock which the timer callback needs to acquire. So the
+ * caller has to handle that case.
+ */
+bool hrtimer_start_range_ns_user(struct hrtimer *timer, ktime_t tim,
+				 u64 delta_ns, const enum hrtimer_mode mode)
+{
+	struct hrtimer_clock_base *base;
+	unsigned long flags;
+	bool ret = true;
+
+	base = lock_hrtimer_base(timer, &flags);
+	switch (hrtimer_start_range_ns_common(timer, tim, delta_ns, mode, base)) {
+	case HRTIMER_REPROGRAM:
+		ret = hrtimer_reprogram_user(timer);
+		break;
+	case HRTIMER_REPROGRAM_FORCE:
+		ret = hrtimer_force_reprogram_user(timer);
+		break;
+	}
+	unlock_hrtimer_base(timer, &flags);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(hrtimer_start_range_ns_user);
+
 /**
  * hrtimer_try_to_cancel - try to deactivate a timer
  * @timer:	hrtimer to stop


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
  2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
  2026-04-07  8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07  9:59   ` Peter Zijlstra
  2026-04-07  8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, Calvin Owens,
	Peter Zijlstra, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

Most hrtimer sleepers are user controlled and user space can hand arbitrary
expiry values in as long as they are valid timespecs. If the expiry value
is in the past then this requires a full loop through reprogramming the
clock event device, taking the hrtimer interrupt, waking the task and
reprogram again.

Use hrtimer_start_expires_user() which avoids the full round trip by
checking the timer for expiry on enqueue.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/hrtimer.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -2152,7 +2152,11 @@ void hrtimer_sleeper_start_expires(struc
 	if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard)
 		mode |= HRTIMER_MODE_HARD;
 
-	hrtimer_start_expires(&sl->timer, mode);
+	/* If already expired, clear the task pointer and set current state to running */
+	if (!hrtimer_start_expires_user(&sl->timer, mode)) {
+		sl->task = NULL;
+		__set_current_state(TASK_RUNNING);
+	}
 }
 EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (2 preceding siblings ...)
  2026-04-07  8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07 10:00   ` Peter Zijlstra
  2026-04-07  8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Calvin Owens, Peter Zijlstra, Ingo Molnar,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

In order to catch expiry times which are already in the past the
timer_arm() and timer_rearm() callbacks need to be able to report back to
the caller whether the timer has been queued or not.

Change the function signature and let all implementations return true for
now. While at it simplify posix_cpu_timer_rearm().

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Stephen Boyd <sboyd@kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/alarmtimer.c       |    6 ++++--
 kernel/time/posix-cpu-timers.c |   18 ++++++++++--------
 kernel/time/posix-timers.c     |    6 ++++--
 kernel/time/posix-timers.h     |    4 ++--
 4 files changed, 20 insertions(+), 14 deletions(-)

--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -523,12 +523,13 @@ static void alarm_handle_timer(struct al
  * alarm_timer_rearm - Posix timer callback for rearming timer
  * @timr:	Pointer to the posixtimer data struct
  */
-static void alarm_timer_rearm(struct k_itimer *timr)
+static bool alarm_timer_rearm(struct k_itimer *timr)
 {
 	struct alarm *alarm = &timr->it.alarm.alarmtimer;
 
 	timr->it_overrun += alarm_forward_now(alarm, timr->it_interval);
 	alarm_start(alarm, alarm->node.expires);
+	return true;
 }
 
 /**
@@ -584,7 +585,7 @@ static void alarm_timer_wait_running(str
  * @absolute:	Expiry value is absolute time
  * @sigev_none:	Posix timer does not deliver signals
  */
-static void alarm_timer_arm(struct k_itimer *timr, ktime_t expires,
+static bool alarm_timer_arm(struct k_itimer *timr, ktime_t expires,
 			    bool absolute, bool sigev_none)
 {
 	struct alarm *alarm = &timr->it.alarm.alarmtimer;
@@ -596,6 +597,7 @@ static void alarm_timer_arm(struct k_iti
 		alarm->node.expires = expires;
 	else
 		alarm_start(&timr->it.alarm.alarmtimer, expires);
+	return true;
 }
 
 /**
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -19,7 +19,7 @@
 
 #include "posix-timers.h"
 
-static void posix_cpu_timer_rearm(struct k_itimer *timer);
+static bool posix_cpu_timer_rearm(struct k_itimer *timer);
 
 void posix_cputimers_group_init(struct posix_cputimers *pct, u64 cpu_limit)
 {
@@ -1011,24 +1011,27 @@ static void check_process_timers(struct
 /*
  * This is called from the signal code (via posixtimer_rearm)
  * when the last timer signal was delivered and we have to reload the timer.
+ *
+ * Return true unconditionally so the core code assumes the timer to be
+ * armed. Otherwise it would requeue the signal.
  */
-static void posix_cpu_timer_rearm(struct k_itimer *timer)
+static bool posix_cpu_timer_rearm(struct k_itimer *timer)
 {
 	clockid_t clkid = CPUCLOCK_WHICH(timer->it_clock);
-	struct task_struct *p;
 	struct sighand_struct *sighand;
+	struct task_struct *p;
 	unsigned long flags;
 	u64 now;
 
-	rcu_read_lock();
+	guard(rcu)();
 	p = cpu_timer_task_rcu(timer);
 	if (!p)
-		goto out;
+		return true;
 
 	/* Protect timer list r/w in arm_timer() */
 	sighand = lock_task_sighand(p, &flags);
 	if (unlikely(sighand == NULL))
-		goto out;
+		return true;
 
 	/*
 	 * Fetch the current sample and update the timer's expiry time.
@@ -1045,8 +1048,7 @@ static void posix_cpu_timer_rearm(struct
 	 */
 	arm_timer(timer, p);
 	unlock_task_sighand(p, &flags);
-out:
-	rcu_read_unlock();
+	return true;
 }
 
 /**
--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -288,12 +288,13 @@ static inline int timer_overrun_to_int(s
 	return (int)timr->it_overrun_last;
 }
 
-static void common_hrtimer_rearm(struct k_itimer *timr)
+static bool common_hrtimer_rearm(struct k_itimer *timr)
 {
 	struct hrtimer *timer = &timr->it.real.timer;
 
 	timr->it_overrun += hrtimer_forward_now(timer, timr->it_interval);
 	hrtimer_restart(timer);
+	return true;
 }
 
 static bool __posixtimer_deliver_signal(struct kernel_siginfo *info, struct k_itimer *timr)
@@ -795,7 +796,7 @@ SYSCALL_DEFINE1(timer_getoverrun, timer_
 		return timer_overrun_to_int(scoped_timer);
 }
 
-static void common_hrtimer_arm(struct k_itimer *timr, ktime_t expires,
+static bool common_hrtimer_arm(struct k_itimer *timr, ktime_t expires,
 			       bool absolute, bool sigev_none)
 {
 	struct hrtimer *timer = &timr->it.real.timer;
@@ -822,6 +823,7 @@ static void common_hrtimer_arm(struct k_
 
 	if (!sigev_none)
 		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+	return true;
 }
 
 static int common_hrtimer_try_to_cancel(struct k_itimer *timr)
--- a/kernel/time/posix-timers.h
+++ b/kernel/time/posix-timers.h
@@ -27,11 +27,11 @@ struct k_clock {
 	int	(*timer_del)(struct k_itimer *timr);
 	void	(*timer_get)(struct k_itimer *timr,
 			     struct itimerspec64 *cur_setting);
-	void	(*timer_rearm)(struct k_itimer *timr);
+	bool	(*timer_rearm)(struct k_itimer *timr);
 	s64	(*timer_forward)(struct k_itimer *timr, ktime_t now);
 	ktime_t	(*timer_remaining)(struct k_itimer *timr, ktime_t now);
 	int	(*timer_try_to_cancel)(struct k_itimer *timr);
-	void	(*timer_arm)(struct k_itimer *timr, ktime_t expires,
+	bool	(*timer_arm)(struct k_itimer *timr, ktime_t expires,
 			     bool absolute, bool sigev_none);
 	void	(*timer_wait_running)(struct k_itimer *timr);
 };


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 05/12] posix-timers: Handle the timer_[re]arm() return value
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (3 preceding siblings ...)
  2026-04-07  8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07 10:01   ` Peter Zijlstra
  2026-04-07  8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, Calvin Owens,
	Peter Zijlstra, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

The [re]arm callbacks will return true when the timer was queued and false
if it was already expired at enqueue time.

In both cases the call sites can trivially queue the signal right there,
when the timer was already expired.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/posix-timers.c |   22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -299,6 +299,8 @@ static bool common_hrtimer_rearm(struct
 
 static bool __posixtimer_deliver_signal(struct kernel_siginfo *info, struct k_itimer *timr)
 {
+	bool queued;
+
 	guard(spinlock)(&timr->it_lock);
 
 	/*
@@ -312,12 +314,18 @@ static bool __posixtimer_deliver_signal(
 	if (!timr->it_interval || WARN_ON_ONCE(timr->it_status != POSIX_TIMER_REQUEUE_PENDING))
 		return true;
 
-	timr->kclock->timer_rearm(timr);
-	timr->it_status = POSIX_TIMER_ARMED;
+	/* timer_rearm() updates timr::it_overrun */
+	queued = timr->kclock->timer_rearm(timr);
+
 	timr->it_overrun_last = timr->it_overrun;
 	timr->it_overrun = -1LL;
 	++timr->it_signal_seq;
 	info->si_overrun = timer_overrun_to_int(timr);
+
+	if (queued)
+		timr->it_status = POSIX_TIMER_ARMED;
+	else
+		posix_timer_queue_signal(timr);
 	return true;
 }
 
@@ -905,9 +913,13 @@ int common_timer_set(struct k_itimer *ti
 		expires = timens_ktime_to_host(timr->it_clock, expires);
 	sigev_none = timr->it_sigev_notify == SIGEV_NONE;
 
-	kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none);
-	if (!sigev_none)
-		timr->it_status = POSIX_TIMER_ARMED;
+	if (kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none)) {
+		if (!sigev_none)
+			timr->it_status = POSIX_TIMER_ARMED;
+	} else {
+		/* Timer was already expired, queue the signal */
+		posix_timer_queue_signal(timr);
+	}
 	return 0;
 }
 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user()
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (4 preceding siblings ...)
  2026-04-07  8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07 10:01   ` Peter Zijlstra
  2026-04-07  8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: Anna-Maria Behnsen, Frederic Weisbecker, Calvin Owens,
	Peter Zijlstra, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

Switch the arm and rearm callbacks for hrtimer based posix timers over to
hrtimer_start_expires_user() so that already expired timers are not
queued. Hand the result back to the caller, which then queues the signal.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
---
 kernel/time/posix-timers.c |   11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

--- a/kernel/time/posix-timers.c
+++ b/kernel/time/posix-timers.c
@@ -293,8 +293,7 @@ static bool common_hrtimer_rearm(struct
 	struct hrtimer *timer = &timr->it.real.timer;
 
 	timr->it_overrun += hrtimer_forward_now(timer, timr->it_interval);
-	hrtimer_restart(timer);
-	return true;
+	return hrtimer_start_expires_user(timer, HRTIMER_MODE_ABS);
 }
 
 static bool __posixtimer_deliver_signal(struct kernel_siginfo *info, struct k_itimer *timr)
@@ -829,9 +828,11 @@ static bool common_hrtimer_arm(struct k_
 		expires = ktime_add_safe(expires, hrtimer_cb_get_time(timer));
 	hrtimer_set_expires(timer, expires);
 
-	if (!sigev_none)
-		hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
-	return true;
+	/* For sigev_none pretend that the timer is queued */
+	if (sigev_none)
+		return true;
+
+	return hrtimer_start_expires_user(timer, HRTIMER_MODE_ABS);
 }
 
 static int common_hrtimer_try_to_cancel(struct k_itimer *timr)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 07/12] alarmtimer: Provide alarmtimer_start()
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (5 preceding siblings ...)
  2026-04-07  8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07 10:04   ` Peter Zijlstra
  2026-04-07  8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
                   ` (6 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Stephen Boyd, Calvin Owens, Peter Zijlstra,
	Anna-Maria Behnsen, Frederic Weisbecker, Ingo Molnar,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

Alarm timers utilize hrtimers for normal operation and only switch to the
RTC on suspend. In order to catch already expired timers early and without
going through a timer interrupt cycle, provide a new start function which
internally uses hrtimer_start_range_ns_user().

If hrtimer_start_range_ns_user() detects an already expired timer, it does
not queue it. In that case remove the timer from the alarm base as well.

Return the status queued or not back to the caller to handle the early
expiry.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Stephen Boyd <sboyd@kernel.org>
---
 include/linux/alarmtimer.h |    6 ++++++
 kernel/time/alarmtimer.c   |   28 ++++++++++++++++++++++++++++
 2 files changed, 34 insertions(+)

--- a/include/linux/alarmtimer.h
+++ b/include/linux/alarmtimer.h
@@ -42,8 +42,14 @@ struct alarm {
 	void			*data;
 };
 
+static __always_inline ktime_t alarm_get_expires(struct alarm *alarm)
+{
+	return alarm->node.expires;
+}
+
 void alarm_init(struct alarm *alarm, enum alarmtimer_type type,
 		void (*function)(struct alarm *, ktime_t));
+bool alarmtimer_start(struct alarm *alarm, ktime_t expires, bool relative);
 void alarm_start(struct alarm *alarm, ktime_t start);
 void alarm_start_relative(struct alarm *alarm, ktime_t start);
 void alarm_restart(struct alarm *alarm);
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -365,6 +365,34 @@ void alarm_start_relative(struct alarm *
 }
 EXPORT_SYMBOL_GPL(alarm_start_relative);
 
+/**
+ * alarmtimer_start - Sets an alarm to fire
+ * @alarm:	Pointer to alarm to set
+ * @expires:	Expiry time
+ * @relative:	True if @expires is relative
+ *
+ * Returns: True if the alarm was queued. False if it already expired
+ */
+bool alarmtimer_start(struct alarm *alarm, ktime_t expires, bool relative)
+{
+	struct alarm_base *base = &alarm_bases[alarm->type];
+
+	if (relative)
+		expires = ktime_add_safe(expires, base->get_ktime());
+
+	trace_alarmtimer_start(alarm, base->get_ktime());
+
+	guard(spinlock_irqsave)(&base->lock);
+	alarm->node.expires = expires;
+	alarmtimer_enqueue(base, alarm);
+	if (!hrtimer_start_range_ns_user(&alarm->timer, expires, 0, HRTIMER_MODE_ABS)) {
+		alarmtimer_dequeue(base, alarm);
+		return false;
+	}
+	return true;
+}
+EXPORT_SYMBOL_GPL(alarmtimer_start);
+
 void alarm_restart(struct alarm *alarm)
 {
 	struct alarm_base *base = &alarm_bases[alarm->type];


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start()
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (6 preceding siblings ...)
  2026-04-07  8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07  8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Stephen Boyd, Calvin Owens, Peter Zijlstra,
	Anna-Maria Behnsen, Frederic Weisbecker, Ingo Molnar,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

Use the new alarmtimer_start() for arming and rearming posix interval
timers and for clock_nanosleep() so that already expired timers do not go
through the full timer interrupt cycle.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Stephen Boyd <sboyd@kernel.org>
---
 kernel/time/alarmtimer.c |   20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -556,8 +556,7 @@ static bool alarm_timer_rearm(struct k_i
 	struct alarm *alarm = &timr->it.alarm.alarmtimer;
 
 	timr->it_overrun += alarm_forward_now(alarm, timr->it_interval);
-	alarm_start(alarm, alarm->node.expires);
-	return true;
+	return alarmtimer_start(alarm, alarm->node.expires, false);
 }
 
 /**
@@ -621,11 +620,16 @@ static bool alarm_timer_arm(struct k_iti
 
 	if (!absolute)
 		expires = ktime_add_safe(expires, base->get_ktime());
-	if (sigev_none)
+
+	/*
+	 * sigev_none needs to update the expires value and pretend
+	 * that the timer is queued
+	 */
+	if (sigev_none) {
 		alarm->node.expires = expires;
-	else
-		alarm_start(&timr->it.alarm.alarmtimer, expires);
-	return true;
+		return true;
+	}
+	return alarmtimer_start(&timr->it.alarm.alarmtimer, expires, false);
 }
 
 /**
@@ -732,7 +736,9 @@ static int alarmtimer_do_nsleep(struct a
 	alarm->data = (void *)current;
 	do {
 		set_current_state(TASK_INTERRUPTIBLE);
-		alarm_start(alarm, absexp);
+		if (!alarmtimer_start(alarm, absexp, false))
+			alarm->data = NULL;
+
 		if (likely(alarm->data))
 			schedule();
 


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (7 preceding siblings ...)
  2026-04-07  8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
@ 2026-04-07  8:54 ` Thomas Gleixner
  2026-04-07 10:09   ` Peter Zijlstra
  2026-04-07  8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:54 UTC (permalink / raw)
  To: LKML
  Cc: Alexander Viro, Christian Brauner, Jan Kara, Anna-Maria Behnsen,
	Frederic Weisbecker, linux-fsdevel, Calvin Owens, Peter Zijlstra,
	Ingo Molnar, John Stultz, Stephen Boyd, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

Like any other user controlled interface, timerfd based timers can be
programmed with expiry times in the past or vary small intervals.

Both hrtimer and alarmtimer provide new interfaces which return the queued
state of the timer. If the timer was already expired, then let the callsite
handle the timerfd context update so that the full round trip through the
hrtimer interrupt is avoided.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Thomas Gleixner <tglx@kernel.org>
Cc: linux-fsdevel@vger.kernel.org
---
 fs/timerfd.c |  115 +++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 66 insertions(+), 49 deletions(-)

--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -55,6 +55,15 @@ static inline bool isalarm(struct timerf
 		ctx->clockid == CLOCK_BOOTTIME_ALARM;
 }
 
+static void __timerfd_triggered(struct timerfd_ctx *ctx)
+{
+	lockdep_assert_held(&ctx->wqh.lock);
+
+	ctx->expired = 1;
+	ctx->ticks++;
+	wake_up_locked_poll(&ctx->wqh, EPOLLIN);
+}
+
 /*
  * This gets called when the timer event triggers. We set the "expired"
  * flag, but we do not re-arm the timer (in case it's necessary,
@@ -62,13 +71,8 @@ static inline bool isalarm(struct timerf
  */
 static void timerfd_triggered(struct timerfd_ctx *ctx)
 {
-	unsigned long flags;
-
-	spin_lock_irqsave(&ctx->wqh.lock, flags);
-	ctx->expired = 1;
-	ctx->ticks++;
-	wake_up_locked_poll(&ctx->wqh, EPOLLIN);
-	spin_unlock_irqrestore(&ctx->wqh.lock, flags);
+	guard(spinlock_irqsave)(&ctx->wqh.lock);
+	__timerfd_triggered(ctx);
 }
 
 static enum hrtimer_restart timerfd_tmrproc(struct hrtimer *htmr)
@@ -184,15 +188,52 @@ static ktime_t timerfd_get_remaining(str
 	return remaining < 0 ? 0: remaining;
 }
 
+static void timerfd_alarm_start(struct timerfd_ctx *ctx, ktime_t exp, bool relative)
+{
+	/* Start the timer. If it's expired already, handle the callback. */
+	if (!alarmtimer_start(&ctx->t.alarm, exp, relative))
+		__timerfd_triggered(ctx);
+}
+
+static u64 timerfd_alarm_restart(struct timerfd_ctx *ctx)
+{
+	u64 ticks = alarm_forward_now(&ctx->t.alarm, ctx->tintv) - 1;
+
+	timerfd_alarm_start(ctx, alarm_get_expires(&ctx->t.alarm), false);
+	return ticks;
+}
+
+static void timerfd_hrtimer_start(struct timerfd_ctx *ctx, ktime_t exp,
+				  const enum hrtimer_mode mode)
+{
+	/* Start the timer. If it's expired already, handle the callback. */
+	if (!hrtimer_start_range_ns_user(&ctx->t.tmr, exp, 0, mode))
+		__timerfd_triggered(ctx);
+}
+
+static u64 timerfd_hrtimer_restart(struct timerfd_ctx *ctx)
+{
+	u64 ticks = hrtimer_forward_now(&ctx->t.tmr, ctx->tintv) - 1;
+
+	timerfd_hrtimer_start(ctx, hrtimer_get_expires(&ctx->t.tmr), HRTIMER_MODE_ABS);
+	return ticks;
+}
+
+static u64 timerfd_restart(struct timerfd_ctx *ctx)
+{
+	if (isalarm(ctx))
+		return timerfd_alarm_restart(ctx);
+	return timerfd_hrtimer_restart(ctx);
+}
+
 static int timerfd_setup(struct timerfd_ctx *ctx, int flags,
 			 const struct itimerspec64 *ktmr)
 {
+	int clockid = ctx->clockid;
 	enum hrtimer_mode htmode;
 	ktime_t texp;
-	int clockid = ctx->clockid;
 
-	htmode = (flags & TFD_TIMER_ABSTIME) ?
-		HRTIMER_MODE_ABS: HRTIMER_MODE_REL;
+	htmode = (flags & TFD_TIMER_ABSTIME) ? HRTIMER_MODE_ABS: HRTIMER_MODE_REL;
 
 	texp = timespec64_to_ktime(ktmr->it_value);
 	ctx->expired = 0;
@@ -206,20 +247,15 @@ static int timerfd_setup(struct timerfd_
 			   timerfd_alarmproc);
 	} else {
 		hrtimer_setup(&ctx->t.tmr, timerfd_tmrproc, clockid, htmode);
-		hrtimer_set_expires(&ctx->t.tmr, texp);
 	}
 
 	if (texp != 0) {
 		if (flags & TFD_TIMER_ABSTIME)
 			texp = timens_ktime_to_host(clockid, texp);
-		if (isalarm(ctx)) {
-			if (flags & TFD_TIMER_ABSTIME)
-				alarm_start(&ctx->t.alarm, texp);
-			else
-				alarm_start_relative(&ctx->t.alarm, texp);
-		} else {
-			hrtimer_start(&ctx->t.tmr, texp, htmode);
-		}
+		if (isalarm(ctx))
+			timerfd_alarm_start(ctx, texp, !(flags & TFD_TIMER_ABSTIME));
+		else
+			timerfd_hrtimer_start(ctx, texp, htmode);
 
 		if (timerfd_canceled(ctx))
 			return -ECANCELED;
@@ -287,27 +323,19 @@ static ssize_t timerfd_read_iter(struct
 	}
 
 	if (ctx->ticks) {
-		ticks = ctx->ticks;
+		unsigned int expired = ctx->expired;
 
-		if (ctx->expired && ctx->tintv) {
-			/*
-			 * If tintv != 0, this is a periodic timer that
-			 * needs to be re-armed. We avoid doing it in the timer
-			 * callback to avoid DoS attacks specifying a very
-			 * short timer period.
-			 */
-			if (isalarm(ctx)) {
-				ticks += alarm_forward_now(
-					&ctx->t.alarm, ctx->tintv) - 1;
-				alarm_restart(&ctx->t.alarm);
-			} else {
-				ticks += hrtimer_forward_now(&ctx->t.tmr,
-							     ctx->tintv) - 1;
-				hrtimer_restart(&ctx->t.tmr);
-			}
-		}
+		ticks = ctx->ticks;
 		ctx->expired = 0;
 		ctx->ticks = 0;
+
+		/*
+		 * If tintv != 0, this is a periodic timer that needs to be
+		 * re-armed. We avoid doing it in the timer callback to avoid
+		 * DoS attacks specifying a very short timer period.
+		 */
+		if (expired && ctx->tintv)
+			ticks += timerfd_restart(ctx);
 	}
 	spin_unlock_irq(&ctx->wqh.lock);
 	if (ticks) {
@@ -526,18 +554,7 @@ static int do_timerfd_gettime(int ufd, s
 	spin_lock_irq(&ctx->wqh.lock);
 	if (ctx->expired && ctx->tintv) {
 		ctx->expired = 0;
-
-		if (isalarm(ctx)) {
-			ctx->ticks +=
-				alarm_forward_now(
-					&ctx->t.alarm, ctx->tintv) - 1;
-			alarm_restart(&ctx->t.alarm);
-		} else {
-			ctx->ticks +=
-				hrtimer_forward_now(&ctx->t.tmr, ctx->tintv)
-				- 1;
-			hrtimer_restart(&ctx->t.tmr);
-		}
+		ctx->ticks += timerfd_restart(ctx);
 	}
 	t->it_value = ktime_to_timespec64(timerfd_get_remaining(ctx));
 	t->it_interval = ktime_to_timespec64(ctx->tintv);


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start()
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (8 preceding siblings ...)
  2026-04-07  8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
@ 2026-04-07  8:55 ` Thomas Gleixner
  2026-04-07 10:11   ` Peter Zijlstra
  2026-04-07  8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:55 UTC (permalink / raw)
  To: LKML
  Cc: Sebastian Reichel, linux-pm, Calvin Owens, Peter Zijlstra,
	Anna-Maria Behnsen, Frederic Weisbecker, Ingo Molnar, John Stultz,
	Stephen Boyd, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

The existing alarm_start() interface is replaced with the new
alarmtimer_start() mechanism, which does not longer queue an already
expired timer and returns the state. Adjust the code to utilize this.

No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: linux-pm@vger.kernel.org
---
 drivers/power/supply/charger-manager.c |   12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

--- a/drivers/power/supply/charger-manager.c
+++ b/drivers/power/supply/charger-manager.c
@@ -881,7 +881,7 @@ static bool cm_setup_timer(void)
 	mutex_unlock(&cm_list_mtx);
 
 	if (timer_req && cm_timer) {
-		ktime_t now, add;
+		ktime_t exp;
 
 		/*
 		 * Set alarm with the polling interval (wakeup_ms)
@@ -893,14 +893,16 @@ static bool cm_setup_timer(void)
 
 		pr_info("Charger Manager wakeup timer: %u ms\n", wakeup_ms);
 
-		now = ktime_get_boottime();
-		add = ktime_set(wakeup_ms / MSEC_PER_SEC,
+		exp = ktime_set(wakeup_ms / MSEC_PER_SEC,
 				(wakeup_ms % MSEC_PER_SEC) * NSEC_PER_MSEC);
-		alarm_start(cm_timer, ktime_add(now, add));
 
 		cm_suspend_duration_ms = wakeup_ms;
 
-		return true;
+		/*
+		 * The timer should always be queued as the timeout is at least
+		 * two seconds out. Handle it correctly nevertheless.
+		 */
+		return alarmtimer_start(cm_timer, exp, true);
 	}
 	return false;
 }


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 11/12] netfilter: xt_IDLETIMER: Switch to alarmtimer_start()
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (9 preceding siblings ...)
  2026-04-07  8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
@ 2026-04-07  8:55 ` Thomas Gleixner
  2026-04-07  8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:55 UTC (permalink / raw)
  To: LKML
  Cc: Pablo Neira Ayuso, Florian Westphal, Phil Sutter, netfilter-devel,
	coreteam, Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm

The existing alarm-start() interface is replaced with the new
alarmtimer_start() mechanism, which does not longer queue an already
expired timer and returns the state.

Adjust the code to utilize this so it schedules the work in the case that
the timer was already expired. Unlikely to happen as the timeout is at
least a second, but not impossible especially with virtualization.

No functional change intended

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Florian Westphal <fw@strlen.de>
Cc: Phil Sutter <phil@nwl.cc>
Cc: netfilter-devel@vger.kernel.org
Cc: coreteam@netfilter.org
---
 net/netfilter/xt_IDLETIMER.c |   24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

--- a/net/netfilter/xt_IDLETIMER.c
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -115,6 +115,21 @@ static void idletimer_tg_alarmproc(struc
 	schedule_work(&timer->work);
 }
 
+static void idletimer_start_alarm_ktime(struct idletimer_tg *timer, ktime_t timeout)
+{
+	/*
+	 * The timer should always be queued as @tout it should be least one
+	 * second, but handle it correctly in any case. Virt will manage!
+	 */
+	if (!alarmtimer_start(&timer->alarm, timeout, true))
+		schedule_work(&timer->work);
+}
+
+static void idletimer_start_alarm_sec(struct idletimer_tg *timer, unsigned int seconds)
+{
+	idletimer_start_alarm_ktime(timer, ktime_set(seconds, 0));
+}
+
 static int idletimer_check_sysfs_name(const char *name, unsigned int size)
 {
 	int ret;
@@ -220,12 +235,10 @@ static int idletimer_tg_create_v1(struct
 	INIT_WORK(&info->timer->work, idletimer_tg_work);
 
 	if (info->timer->timer_type & XT_IDLETIMER_ALARM) {
-		ktime_t tout;
 		alarm_init(&info->timer->alarm, ALARM_BOOTTIME,
 			   idletimer_tg_alarmproc);
 		info->timer->alarm.data = info->timer;
-		tout = ktime_set(info->timeout, 0);
-		alarm_start_relative(&info->timer->alarm, tout);
+		idletimer_start_alarm_sec(info->timer, info->timeout);
 	} else {
 		timer_setup(&info->timer->timer, idletimer_tg_expired, 0);
 		mod_timer(&info->timer->timer,
@@ -271,8 +284,7 @@ static unsigned int idletimer_tg_target_
 		 info->label, info->timeout);
 
 	if (info->timer->timer_type & XT_IDLETIMER_ALARM) {
-		ktime_t tout = ktime_set(info->timeout, 0);
-		alarm_start_relative(&info->timer->alarm, tout);
+		idletimer_start_alarm_sec(info->timer, info->timeout);
 	} else {
 		mod_timer(&info->timer->timer,
 				secs_to_jiffies(info->timeout) + jiffies);
@@ -384,7 +396,7 @@ static int idletimer_tg_checkentry_v1(co
 			if (ktimespec.tv_sec > 0) {
 				pr_debug("time_expiry_remaining %lld\n",
 					 ktimespec.tv_sec);
-				alarm_start_relative(&info->timer->alarm, tout);
+				idletimer_start_alarm_ktime(info->timer, tout);
 			}
 		} else {
 				mod_timer(&info->timer->timer,


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [patch 12/12] alarmtimer: Remove unused interfaces
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (10 preceding siblings ...)
  2026-04-07  8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
@ 2026-04-07  8:55 ` Thomas Gleixner
  2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
  2026-04-07 17:38 ` Calvin Owens
  13 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07  8:55 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Stephen Boyd, Calvin Owens, Peter Zijlstra,
	Anna-Maria Behnsen, Frederic Weisbecker, Ingo Molnar,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

All alarmtimer users are converted to alarmtimer_start(). Remove the now
unused interfaces.

Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: John Stultz <jstultz@google.com>
Cc: Stephen Boyd <sboyd@kernel.org>
---
 include/linux/alarmtimer.h |    3 ---
 kernel/time/alarmtimer.c   |   44 --------------------------------------------
 2 files changed, 47 deletions(-)

--- a/include/linux/alarmtimer.h
+++ b/include/linux/alarmtimer.h
@@ -50,9 +50,6 @@ static __always_inline ktime_t alarm_get
 void alarm_init(struct alarm *alarm, enum alarmtimer_type type,
 		void (*function)(struct alarm *, ktime_t));
 bool alarmtimer_start(struct alarm *alarm, ktime_t expires, bool relative);
-void alarm_start(struct alarm *alarm, ktime_t start);
-void alarm_start_relative(struct alarm *alarm, ktime_t start);
-void alarm_restart(struct alarm *alarm);
 int alarm_try_to_cancel(struct alarm *alarm);
 int alarm_cancel(struct alarm *alarm);
 
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -333,39 +333,6 @@ void alarm_init(struct alarm *alarm, enu
 EXPORT_SYMBOL_GPL(alarm_init);
 
 /**
- * alarm_start - Sets an absolute alarm to fire
- * @alarm: ptr to alarm to set
- * @start: time to run the alarm
- */
-void alarm_start(struct alarm *alarm, ktime_t start)
-{
-	struct alarm_base *base = &alarm_bases[alarm->type];
-
-	scoped_guard(spinlock_irqsave, &base->lock) {
-		alarm->node.expires = start;
-		alarmtimer_enqueue(base, alarm);
-		hrtimer_start(&alarm->timer, alarm->node.expires, HRTIMER_MODE_ABS);
-	}
-
-	trace_alarmtimer_start(alarm, base->get_ktime());
-}
-EXPORT_SYMBOL_GPL(alarm_start);
-
-/**
- * alarm_start_relative - Sets a relative alarm to fire
- * @alarm: ptr to alarm to set
- * @start: time relative to now to run the alarm
- */
-void alarm_start_relative(struct alarm *alarm, ktime_t start)
-{
-	struct alarm_base *base = &alarm_bases[alarm->type];
-
-	start = ktime_add_safe(start, base->get_ktime());
-	alarm_start(alarm, start);
-}
-EXPORT_SYMBOL_GPL(alarm_start_relative);
-
-/**
  * alarmtimer_start - Sets an alarm to fire
  * @alarm:	Pointer to alarm to set
  * @expires:	Expiry time
@@ -393,17 +360,6 @@ bool alarmtimer_start(struct alarm *alar
 }
 EXPORT_SYMBOL_GPL(alarmtimer_start);
 
-void alarm_restart(struct alarm *alarm)
-{
-	struct alarm_base *base = &alarm_bases[alarm->type];
-
-	guard(spinlock_irqsave)(&base->lock);
-	hrtimer_set_expires(&alarm->timer, alarm->node.expires);
-	hrtimer_restart(&alarm->timer);
-	alarmtimer_enqueue(base, alarm);
-}
-EXPORT_SYMBOL_GPL(alarm_restart);
-
 /**
  * alarm_try_to_cancel - Tries to cancel an alarm timer
  * @alarm: ptr to alarm to be canceled


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
@ 2026-04-07  9:42   ` Peter Zijlstra
  2026-04-07 11:30     ` Thomas Gleixner
  2026-04-07 14:00   ` Frederic Weisbecker
  2026-04-07 14:33   ` Thomas Gleixner
  2 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07  9:42 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:17AM +0200, Thomas Gleixner wrote:


> @@ -324,16 +324,23 @@ int clockevents_program_event(struct clo
>  		return dev->set_next_ktime(expires, dev);
>  
>  	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
> -	if (delta <= 0)
> -		return force ? clockevents_program_min_delta(dev) : -ETIME;
>  
> -	delta = min(delta, (int64_t) dev->max_delta_ns);
> -	delta = max(delta, (int64_t) dev->min_delta_ns);
>  
> -	clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
> -	rc = dev->set_next_event((unsigned long) clc, dev);
>  
> -	return (rc && force) ? clockevents_program_min_delta(dev) : rc;
>  }

> @@ -324,16 +324,23 @@ int clockevents_program_event(struct clo
>  		return dev->set_next_ktime(expires, dev);
>  
>  	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
>  
> +	if (delta > (int64_t)dev->min_delta_ns) {
> +		delta = min(delta, (int64_t) dev->max_delta_ns);
> +		clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
> +		if (!dev->set_next_event((unsigned long) clc, dev))
> +			return 0;
> +	}
>  
> +	if (dev->next_event_forced)
> +		return 0;
>  
> +	if (dev->set_next_event(dev->min_delta_ticks, dev)) {
> +		if (!force || clockevents_program_min_delta(dev))
> +			return -ETIME;
> +	}
> +	dev->next_event_forced = 1;
> +	return 0;
>  }

Looking at the implementation of clockevents_program_min_delta() doing
that dev->set_next_event(dev->min_delta_ticks,) right before it seems a
bit daft.

But yes, this is effectively also what the old code did.

The only thing that seems to be different, is that the old code would
return the ->set_next_event() error code, rather than 0 in the !force
case.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user()
  2026-04-07  8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
@ 2026-04-07  9:54   ` Peter Zijlstra
  2026-04-07 11:32     ` Thomas Gleixner
  2026-04-07  9:57   ` Peter Zijlstra
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07  9:54 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:22AM +0200, Thomas Gleixner wrote:

> +enum {
> +	HRTIMER_REPROGRAM_NONE,
> +	HRTIMER_REPROGRAM,
> +	HRTIMER_REPROGRAM_FORCE,
> +};

> +static int hrtimer_start_range_ns_common(struct hrtimer *timer, ktime_t tim,
> +					 u64 delta_ns, const enum hrtimer_mode mode,
> +					 struct hrtimer_clock_base *base)

> @@ -1315,25 +1337,110 @@ void hrtimer_start_range_ns(struct hrtim
>  	struct hrtimer_clock_base *base;
>  	unsigned long flags;
>  
> -	/*
> -	 * Check whether the HRTIMER_MODE_SOFT bit and hrtimer.is_soft
> -	 * match on CONFIG_PREEMPT_RT = n. With PREEMPT_RT check the hard
> -	 * expiry mode because unmarked timers are moved to softirq expiry.
> -	 */
> -	if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> -		WARN_ON_ONCE(!(mode & HRTIMER_MODE_SOFT) ^ !timer->is_soft);
> -	else
> -		WARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard);
> -
>  	base = lock_hrtimer_base(timer, &flags);
>  
> -	if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))
> +	switch (hrtimer_start_range_ns_common(timer, tim, delta_ns, mode, base)) {
> +	case HRTIMER_REPROGRAM:
>  		hrtimer_reprogram(timer, true);
> +		break;
> +	case HRTIMER_REPROGRAM_FORCE:
> +		hrtimer_force_reprogram(timer->base->cpu_base, 1);
> +		break;
> +	}
>  
>  	unlock_hrtimer_base(timer, &flags);
>  }

Something is going to figure out that hrtimer_start_range_ns_common() is
really returning that enum and then complain you don't handle NONE :-)

Anyway, to me it would make sense to instead pass that value to
hrtimer_reprogram() as the second argument. But this works I suppose.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user()
  2026-04-07  8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
  2026-04-07  9:54   ` Peter Zijlstra
@ 2026-04-07  9:57   ` Peter Zijlstra
  2026-04-07 11:34     ` Thomas Gleixner
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07  9:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:22AM +0200, Thomas Gleixner wrote:

> +static inline bool hrtimer_check_user_timer(struct hrtimer *timer)
> +{
> +	struct hrtimer_cpu_base *cpu_base = timer->base->cpu_base;
> +	ktime_t expires;
> +
> +	/*
> +	 * This uses soft expires because that's the user provided
> +	 * expiry time, while expires can be further in the past
> +	 * due to a slack value added to the user expiry time.
> +	 */
> +	expires = hrtimer_get_softexpires(timer);
> +
> +	/* Convert to monotonic */
> +	expires = ktime_sub(expires, timer->base->offset);
> +
> +	/*
> +	 * Check whether this timer will end up as the first expiring timer in
> +	 * the CPU base. If not, no further checks required as it's then
> +	 * guaranteed to expire in the future.
> +	 */
> +	if (expires >= cpu_base->expires_next)
> +		return true;
> +
> +	/* Validate that the expiry time is in the future. */
> +	if (expires > ktime_get())
> +		return true;
> +
> +	debug_deactivate(timer);
> +	__remove_hrtimer(timer, timer->base, HRTIMER_STATE_INACTIVE, false);
> +	trace_hrtimer_start_expired(timer);
> +	return false;
> +}
> +
> +static bool hrtimer_reprogram_user(struct hrtimer *timer)
> +{
> +	if (!hrtimer_check_user_timer(timer))
> +		return false;
> +	hrtimer_reprogram(timer, true);
> +	return true;
> +}
> +
> +static bool hrtimer_force_reprogram_user(struct hrtimer *timer)
> +{
> +	bool ret = hrtimer_check_user_timer(timer);
> +
> +	/*
> +	 * The base must always be reevaluated, independent of the result
> +	 * above because the timer was the first pending timer.
> +	 */
> +	hrtimer_force_reprogram(timer->base->cpu_base, 1);
> +	return ret;
> +}
> +
> +/**
> + * hrtimer_start_range_ns_user - (re)start an user controlled hrtimer
> + * @timer:	the timer to be added
> + * @tim:	expiry time
> + * @delta_ns:	"slack" range for the timer
> + * @mode:	timer mode: absolute (HRTIMER_MODE_ABS) or
> + *		relative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED);
> + *		softirq based mode is considered for debug purpose only!
> + *
> + * Returns: True when the timer was queued, false if it was already expired
> + *
> + * This function cannot invoke the timer callback for expired timers as it might
> + * be called under a lock which the timer callback needs to acquire. So the
> + * caller has to handle that case.
> + */
> +bool hrtimer_start_range_ns_user(struct hrtimer *timer, ktime_t tim,
> +				 u64 delta_ns, const enum hrtimer_mode mode)
> +{
> +	struct hrtimer_clock_base *base;
> +	unsigned long flags;
> +	bool ret = true;
> +
> +	base = lock_hrtimer_base(timer, &flags);
> +	switch (hrtimer_start_range_ns_common(timer, tim, delta_ns, mode, base)) {
> +	case HRTIMER_REPROGRAM:
> +		ret = hrtimer_reprogram_user(timer);
> +		break;
> +	case HRTIMER_REPROGRAM_FORCE:
> +		ret = hrtimer_force_reprogram_user(timer);
> +		break;
> +	}
> +	unlock_hrtimer_base(timer, &flags);
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(hrtimer_start_range_ns_user);

Can we do that hrtimer_check_user_timer() in
hrtimer_start_range_ns_user() and then not duplicate
hrtimer_*reprogram() ?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers
  2026-04-07  8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
@ 2026-04-07  9:59   ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07  9:59 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Frederic Weisbecker, Calvin Owens,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:27AM +0200, Thomas Gleixner wrote:
> Most hrtimer sleepers are user controlled and user space can hand arbitrary
> expiry values in as long as they are valid timespecs. If the expiry value
> is in the past then this requires a full loop through reprogramming the
> clock event device, taking the hrtimer interrupt, waking the task and
> reprogram again.
> 
> Use hrtimer_start_expires_user() which avoids the full round trip by
> checking the timer for expiry on enqueue.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
> Cc: Frederic Weisbecker <frederic@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

> ---
>  kernel/time/hrtimer.c |    6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> --- a/kernel/time/hrtimer.c
> +++ b/kernel/time/hrtimer.c
> @@ -2152,7 +2152,11 @@ void hrtimer_sleeper_start_expires(struc
>  	if (IS_ENABLED(CONFIG_PREEMPT_RT) && sl->timer.is_hard)
>  		mode |= HRTIMER_MODE_HARD;
>  
> -	hrtimer_start_expires(&sl->timer, mode);
> +	/* If already expired, clear the task pointer and set current state to running */
> +	if (!hrtimer_start_expires_user(&sl->timer, mode)) {
> +		sl->task = NULL;
> +		__set_current_state(TASK_RUNNING);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(hrtimer_sleeper_start_expires);
>  
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value
  2026-04-07  8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
@ 2026-04-07 10:00   ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07 10:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Stultz, Stephen Boyd, Anna-Maria Behnsen,
	Frederic Weisbecker, Calvin Owens, Ingo Molnar, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:33AM +0200, Thomas Gleixner wrote:
> In order to catch expiry times which are already in the past the
> timer_arm() and timer_rearm() callbacks need to be able to report back to
> the caller whether the timer has been queued or not.
> 
> Change the function signature and let all implementations return true for
> now. While at it simplify posix_cpu_timer_rearm().
> 
> No functional change intended.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 05/12] posix-timers: Handle the timer_[re]arm() return value
  2026-04-07  8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
@ 2026-04-07 10:01   ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07 10:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Frederic Weisbecker, Calvin Owens,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:38AM +0200, Thomas Gleixner wrote:
> The [re]arm callbacks will return true when the timer was queued and false
> if it was already expired at enqueue time.
> 
> In both cases the call sites can trivially queue the signal right there,
> when the timer was already expired.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user()
  2026-04-07  8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
@ 2026-04-07 10:01   ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07 10:01 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Anna-Maria Behnsen, Frederic Weisbecker, Calvin Owens,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:43AM +0200, Thomas Gleixner wrote:
> Switch the arm and rearm callbacks for hrtimer based posix timers over to
> hrtimer_start_expires_user() so that already expired timers are not
> queued. Hand the result back to the caller, which then queues the signal.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 07/12] alarmtimer: Provide alarmtimer_start()
  2026-04-07  8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
@ 2026-04-07 10:04   ` Peter Zijlstra
  2026-04-07 11:34     ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07 10:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, John Stultz, Stephen Boyd, Calvin Owens, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:48AM +0200, Thomas Gleixner wrote:
> Alarm timers utilize hrtimers for normal operation and only switch to the
> RTC on suspend. In order to catch already expired timers early and without
> going through a timer interrupt cycle, provide a new start function which
> internally uses hrtimer_start_range_ns_user().
> 
> If hrtimer_start_range_ns_user() detects an already expired timer, it does
> not queue it. In that case remove the timer from the alarm base as well.
> 
> Return the status queued or not back to the caller to handle the early
> expiry.
> 
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>

Not familiar with this code, but my head hurts from the:

alarm_
alarm_timer_
alarmtimer_

prefixes, what's what?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions
  2026-04-07  8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
@ 2026-04-07 10:09   ` Peter Zijlstra
  2026-04-07 11:41     ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07 10:09 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Alexander Viro, Christian Brauner, Jan Kara,
	Anna-Maria Behnsen, Frederic Weisbecker, linux-fsdevel,
	Calvin Owens, Ingo Molnar, John Stultz, Stephen Boyd,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:54:58AM +0200, Thomas Gleixner wrote:

> +static u64 timerfd_alarm_restart(struct timerfd_ctx *ctx)
> +{
> +	u64 ticks = alarm_forward_now(&ctx->t.alarm, ctx->tintv) - 1;

(still confused on the alarm_forward_now() vs alarmtimer_start()
namespacing)

> +
> +	timerfd_alarm_start(ctx, alarm_get_expires(&ctx->t.alarm), false);
> +	return ticks;
> +}
> +
> +static void timerfd_hrtimer_start(struct timerfd_ctx *ctx, ktime_t exp,
> +				  const enum hrtimer_mode mode)
> +{
> +	/* Start the timer. If it's expired already, handle the callback. */
> +	if (!hrtimer_start_range_ns_user(&ctx->t.tmr, exp, 0, mode))
> +		__timerfd_triggered(ctx);
> +}
> +
> +static u64 timerfd_hrtimer_restart(struct timerfd_ctx *ctx)
> +{
> +	u64 ticks = hrtimer_forward_now(&ctx->t.tmr, ctx->tintv) - 1;
> +
> +	timerfd_hrtimer_start(ctx, hrtimer_get_expires(&ctx->t.tmr), HRTIMER_MODE_ABS);
> +	return ticks;
> +}

> -		if (ctx->expired && ctx->tintv) {
> -			/*
> -			 * If tintv != 0, this is a periodic timer that
> -			 * needs to be re-armed. We avoid doing it in the timer
> -			 * callback to avoid DoS attacks specifying a very
> -			 * short timer period.
> -			 */
> -			if (isalarm(ctx)) {
> -				ticks += alarm_forward_now(
> -					&ctx->t.alarm, ctx->tintv) - 1;
> -				alarm_restart(&ctx->t.alarm);
> -			} else {
> -				ticks += hrtimer_forward_now(&ctx->t.tmr,
> -							     ctx->tintv) - 1;
> -				hrtimer_restart(&ctx->t.tmr);
> -			}
> -		}
> +		ticks = ctx->ticks;
>  		ctx->expired = 0;
>  		ctx->ticks = 0;
> +
> +		/*
> +		 * If tintv != 0, this is a periodic timer that needs to be
> +		 * re-armed. We avoid doing it in the timer callback to avoid
> +		 * DoS attacks specifying a very short timer period.
> +		 */
> +		if (expired && ctx->tintv)
> +			ticks += timerfd_restart(ctx);
>  	}
>  	spin_unlock_irq(&ctx->wqh.lock);
>  	if (ticks) {
> @@ -526,18 +554,7 @@ static int do_timerfd_gettime(int ufd, s
>  	spin_lock_irq(&ctx->wqh.lock);
>  	if (ctx->expired && ctx->tintv) {
>  		ctx->expired = 0;
> -
> -		if (isalarm(ctx)) {
> -			ctx->ticks +=
> -				alarm_forward_now(
> -					&ctx->t.alarm, ctx->tintv) - 1;
> -			alarm_restart(&ctx->t.alarm);
> -		} else {
> -			ctx->ticks +=
> -				hrtimer_forward_now(&ctx->t.tmr, ctx->tintv)
> -				- 1;

(argh!)

> -			hrtimer_restart(&ctx->t.tmr);
> -		}
> +		ctx->ticks += timerfd_restart(ctx);
>  	}
>  	t->it_value = ktime_to_timespec64(timerfd_get_remaining(ctx));
>  	t->it_interval = ktime_to_timespec64(ctx->tintv);

What's with the -1 thing?

Anyway, this looks about right.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start()
  2026-04-07  8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
@ 2026-04-07 10:11   ` Peter Zijlstra
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07 10:11 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Sebastian Reichel, linux-pm, Calvin Owens,
	Anna-Maria Behnsen, Frederic Weisbecker, Ingo Molnar, John Stultz,
	Stephen Boyd, Alexander Viro, Christian Brauner, Jan Kara,
	linux-fsdevel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 10:55:03AM +0200, Thomas Gleixner wrote:

> +		exp = ktime_set(wakeup_ms / MSEC_PER_SEC,
>  				(wakeup_ms % MSEC_PER_SEC) * NSEC_PER_MSEC);

Surely we can write this less insane?

  exp = wakeup_ms * NSEC_PER_MSEC;

comes to mind? And yes, we then seem to loose that KTIME_SEC_MAX check,
but urgh.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07  9:42   ` Peter Zijlstra
@ 2026-04-07 11:30     ` Thomas Gleixner
  2026-04-07 11:49       ` Peter Zijlstra
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 11:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 11:42, Peter Zijlstra wrote:
> On Tue, Apr 07, 2026 at 10:54:17AM +0200, Thomas Gleixner wrote:
>> @@ -324,16 +324,23 @@ int clockevents_program_event(struct clo
>>  		return dev->set_next_ktime(expires, dev);
>>  
>>  	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
>>  
>> +	if (delta > (int64_t)dev->min_delta_ns) {
>> +		delta = min(delta, (int64_t) dev->max_delta_ns);
>> +		clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
>> +		if (!dev->set_next_event((unsigned long) clc, dev))
>> +			return 0;
>> +	}
>>  
>> +	if (dev->next_event_forced)
>> +		return 0;
>>  
>> +	if (dev->set_next_event(dev->min_delta_ticks, dev)) {
>> +		if (!force || clockevents_program_min_delta(dev))
>> +			return -ETIME;
>> +	}
>> +	dev->next_event_forced = 1;
>> +	return 0;
>>  }
>
> Looking at the implementation of clockevents_program_min_delta() doing
> that dev->set_next_event(dev->min_delta_ticks,) right before it seems a
> bit daft.
>
> But yes, this is effectively also what the old code did.

yes. I looked at that and didn't come up with a good plan.

> The only thing that seems to be different, is that the old code would
> return the ->set_next_event() error code, rather than 0 in the !force
> case.

You mean when dev->next_event_forced is set and the set_event() callback
above failed?

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user()
  2026-04-07  9:54   ` Peter Zijlstra
@ 2026-04-07 11:32     ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 11:32 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 11:54, Peter Zijlstra wrote:
> On Tue, Apr 07, 2026 at 10:54:22AM +0200, Thomas Gleixner wrote:
>> -	if (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))
>> +	switch (hrtimer_start_range_ns_common(timer, tim, delta_ns, mode, base)) {
>> +	case HRTIMER_REPROGRAM:
>>  		hrtimer_reprogram(timer, true);
>> +		break;
>> +	case HRTIMER_REPROGRAM_FORCE:
>> +		hrtimer_force_reprogram(timer->base->cpu_base, 1);
>> +		break;
>> +	}
>>  
>>  	unlock_hrtimer_base(timer, &flags);
>>  }
>
> Something is going to figure out that hrtimer_start_range_ns_common() is
> really returning that enum and then complain you don't handle NONE :-)

:)

> Anyway, to me it would make sense to instead pass that value to
> hrtimer_reprogram() as the second argument. But this works I suppose.

I can do that too. Splitting it this way made me more comfortable to
validate the logic I was implementing.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user()
  2026-04-07  9:57   ` Peter Zijlstra
@ 2026-04-07 11:34     ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 11:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 11:57, Peter Zijlstra wrote:
> On Tue, Apr 07, 2026 at 10:54:22AM +0200, Thomas Gleixner wrote:
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(hrtimer_start_range_ns_user);
>
> Can we do that hrtimer_check_user_timer() in
> hrtimer_start_range_ns_user() and then not duplicate
> hrtimer_*reprogram() ?

We probably can. Let me have a look.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 07/12] alarmtimer: Provide alarmtimer_start()
  2026-04-07 10:04   ` Peter Zijlstra
@ 2026-04-07 11:34     ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 11:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, John Stultz, Stephen Boyd, Calvin Owens, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 12:04, Peter Zijlstra wrote:
> On Tue, Apr 07, 2026 at 10:54:48AM +0200, Thomas Gleixner wrote:
>> Alarm timers utilize hrtimers for normal operation and only switch to the
>> RTC on suspend. In order to catch already expired timers early and without
>> going through a timer interrupt cycle, provide a new start function which
>> internally uses hrtimer_start_range_ns_user().
>> 
>> If hrtimer_start_range_ns_user() detects an already expired timer, it does
>> not queue it. In that case remove the timer from the alarm base as well.
>> 
>> Return the status queued or not back to the caller to handle the early
>> expiry.
>> 
>> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
>
> Not familiar with this code, but my head hurts from the:
>
> alarm_
> alarm_timer_
> alarmtimer_
>
> prefixes, what's what?

Yeah. I should have named it alarm_timer_start(). Let me fix this.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions
  2026-04-07 10:09   ` Peter Zijlstra
@ 2026-04-07 11:41     ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 11:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Alexander Viro, Christian Brauner, Jan Kara,
	Anna-Maria Behnsen, Frederic Weisbecker, linux-fsdevel,
	Calvin Owens, Ingo Molnar, John Stultz, Stephen Boyd,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

On Tue, Apr 07 2026 at 12:09, Peter Zijlstra wrote:
>> -			ctx->ticks +=
>> -				hrtimer_forward_now(&ctx->t.tmr, ctx->tintv)
>> -				- 1;
>
> (argh!)
>
>> -			hrtimer_restart(&ctx->t.tmr);
>> -		}
>> +		ctx->ticks += timerfd_restart(ctx);
>>  	}
>>  	t->it_value = ktime_to_timespec64(timerfd_get_remaining(ctx));
>>  	t->it_interval = ktime_to_timespec64(ctx->tintv);
>
> What's with the -1 thing?

Magic :)

Reading the timerfd returns the number of expired ticks since the last
read or since the timer was armed.

The expiry callback increments ticks by one, hrtimer_forward_now()
returns the number of expired ticks relative to the previous expiry
time. So it would double account that.

Not pretty, but we need to increment ticks in the callback because of
non-interval timers as for those we don't invoke the forwarding.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07 11:30     ` Thomas Gleixner
@ 2026-04-07 11:49       ` Peter Zijlstra
  2026-04-07 13:59         ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Zijlstra @ 2026-04-07 11:49 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07, 2026 at 01:30:42PM +0200, Thomas Gleixner wrote:
> On Tue, Apr 07 2026 at 11:42, Peter Zijlstra wrote:
> > On Tue, Apr 07, 2026 at 10:54:17AM +0200, Thomas Gleixner wrote:
> >> @@ -324,16 +324,23 @@ int clockevents_program_event(struct clo
> >>  		return dev->set_next_ktime(expires, dev);
> >>  
> >>  	delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
> >>  
> >> +	if (delta > (int64_t)dev->min_delta_ns) {
> >> +		delta = min(delta, (int64_t) dev->max_delta_ns);
> >> +		clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
> >> +		if (!dev->set_next_event((unsigned long) clc, dev))
> >> +			return 0;
> >> +	}
> >>  
> >> +	if (dev->next_event_forced)
> >> +		return 0;
> >>  
> >> +	if (dev->set_next_event(dev->min_delta_ticks, dev)) {
> >> +		if (!force || clockevents_program_min_delta(dev))
> >> +			return -ETIME;
> >> +	}
> >> +	dev->next_event_forced = 1;
> >> +	return 0;
> >>  }
> >
> > Looking at the implementation of clockevents_program_min_delta() doing
> > that dev->set_next_event(dev->min_delta_ticks,) right before it seems a
> > bit daft.
> >
> > But yes, this is effectively also what the old code did.
> 
> yes. I looked at that and didn't come up with a good plan.
> 
> > The only thing that seems to be different, is that the old code would
> > return the ->set_next_event() error code, rather than 0 in the !force
> > case.
> 
> You mean when dev->next_event_forced is set and the set_event() callback
> above failed?

next_event_foced = 0;
force = 0;

Then the old code would return rc (return value of ->set_next_event),
while the new code will return -ETIME.

(not 0 like I said).

I suppose ->set_next_event() will only ever fail with -ETIME?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07 11:49       ` Peter Zijlstra
@ 2026-04-07 13:59         ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Calvin Owens, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 13:49, Peter Zijlstra wrote:
> On Tue, Apr 07, 2026 at 01:30:42PM +0200, Thomas Gleixner wrote:
>> > The only thing that seems to be different, is that the old code would
>> > return the ->set_next_event() error code, rather than 0 in the !force
>> > case.
>> 
>> You mean when dev->next_event_forced is set and the set_event() callback
>> above failed?
>
> next_event_foced = 0;
> force = 0;
>
> Then the old code would return rc (return value of ->set_next_event),
> while the new code will return -ETIME.
>
> (not 0 like I said).

Ah. Now it makes sense :)

> I suppose ->set_next_event() will only ever fail with -ETIME?

Yes.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
  2026-04-07  9:42   ` Peter Zijlstra
@ 2026-04-07 14:00   ` Frederic Weisbecker
  2026-04-07 16:08     ` Thomas Gleixner
  2026-04-07 14:33   ` Thomas Gleixner
  2 siblings, 1 reply; 39+ messages in thread
From: Frederic Weisbecker @ 2026-04-07 14:00 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

Le Tue, Apr 07, 2026 at 10:54:17AM +0200, Thomas Gleixner a écrit :
> From: Thomas Gleixner <tglx@kernel.org>
> 
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space. He provided a reproducer, which sets up a timerfd based
> timer and then rearms it in a loop with an absolute expiry time of 1ns.
> 
> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
> 
> As a first step to prevent this, avoid reprogramming the clock event device
> when:
>      - a forced minimum delta event is pending
>      - the new expiry delta is less then or equal to the minimum delta
> 
> Thanks to Calvin for providing the reproducer and to Borislav for testing
> and providing data from his Zen5 machine.
> 
> The problem is not limited to Zen5, but depending on the underlying
> clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
> not necessarily observable.
> 
> This change serves only as the last resort and further changes will be made
> to prevent this scenario earlier in the call chain as far as possible.
> 
> Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
> Reported-by: Calvin Owens <calvin@wbinvd.org>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
> ---
> V2: Simplified the clockevents code - Peter

Isn't it possible to rely on dev->next_event instead? In the above scenario,
subsequent 0 delta would not reprogram if dev->next_event is already below
the new call to ktime_get() ?

Thanks.

-- 
Frederic Weisbecker
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
  2026-04-07  9:42   ` Peter Zijlstra
  2026-04-07 14:00   ` Frederic Weisbecker
@ 2026-04-07 14:33   ` Thomas Gleixner
  2 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 14:33 UTC (permalink / raw)
  To: LKML
  Cc: Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

Calvin!

On Tue, Apr 07 2026 at 10:54, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@kernel.org>
>
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space. He provided a reproducer, which sets up a timerfd based
> timer and then rearms it in a loop with an absolute expiry time of 1ns.
>
> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
>
> As a first step to prevent this, avoid reprogramming the clock event device
> when:
>      - a forced minimum delta event is pending
>      - the new expiry delta is less then or equal to the minimum delta
>
> Thanks to Calvin for providing the reproducer and to Borislav for testing
> and providing data from his Zen5 machine.
>
> The problem is not limited to Zen5, but depending on the underlying
> clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
> not necessarily observable.
>
> This change serves only as the last resort and further changes will be made
> to prevent this scenario earlier in the call chain as far as possible.

It'd be great if you could re-test this one independently of the other
changes, so we can get that on the way ASAP.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (11 preceding siblings ...)
  2026-04-07  8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
@ 2026-04-07 14:43 ` Thomas Gleixner
  2026-04-07 16:17   ` Thomas Gleixner
  2026-04-07 17:38 ` Calvin Owens
  13 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 14:43 UTC (permalink / raw)
  To: LKML
  Cc: Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

On Tue, Apr 07 2026 at 10:54, Thomas Gleixner wrote:
> There needs to be some discussion about the scope of backporting. The first
> patch preventing the stall is obviously a backport candidate. The remaining
> series can be obviously argued about, but in my opinion it should be
> backported as well as it prevents stupid or malicious user space from
> generating tons of pointless timer interrupts.

Peter and me just discussed it over IRC. With the clockevents prevention
in place, the effect of stupid/malicious code is pretty much affecting
only the user space task itself. As the timer is forced to expire once
the clockevent device has been force armed, it won't have other side
effects as device interrupts or IPIs are not blocked out and in the
worst case marginally delayed by the high frequency timer interrupt.

Once the task is scheduled out that subsides as there is nothing which
re-arms the timer anymore.

So we should be fine with backporting the clockevents fix and leave the
other parts of the series for upstream only. I still need to investigate
how all of that affects the pending changes vs. TSC deadline timer (and
similar devices) which are not going to reach that modified clockevents
code anymore.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07 14:00   ` Frederic Weisbecker
@ 2026-04-07 16:08     ` Thomas Gleixner
  2026-04-07 18:01       ` Thomas Gleixner
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 16:08 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 16:00, Frederic Weisbecker wrote:
> Le Tue, Apr 07, 2026 at 10:54:17AM +0200, Thomas Gleixner a écrit :
>> From: Thomas Gleixner <tglx@kernel.org>
>> 
>> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
>> up in user space. He provided a reproducer, which sets up a timerfd based
>> timer and then rearms it in a loop with an absolute expiry time of 1ns.
>> 
>> As the expiry time is in the past, the timer ends up as the first expiring
>> timer in the per CPU hrtimer base and the clockevent device is programmed
>> with the minimum delta value. If the machine is fast enough, this ends up
>> in a endless loop of programming the delta value to the minimum value
>> defined by the clock event device, before the timer interrupt can fire,
>> which starves the interrupt and consequently triggers the lockup detector
>> because the hrtimer callback of the lockup mechanism is never invoked.
>> 
>> As a first step to prevent this, avoid reprogramming the clock event device
>> when:
>>      - a forced minimum delta event is pending
>>      - the new expiry delta is less then or equal to the minimum delta
>> 
>> Thanks to Calvin for providing the reproducer and to Borislav for testing
>> and providing data from his Zen5 machine.
>> 
>> The problem is not limited to Zen5, but depending on the underlying
>> clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
>> not necessarily observable.
>> 
>> This change serves only as the last resort and further changes will be made
>> to prevent this scenario earlier in the call chain as far as possible.
>> 
>> Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
>> Reported-by: Calvin Owens <calvin@wbinvd.org>
>> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
>> Cc: Frederic Weisbecker <frederic@kernel.org>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
>> ---
>> V2: Simplified the clockevents code - Peter
>
> Isn't it possible to rely on dev->next_event instead? In the above scenario,
> subsequent 0 delta would not reprogram if dev->next_event is already below
> the new call to ktime_get() ?

It does if force is set and that is set when hrtimer calls into it:

	if (delta <= 0)
		return force ? clockevents_program_min_delta(dev) : -ETIME;

I can't change that for various reasons.

But we always need the flag which tells us that the programming was
forced in order to prevent the above scenario. And delta <= 0 is not the
only way how to achieve that. You can have a delta > 0 and < min_delta
anc achieve the same effect. That needs more effort on the callsite, but
it's trivially doable as the systemcall to reprogram time is pretty
constant.

As I had to introduce the flag and prevent the other scenraio I just
consolidated everything into one code path.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
  2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
@ 2026-04-07 16:17   ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 16:17 UTC (permalink / raw)
  To: LKML
  Cc: Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Frederic Weisbecker, Ingo Molnar, John Stultz, Stephen Boyd,
	Alexander Viro, Christian Brauner, Jan Kara, linux-fsdevel,
	Sebastian Reichel, linux-pm, Pablo Neira Ayuso, Florian Westphal,
	Phil Sutter, netfilter-devel, coreteam

On Tue, Apr 07 2026 at 16:43, Thomas Gleixner wrote:
> On Tue, Apr 07 2026 at 10:54, Thomas Gleixner wrote:
>> There needs to be some discussion about the scope of backporting. The first
>> patch preventing the stall is obviously a backport candidate. The remaining
>> series can be obviously argued about, but in my opinion it should be
>> backported as well as it prevents stupid or malicious user space from
>> generating tons of pointless timer interrupts.
>
> Peter and me just discussed it over IRC. With the clockevents prevention
> in place, the effect of stupid/malicious code is pretty much affecting
> only the user space task itself. As the timer is forced to expire once
> the clockevent device has been force armed, it won't have other side
> effects as device interrupts or IPIs are not blocked out and in the
> worst case marginally delayed by the high frequency timer interrupt.
>
> Once the task is scheduled out that subsides as there is nothing which
> re-arms the timer anymore.
>
> So we should be fine with backporting the clockevents fix and leave the
> other parts of the series for upstream only. I still need to investigate
> how all of that affects the pending changes vs. TSC deadline timer (and
> similar devices) which are not going to reach that modified clockevents
> code anymore.

It's pretty much the same as the above with the difference that a timer
armed in the past will result in an instantaneous interrupt as the
coupled event devices must provide a less than or equal comparator. So
again the task can only delay itself with hrtimer interrupts.

Thanks,

        tglx



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
  2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
                   ` (12 preceding siblings ...)
  2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
@ 2026-04-07 17:38 ` Calvin Owens
  2026-04-07 18:03   ` Thomas Gleixner
  13 siblings, 1 reply; 39+ messages in thread
From: Calvin Owens @ 2026-04-07 17:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tuesday 04/07 at 10:54 +0200, Thomas Gleixner wrote:
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space:
> 
>   https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
> 
> He provided a reproducer, which sets up a timerfd based timer and then
> rearms it in a loop with an absolute expiry time of 1ns.

The original AMD machines survive the reproducer with this series.

Tested-by: Calvin Owens <calvin@wbinvd.org>

I'm happy to test subsets of it and stable backports too, if that's
helpful, just let me know.

Thanks,
Calvin

> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
> 
> The first patch in the series changes the clockevent set next event
> mechanism to prevent reprogramming of the clockevent device when the
> minimum delta value was programmed unless the new delta is larger than
> that. It's a less convoluted variant of the patch which was posted in the
> above linked thread and was confirmed to prevent the starvation problem.
> 
> But that's only to be considered the last resort because it results in an
> insane amount of avoidable hrtimer interrupts.
> 
> The problem of user controlled timers is that the input value is only
> sanity checked vs. validity of the provided timespec and clamped to be in
> the maximum allowable range. But for performance reasons for in kernel
> usage there is no check whether a to be armed timer might have been expired
> already at enqueue time.
> 
> The rest of the series addresses this by providing a separate interface to
> arm user controlled timers. This works the same way as the existing
> hrtimer_start_range_ns(), but in case that the timer ends up as the first
> timer in the clock base after enqueue it provides additional checks:
> 
>       - Whether the timer becomes the first expiring timer in the CPU base.
> 
>       	If not the timer is considered to expire in the future as there is
> 	already an earlier event programmed.
> 
>       - Whether the timer has expired already by comparing the expiry value
>         against current time.
> 
> 	If it is expired, the timer is removed from the clock base and the
> 	function returns false, so that the caller can handle it. That's
> 	required because the function cannot invoke the callback as that
> 	might need to acquire a lock which is held by the caller.
> 
> This function is then used for the user controlled timer arming interfaces
> mainly by converting hrtimer sleeper over to it. That affects a few in
> kernel users too, but the overhead is minimal in that case and it spares a
> tedious whack the mole game all over the tree.
> 
> The other usage sites in posixtimers, alarmtimers and timerfd are converted
> as well, which should cover the vast majority of user space controllable
> timers as far as my investigation goes.
> 
> The series applies against Linux tree and is also available from git:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hrtimer-exp-v1
> 
> There needs to be some discussion about the scope of backporting. The first
> patch preventing the stall is obviously a backport candidate. The remaining
> series can be obviously argued about, but in my opinion it should be
> backported as well as it prevents stupid or malicious user space from
> generating tons of pointless timer interrupts.
> 
> Thanks,
> 
> 	tglx
> ---
>  drivers/power/supply/charger-manager.c |   12 +-
>  fs/timerfd.c                           |  115 +++++++++++++++-----------
>  include/linux/alarmtimer.h             |    9 +-
>  include/linux/clockchips.h             |    2 
>  include/linux/hrtimer.h                |   20 +++-
>  include/trace/events/timer.h           |   13 +++
>  kernel/time/alarmtimer.c               |   70 +++++++---------
>  kernel/time/clockevents.c              |   23 +++--
>  kernel/time/hrtimer.c                  |  142 +++++++++++++++++++++++++++++----
>  kernel/time/posix-cpu-timers.c         |   18 ++--
>  kernel/time/posix-timers.c             |   35 +++++---
>  kernel/time/posix-timers.h             |    4 
>  kernel/time/tick-common.c              |    1 
>  kernel/time/tick-sched.c               |    1 
>  net/netfilter/xt_IDLETIMER.c           |   24 ++++-
>  15 files changed, 341 insertions(+), 148 deletions(-)
> 
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
  2026-04-07 16:08     ` Thomas Gleixner
@ 2026-04-07 18:01       ` Thomas Gleixner
  0 siblings, 0 replies; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 18:01 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: LKML, Calvin Owens, Peter Zijlstra, Anna-Maria Behnsen,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 18:08, Thomas Gleixner wrote:
> On Tue, Apr 07 2026 at 16:00, Frederic Weisbecker wrote:
> As I had to introduce the flag and prevent the other scenraio I just
> consolidated everything into one code path.

Just for the record. This whole -ETIME handling is restricted to HPET
like devices where the hardware people decided it's a brilliant idea to
build a 'equal' comparator and then followed up by making the write
posted to reduce the costs of the write. The original direct write was
already flawed vs. NMI/SMI, but the delayed commit into the comparator
made it insanely broken.

AFAICT that's a handful devices in the zoo of clockevent IPs the kernel
supports, so I'm really not too worried about the impact of this change.
Any sane hardware will never hit that code path so no point for
optimizing for it.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
  2026-04-07 17:38 ` Calvin Owens
@ 2026-04-07 18:03   ` Thomas Gleixner
  2026-04-07 18:35     ` Calvin Owens
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Gleixner @ 2026-04-07 18:03 UTC (permalink / raw)
  To: Calvin Owens
  Cc: LKML, Peter Zijlstra, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tue, Apr 07 2026 at 10:38, Calvin Owens wrote:
> On Tuesday 04/07 at 10:54 +0200, Thomas Gleixner wrote:
>> He provided a reproducer, which sets up a timerfd based timer and then
>> rearms it in a loop with an absolute expiry time of 1ns.
>
> The original AMD machines survive the reproducer with this series.
>
> Tested-by: Calvin Owens <calvin@wbinvd.org>
>
> I'm happy to test subsets of it and stable backports too, if that's
> helpful, just let me know.

We'll only backport the first patch, so confirming that it still
prevents the issue would be nice. The rest is slated for upstream only.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
  2026-04-07 18:03   ` Thomas Gleixner
@ 2026-04-07 18:35     ` Calvin Owens
  0 siblings, 0 replies; 39+ messages in thread
From: Calvin Owens @ 2026-04-07 18:35 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Peter Zijlstra, Anna-Maria Behnsen, Frederic Weisbecker,
	Ingo Molnar, John Stultz, Stephen Boyd, Alexander Viro,
	Christian Brauner, Jan Kara, linux-fsdevel, Sebastian Reichel,
	linux-pm, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	netfilter-devel, coreteam

On Tuesday 04/07 at 20:03 +0200, Thomas Gleixner wrote:
> On Tue, Apr 07 2026 at 10:38, Calvin Owens wrote:
> > On Tuesday 04/07 at 10:54 +0200, Thomas Gleixner wrote:
> >> He provided a reproducer, which sets up a timerfd based timer and then
> >> rearms it in a loop with an absolute expiry time of 1ns.
> >
> > The original AMD machines survive the reproducer with this series.
> >
> > Tested-by: Calvin Owens <calvin@wbinvd.org>
> >
> > I'm happy to test subsets of it and stable backports too, if that's
> > helpful, just let me know.
> 
> We'll only backport the first patch, so confirming that it still
> prevents the issue would be nice. The rest is slated for upstream only.

Confirmed, [1/12] alone passes.

Thanks,
Calvin

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2026-04-07 18:35 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
2026-04-07  9:42   ` Peter Zijlstra
2026-04-07 11:30     ` Thomas Gleixner
2026-04-07 11:49       ` Peter Zijlstra
2026-04-07 13:59         ` Thomas Gleixner
2026-04-07 14:00   ` Frederic Weisbecker
2026-04-07 16:08     ` Thomas Gleixner
2026-04-07 18:01       ` Thomas Gleixner
2026-04-07 14:33   ` Thomas Gleixner
2026-04-07  8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
2026-04-07  9:54   ` Peter Zijlstra
2026-04-07 11:32     ` Thomas Gleixner
2026-04-07  9:57   ` Peter Zijlstra
2026-04-07 11:34     ` Thomas Gleixner
2026-04-07  8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
2026-04-07  9:59   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
2026-04-07 10:00   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
2026-04-07 10:01   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
2026-04-07 10:01   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
2026-04-07 10:04   ` Peter Zijlstra
2026-04-07 11:34     ` Thomas Gleixner
2026-04-07  8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
2026-04-07  8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
2026-04-07 10:09   ` Peter Zijlstra
2026-04-07 11:41     ` Thomas Gleixner
2026-04-07  8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
2026-04-07 10:11   ` Peter Zijlstra
2026-04-07  8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
2026-04-07  8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 16:17   ` Thomas Gleixner
2026-04-07 17:38 ` Calvin Owens
2026-04-07 18:03   ` Thomas Gleixner
2026-04-07 18:35     ` Calvin Owens

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox