From: Thomas Gleixner <tglx@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Calvin Owens <calvin@wbinvd.org>,
Peter Zijlstra <peterz@infradead.org>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@kernel.org>, John Stultz <jstultz@google.com>,
Stephen Boyd <sboyd@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, Sebastian Reichel <sre@kernel.org>,
linux-pm@vger.kernel.org, Pablo Neira Ayuso <pablo@netfilter.org>,
Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org
Subject: [patch 01/12] clockevents: Prevent timer interrupt starvation
Date: Tue, 07 Apr 2026 10:54:17 +0200 [thread overview]
Message-ID: <20260407083247.562657657@kernel.org> (raw)
In-Reply-To: 20260407083219.478203185@kernel.org
From: Thomas Gleixner <tglx@kernel.org>
Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
up in user space. He provided a reproducer, which sets up a timerfd based
timer and then rearms it in a loop with an absolute expiry time of 1ns.
As the expiry time is in the past, the timer ends up as the first expiring
timer in the per CPU hrtimer base and the clockevent device is programmed
with the minimum delta value. If the machine is fast enough, this ends up
in a endless loop of programming the delta value to the minimum value
defined by the clock event device, before the timer interrupt can fire,
which starves the interrupt and consequently triggers the lockup detector
because the hrtimer callback of the lockup mechanism is never invoked.
As a first step to prevent this, avoid reprogramming the clock event device
when:
- a forced minimum delta event is pending
- the new expiry delta is less then or equal to the minimum delta
Thanks to Calvin for providing the reproducer and to Borislav for testing
and providing data from his Zen5 machine.
The problem is not limited to Zen5, but depending on the underlying
clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
not necessarily observable.
This change serves only as the last resort and further changes will be made
to prevent this scenario earlier in the call chain as far as possible.
Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
Reported-by: Calvin Owens <calvin@wbinvd.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
---
V2: Simplified the clockevents code - Peter
---
include/linux/clockchips.h | 2 ++
kernel/time/clockevents.c | 23 +++++++++++++++--------
kernel/time/hrtimer.c | 1 +
kernel/time/tick-common.c | 1 +
kernel/time/tick-sched.c | 1 +
5 files changed, 20 insertions(+), 8 deletions(-)
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -80,6 +80,7 @@ enum clock_event_state {
* @shift: nanoseconds to cycles divisor (power of two)
* @state_use_accessors:current state of the device, assigned by the core code
* @features: features
+ * @next_event_forced: True if the last programming was a forced event
* @retries: number of forced programming retries
* @set_state_periodic: switch state to periodic
* @set_state_oneshot: switch state to oneshot
@@ -108,6 +109,7 @@ struct clock_event_device {
u32 shift;
enum clock_event_state state_use_accessors;
unsigned int features;
+ unsigned int next_event_forced;
unsigned long retries;
int (*set_state_periodic)(struct clock_event_device *);
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -172,6 +172,7 @@ void clockevents_shutdown(struct clock_e
{
clockevents_switch_state(dev, CLOCK_EVT_STATE_SHUTDOWN);
dev->next_event = KTIME_MAX;
+ dev->next_event_forced = 0;
}
/**
@@ -305,7 +306,6 @@ int clockevents_program_event(struct clo
{
unsigned long long clc;
int64_t delta;
- int rc;
if (WARN_ON_ONCE(expires < 0))
return -ETIME;
@@ -324,16 +324,23 @@ int clockevents_program_event(struct clo
return dev->set_next_ktime(expires, dev);
delta = ktime_to_ns(ktime_sub(expires, ktime_get()));
- if (delta <= 0)
- return force ? clockevents_program_min_delta(dev) : -ETIME;
- delta = min(delta, (int64_t) dev->max_delta_ns);
- delta = max(delta, (int64_t) dev->min_delta_ns);
+ if (delta > (int64_t)dev->min_delta_ns) {
+ delta = min(delta, (int64_t) dev->max_delta_ns);
+ clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
+ if (!dev->set_next_event((unsigned long) clc, dev))
+ return 0;
+ }
- clc = ((unsigned long long) delta * dev->mult) >> dev->shift;
- rc = dev->set_next_event((unsigned long) clc, dev);
+ if (dev->next_event_forced)
+ return 0;
- return (rc && force) ? clockevents_program_min_delta(dev) : rc;
+ if (dev->set_next_event(dev->min_delta_ticks, dev)) {
+ if (!force || clockevents_program_min_delta(dev))
+ return -ETIME;
+ }
+ dev->next_event_forced = 1;
+ return 0;
}
/*
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -1888,6 +1888,7 @@ void hrtimer_interrupt(struct clock_even
BUG_ON(!cpu_base->hres_active);
cpu_base->nr_events++;
dev->next_event = KTIME_MAX;
+ dev->next_event_forced = 0;
raw_spin_lock_irqsave(&cpu_base->lock, flags);
entry_time = now = hrtimer_update_base(cpu_base);
--- a/kernel/time/tick-common.c
+++ b/kernel/time/tick-common.c
@@ -110,6 +110,7 @@ void tick_handle_periodic(struct clock_e
int cpu = smp_processor_id();
ktime_t next = dev->next_event;
+ dev->next_event_forced = 0;
tick_periodic(cpu);
/*
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1513,6 +1513,7 @@ static void tick_nohz_lowres_handler(str
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
dev->next_event = KTIME_MAX;
+ dev->next_event_forced = 0;
if (likely(tick_nohz_handler(&ts->sched_timer) == HRTIMER_RESTART))
tick_program_event(hrtimer_get_expires(&ts->sched_timer), 1);
next prev parent reply other threads:[~2026-04-07 8:54 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 8:54 ` Thomas Gleixner [this message]
2026-04-07 9:42 ` [patch 01/12] clockevents: Prevent timer " Peter Zijlstra
2026-04-07 11:30 ` Thomas Gleixner
2026-04-07 11:49 ` Peter Zijlstra
2026-04-07 13:59 ` Thomas Gleixner
2026-04-07 14:00 ` Frederic Weisbecker
2026-04-07 16:08 ` Thomas Gleixner
2026-04-07 18:01 ` Thomas Gleixner
2026-04-07 14:33 ` Thomas Gleixner
2026-04-07 8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
2026-04-07 9:54 ` Peter Zijlstra
2026-04-07 11:32 ` Thomas Gleixner
2026-04-07 9:57 ` Peter Zijlstra
2026-04-07 11:34 ` Thomas Gleixner
2026-04-07 8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
2026-04-07 9:59 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
2026-04-07 10:00 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
2026-04-07 10:01 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
2026-04-07 10:01 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
2026-04-07 10:04 ` Peter Zijlstra
2026-04-07 11:34 ` Thomas Gleixner
2026-04-07 8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
2026-04-07 8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
2026-04-07 10:09 ` Peter Zijlstra
2026-04-07 11:41 ` Thomas Gleixner
2026-04-07 8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
2026-04-07 10:11 ` Peter Zijlstra
2026-04-07 8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
2026-04-07 8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 16:17 ` Thomas Gleixner
2026-04-07 17:38 ` Calvin Owens
2026-04-07 18:03 ` Thomas Gleixner
2026-04-07 18:35 ` Calvin Owens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260407083247.562657657@kernel.org \
--to=tglx@kernel.org \
--cc=anna-maria@linutronix.de \
--cc=brauner@kernel.org \
--cc=calvin@wbinvd.org \
--cc=coreteam@netfilter.org \
--cc=frederic@kernel.org \
--cc=fw@strlen.de \
--cc=jack@suse.cz \
--cc=jstultz@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=peterz@infradead.org \
--cc=phil@nwl.cc \
--cc=sboyd@kernel.org \
--cc=sre@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox