From: Thomas Gleixner <tglx@kernel.org>
To: LKML <linux-kernel@vger.kernel.org>
Cc: Calvin Owens <calvin@wbinvd.org>,
Peter Zijlstra <peterz@infradead.org>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@kernel.org>, John Stultz <jstultz@google.com>,
Stephen Boyd <sboyd@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, Sebastian Reichel <sre@kernel.org>,
linux-pm@vger.kernel.org, Pablo Neira Ayuso <pablo@netfilter.org>,
Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org
Subject: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
Date: Tue, 07 Apr 2026 10:54:12 +0200 [thread overview]
Message-ID: <20260407083219.478203185@kernel.org> (raw)
Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
up in user space:
https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
He provided a reproducer, which sets up a timerfd based timer and then
rearms it in a loop with an absolute expiry time of 1ns.
As the expiry time is in the past, the timer ends up as the first expiring
timer in the per CPU hrtimer base and the clockevent device is programmed
with the minimum delta value. If the machine is fast enough, this ends up
in a endless loop of programming the delta value to the minimum value
defined by the clock event device, before the timer interrupt can fire,
which starves the interrupt and consequently triggers the lockup detector
because the hrtimer callback of the lockup mechanism is never invoked.
The first patch in the series changes the clockevent set next event
mechanism to prevent reprogramming of the clockevent device when the
minimum delta value was programmed unless the new delta is larger than
that. It's a less convoluted variant of the patch which was posted in the
above linked thread and was confirmed to prevent the starvation problem.
But that's only to be considered the last resort because it results in an
insane amount of avoidable hrtimer interrupts.
The problem of user controlled timers is that the input value is only
sanity checked vs. validity of the provided timespec and clamped to be in
the maximum allowable range. But for performance reasons for in kernel
usage there is no check whether a to be armed timer might have been expired
already at enqueue time.
The rest of the series addresses this by providing a separate interface to
arm user controlled timers. This works the same way as the existing
hrtimer_start_range_ns(), but in case that the timer ends up as the first
timer in the clock base after enqueue it provides additional checks:
- Whether the timer becomes the first expiring timer in the CPU base.
If not the timer is considered to expire in the future as there is
already an earlier event programmed.
- Whether the timer has expired already by comparing the expiry value
against current time.
If it is expired, the timer is removed from the clock base and the
function returns false, so that the caller can handle it. That's
required because the function cannot invoke the callback as that
might need to acquire a lock which is held by the caller.
This function is then used for the user controlled timer arming interfaces
mainly by converting hrtimer sleeper over to it. That affects a few in
kernel users too, but the overhead is minimal in that case and it spares a
tedious whack the mole game all over the tree.
The other usage sites in posixtimers, alarmtimers and timerfd are converted
as well, which should cover the vast majority of user space controllable
timers as far as my investigation goes.
The series applies against Linux tree and is also available from git:
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hrtimer-exp-v1
There needs to be some discussion about the scope of backporting. The first
patch preventing the stall is obviously a backport candidate. The remaining
series can be obviously argued about, but in my opinion it should be
backported as well as it prevents stupid or malicious user space from
generating tons of pointless timer interrupts.
Thanks,
tglx
---
drivers/power/supply/charger-manager.c | 12 +-
fs/timerfd.c | 115 +++++++++++++++-----------
include/linux/alarmtimer.h | 9 +-
include/linux/clockchips.h | 2
include/linux/hrtimer.h | 20 +++-
include/trace/events/timer.h | 13 +++
kernel/time/alarmtimer.c | 70 +++++++---------
kernel/time/clockevents.c | 23 +++--
kernel/time/hrtimer.c | 142 +++++++++++++++++++++++++++++----
kernel/time/posix-cpu-timers.c | 18 ++--
kernel/time/posix-timers.c | 35 +++++---
kernel/time/posix-timers.h | 4
kernel/time/tick-common.c | 1
kernel/time/tick-sched.c | 1
net/netfilter/xt_IDLETIMER.c | 24 ++++-
15 files changed, 341 insertions(+), 148 deletions(-)
next reply other threads:[~2026-04-07 8:54 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 8:54 Thomas Gleixner [this message]
2026-04-07 8:54 ` [patch 01/12] clockevents: Prevent timer interrupt starvation Thomas Gleixner
2026-04-07 9:42 ` Peter Zijlstra
2026-04-07 11:30 ` Thomas Gleixner
2026-04-07 11:49 ` Peter Zijlstra
2026-04-07 13:59 ` Thomas Gleixner
2026-04-07 14:00 ` Frederic Weisbecker
2026-04-07 16:08 ` Thomas Gleixner
2026-04-07 18:01 ` Thomas Gleixner
2026-04-07 14:33 ` Thomas Gleixner
2026-04-07 8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
2026-04-07 9:54 ` Peter Zijlstra
2026-04-07 11:32 ` Thomas Gleixner
2026-04-07 9:57 ` Peter Zijlstra
2026-04-07 11:34 ` Thomas Gleixner
2026-04-07 8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
2026-04-07 9:59 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
2026-04-07 10:00 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
2026-04-07 10:01 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
2026-04-07 10:01 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
2026-04-07 10:04 ` Peter Zijlstra
2026-04-07 11:34 ` Thomas Gleixner
2026-04-07 8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
2026-04-07 8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
2026-04-07 10:09 ` Peter Zijlstra
2026-04-07 11:41 ` Thomas Gleixner
2026-04-07 8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
2026-04-07 10:11 ` Peter Zijlstra
2026-04-07 8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
2026-04-07 8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 16:17 ` Thomas Gleixner
2026-04-07 17:38 ` Calvin Owens
2026-04-07 18:03 ` Thomas Gleixner
2026-04-07 18:35 ` Calvin Owens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260407083219.478203185@kernel.org \
--to=tglx@kernel.org \
--cc=anna-maria@linutronix.de \
--cc=brauner@kernel.org \
--cc=calvin@wbinvd.org \
--cc=coreteam@netfilter.org \
--cc=frederic@kernel.org \
--cc=fw@strlen.de \
--cc=jack@suse.cz \
--cc=jstultz@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=peterz@infradead.org \
--cc=phil@nwl.cc \
--cc=sboyd@kernel.org \
--cc=sre@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox