From: Calvin Owens <calvin@wbinvd.org>
To: Thomas Gleixner <tglx@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
Frederic Weisbecker <frederic@kernel.org>,
Ingo Molnar <mingo@kernel.org>, John Stultz <jstultz@google.com>,
Stephen Boyd <sboyd@kernel.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
linux-fsdevel@vger.kernel.org, Sebastian Reichel <sre@kernel.org>,
linux-pm@vger.kernel.org, Pablo Neira Ayuso <pablo@netfilter.org>,
Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
netfilter-devel@vger.kernel.org, coreteam@netfilter.org
Subject: Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
Date: Tue, 7 Apr 2026 10:38:06 -0700 [thread overview]
Message-ID: <adVA_uv1srA_bsKj@mozart.vkv.me> (raw)
In-Reply-To: <20260407083219.478203185@kernel.org>
On Tuesday 04/07 at 10:54 +0200, Thomas Gleixner wrote:
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space:
>
> https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
>
> He provided a reproducer, which sets up a timerfd based timer and then
> rearms it in a loop with an absolute expiry time of 1ns.
The original AMD machines survive the reproducer with this series.
Tested-by: Calvin Owens <calvin@wbinvd.org>
I'm happy to test subsets of it and stable backports too, if that's
helpful, just let me know.
Thanks,
Calvin
> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
>
> The first patch in the series changes the clockevent set next event
> mechanism to prevent reprogramming of the clockevent device when the
> minimum delta value was programmed unless the new delta is larger than
> that. It's a less convoluted variant of the patch which was posted in the
> above linked thread and was confirmed to prevent the starvation problem.
>
> But that's only to be considered the last resort because it results in an
> insane amount of avoidable hrtimer interrupts.
>
> The problem of user controlled timers is that the input value is only
> sanity checked vs. validity of the provided timespec and clamped to be in
> the maximum allowable range. But for performance reasons for in kernel
> usage there is no check whether a to be armed timer might have been expired
> already at enqueue time.
>
> The rest of the series addresses this by providing a separate interface to
> arm user controlled timers. This works the same way as the existing
> hrtimer_start_range_ns(), but in case that the timer ends up as the first
> timer in the clock base after enqueue it provides additional checks:
>
> - Whether the timer becomes the first expiring timer in the CPU base.
>
> If not the timer is considered to expire in the future as there is
> already an earlier event programmed.
>
> - Whether the timer has expired already by comparing the expiry value
> against current time.
>
> If it is expired, the timer is removed from the clock base and the
> function returns false, so that the caller can handle it. That's
> required because the function cannot invoke the callback as that
> might need to acquire a lock which is held by the caller.
>
> This function is then used for the user controlled timer arming interfaces
> mainly by converting hrtimer sleeper over to it. That affects a few in
> kernel users too, but the overhead is minimal in that case and it spares a
> tedious whack the mole game all over the tree.
>
> The other usage sites in posixtimers, alarmtimers and timerfd are converted
> as well, which should cover the vast majority of user space controllable
> timers as far as my investigation goes.
>
> The series applies against Linux tree and is also available from git:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hrtimer-exp-v1
>
> There needs to be some discussion about the scope of backporting. The first
> patch preventing the stall is obviously a backport candidate. The remaining
> series can be obviously argued about, but in my opinion it should be
> backported as well as it prevents stupid or malicious user space from
> generating tons of pointless timer interrupts.
>
> Thanks,
>
> tglx
> ---
> drivers/power/supply/charger-manager.c | 12 +-
> fs/timerfd.c | 115 +++++++++++++++-----------
> include/linux/alarmtimer.h | 9 +-
> include/linux/clockchips.h | 2
> include/linux/hrtimer.h | 20 +++-
> include/trace/events/timer.h | 13 +++
> kernel/time/alarmtimer.c | 70 +++++++---------
> kernel/time/clockevents.c | 23 +++--
> kernel/time/hrtimer.c | 142 +++++++++++++++++++++++++++++----
> kernel/time/posix-cpu-timers.c | 18 ++--
> kernel/time/posix-timers.c | 35 +++++---
> kernel/time/posix-timers.h | 4
> kernel/time/tick-common.c | 1
> kernel/time/tick-sched.c | 1
> net/netfilter/xt_IDLETIMER.c | 24 ++++-
> 15 files changed, 341 insertions(+), 148 deletions(-)
>
>
next prev parent reply other threads:[~2026-04-07 17:38 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-07 8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
2026-04-07 9:42 ` Peter Zijlstra
2026-04-07 11:30 ` Thomas Gleixner
2026-04-07 11:49 ` Peter Zijlstra
2026-04-07 13:59 ` Thomas Gleixner
2026-04-07 14:00 ` Frederic Weisbecker
2026-04-07 16:08 ` Thomas Gleixner
2026-04-07 18:01 ` Thomas Gleixner
2026-04-07 14:33 ` Thomas Gleixner
2026-04-08 12:41 ` Thomas Weißschuh
2026-04-08 13:55 ` Thomas Weißschuh
2026-04-08 15:18 ` Thomas Gleixner
2026-04-08 14:15 ` Frederic Weisbecker
2026-04-10 20:52 ` Nathan Chancellor
2026-04-10 21:02 ` Thomas Gleixner
2026-04-10 21:13 ` Nathan Chancellor
2026-04-07 8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
2026-04-07 9:54 ` Peter Zijlstra
2026-04-07 11:32 ` Thomas Gleixner
2026-04-07 9:57 ` Peter Zijlstra
2026-04-07 11:34 ` Thomas Gleixner
2026-04-07 8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
2026-04-07 9:59 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
2026-04-07 10:00 ` Peter Zijlstra
2026-04-07 20:20 ` John Stultz
2026-04-07 8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
2026-04-07 10:01 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
2026-04-07 10:01 ` Peter Zijlstra
2026-04-07 8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
2026-04-07 10:04 ` Peter Zijlstra
2026-04-07 11:34 ` Thomas Gleixner
2026-04-07 20:23 ` John Stultz
2026-04-07 8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
2026-04-07 20:19 ` John Stultz
2026-04-07 8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
2026-04-07 10:09 ` Peter Zijlstra
2026-04-07 11:41 ` Thomas Gleixner
2026-04-07 8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
2026-04-07 10:11 ` Peter Zijlstra
2026-04-07 8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
2026-04-07 8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
2026-04-07 20:21 ` John Stultz
2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 16:17 ` Thomas Gleixner
2026-04-07 17:38 ` Calvin Owens [this message]
2026-04-07 18:03 ` Thomas Gleixner
2026-04-07 18:35 ` Calvin Owens
2026-04-07 20:58 ` Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=adVA_uv1srA_bsKj@mozart.vkv.me \
--to=calvin@wbinvd.org \
--cc=anna-maria@linutronix.de \
--cc=brauner@kernel.org \
--cc=coreteam@netfilter.org \
--cc=frederic@kernel.org \
--cc=fw@strlen.de \
--cc=jack@suse.cz \
--cc=jstultz@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=peterz@infradead.org \
--cc=phil@nwl.cc \
--cc=sboyd@kernel.org \
--cc=sre@kernel.org \
--cc=tglx@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox