public inbox for linux-pm@vger.kernel.org
 help / color / mirror / Atom feed
From: Calvin Owens <calvin@wbinvd.org>
To: Thomas Gleixner <tglx@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@kernel.org>, John Stultz <jstultz@google.com>,
	Stephen Boyd <sboyd@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, Sebastian Reichel <sre@kernel.org>,
	linux-pm@vger.kernel.org, Pablo Neira Ayuso <pablo@netfilter.org>,
	Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org
Subject: Re: [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation
Date: Tue, 7 Apr 2026 10:38:06 -0700	[thread overview]
Message-ID: <adVA_uv1srA_bsKj@mozart.vkv.me> (raw)
In-Reply-To: <20260407083219.478203185@kernel.org>

On Tuesday 04/07 at 10:54 +0200, Thomas Gleixner wrote:
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space:
> 
>   https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/
> 
> He provided a reproducer, which sets up a timerfd based timer and then
> rearms it in a loop with an absolute expiry time of 1ns.

The original AMD machines survive the reproducer with this series.

Tested-by: Calvin Owens <calvin@wbinvd.org>

I'm happy to test subsets of it and stable backports too, if that's
helpful, just let me know.

Thanks,
Calvin

> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
> 
> The first patch in the series changes the clockevent set next event
> mechanism to prevent reprogramming of the clockevent device when the
> minimum delta value was programmed unless the new delta is larger than
> that. It's a less convoluted variant of the patch which was posted in the
> above linked thread and was confirmed to prevent the starvation problem.
> 
> But that's only to be considered the last resort because it results in an
> insane amount of avoidable hrtimer interrupts.
> 
> The problem of user controlled timers is that the input value is only
> sanity checked vs. validity of the provided timespec and clamped to be in
> the maximum allowable range. But for performance reasons for in kernel
> usage there is no check whether a to be armed timer might have been expired
> already at enqueue time.
> 
> The rest of the series addresses this by providing a separate interface to
> arm user controlled timers. This works the same way as the existing
> hrtimer_start_range_ns(), but in case that the timer ends up as the first
> timer in the clock base after enqueue it provides additional checks:
> 
>       - Whether the timer becomes the first expiring timer in the CPU base.
> 
>       	If not the timer is considered to expire in the future as there is
> 	already an earlier event programmed.
> 
>       - Whether the timer has expired already by comparing the expiry value
>         against current time.
> 
> 	If it is expired, the timer is removed from the clock base and the
> 	function returns false, so that the caller can handle it. That's
> 	required because the function cannot invoke the callback as that
> 	might need to acquire a lock which is held by the caller.
> 
> This function is then used for the user controlled timer arming interfaces
> mainly by converting hrtimer sleeper over to it. That affects a few in
> kernel users too, but the overhead is minimal in that case and it spares a
> tedious whack the mole game all over the tree.
> 
> The other usage sites in posixtimers, alarmtimers and timerfd are converted
> as well, which should cover the vast majority of user space controllable
> timers as far as my investigation goes.
> 
> The series applies against Linux tree and is also available from git:
> 
>     git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git hrtimer-exp-v1
> 
> There needs to be some discussion about the scope of backporting. The first
> patch preventing the stall is obviously a backport candidate. The remaining
> series can be obviously argued about, but in my opinion it should be
> backported as well as it prevents stupid or malicious user space from
> generating tons of pointless timer interrupts.
> 
> Thanks,
> 
> 	tglx
> ---
>  drivers/power/supply/charger-manager.c |   12 +-
>  fs/timerfd.c                           |  115 +++++++++++++++-----------
>  include/linux/alarmtimer.h             |    9 +-
>  include/linux/clockchips.h             |    2 
>  include/linux/hrtimer.h                |   20 +++-
>  include/trace/events/timer.h           |   13 +++
>  kernel/time/alarmtimer.c               |   70 +++++++---------
>  kernel/time/clockevents.c              |   23 +++--
>  kernel/time/hrtimer.c                  |  142 +++++++++++++++++++++++++++++----
>  kernel/time/posix-cpu-timers.c         |   18 ++--
>  kernel/time/posix-timers.c             |   35 +++++---
>  kernel/time/posix-timers.h             |    4 
>  kernel/time/tick-common.c              |    1 
>  kernel/time/tick-sched.c               |    1 
>  net/netfilter/xt_IDLETIMER.c           |   24 ++++-
>  15 files changed, 341 insertions(+), 148 deletions(-)
> 
> 

  parent reply	other threads:[~2026-04-07 17:38 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
2026-04-07  9:42   ` Peter Zijlstra
2026-04-07 11:30     ` Thomas Gleixner
2026-04-07 11:49       ` Peter Zijlstra
2026-04-07 13:59         ` Thomas Gleixner
2026-04-07 14:00   ` Frederic Weisbecker
2026-04-07 16:08     ` Thomas Gleixner
2026-04-07 18:01       ` Thomas Gleixner
2026-04-07 14:33   ` Thomas Gleixner
2026-04-08 12:41   ` Thomas Weißschuh
2026-04-08 13:55     ` Thomas Weißschuh
2026-04-08 15:18       ` Thomas Gleixner
2026-04-08 14:15   ` Frederic Weisbecker
2026-04-10 20:52   ` Nathan Chancellor
2026-04-10 21:02     ` Thomas Gleixner
2026-04-10 21:13       ` Nathan Chancellor
2026-04-07  8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
2026-04-07  9:54   ` Peter Zijlstra
2026-04-07 11:32     ` Thomas Gleixner
2026-04-07  9:57   ` Peter Zijlstra
2026-04-07 11:34     ` Thomas Gleixner
2026-04-07  8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
2026-04-07  9:59   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
2026-04-07 10:00   ` Peter Zijlstra
2026-04-07 20:20   ` John Stultz
2026-04-07  8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
2026-04-07 10:01   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
2026-04-07 10:01   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
2026-04-07 10:04   ` Peter Zijlstra
2026-04-07 11:34     ` Thomas Gleixner
2026-04-07 20:23   ` John Stultz
2026-04-07  8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
2026-04-07 20:19   ` John Stultz
2026-04-07  8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
2026-04-07 10:09   ` Peter Zijlstra
2026-04-07 11:41     ` Thomas Gleixner
2026-04-07  8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
2026-04-07 10:11   ` Peter Zijlstra
2026-04-07  8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
2026-04-07  8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
2026-04-07 20:21   ` John Stultz
2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 16:17   ` Thomas Gleixner
2026-04-07 17:38 ` Calvin Owens [this message]
2026-04-07 18:03   ` Thomas Gleixner
2026-04-07 18:35     ` Calvin Owens
2026-04-07 20:58       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adVA_uv1srA_bsKj@mozart.vkv.me \
    --to=calvin@wbinvd.org \
    --cc=anna-maria@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=coreteam@netfilter.org \
    --cc=frederic@kernel.org \
    --cc=fw@strlen.de \
    --cc=jack@suse.cz \
    --cc=jstultz@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=peterz@infradead.org \
    --cc=phil@nwl.cc \
    --cc=sboyd@kernel.org \
    --cc=sre@kernel.org \
    --cc=tglx@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox