public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nathan Chancellor <nathan@kernel.org>
To: Thomas Gleixner <tglx@kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
	Calvin Owens <calvin@wbinvd.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	Frederic Weisbecker <frederic@kernel.org>,
	Ingo Molnar <mingo@kernel.org>, John Stultz <jstultz@google.com>,
	Stephen Boyd <sboyd@kernel.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, Sebastian Reichel <sre@kernel.org>,
	linux-pm@vger.kernel.org, Pablo Neira Ayuso <pablo@netfilter.org>,
	Florian Westphal <fw@strlen.de>, Phil Sutter <phil@nwl.cc>,
	netfilter-devel@vger.kernel.org, coreteam@netfilter.org
Subject: Re: [patch 01/12] clockevents: Prevent timer interrupt starvation
Date: Fri, 10 Apr 2026 13:52:03 -0700	[thread overview]
Message-ID: <20260410205203.GA3922321@ax162> (raw)
In-Reply-To: <20260407083247.562657657@kernel.org>

Hi Thomas,

On Tue, Apr 07, 2026 at 10:54:17AM +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@kernel.org>
> 
> Calvin reported an odd NMI watchdog lockup which claims that the CPU locked
> up in user space. He provided a reproducer, which sets up a timerfd based
> timer and then rearms it in a loop with an absolute expiry time of 1ns.
> 
> As the expiry time is in the past, the timer ends up as the first expiring
> timer in the per CPU hrtimer base and the clockevent device is programmed
> with the minimum delta value. If the machine is fast enough, this ends up
> in a endless loop of programming the delta value to the minimum value
> defined by the clock event device, before the timer interrupt can fire,
> which starves the interrupt and consequently triggers the lockup detector
> because the hrtimer callback of the lockup mechanism is never invoked.
> 
> As a first step to prevent this, avoid reprogramming the clock event device
> when:
>      - a forced minimum delta event is pending
>      - the new expiry delta is less then or equal to the minimum delta
> 
> Thanks to Calvin for providing the reproducer and to Borislav for testing
> and providing data from his Zen5 machine.
> 
> The problem is not limited to Zen5, but depending on the underlying
> clock event device (e.g. TSC deadline timer on Intel) and the CPU speed
> not necessarily observable.
> 
> This change serves only as the last resort and further changes will be made
> to prevent this scenario earlier in the call chain as far as possible.
> 
> Fixes: d316c57ff6bf ("[PATCH] clockevents: add core functionality")
> Reported-by: Calvin Owens <calvin@wbinvd.org>
> Signed-off-by: Thomas Gleixner <tglx@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Anna-Maria Behnsen <anna-maria@linutronix.de>
> Cc: Frederic Weisbecker <frederic@kernel.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Link: https://lore.kernel.org/lkml/acMe-QZUel-bBYUh@mozart.vkv.me/

This change in -next as commit 1c2eabb8805d ("clockevents: Prevent timer
interrupt starvation") appears to make one of my test machines
consistently lock up on boot (at least I never get to userspace). Most
of the time I get stall messages such as

  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:    14-...!: (20 GPs behind) idle=f380/0/0x0 softirq=1272/1273 fqs=4 (false positive?)
  rcu:    (detected by 2, t=60002 jiffies, g=3673, q=12382 ncpus=16)
  Sending NMI from CPU 2 to CPUs 14:
  NMI backtrace for cpu 14 skipped: idling at cpu_idle_poll.isra.0+0x50/0x170
  rcu: rcu_preempt kthread timer wakeup didn't happen for 59984 jiffies! g3673 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
  rcu:    Possible timer handling issue on cpu=4 timer-softirq=170
  rcu: rcu_preempt kthread starved for 59987 jiffies! g3673 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
  rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
  rcu: RCU grace-period kthread stack dump:
  task:rcu_preempt     state:I stack:0     pid:16    tgid:16    ppid:2      task_flags:0x208040 flags:0x00000010
  Call trace:
   __switch_to+0x100/0x1c8 (T)
   __schedule+0x2b0/0x710
   schedule+0x3c/0xc0
   schedule_timeout+0x88/0x128
   rcu_gp_fqs_loop+0x12c/0x640
   rcu_gp_kthread+0x308/0x350
   kthread+0x10c/0x128
   ret_from_fork+0x10/0x20
  rcu: Stack dump where RCU GP kthread last ran:
  Sending NMI from CPU 2 to CPUs 4:
  NMI backtrace for cpu 4 skipped: idling at cpu_idle_poll.isra.0+0x50/0x170
  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:    0-...!: (21 GPs behind) idle=a4a0/0/0x0 softirq=1775/1776 fqs=0 (false positive?)
  rcu:    3-...!: (28 GPs behind) idle=5b00/0/0x0 softirq=1437/1438 fqs=0 (false positive?)
  rcu:    7-...!: (21 GPs behind) idle=0c18/0/0x0 softirq=1658/1659 fqs=0 (false positive?)
  rcu:    8-...!: (21 GPs behind) idle=1418/0/0x0 softirq=1231/1231 fqs=0 (false positive?)
  rcu:    9-...!: (18 GPs behind) idle=1288/0/0x0 softirq=1440/1440 fqs=0 (false positive?)
  rcu:    12-...!: (21 GPs behind) idle=ae70/0/0x0 softirq=1339/1339 fqs=0 (false positive?)
  rcu:    13-...!: (28 GPs behind) idle=02c8/0/0x0 softirq=1785/1787 fqs=0 (false positive?)
  rcu:    14-...!: (21 GPs behind) idle=f428/0/0x0 softirq=1272/1273 fqs=0 (false positive?)
  rcu:    15-...!: (21 GPs behind) idle=0fb8/0/0x0 softirq=1562/1562 fqs=0 (false positive?)
  rcu:    (detected by 5, t=60002 jiffies, g=3677, q=12637 ncpus=16)
  Sending NMI from CPU 5 to CPUs 0:
  NMI backtrace for cpu 0 skipped: idling at cpu_idle_poll.isra.0+0x38/0x170
  Sending NMI from CPU 5 to CPUs 3:
  NMI backtrace for cpu 3 skipped: idling at cpu_idle_poll.isra.0+0x38/0x170
  Sending NMI from CPU 5 to CPUs 7:
  NMI backtrace for cpu 7 skipped: idling at cpu_idle_poll.isra.0+0x40/0x170
  Sending NMI from CPU 5 to CPUs 8:
  NMI backtrace for cpu 8 skipped: idling at cpu_idle_poll.isra.0+0x40/0x170
  Sending NMI from CPU 5 to CPUs 9:
  NMI backtrace for cpu 9 skipped: idling at cpu_idle_poll.isra.0+0x40/0x170
  Sending NMI from CPU 5 to CPUs 12:
  NMI backtrace for cpu 12 skipped: idling at cpu_idle_poll.isra.0+0x40/0x170
  Sending NMI from CPU 5 to CPUs 13:
  NMI backtrace for cpu 13 skipped: idling at cpu_idle_poll.isra.0+0x50/0x170
  Sending NMI from CPU 5 to CPUs 14:
  NMI backtrace for cpu 14 skipped: idling at cpu_idle_poll.isra.0+0x50/0x170
  Sending NMI from CPU 5 to CPUs 15:
  NMI backtrace for cpu 15 skipped: idling at cpu_idle_poll.isra.0+0x38/0x170
  rcu: rcu_preempt kthread timer wakeup didn't happen for 60008 jiffies! g3677 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
  rcu:    Possible timer handling issue on cpu=4 timer-softirq=170
  rcu: rcu_preempt kthread starved for 60011 jiffies! g3677 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=4
  rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
  rcu: RCU grace-period kthread stack dump:
  task:rcu_preempt     state:I stack:0     pid:16    tgid:16    ppid:2      task_flags:0x208040 flags:0x00000010
  Call trace:
   __switch_to+0x100/0x1c8 (T)
   __schedule+0x2b0/0x710
   schedule+0x3c/0xc0
   schedule_timeout+0x88/0x128
   rcu_gp_fqs_loop+0x12c/0x640
   rcu_gp_kthread+0x308/0x350
   kthread+0x10c/0x128
   ret_from_fork+0x10/0x20
  rcu: Stack dump where RCU GP kthread last ran:
  Sending NMI from CPU 5 to CPUs 4:
  NMI backtrace for cpu 4
  CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Not tainted 7.0.0-rc7-next-20260409 #1 PREEMPT(lazy)
  Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Jun 21 2022
  pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : tick_check_broadcast_expired+0x4/0x40
  lr : cpu_idle_poll.isra.0+0x54/0x170
  sp : ffff80008017be20
  x29: ffff80008017be20 x28: 0000000000000000 x27: 0000000000000000
  x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
  x23: 00000000000000c0 x22: ffffb3ce21fad000 x21: 0000000000000004
  x20: ffffb3ce21fadd50 x19: ffffb3ce21fad000 x18: 0000000000000004
  x17: 0000000000000000 x16: 0000000000000000 x15: ffffb3ce21fb3b98
  x14: ffffb3ce21788180 x13: 0000000000000000 x12: 000000124d69be59
  x11: 00000000000000c0 x10: 0000000000001c80 x9 : ffffb3ce1f8a6e68
  x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000004
  x5 : ffff00275c3682c8 x4 : 0000000000020a3c x3 : 0000000000000000
  x2 : 0000000000000004 x1 : ffffb3ce223ca0c0 x0 : ffff002020da2140
  Call trace:
   tick_check_broadcast_expired+0x4/0x40 (P)
   do_idle+0x64/0x130
   cpu_startup_entry+0x40/0x50
   secondary_start_kernel+0xe4/0x128
   __secondary_switched+0xc0/0xc8
  rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
  rcu:    0-...!: (22 GPs behind) idle=ae48/0/0x0 softirq=1775/1776 fqs=0 (false positive?)
  rcu:    3-...!: (29 GPs behind) idle=7ce8/0/0x0 softirq=1437/1438 fqs=0 (false positive?)
  rcu:    7-...!: (22 GPs behind) idle=0df8/0/0x0 softirq=1658/1659 fqs=0 (false positive?)
  rcu:    8-...!: (22 GPs behind) idle=1548/0/0x0 softirq=1231/1231 fqs=0 (false positive?)
  rcu:    9-...!: (19 GPs behind) idle=1360/0/0x0 softirq=1440/1440 fqs=0 (false positive?)
  rcu:    12-...!: (22 GPs behind) idle=af40/0/0x0 softirq=1339/1339 fqs=0 (false positive?)
  rcu:    13-...!: (29 GPs behind) idle=04e0/0/0x0 softirq=1785/1787 fqs=0 (false positive?)
  rcu:    14-...!: (22 GPs behind) idle=f528/0/0x0 softirq=1272/1273 fqs=0 (false positive?)
  rcu:    15-...!: (22 GPs behind) idle=0fd8/0/0x0 softirq=1562/1562 fqs=0 (false positive?)
  rcu:    (detected by 5, t=60002 jiffies, g=3681, q=13149 ncpus=16)

but other times, there is no output after it locks up. Is there any
initial information I can provide to help debug this? Reverting the
change on top of next-20260409 avoids the issue.

Cheers,
Nathan

# bad: [3fa7d958829eb9bc3b469ed07f11de3d2804ef71] Add linux-next specific files for 20260409
# good: [7f87a5ea75f011d2c9bc8ac0167e5e2d1adb1594] Merge tag 'hid-for-linus-2026040801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
git bisect start '3fa7d958829eb9bc3b469ed07f11de3d2804ef71' '7f87a5ea75f011d2c9bc8ac0167e5e2d1adb1594'
# bad: [443e04732ac2cdc17e3b90aa2345730a298fab37] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
git bisect bad 443e04732ac2cdc17e3b90aa2345730a298fab37
# bad: [ea33e83d9fa24b34e79c8df57b8927a8d94deb15] Merge branch 'xtensa-for-next' of https://github.com/jcmvbkbc/linux-xtensa.git
git bisect bad ea33e83d9fa24b34e79c8df57b8927a8d94deb15
# bad: [429057750b3d3a7477df48d17aa605dc47bc2344] Merge branch 'for-next/perf' of https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git
git bisect bad 429057750b3d3a7477df48d17aa605dc47bc2344
# bad: [e98894f89da72f392141d9eecf1c7a8f13faa67f] Merge branch 'mm-stable' of https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
git bisect bad e98894f89da72f392141d9eecf1c7a8f13faa67f
# good: [668937b7b2256f4b2a982e8f69b07d9ee8f81d36] mm: allow handling of stacked mmap_prepare hooks in more drivers
git bisect good 668937b7b2256f4b2a982e8f69b07d9ee8f81d36
# good: [a0fbc8dd44a27011537268e2a974b1180b848796] Merge branch 'dma-mapping-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux.git
git bisect good a0fbc8dd44a27011537268e2a974b1180b848796
# good: [8a23051ed8584215b22368e9501f771ef98f0c1d] Merge tag 'pin-init-v7.1' of https://github.com/Rust-for-Linux/linux into rust-next
git bisect good 8a23051ed8584215b22368e9501f771ef98f0c1d
# good: [716b25a9dc20f4fb94d521581331a0565a43f3bb] Merge branch 'urgent' of https://git.kernel.org/pub/scm/linux/kernel/git/efi/efi.git
git bisect good 716b25a9dc20f4fb94d521581331a0565a43f3bb
# bad: [1a49dc272e25dae6cbb506a02bb70e0201a1498e] Merge branch 'tip/urgent' of https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
git bisect bad 1a49dc272e25dae6cbb506a02bb70e0201a1498e
# good: [30023353b2171cd36b10615a788a985f5caa29e3] Merge branch into tip/master: 'sched/urgent'
git bisect good 30023353b2171cd36b10615a788a985f5caa29e3
# good: [34ef164adaf00982d5f45037a7e37689c4555271] Merge branch 'i2c/i2c-host-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux.git
git bisect good 34ef164adaf00982d5f45037a7e37689c4555271
# bad: [4fc7108ff756267ad53ecdeaa1e847d378887511] Merge branch into tip/master: 'timers/urgent'
git bisect bad 4fc7108ff756267ad53ecdeaa1e847d378887511
# bad: [1c2eabb8805d9fd79a19de5c76d4a64c9ad3cdf4] clockevents: Prevent timer interrupt starvation
git bisect bad 1c2eabb8805d9fd79a19de5c76d4a64c9ad3cdf4
# good: [82b915051d32a68ea3bbe261c93f5620699ff047] tick/nohz: Fix inverted return value in check_tick_dependency() fast path
git bisect good 82b915051d32a68ea3bbe261c93f5620699ff047
# first bad commit: [1c2eabb8805d9fd79a19de5c76d4a64c9ad3cdf4] clockevents: Prevent timer interrupt starvation

  parent reply	other threads:[~2026-04-10 20:52 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-07  8:54 [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07  8:54 ` [patch 01/12] clockevents: Prevent timer " Thomas Gleixner
2026-04-07  9:42   ` Peter Zijlstra
2026-04-07 11:30     ` Thomas Gleixner
2026-04-07 11:49       ` Peter Zijlstra
2026-04-07 13:59         ` Thomas Gleixner
2026-04-07 14:00   ` Frederic Weisbecker
2026-04-07 16:08     ` Thomas Gleixner
2026-04-07 18:01       ` Thomas Gleixner
2026-04-07 14:33   ` Thomas Gleixner
2026-04-07 20:23   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
2026-04-08 12:41   ` [patch 01/12] " Thomas Weißschuh
2026-04-08 13:55     ` Thomas Weißschuh
2026-04-08 15:18       ` Thomas Gleixner
2026-04-08 14:15   ` Frederic Weisbecker
2026-04-08 15:09   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
2026-04-10 20:52   ` Nathan Chancellor [this message]
2026-04-10 21:02     ` [patch 01/12] " Thomas Gleixner
2026-04-10 21:13       ` Nathan Chancellor
2026-04-10 20:54   ` [tip: timers/urgent] " tip-bot2 for Thomas Gleixner
2026-04-07  8:54 ` [patch 02/12] hrtimer: Provide hrtimer_start_range_ns_user() Thomas Gleixner
2026-04-07  9:54   ` Peter Zijlstra
2026-04-07 11:32     ` Thomas Gleixner
2026-04-07  9:57   ` Peter Zijlstra
2026-04-07 11:34     ` Thomas Gleixner
2026-04-07  8:54 ` [patch 03/12] hrtimer: Use hrtimer_start_expires_user() for hrtimer sleepers Thomas Gleixner
2026-04-07  9:59   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 04/12] posix-timers: Expand timer_[re]arm() callbacks with a boolean return value Thomas Gleixner
2026-04-07 10:00   ` Peter Zijlstra
2026-04-07 20:20   ` John Stultz
2026-04-07  8:54 ` [patch 05/12] posix-timers: Handle the timer_[re]arm() " Thomas Gleixner
2026-04-07 10:01   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 06/12] posix-timers: Switch to hrtimer_start_expires_user() Thomas Gleixner
2026-04-07 10:01   ` Peter Zijlstra
2026-04-07  8:54 ` [patch 07/12] alarmtimer: Provide alarmtimer_start() Thomas Gleixner
2026-04-07 10:04   ` Peter Zijlstra
2026-04-07 11:34     ` Thomas Gleixner
2026-04-07 20:23   ` John Stultz
2026-04-07  8:54 ` [patch 08/12] alarmtimer: Convert posix timer functions to alarmtimer_start() Thomas Gleixner
2026-04-07 20:19   ` John Stultz
2026-04-07  8:54 ` [patch 09/12] fs/timerfd: Use the new alarm/hrtimer functions Thomas Gleixner
2026-04-07 10:09   ` Peter Zijlstra
2026-04-07 11:41     ` Thomas Gleixner
2026-04-07  8:55 ` [patch 10/12] power: supply: charger-manager: Switch to alarmtimer_start() Thomas Gleixner
2026-04-07 10:11   ` Peter Zijlstra
2026-04-07  8:55 ` [patch 11/12] netfilter: xt_IDLETIMER: " Thomas Gleixner
2026-04-07  8:55 ` [patch 12/12] alarmtimer: Remove unused interfaces Thomas Gleixner
2026-04-07 20:21   ` John Stultz
2026-04-07 14:43 ` [patch 00/12] hrtimers: Prevent hrtimer interrupt starvation Thomas Gleixner
2026-04-07 16:17   ` Thomas Gleixner
2026-04-07 17:38 ` Calvin Owens
2026-04-07 18:03   ` Thomas Gleixner
2026-04-07 18:35     ` Calvin Owens
2026-04-07 20:58       ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260410205203.GA3922321@ax162 \
    --to=nathan@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=brauner@kernel.org \
    --cc=calvin@wbinvd.org \
    --cc=coreteam@netfilter.org \
    --cc=frederic@kernel.org \
    --cc=fw@strlen.de \
    --cc=jack@suse.cz \
    --cc=jstultz@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=peterz@infradead.org \
    --cc=phil@nwl.cc \
    --cc=sboyd@kernel.org \
    --cc=sre@kernel.org \
    --cc=tglx@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox