From: Thomas Gleixner <tglx@kernel.org>
To: Calvin Owens <calvin@wbinvd.org>
Cc: Borislav Petkov <bp@alien8.de>, Petr Mladek <pmladek@suse.com>,
linux-kernel@vger.kernel.org, arighi@nvidia.com,
yaozhenguo1@gmail.com, tj@kernel.org,
feng.tang@linux.alibaba.com, lirongqing@baidu.com,
realwujing@gmail.com, hu.shengming@zte.com.cn,
dianders@chromium.org, joel.granados@kernel.org,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
Valentin Schneider <vschneid@redhat.com>,
Frederic Weisbecker <frederic@kernel.org>,
Anna-Maria Behnsen <anna-maria@linutronix.de>,
x86@kernel.org
Subject: Re: [PATCH] clockevents: Prevent timer interrupt starvation
Date: Wed, 08 Apr 2026 10:52:13 +0200 [thread overview]
Message-ID: <875x614ywy.ffs@tglx> (raw)
In-Reply-To: <adBYDhMBYsv4RAp2@mozart.vkv.me>
On Fri, Apr 03 2026 at 17:15, Calvin Owens wrote:
> On Friday 04/03 at 21:00 +0200, Thomas Gleixner wrote:
>> Btw, I'm really curious how you deduced the reproducer from systemd
>> code. I assume you figured somehow out which program triggered the
>> behaviour and then inspected the source to find something fishy. Can you
>> provide a pointer to the code in question? If they really do what your
>> reproducer does, then this code needs to be fixed too :)
>
> I pulled the text that was executing when the NMI fired out of the dump:
>
> 00 ba 38 03 00 00 48 8d 35 ce 40 18 00 48 8d 3d 16 41 18 00 e8 11 14
> e8 ff b8 f4 ff ff ff e9 6d ff ff ff 0f 1f 80 00 00 00 00 0f b6 4f 2f
> 48 8d 15 e5 5f 26 00 48 89 c8 83 e0 03 48 c1 e0 05 48
>
> ...and searched for it in systemd-networkd and all its libs. It appears
> in one spot in libsystemd-shared-259.so in path_hash_func(), so that
> must be where the userspace %ip was when the NMI fired.
Amazing.
> Unfortunately that has too many callers: I couldn't narrow it down
> meaningfully from there. Despite staring at a lot of timer code in
> systemd, I haven't yet found anything concrete that might cause buggy
> behavior.
>
> But, it stuck out at me that the detritus on the stack wasn't futex() or
> poll() or read() related. It seemed wildly improbable that the NMI
> would have just happened to catch systemd-networkd running like that, I
> guessed it was probably spinning around timerfd_settime() in userspace
> when the NMI fired (with calls to path_hash_func() somehow in-between).
Right and there is an explicit timerfd_settime(... { 0, 1 }) in the
event management code.
> My initial guess was that the trigger was something about waiting on the
> timer in a different thread than it was set on. I started to write that
> out as a small reproducer, but almost jokingly thought, "well, I should
> just try setting them blindly first and see if that works", and then my
> head exploded when it actually did :)
:)
> I've tried overloading the machine, and triggering some unrealistically
> large time steps back and forth underneath it. But I can't get systemd
> to stick itself in any sort of loop like that, or even set a single
> timer expiry to an unreasonable value.
>
> I think I will set up a little BPF thing to force systemd-networkd to
> dump core if it makes timerfd_settime() calls too quickly or with
> abstime arguments in the past, hopefully from the core I can work out
> what was going on. But any better suggestions are welcome.
It just occured to me that with the hrtimer changes, you might be able
to utilize the new hrtimer_start_expires tracepoint and enable user
stack traces to get down to the actual root cause.
Thanks,
tglx
next prev parent reply other threads:[~2026-04-08 8:52 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 23:32 [BUG] Random hard lockup with userspace %ip on 7.0-rc5 Calvin Owens
2026-03-25 9:03 ` Petr Mladek
2026-03-25 16:56 ` Thomas Gleixner
2026-04-01 1:58 ` Calvin Owens
2026-04-01 15:01 ` Thomas Gleixner
2026-04-01 15:12 ` Borislav Petkov
2026-04-01 16:34 ` Borislav Petkov
2026-04-02 17:07 ` [PATCH] clockevents: Prevent timer interrupt starvation Thomas Gleixner
2026-04-03 5:11 ` Calvin Owens
2026-04-03 14:41 ` Thomas Gleixner
2026-04-03 15:58 ` Calvin Owens
2026-04-03 19:00 ` Thomas Gleixner
2026-04-04 0:15 ` Calvin Owens
2026-04-08 8:52 ` Thomas Gleixner [this message]
2026-04-03 12:16 ` Peter Zijlstra
2026-04-03 14:43 ` Thomas Gleixner
2026-04-03 16:17 ` Thomas Gleixner
2026-04-03 21:01 ` Peter Zijlstra
2026-04-03 21:24 ` Thomas Gleixner
2026-04-03 22:14 ` Thomas Gleixner
2026-04-03 22:21 ` Peter Zijlstra
2026-03-27 1:36 ` [BUG] Random hard lockup with userspace %ip on 7.0-rc5 Feng Tang
2026-03-27 15:36 ` Calvin Owens
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875x614ywy.ffs@tglx \
--to=tglx@kernel.org \
--cc=anna-maria@linutronix.de \
--cc=arighi@nvidia.com \
--cc=bp@alien8.de \
--cc=bsegall@google.com \
--cc=calvin@wbinvd.org \
--cc=dianders@chromium.org \
--cc=dietmar.eggemann@arm.com \
--cc=feng.tang@linux.alibaba.com \
--cc=frederic@kernel.org \
--cc=hu.shengming@zte.com.cn \
--cc=joel.granados@kernel.org \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=realwujing@gmail.com \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=x86@kernel.org \
--cc=yaozhenguo1@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox