All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Calvin Owens <calvin@wbinvd.org>
Cc: Borislav Petkov <bp@alien8.de>, Petr Mladek <pmladek@suse.com>,
	linux-kernel@vger.kernel.org, arighi@nvidia.com,
	yaozhenguo1@gmail.com, tj@kernel.org,
	feng.tang@linux.alibaba.com, lirongqing@baidu.com,
	realwujing@gmail.com, hu.shengming@zte.com.cn,
	dianders@chromium.org, joel.granados@kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	x86@kernel.org
Subject: Re: [PATCH] clockevents: Prevent timer interrupt starvation
Date: Wed, 08 Apr 2026 10:52:13 +0200	[thread overview]
Message-ID: <875x614ywy.ffs@tglx> (raw)
In-Reply-To: <adBYDhMBYsv4RAp2@mozart.vkv.me>

On Fri, Apr 03 2026 at 17:15, Calvin Owens wrote:
> On Friday 04/03 at 21:00 +0200, Thomas Gleixner wrote:
>> Btw, I'm really curious how you deduced the reproducer from systemd
>> code. I assume you figured somehow out which program triggered the
>> behaviour and then inspected the source to find something fishy. Can you
>> provide a pointer to the code in question? If they really do what your
>> reproducer does, then this code needs to be fixed too :)
>
> I pulled the text that was executing when the NMI fired out of the dump:
>
>     00 ba 38 03 00 00 48 8d 35 ce 40 18 00 48 8d 3d 16 41 18 00 e8 11 14
>     e8 ff b8 f4 ff ff ff e9 6d ff ff ff 0f 1f 80 00 00 00 00 0f b6 4f 2f
>     48 8d 15 e5 5f 26 00 48 89 c8 83 e0 03 48 c1 e0 05 48
>
> ...and searched for it in systemd-networkd and all its libs. It appears
> in one spot in libsystemd-shared-259.so in path_hash_func(), so that
> must be where the userspace %ip was when the NMI fired.

Amazing.

> Unfortunately that has too many callers: I couldn't narrow it down
> meaningfully from there. Despite staring at a lot of timer code in
> systemd, I haven't yet found anything concrete that might cause buggy
> behavior.
>
> But, it stuck out at me that the detritus on the stack wasn't futex() or
> poll() or read() related. It seemed wildly improbable that the NMI
> would have just happened to catch systemd-networkd running like that, I
> guessed it was probably spinning around timerfd_settime() in userspace
> when the NMI fired (with calls to path_hash_func() somehow in-between).

Right and there is an explicit timerfd_settime(... { 0, 1 }) in the
event management code.

> My initial guess was that the trigger was something about waiting on the
> timer in a different thread than it was set on. I started to write that
> out as a small reproducer, but almost jokingly thought, "well, I should
> just try setting them blindly first and see if that works", and then my
> head exploded when it actually did :)

:)

> I've tried overloading the machine, and triggering some unrealistically
> large time steps back and forth underneath it. But I can't get systemd
> to stick itself in any sort of loop like that, or even set a single
> timer expiry to an unreasonable value.
>
> I think I will set up a little BPF thing to force systemd-networkd to
> dump core if it makes timerfd_settime() calls too quickly or with
> abstime arguments in the past, hopefully from the core I can work out
> what was going on. But any better suggestions are welcome.

It just occured to me that with the hrtimer changes, you might be able
to utilize the new hrtimer_start_expires tracepoint and enable user
stack traces to get down to the actual root cause.

Thanks,

        tglx

  reply	other threads:[~2026-04-08  8:52 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 23:32 [BUG] Random hard lockup with userspace %ip on 7.0-rc5 Calvin Owens
2026-03-25  9:03 ` Petr Mladek
2026-03-25 16:56   ` Thomas Gleixner
2026-04-01  1:58     ` Calvin Owens
2026-04-01 15:01       ` Thomas Gleixner
2026-04-01 15:12         ` Borislav Petkov
2026-04-01 16:34         ` Borislav Petkov
2026-04-02 17:07           ` [PATCH] clockevents: Prevent timer interrupt starvation Thomas Gleixner
2026-04-03  5:11             ` Calvin Owens
2026-04-03 14:41               ` Thomas Gleixner
2026-04-03 15:58                 ` Calvin Owens
2026-04-03 19:00                   ` Thomas Gleixner
2026-04-04  0:15                     ` Calvin Owens
2026-04-08  8:52                       ` Thomas Gleixner [this message]
2026-04-03 12:16             ` Peter Zijlstra
2026-04-03 14:43               ` Thomas Gleixner
2026-04-03 16:17               ` Thomas Gleixner
2026-04-03 21:01                 ` Peter Zijlstra
2026-04-03 21:24                   ` Thomas Gleixner
2026-04-03 22:14                     ` Thomas Gleixner
2026-04-03 22:21                       ` Peter Zijlstra
2026-03-27  1:36   ` [BUG] Random hard lockup with userspace %ip on 7.0-rc5 Feng Tang
2026-03-27 15:36     ` Calvin Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875x614ywy.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=arighi@nvidia.com \
    --cc=bp@alien8.de \
    --cc=bsegall@google.com \
    --cc=calvin@wbinvd.org \
    --cc=dianders@chromium.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=frederic@kernel.org \
    --cc=hu.shengming@zte.com.cn \
    --cc=joel.granados@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=realwujing@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    --cc=yaozhenguo1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.