public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@kernel.org>
To: Calvin Owens <calvin@wbinvd.org>
Cc: Petr Mladek <pmladek@suse.com>,
	linux-kernel@vger.kernel.org, arighi@nvidia.com,
	yaozhenguo1@gmail.com, tj@kernel.org,
	feng.tang@linux.alibaba.com, lirongqing@baidu.com,
	realwujing@gmail.com, hu.shengming@zte.com.cn,
	dianders@chromium.org, joel.granados@kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Frederic Weisbecker <frederic@kernel.org>,
	Anna-Maria Behnsen <anna-maria@linutronix.de>,
	x86@kernel.org
Subject: Re: [BUG] Random hard lockup with userspace %ip on 7.0-rc5
Date: Wed, 01 Apr 2026 17:01:00 +0200	[thread overview]
Message-ID: <875x6a913n.ffs@tglx> (raw)
In-Reply-To: <acx7qSVUWkJOwglp@mozart.vkv.me>

On Tue, Mar 31 2026 at 18:58, Calvin Owens wrote:
> On Wednesday 03/25 at 17:56 +0100, Thomas Gleixner wrote:
> The below userspace reproducer consistently triggers the hard lockup
> on two different machines with an AMD 7950X3D and an AMD 9950X3D CPU.

Is that instantaneous or does it take some time?

> However, it never reproduces at all on a Xeon E-2124. Maybe a clue?

Not really, but there is a difference in how the timer hardware is
programmed. The XEON uses the TSC deadline timer, the AMD CPUs use the
good old local APIC timer. But you can disable the deadline timer on the
XEON with 'notscdeadline' on the kernel command line.

> I wish I had a nice clever story for how I found it, but I just guessed
> based on how systemd uses timerfd_settime().

:)

> #ifndef NR_THREADS
> #define NR_THREADS 32
> #endif
>
> static void set(int fd)
> {
> 	struct itimerspec new = {
> 		.it_value = {
> 			.tv_sec = 0,
> 			.tv_nsec = 1,
> 		},
> 	};
>
> 	if (timerfd_settime(fd, TFD_TIMER_ABSTIME | TFD_TIMER_CANCEL_ON_SET,
> 			    &new, NULL))
> 		err(2, "Can't set timer");

So this [re]starts the timer which immediately expires. Most likely even
before the syscall returns. TFD_TIMER_CANCEL_ON_SET has no effect
because the timer is based on CLOCK_MONOTONIC, which cannot be set.

> static void *fn(void *arg)
> {
> 	int fd = timerfd_create(CLOCK_MONOTONIC, 0);
>
> 	while (1)
> 		set(fd);

and does so in an endless loop with NR_THREADS in parallel. That means
all 32 CPUs are hogged by this. But the scheduler has full control of
the tasks, so there is no real good explanation why the machine would
actually lock up.

Now that you have a reproducer, can you verify that the machine really
locks up hard? Disable the NMI watchdog either via the kernel command
line 'nmi_watchdog=0' or via echo 0 >/proc/sys/kernel/nmi_watchdog.

If that works and the machine stays usable then the watchdog is
hallucinating.

Thanks,

        tglx

  reply	other threads:[~2026-04-01 15:01 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 23:32 [BUG] Random hard lockup with userspace %ip on 7.0-rc5 Calvin Owens
2026-03-25  9:03 ` Petr Mladek
2026-03-25 16:56   ` Thomas Gleixner
2026-04-01  1:58     ` Calvin Owens
2026-04-01 15:01       ` Thomas Gleixner [this message]
2026-04-01 15:12         ` Borislav Petkov
2026-04-01 16:34         ` Borislav Petkov
2026-04-02 17:07           ` [PATCH] clockevents: Prevent timer interrupt starvation Thomas Gleixner
2026-04-03  5:11             ` Calvin Owens
2026-04-03 14:41               ` Thomas Gleixner
2026-04-03 15:58                 ` Calvin Owens
2026-04-03 19:00                   ` Thomas Gleixner
2026-04-04  0:15                     ` Calvin Owens
2026-04-03 12:16             ` Peter Zijlstra
2026-04-03 14:43               ` Thomas Gleixner
2026-04-03 16:17               ` Thomas Gleixner
2026-04-03 21:01                 ` Peter Zijlstra
2026-04-03 21:24                   ` Thomas Gleixner
2026-04-03 22:14                     ` Thomas Gleixner
2026-04-03 22:21                       ` Peter Zijlstra
2026-03-27  1:36   ` [BUG] Random hard lockup with userspace %ip on 7.0-rc5 Feng Tang
2026-03-27 15:36     ` Calvin Owens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875x6a913n.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=anna-maria@linutronix.de \
    --cc=arighi@nvidia.com \
    --cc=bsegall@google.com \
    --cc=calvin@wbinvd.org \
    --cc=dianders@chromium.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=feng.tang@linux.alibaba.com \
    --cc=frederic@kernel.org \
    --cc=hu.shengming@zte.com.cn \
    --cc=joel.granados@kernel.org \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=realwujing@gmail.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=x86@kernel.org \
    --cc=yaozhenguo1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox