From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Petr Mladek <pmladek@suse.com>
Cc: "John B. Wyatt IV" <jwyatt@redhat.com>,
John Ogness <john.ogness@linutronix.de>,
Clark Williams <williams@redhat.com>,
jlelli@redhat.com, Derek Barbosa <debarbos@redhat.com>,
"John B. Wyatt IV" <sageofredondo@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
linux-rt-users <linux-rt-users@vger.kernel.org>
Subject: Re: NMI Reported with console_blast.sh
Date: Fri, 1 Mar 2024 11:11:05 +0100 [thread overview]
Message-ID: <20240301101105.eUPvRzUv@linutronix.de> (raw)
In-Reply-To: <ZeBnUCk598gttpds@alley>
On 2024-02-29 12:15:28 [+0100], Petr Mladek wrote:
> > [ T2481] Call Trace:
> > [ T2477] Kernel panic - not syncing: sysrq triggered crash
> > [ C0] NMI backtrace for cpu 0
>
> This message seems to be printed by nmi_cpu_backtrace().
>
> I am surprised. I would expect to see the backtrace printed from panic().
> It calls dump_stack() directly on the panic-CPU. And this panic() should
> be called from sysrq_handle_crash(). IMHO, it should be (normal)
> interrupt context.
>
> Is it different on RT?
No, it is all okay and the same. panic() was triggered from sysrq which
is a threaded-IRQ and you don't see this. This triggered an NMI
backtrace on CPU0 and it shows the NMI stack. After that, the TASK
stack follows which could show panic() if the backtrace would show
something. The stack trace for NMI is prefixed with ? so it is a guess
from something on stack, not an actual unroll.
And this part:
> > 15: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> > 1a: 0f b6 8f c1 00 00 00 movzbl 0xc1(%rdi),%ecx
> > 21: 0f b7 57 08 movzwl 0x8(%rdi),%edx
> > 25: d3 e6 shl %cl,%esi
> > 27: 01 f2 add %esi,%edx
> > 29: ec in (%dx),%al
> > 2a:* 0f b6 c0 movzbl %al,%eax <-- trapping instruction
> > 2d: c3 ret
the error of some kind occurred here at 0x2a. But this instruction is
innocent, it simply a zeros the upper 24bit of the register eax. So this
isn't a crash, it is just where the CPUs at the time this was captured.
> > <NMI>
> > dump_stack_lvl (lib/dump_stack.c:107)
> > panic (kernel/panic.c:344)
> > nmi_panic (kernel/panic.c:203)
> > watchdog_hardlockup_check (kernel/watchdog.c:180)
> > __perf_event_overflow (kernel/events/core.c:9612 (discriminator 2))
> > handle_pmi_common (arch/x86/events/intel/core.c:3052 (discriminator 1))
> > ? set_pte_vaddr_p4d (arch/x86/mm/init_64.c:307 arch/x86/mm/init_64.c:315)
> > ? flush_tlb_one_kernel (./arch/x86/include/asm/paravirt.h:81 arch/x86/mm/tlb.c:1171 arch/x86/mm/tlb.c:1126)
> > ? native_set_fixmap (arch/x86/mm/pgtable.c:679 arch/x86/mm/pgtable.c:688)
> > ? ghes_copy_tofrom_phys (drivers/acpi/apei/ghes.c:330)
> > intel_pmu_handle_irq (./arch/x86/include/asm/msr.h:84 ./arch/x86/include/asm/msr.h:118 arch/x86/events/intel/core.c:2427 arch/x86/events/intel/core.c:3118)
> > perf_event_nmi_handler (arch/x86/events/core.c:1743 arch/x86/events/core.c:1729)
> > ? native_queued_spin_lock_slowpath (./arch/x86/include/asm/vdso/processor.h:13 ./arch/x86/include/asm/vdso/processor.h:18 kernel/locking/qspinlock.c:383)
> > nmi_handle (arch/x86/kernel/nmi.c:150)
> > ? native_queued_spin_lock_slowpath (./arch/x86/include/asm/vdso/processor.h:13 ./arch/x86/include/asm/vdso/processor.h:18 kernel/locking/qspinlock.c:383)
> > default_do_nmi (arch/x86/kernel/nmi.c:351)
> > exc_nmi (arch/x86/kernel/nmi.c:545)
> > end_repeat_nmi (arch/x86/entry/entry_64.S:1394)
>
> This actually seems to be from perf_event() used by the hardlockup
> detector. It triggers NMI.
And the stack is properly unrolled and this the third backtrace. The
second had <NMI> in it as suggesting this is from NMI but also stack
traces from tasks due to sysrq-t.
Now this last backtrace seems fine, and it appears that something tried
to acquire a raw_spinlock_t, it took too long at which point the
hardware watchdog triggered and all we got was this backtrace. Sadly we
don't see the other side to get an idea on which lock it stuck.
> Best Regards,
> Petr
Sebastian
next prev parent reply other threads:[~2024-03-01 10:11 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-08 0:43 NMI Reported with console_blast.sh John B. Wyatt IV
2024-02-09 9:11 ` John Ogness
2024-02-09 20:50 ` John B. Wyatt IV
2024-02-10 21:27 ` John Ogness
2024-02-12 21:23 ` John B. Wyatt IV
2024-02-22 5:21 ` John B. Wyatt IV
2024-02-29 11:15 ` Petr Mladek
2024-02-29 11:19 ` Petr Mladek
2024-03-01 10:11 ` Sebastian Andrzej Siewior [this message]
2024-03-01 16:53 ` John B. Wyatt IV
2024-02-20 22:24 ` John B. Wyatt IV
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240301101105.eUPvRzUv@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=debarbos@redhat.com \
--cc=jlelli@redhat.com \
--cc=john.ogness@linutronix.de \
--cc=jwyatt@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-users@vger.kernel.org \
--cc=pmladek@suse.com \
--cc=sageofredondo@gmail.com \
--cc=williams@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox