Re: NMI stack overflow during resume of PCIe bridge with CONFIG_HARDLOCKUP_DETECTOR=y

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Thomas Gleixner <tglx@kernel.org>
To: Bert Karwatzki <spasswolf@web.de>, linux-kernel@vger.kernel.org
Cc: "Bert Karwatzki" <spasswolf@web.de>,
	linux-next@vger.kernel.org,
	"Mario Limonciello" <mario.limonciello@amd.com>,
	"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
	"Clark Williams" <clrkwllms@kernel.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Christian König" <christian.koenig@amd.com>,
	regressions@lists.linux.dev, linux-pci@vger.kernel.org,
	linux-acpi@vger.kernel.org,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	acpica-devel@lists.linux.dev,
	"Robert Moore" <robert.moore@intel.com>,
	"Saket Dumbre" <saket.dumbre@intel.com>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Clemens Ladisch" <clemens@ladisch.de>,
	"Jinchao Wang" <wangjinchao600@gmail.com>,
	"Yury Norov" <yury.norov@gmail.com>,
	"Anna Schumaker" <anna.schumaker@oracle.com>,
	"Baoquan He" <bhe@redhat.com>,
	"Darrick J. Wong" <djwong@kernel.org>,
	"Dave Young" <dyoung@redhat.com>,
	"Doug Anderson" <dianders@chromium.org>,
	"Guilherme G. Piccoli" <gpiccoli@igalia.com>,
	"Helge Deller" <deller@gmx.de>, "Ingo Molnar" <mingo@kernel.org>,
	"Jason Gunthorpe" <jgg@ziepe.ca>,
	"Joanthan Cameron" <Jonathan.Cameron@huawei.com>,
	"Joel Granados" <joel.granados@kernel.org>,
	"John Ogness" <john.ogness@linutronix.de>,
	"Kees Cook" <kees@kernel.org>, "Li Huafei" <lihuafei1@huawei.com>,
	"Luck, Tony" <tony.luck@intel.com>,
	"Luo Gengkun" <luogengkun@huaweicloud.com>,
	"Max Kellermann" <max.kellermann@ionos.com>,
	"Nam Cao" <namcao@linutronix.de>,
	oushixiong <oushixiong@kylinos.cn>,
	"Petr Mladek" <pmladek@suse.com>,
	"Qianqiang Liu" <qianqiang.liu@163.com>,
	"Sergey Senozhatsky" <senozhatsky@chromium.org>,
	"Sohil Mehta" <sohil.mehta@intel.com>,
	"Tejun Heo" <tj@kernel.org>,
	"Thomas Zimemrmann" <tzimmermann@suse.de>,
	"Thorsten Blum" <thorsten.blum@linux.dev>,
	"Ville Syrjala" <ville.syrjala@linux.intel.com>,
	"Vivek Goyal" <vgoyal@redhat.com>,
	"Yicong Yang" <yangyicong@hisilicon.com>,
	"Yunhui Cui" <cuiyunhui@bytedance.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	W_Armin@gmx.de
Subject: Re: NMI stack overflow during resume of PCIe bridge with CONFIG_HARDLOCKUP_DETECTOR=y
Date: Tue, 13 Jan 2026 16:24:46 +0100	[thread overview]
Message-ID: <87h5spk01t.ffs@tglx> (raw)
In-Reply-To: <20260113094129.3357-1-spasswolf@web.de>

On Tue, Jan 13 2026 at 10:41, Bert Karwatzki wrote:
> Here's the result in case of the crash:
> 2026-01-12T04:24:36.809904+01:00 T1510;acpi_ex_system_memory_space_handler 255: logical_addr_ptr = ffffc066977b3000
> 2026-01-12T04:24:36.846170+01:00 C14;exc_nmi: 0

Here the NMI triggers in non-task context on CPU14

> 2026-01-12T04:24:36.960760+01:00 C14;exc_nmi: 10.3
> 2026-01-12T04:24:36.960760+01:00 C14;default_do_nmi 
> 2026-01-12T04:24:36.960760+01:00 C14;nmi_handle: type=0x0
> 2026-01-12T04:24:36.960760+01:00 C14;nmi_handle: a=0xffffffffa1612de0
> 2026-01-12T04:24:36.960760+01:00 C14;nmi_handle: a->handler=perf_event_nmi_handler+0x0/0xa6
> 2026-01-12T04:24:36.960760+01:00 C14;perf_event_nmi_handler: 0
> 2026-01-12T04:24:36.960760+01:00 C14;perf_event_nmi_handler: 1
> 2026-01-12T04:24:36.960760+01:00 C14;perf_event_nmi_handler: 2
> 2026-01-12T04:24:36.960760+01:00 C14;x86_pmu_handle_irq: 2
> 2026-01-12T04:24:36.960760+01:00 C14;x86_pmu_handle_irq: 2.6
> 2026-01-12T04:24:36.960760+01:00 C14;__perf_event_overflow: 0
> 2026-01-12T04:24:36.960760+01:00 C14;__perf_event_overflow: 6.99: overflow_handler=watchdog_overflow_callback+0x0/0x10d
> 2026-01-12T04:24:36.960760+01:00 C14;watchdog_overflow_callback: 0
> 2026-01-12T04:24:36.960760+01:00 C14;__ktime_get_fast_ns_debug: 0.1
> 2026-01-12T04:24:36.960760+01:00 C14;tk_clock_read_debug: read=read_hpet+0x0/0xf0
> 2026-01-12T04:24:36.960760+01:00 C14;read_hpet: 0
> 2026-01-12T04:24:36.960760+01:00 C14;read_hpet: 0.1

> 2026-01-12T04:24:36.960760+01:00 T0;exc_nmi: 0

This one triggers in task context of PID0, aka idle task, but it's not
clear on which CPU that happens. It's probably CPU13 as that continues
with the expected 10.3 output, but that's almost ~1.71 seconds later.

> 2026-01-12T04:24:38.674625+01:00 C13;exc_nmi: 10.3
> 2026-01-12T04:24:38.674625+01:00 C13;default_do_nmi 
> 2026-01-12T04:24:38.674625+01:00 C13;nmi_handle: type=0x0
> 2026-01-12T04:24:38.674625+01:00 C13;nmi_handle: a=0xffffffffa1612de0
> 2026-01-12T04:24:38.674625+01:00 C13;nmi_handle: a->handler=perf_event_nmi_handler+0x0/0xa6
> 2026-01-12T04:24:38.674625+01:00 C13;perf_event_nmi_handler: 0
> 2026-01-12T04:24:38.674625+01:00 C13;perf_event_nmi_handler: 1
> 2026-01-12T04:24:38.674625+01:00 C13;perf_event_nmi_handler: 2
> 2026-01-12T04:24:38.674625+01:00 C13;x86_pmu_handle_irq: 2
> 2026-01-12T04:24:38.674625+01:00 C13;x86_pmu_handle_irq: 2.6
> 2026-01-12T04:24:38.674625+01:00 C13;__perf_event_overflow: 0
> 2026-01-12T04:24:38.674625+01:00 C13;__perf_event_overflow: 6.99: overflow_handler=watchdog_overflow_callback+0x0/0x10d
> 2026-01-12T04:24:38.674625+01:00 C13;watchdog_overflow_callback: 0
> 2026-01-12T04:24:38.674625+01:00 C13;__ktime_get_fast_ns_debug: 0.1
> 2026-01-12T04:24:38.674625+01:00 C13;tk_clock_read_debug: read=read_hpet+0x0/0xf0
> 2026-01-12T04:24:38.674625+01:00 C13;read_hpet: 0
> 2026-01-12T04:24:38.674625+01:00 C13;read_hpet: 0.1

> 2026-01-12T04:24:38.674625+01:00 T0;exc_nmi: 0

Same picture as above, but this time on CPU2 with a delay of 0.68
seconds

> 2026-01-12T04:24:39.355101+01:00 C2;exc_nmi: 10.3
> 2026-01-12T04:24:39.355101+01:00 C2;default_do_nmi 
> 2026-01-12T04:24:39.355101+01:00 C2;nmi_handle: type=0x0
> 2026-01-12T04:24:39.355101+01:00 C2;nmi_handle: a=0xffffffffa1612de0
> 2026-01-12T04:24:39.355101+01:00 C2;nmi_handle: a->handler=perf_event_nmi_handler+0x0/0xa6
> 2026-01-12T04:24:39.355101+01:00 C2;perf_event_nmi_handler: 0
> 2026-01-12T04:24:39.355101+01:00 C2;perf_event_nmi_handler: 1
> 2026-01-12T04:24:39.355101+01:00 C2;perf_event_nmi_handler: 2
> 2026-01-12T04:24:39.355101+01:00 C2;x86_pmu_handle_irq: 2
> 2026-01-12T04:24:39.355101+01:00 C2;x86_pmu_handle_irq: 2.6
> 2026-01-12T04:24:39.355101+01:00 C2;__perf_event_overflow: 0
> 2026-01-12T04:24:39.355101+01:00 C2;__perf_event_overflow: 6.99: overflow_handler=watchdog_overflow_callback+0x0/0x10d
> 2026-01-12T04:24:39.355101+01:00 C2;watchdog_overflow_callback: 0
> 2026-01-12T04:24:39.355101+01:00 C2;__ktime_get_fast_ns_debug: 0.1
> 2026-01-12T04:24:39.355101+01:00 C2;tk_clock_read_debug: read=read_hpet+0x0/0xf0
> 2026-01-12T04:24:39.355101+01:00 C2;read_hpet: 0
> 2026-01-12T04:24:39.355101+01:00 C2;read_hpet: 0.1

> 2026-01-12T04:24:39.355101+01:00 T0;exc_nmi: 0

Again on CPU0 with a delay of 0.06 seconds

> 2026-01-12T04:24:39.410207+01:00 C0;exc_nmi: 10.3
> 2026-01-12T04:24:39.410207+01:00 C0;default_do_nmi 
> 2026-01-12T04:24:39.410207+01:00 C0;nmi_handle: type=0x0
> 2026-01-12T04:24:39.410207+01:00 C0;nmi_handle: a=0xffffffffa1612de0
> 2026-01-12T04:24:39.410207+01:00 C0;nmi_handle: a->handler=perf_event_nmi_handler+0x0/0xa6
> 2026-01-12T04:24:39.410207+01:00 C0;perf_event_nmi_handler: 0
> 2026-01-12T04:24:39.410207+01:00 C0;perf_event_nmi_handler: 1
> 2026-01-12T04:24:39.410207+01:00 C0;perf_event_nmi_handler: 2
> 2026-01-12T04:24:39.410207+01:00 C0;x86_pmu_handle_irq: 2
> 2026-01-12T04:24:39.410207+01:00 C0;x86_pmu_handle_irq: 2.6
> 2026-01-12T04:24:39.410207+01:00 C0;__perf_event_overflow: 0
> 2026-01-12T04:24:39.410207+01:00 C0;__perf_event_overflow: 6.99: overflow_handler=watchdog_overflow_callback+0x0/0x10d
> 2026-01-12T04:24:39.410207+01:00 C0;watchdog_overflow_callback: 0
> 2026-01-12T04:24:39.410207+01:00 C0;__ktime_get_fast_ns_debug: 0.1
> 2026-01-12T04:24:39.410207+01:00 C0;tk_clock_read_debug: read=read_hpet+0x0/0xf0
> 2026-01-12T04:24:39.410207+01:00 C0;read_hpet: 0
> 2026-01-12T04:24:39.410207+01:00 C0;read_hpet: 0.1

> 2026-01-12T04:24:39.410207+01:00 T0;exc_nmi: 0

....

> In the case of the crash the interrupt handler never returns because when accessing
> the HPET another NMI is triggered. This goes on until a crash happens, probably because
> of stack overflow.

No. NMI nesting is only one level deep and immediately returns:

        if (this_cpu_read(nmi_state) != NMI_NOT_RUNNING) {
		this_cpu_write(nmi_state, NMI_LATCHED);
		return;
	}


So it's not a stack overflow. What's more likely is that after a while
_ALL_ CPUs are hung up in the NMI handler after they tripped over the
HPET read.

> The behaviour described here seems to be similar to the bug that commit
> 3d5f4f15b778 ("watchdog: skip checks when panic is in progress") is fixing, but
> this is actually a different bug as kernel 6.18 (which contains 3d5f4f15b778)
> is also affected (I've conducted 5 tests with 6.18 so far and got 4 crashes (crashes occured
> after (0.5h, 1h, 4.5h, 1.5h) of testing)). 
> Nevertheless these look similar enough to CC the involved people.

There is nothing similar.

Your problem originates from a screwed up hardware state which in turn
causes the HPET to go haywire for unknown reasons.

What is the physical address of this ACPI handler access:

       logical_addr_ptr = ffffc066977b3000

along with the full output of /proc/iomem

Thanks,

        tglx

next prev parent reply	other threads:[~2026-01-13 15:24 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13  9:41 NMI stack overflow during resume of PCIe bridge with CONFIG_HARDLOCKUP_DETECTOR=y Bert Karwatzki
2026-01-13 15:24 ` Thomas Gleixner [this message]
2026-01-13 17:50   ` Bert Karwatzki
2026-01-13 19:30     ` Thomas Gleixner
2026-01-13 21:15       ` Jason Gunthorpe
2026-01-13 22:19       ` Bert Karwatzki
2026-01-20 10:27         ` crash during resume of PCIe bridge in v5.17 (v5.16 works) Bert Karwatzki
2026-02-01  0:36           ` crash during resume of PCIe bridge from v5.17 to next-20260130 " Bert Karwatzki
2026-02-01 10:19             ` Armin Wolf
2026-02-01 11:42               ` Rafael J. Wysocki
2026-02-01 16:42             ` Thomas Gleixner
2026-02-02 10:37               ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h5spk01t.ffs@tglx \
    --to=tglx@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=W_Armin@gmx.de \
    --cc=acpica-devel@lists.linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=anna.schumaker@oracle.com \
    --cc=bhe@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=bigeasy@linutronix.de \
    --cc=christian.koenig@amd.com \
    --cc=clemens@ladisch.de \
    --cc=clrkwllms@kernel.org \
    --cc=cuiyunhui@bytedance.com \
    --cc=deller@gmx.de \
    --cc=dianders@chromium.org \
    --cc=djwong@kernel.org \
    --cc=dyoung@redhat.com \
    --cc=gpiccoli@igalia.com \
    --cc=jgg@ziepe.ca \
    --cc=joel.granados@kernel.org \
    --cc=john.ogness@linutronix.de \
    --cc=kees@kernel.org \
    --cc=lihuafei1@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=luogengkun@huaweicloud.com \
    --cc=mario.limonciello@amd.com \
    --cc=max.kellermann@ionos.com \
    --cc=mingo@kernel.org \
    --cc=namcao@linutronix.de \
    --cc=oushixiong@kylinos.cn \
    --cc=pmladek@suse.com \
    --cc=qianqiang.liu@163.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=regressions@lists.linux.dev \
    --cc=robert.moore@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=saket.dumbre@intel.com \
    --cc=senozhatsky@chromium.org \
    --cc=sohil.mehta@intel.com \
    --cc=spasswolf@web.de \
    --cc=thorsten.blum@linux.dev \
    --cc=tj@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=tzimmermann@suse.de \
    --cc=vgoyal@redhat.com \
    --cc=ville.syrjala@linux.intel.com \
    --cc=wangjinchao600@gmail.com \
    --cc=yangyicong@hisilicon.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox