From: Bert Karwatzki <spasswolf@web.de>
To: Thomas Gleixner <tglx@kernel.org>, linux-kernel@vger.kernel.org
Cc: linux-next@vger.kernel.org, spasswolf@web.de,
"Mario Limonciello" <mario.limonciello@amd.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Clark Williams" <clrkwllms@kernel.org>,
"Steven Rostedt" <rostedt@goodmis.org>,
"Christian König" <christian.koenig@amd.com>,
regressions@lists.linux.dev, linux-pci@vger.kernel.org,
linux-acpi@vger.kernel.org,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
acpica-devel@lists.linux.dev,
"Robert Moore" <robert.moore@intel.com>,
"Saket Dumbre" <saket.dumbre@intel.com>,
"Bjorn Helgaas" <bhelgaas@google.com>,
"Clemens Ladisch" <clemens@ladisch.de>,
"Jinchao Wang" <wangjinchao600@gmail.com>,
"Yury Norov" <yury.norov@gmail.com>,
"Anna Schumaker" <anna.schumaker@oracle.com>,
"Baoquan He" <bhe@redhat.com>,
"Darrick J. Wong" <djwong@kernel.org>,
"Dave Young" <dyoung@redhat.com>,
"Doug Anderson" <dianders@chromium.org>,
"Guilherme G. Piccoli" <gpiccoli@igalia.com>,
"Helge Deller" <deller@gmx.de>, "Ingo Molnar" <mingo@kernel.org>,
"Jason Gunthorpe" <jgg@ziepe.ca>,
"Joanthan Cameron" <Jonathan.Cameron@huawei.com>,
"Joel Granados" <joel.granados@kernel.org>,
"John Ogness" <john.ogness@linutronix.de>,
"Kees Cook" <kees@kernel.org>, "Li Huafei" <lihuafei1@huawei.com>,
"Luck, Tony" <tony.luck@intel.com>,
"Luo Gengkun" <luogengkun@huaweicloud.com>,
"Max Kellermann" <max.kellermann@ionos.com>,
"Nam Cao" <namcao@linutronix.de>,
oushixiong <oushixiong@kylinos.cn>,
"Petr Mladek" <pmladek@suse.com>,
"Qianqiang Liu" <qianqiang.liu@163.com>,
"Sergey Senozhatsky" <senozhatsky@chromium.org>,
"Sohil Mehta" <sohil.mehta@intel.com>,
"Tejun Heo" <tj@kernel.org>,
"Thomas Zimemrmann" <tzimmermann@suse.de>,
"Thorsten Blum" <thorsten.blum@linux.dev>,
"Ville Syrjala" <ville.syrjala@linux.intel.com>,
"Vivek Goyal" <vgoyal@redhat.com>,
"Yunhui Cui" <cuiyunhui@bytedance.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
W_Armin@gmx.de
Subject: Re: crash during resume of PCIe bridge in v5.17 (v5.16 works)
Date: Tue, 20 Jan 2026 11:27:00 +0100 [thread overview]
Message-ID: <82b4d69a5b943aa5e8aa7cc33fcc00bce02e557c.camel@web.de> (raw)
In-Reply-To: <99f1aaba32030d2b9285dbd983fdf8518a181a8d.camel@web.de>
There are exitining news here:
I tested older linux versions with this script
#!/bin/bash
for i in {0..20000}
do
echo $i
evolution &
sleep 5
killall evolution
sleep 5
done
with the following results:
Version v5.17, v6.1 and v6.8 show more or less the same
behaviour as v6.12+, i.e. repreated resumes crash after a
while (sometimes the discrete GPU is lost without a crash)
Version v5.15 shows a different behaviour:
5.15.0-stable-dirty booted 14:28, 16.1.2026 error 14:52 (25min, 142 resumes)
[ 1453.515962] [ T18093] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[ 1453.515978] [ T18093] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[ 1453.516046] [ T18093] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62). (-ETIME!)
5.15.0-stable-dirty booted 17:09, 16.1.2026 error 20:18 (3h, 1102 resumes)
[11337.547257] [ T157373] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[11337.547273] [ T157373] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[11337.547358] [ T157373] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
5.15.0-stable-dirty booted 20:51, 16.1.2026 error 21:20 (30min, 164 resumes)
[ 1698.065653] [ T22129] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[ 1698.065665] [ T22129] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[ 1698.065734] [ T22129] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
5.15.0-stable-dirty booted 21:25, 16.1.2026 error 21:41 (10min, 91 resumes)
[ 965.908197] [ T3843] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[ 965.908212] [ T3843] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[ 965.908284] [ T3843] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
5.15.0-stable-dirty booted 21:46, 16.1.2026 error 1:43 17.1.2026 (4h, 1411 resumes)
[14220.044577] [ T203585] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[14220.044593] [ T203585] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
[14220.044662] [ T203585] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
In all 5 tests the resume failed, but with an error message not seen before, also no crash occured.
And most importantly v5.16 did not crash at all, and also did not show any GPU suspend/resume related error
despite being tested for 36h and 11644 resumes:
Testing (dirty because of -Wno-error=use-after-free and -Wno-error=format-truncation compile fix)
5.16.0-stable-dirty booted 2:09, 17.1.2026 14:20, 18.1.2026 no error (neither crash or loss of device)
(36h, 11644 resumes)
So the whole issue seems to be a pure software issue after all (not just an issue related to
probably broken hardware). I'm currently bisecting this between v5.16 and v5.17, but getting a
result can take 2 weeks given the length of the testruns.
The first step of the bisection is already finished and GOOD:
Testing (dirty because of -Wno-error=use-after-free and -Wno-error=format-truncation compile fix)
5.16.0-bisect-07203-g22ef12195e13-dirty booted 14:27, 18.1.2026 no error 1:57 (35.5h, 12508 resumes)
Bert Karwatzki
next prev parent reply other threads:[~2026-01-20 10:28 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-13 9:41 NMI stack overflow during resume of PCIe bridge with CONFIG_HARDLOCKUP_DETECTOR=y Bert Karwatzki
2026-01-13 15:24 ` Thomas Gleixner
2026-01-13 17:50 ` Bert Karwatzki
2026-01-13 19:30 ` Thomas Gleixner
2026-01-13 21:15 ` Jason Gunthorpe
2026-01-13 22:19 ` Bert Karwatzki
2026-01-20 10:27 ` Bert Karwatzki [this message]
2026-02-01 0:36 ` crash during resume of PCIe bridge from v5.17 to next-20260130 (v5.16 works) Bert Karwatzki
2026-02-01 10:19 ` Armin Wolf
2026-02-01 11:42 ` Rafael J. Wysocki
2026-02-01 16:42 ` Thomas Gleixner
2026-02-02 10:37 ` Christian König
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=82b4d69a5b943aa5e8aa7cc33fcc00bce02e557c.camel@web.de \
--to=spasswolf@web.de \
--cc=Jonathan.Cameron@huawei.com \
--cc=W_Armin@gmx.de \
--cc=acpica-devel@lists.linux.dev \
--cc=akpm@linux-foundation.org \
--cc=anna.schumaker@oracle.com \
--cc=bhe@redhat.com \
--cc=bhelgaas@google.com \
--cc=bigeasy@linutronix.de \
--cc=christian.koenig@amd.com \
--cc=clemens@ladisch.de \
--cc=clrkwllms@kernel.org \
--cc=cuiyunhui@bytedance.com \
--cc=deller@gmx.de \
--cc=dianders@chromium.org \
--cc=djwong@kernel.org \
--cc=dyoung@redhat.com \
--cc=gpiccoli@igalia.com \
--cc=jgg@ziepe.ca \
--cc=joel.granados@kernel.org \
--cc=john.ogness@linutronix.de \
--cc=kees@kernel.org \
--cc=lihuafei1@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=luogengkun@huaweicloud.com \
--cc=mario.limonciello@amd.com \
--cc=max.kellermann@ionos.com \
--cc=mingo@kernel.org \
--cc=namcao@linutronix.de \
--cc=oushixiong@kylinos.cn \
--cc=pmladek@suse.com \
--cc=qianqiang.liu@163.com \
--cc=rafael.j.wysocki@intel.com \
--cc=regressions@lists.linux.dev \
--cc=robert.moore@intel.com \
--cc=rostedt@goodmis.org \
--cc=saket.dumbre@intel.com \
--cc=senozhatsky@chromium.org \
--cc=sohil.mehta@intel.com \
--cc=tglx@kernel.org \
--cc=thorsten.blum@linux.dev \
--cc=tj@kernel.org \
--cc=tony.luck@intel.com \
--cc=tzimmermann@suse.de \
--cc=vgoyal@redhat.com \
--cc=ville.syrjala@linux.intel.com \
--cc=wangjinchao600@gmail.com \
--cc=yury.norov@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox