public inbox for linux-next@vger.kernel.org
 help / color / mirror / Atom feed
From: Bert Karwatzki <spasswolf@web.de>
To: Rafal Ostrowski <rafal.ostrowski@amd.com>
Cc: Bert Karwatzki <spasswolf@web.de>,
	Dillon Varone <dillon.varone@amd.com>,
	Alex Hung <alex.hung@amd.com>,
	Alex Deucher <alexander.deucher@amd.com>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, linux-next@vger.kernel.org,
	linux-rt-devel@lists.linux.dev, amd-gfx@lists.freedesktop.org
Subject: kernel panic when resuming from hibernate in next-20260406 with PREEMPT_RT
Date: Thu,  9 Apr 2026 15:14:10 +0200	[thread overview]
Message-ID: <20260409131411.10598-1-spasswolf@web.de> (raw)

I noticed that my debian stable (trixie) system running linux next-20260406
would sometimes hang when resuming from hibernate (I also had two similar errors
under different circumstances, one just after booting and one when starting the
game stellaris, but hibernate seems to be the best way to provoke this error). 
There are usually no error messages, but once I got this (incomplete) error (via drm panic "kmsg"):

[ 51.556812][ C0]  gpio_amdpt gpio_generic
[ 51.556817][ C0] ---[ end trace 0000000000000000 ]--- (the start tag is not present in the qr_code)
[ 52.616208][ C0] RIP: 0010:__get_vm_area_node+0x140/0x150
[ 52.616214][ C0] Code: 00 00 ff c5 39 c5 0f 4c e8 b8 1e 00 00 00 39 c5 0f 4f e8 c4 e2 d1 f7 ea e9 3e ff ff ff 4c 89 e7 e8 35 48 01 00 45 31 e4 eb b0 <0f> 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 40 d6 41 50
[ 52.616215][ C0] RSP: 0010:ffffae404dcc3818 EFLAGS: 00010206
[ 52.616217][ C0] RAX: 0000000000ff0000 RBX: 000000000000000c RCX: 0000000000000022
[ 52.616217][ C0] RDX: 0000000000ff0000 RSI: 0000000000000001 RDI: 000000000000f720
[ 52.616218][ C0] RBP: 000000000000000c R08: ffffae4040000000 R09: ffffce403fffffff
[ 52.616218][ C0] R10: ffffce403fffffff R11: 0000000000000006 R12: ffff9f289d400000
[ 52.616219][ C0] R13: ffff9f289d6e9fd0 R14: 0000000000000dc0 R15: 000000000000f720
[ 52.616220][ C0] FS:  00007f814cdc9b40(0000) GS:ffff9f383a215000(0000) knlGS:0000000000000000
[ 52.616220][ C0] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000000050033
[ 52.616221][ C0] CR2: 00007f8134000020 CR3: 0000000201867000 CR4: 0000000000f50ef0
[ 52.616221][ C0] PKRU: 55555554
[ 52.616222][ C0] Kernel panic - not syncing: Fatal exception in interrupt
[ 52.616302][ C0] Kenrel Offset: 0xc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

I also got a really nice huge qrcode (via drm panic "qr_code") which showed a little more of the message, but not
the beginning of the trace:

[ 125.266334][ C17] RSP: 002b:00007ffc1fcae230 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 125.266335][ C17] RAX: ffffffffffffffda RBX: 000055845b640060 RCX: 00007fb699e8e91b
[ 125.266335][ C17] RDX: 00007ffc1fcae2d0 RSI: 00000000c05064a7 RDI: 000000000000000f
[ 125.266336][ C17] RBP: 00007ffc1fcae2d0 R08: 0000000000000060 R09: 0000000000000000
[ 125.266336][ C17] R10: 0000000000000003 R11: 0000000000000246 R12: 00000000c05064a7
[ 125.266336][ C17] R13: 000000000000000f R14: 00000000c05064a7 R15: 00007ffc1fcae2d0
[ 125.266337][ C17] &lt;/TASK&gt;
[ 125.266337][ C17] Modules linked in: ccm snd_usb_audio joydev snd_usbmidi_lib snd_ump snd_rawmidi snd_seq_dummy snd_hrtimer snd_seq snd_seq_device nls_ascii nls_cp437 vfat fat mt7925e mt7925_common mt792x_lib mt76_connac_lib mt76 intel_rapl_msr snd_hda_codec_atihdmi mac80211 intel_rapl_common snd_hda_codec_hdmi iosf_mbi snd_hda_intel rapl snd_hda_codec wmi_bmof snd_hda_core spd5118 regmap_i2c snd_intel_dspcfg snd_hwdep snd_pcm libarc4 snd_timer cfg80211 snd soundcore pcspkr rfkill ccp k10temp evdev nct6775 nct6775_core hwmon_vid configfs efi_pstore efivarfs autofs4 ext4 mbcache jbd2 hid_generic usbhid hid amdgpu drm_client_lib i2c_algo_bit drm_buddy drm_ttm_helper ttm drm_exec drm_suballoc_helper mfd_core drm_panel_backlight_quirks gpu_sched amdxcp drm_display_helper xhci_pci xhci_hcd drm_kms_helper ahci libahci drm libata usbcore nvme scsi_mod igc nvme_core cec i2c_piix4 scsi_common video usb_common nvme_keyring crc16 i2c_smbus nvme_auth wmi hkdf gpio_amdpt gpio_generic
[ 125.266351][ C17] ---[ end trace 0000000000000000 ]---
[ 126.356624][ C17] RIP: 0010:__get_vm_area_node+0x140/0x150
[ 126.356631][ C17] Code: 00 00 ff c5 39 c5 0f 4c e8 b8 1e 00 00 00 39 c5 0f 4f e8 c4 e2 d1 f7 ea e9 3e ff ff ff 4c 89 e7 e8 f5 49 01 00 45 31 e4 eb b0 &lt;0f&gt; 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 40 d6 41 50
[ 126.356633][ C17] RSP: 0018:ffffb50f075275f8 EFLAGS: 00010206
[ 126.356635][ C17] RAX: 0000000000ff0000 RBX: 000000000000000c RCX: 0000000000000022
[ 126.356636][ C17] RDX: 0000000000ff0000 RSI: 0000000000000001 RDI: 000000000000f720
[ 126.356637][ C17] RBP: 000000000000000c R08: ffffb50f00000000 R09: ffffd50effffffff
[ 126.356637][ C17] R10: ffffd50effffffff R11: 0000000000000006 R12: ffff9090a6000000
[ 126.356638][ C17] R13: ffff9090a62e9fd0 R14: 000000000000f720 R15: 0000000000000dc0
[ 126.356639][ C17] FS: 00007fb699b0eb40(0000) GS:ffff90a00a848000(0000) knlGS:0000000000000000
[ 126.356640][ C17] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 126.356640][ C17] CR2: 00007f617c3fa02f CR3: 00000001a0c4e000 CR4: 0000000000f50ef0
[ 126.356641][ C17] PKRU: 55555554
[ 126.356642][ C17] Kernel panic - not syncing: Fatal exception in interrupt
[ 126.356811][ C17] Kernel Offset: 0x30a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Hardware used:
$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A]
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A]
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 7
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 25)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 25)
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 44 [RX 9060 XT] (rev c0)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 HDMI/DP Audio Controller
04:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD 9100 PRO [PM9E1]
05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Upstream Port (rev 01)
06:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:0c.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
06:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
08:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 06)
09:00.0 Network controller: MEDIATEK Corp. Device 7925
0b:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 800 Series Chipset USB 3.x XHCI Controller (rev 01)
0c:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller (rev 01)
0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge PCIe Dummy Function (rev c1)
0d:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP
0d:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
0d:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
0e:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 2.0 xHCI

$ cat /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 26
model		: 68
model name	: AMD Ryzen 9 9950X 16-Core Processor
stepping	: 0
microcode	: 0xb404035
cpu MHz		: 624.194
cache size	: 1024 KB
physical id	: 0
siblings	: 32
core id		: 0
cpu cores	: 16
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 16
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid bus_lock_detect movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d amd_lbr_pmc_freeze
bugs		: sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso spectre_v2_user vmscape
bogomips	: 8599.99
TLB size	: 192 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]

As the error does not occur in v7.0-rc7 I bisected the issue, declaring a commit as GOOD
when it passes 18 hibernate/resumes cycles (the issue is not 100% reproducible)

All these kernel are compiled with PREEMPT_RT:
7.0.0-rc7			     18 hibernate/resume cycles withot crash, GOOD
7.0.0-rc7-next-20260406-master	     crash on 1st resume, BAD
7.0.0-rc7-bisect-06060-g00f03539e3d9 18 hibernate/resume cycles without crash, GOOD
7.0.0-rc7-bisect-09155-g9a6b64640388 crash on 1st resume, BAD
7.0.0-rc4-bisect-01504-g8e005ef09ba5 18 hibernate/resume cycles without crash, GOOD
7.0.0-rc7-bisect-08350-gb82dff8ab846 crash on 4th resume, BAD
7.0.0-rc7-bisect-06517-ga03c0f5a4d5f 18 hibernate/resume cycles without crash, GOOD
7.0.0-rc6-bisect-01673-gcdd65e8bb954 crash on 6th resume, BAD
7.0.0-rc4-bisect-00705-g02ade2557eba 18 hibernate/resume cycles without crash, GOOD
7.0.0-rc4-bisect-01453-g353f20082505 crash on 8th resume, BAD
7.0.0-rc4-bisect-01428-g4c3aeb11d504 18 hibernate/resume cycles without crash, GOOD
7.0.0-rc4-bisect-01440-g60c741a13fd1 18 hibernate/resume cycles without crash, GOOD
7.0.0-rc4-bisect-01446-g32c1c35b6d8b crash on 12th resume, BAD
7.0.0-rc4-bisect-01443-g02c3060ee303 18 hibernate/resume cycles without crash, GOOD
7.0.0-rc4-bisect-01445-g4bb2f0721ed8 crash on 1st resume, BAD
7.0.0-rc4-bisect-01444-g3539437f354b crash on 3rd resume, BAD

The result of the bisection points to 
commit 3539437f354b ("drm/amd/display: Move FPU Guards From DML To DC - Part 1")
as the first bad commit.

As the offending commit contains preemption related macros I tried commit
3539437f354b ("drm/amd/display: Move FPU Guards From DML To DC - Part 1")
without PREEMPT_RT and got no error in 24 cycles:
7.0.0-rc4-nort-01444-g3539437f354b 24 hibernate/resume cycles without crash

Any Ideas?

Bert Karwatzki



             reply	other threads:[~2026-04-09 13:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 13:14 Bert Karwatzki [this message]
2026-04-09 13:30 ` kernel panic when resuming from hibernate in next-20260406 with PREEMPT_RT Bert Karwatzki
     [not found] ` <IA1PR12MB6307858BAECC7CC3AE8AA958F8582@IA1PR12MB6307.namprd12.prod.outlook.com>
2026-04-09 13:33   ` Bert Karwatzki
2026-04-09 19:18     ` Ostrowski, Rafal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260409131411.10598-1-spasswolf@web.de \
    --to=spasswolf@web.de \
    --cc=alex.hung@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=bigeasy@linutronix.de \
    --cc=dillon.varone@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=rafal.ostrowski@amd.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox