linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* commit 3284e4adca9b causes hang on boot with CONFIG_PREEMPT_RT=y
@ 2025-07-11 23:00 Bert Karwatzki
  2025-07-11 23:06 ` Joel Fernandes
  0 siblings, 1 reply; 3+ messages in thread
From: Bert Karwatzki @ 2025-07-11 23:00 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Bert Karwatzki, linux-kernel, linux-next, linux-rt-devel,
	ankur.a.arora, bobo.shaobowang, boqun.feng, frederic, joel,
	neeraj.upadhyay, paulmck, rcu, urezki, wangxiongfeng2, xiexiuqi,
	xiqi2

When booting linux next-20250711 (with CONFIG_PREEMPT_RT=y) on my MSI Alpha 15 
Laptop running debian sid amd64 the boot process hangs with the last 
messages displayed on screen being:

fbcon: amdgpudrmfb (fb0) is primary device
Console: switching to colour frame buffer device
amdgpu: 0000:08:00.0: [drm]fb0: admgpudrmfb frame buffer device

after some time (about 60s) this error messages appears (hand copied
from screen, not entirely accurate)

rcu_preempt self detected stall

with call trace
run_irq_workd
smpboot_thread_fn
kthread
? kthreads_online_cpu
? kthreads_online_cpu
ret_from_fork
? kthreads_online_cpu
ret_from_fork

This only occurs when compiling with CONFIG_PREEMPT_RT=y.
I bisected this and found the first bad commit to be

3284e4adca9b ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")

Reverting this commit in next-20250711 fixes the issue for me.

Hardware:
$ lspci -nn
00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex [1022:1630]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU [1022:1631]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge [1022:1633]
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge [1022:1634]
00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge [1022:1634]
00:02.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge [1022:1634]
00:02.4 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge [1022:1634]
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge [1022:1632]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus [1022:1635]
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 51)
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0 [1022:166a]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1 [1022:166b]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2 [1022:166c]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3 [1022:166d]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4 [1022:166e]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5 [1022:166f]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6 [1022:1670]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7 [1022:1671]
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c3)
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
03:00.0 Display controller [0380]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c3)
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
04:00.0 Network controller [0280]: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz [14c3:0608]
05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
06:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] [2646:5013] (rev 01)
07:00.0 Non-Volatile memory controller [0108]: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] [c0a9:2263] (rev 03)
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1638] (rev c5)
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller [1002:1637]
08:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor [1022:15df]
08:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 [1022:1639]
08:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1 [1022:1639]
08:00.5 Multimedia controller [0480]: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor [1022:15e2] (rev 01)
08:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller [1022:15e3]
08:00.7 Signal processing controller [1180]: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub [1022:15e4]

$ head /proc/cpuinfo 
processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 25
model		: 80
model name	: AMD Ryzen 7 5800H with Radeon Graphics
stepping	: 0
microcode	: 0xa50000c
cpu MHz		: 2826.830
cache size	: 512 KB
physical id	: 0


Bert Karwatzki

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: commit 3284e4adca9b causes hang on boot with CONFIG_PREEMPT_RT=y
  2025-07-11 23:00 commit 3284e4adca9b causes hang on boot with CONFIG_PREEMPT_RT=y Bert Karwatzki
@ 2025-07-11 23:06 ` Joel Fernandes
  2025-07-11 23:27   ` Bert Karwatzki
  0 siblings, 1 reply; 3+ messages in thread
From: Joel Fernandes @ 2025-07-11 23:06 UTC (permalink / raw)
  To: Bert Karwatzki
  Cc: linux-kernel, linux-next, linux-rt-devel, ankur.a.arora,
	bobo.shaobowang, boqun.feng, frederic, joel, neeraj.upadhyay,
	paulmck, rcu, urezki, wangxiongfeng2, xiexiuqi, xiqi2



On 7/11/2025 7:00 PM, Bert Karwatzki wrote:
> When booting linux next-20250711 (with CONFIG_PREEMPT_RT=y) on my MSI Alpha 15 
> Laptop running debian sid amd64 the boot process hangs with the last 
> messages displayed on screen being:
> 
> fbcon: amdgpudrmfb (fb0) is primary device
> Console: switching to colour frame buffer device
> amdgpu: 0000:08:00.0: [drm]fb0: admgpudrmfb frame buffer device
> 
> after some time (about 60s) this error messages appears (hand copied
> from screen, not entirely accurate)
> 
> rcu_preempt self detected stall
> 
> with call trace
> run_irq_workd
> smpboot_thread_fn
> kthread
> ? kthreads_online_cpu
> ? kthreads_online_cpu
> ret_from_fork
> ? kthreads_online_cpu
> ret_from_fork
> 
> This only occurs when compiling with CONFIG_PREEMPT_RT=y.
> I bisected this and found the first bad commit to be
> 
> 3284e4adca9b ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")

This commit is still using old code which was fixed in the last day.

Here is the new commit:
https://web.git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/commit/?h=next&id=2e154d164418e1eaadbf5dc58cbf19e7be8fdc67

Thanks!

 - Joel


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: commit 3284e4adca9b causes hang on boot with CONFIG_PREEMPT_RT=y
  2025-07-11 23:06 ` Joel Fernandes
@ 2025-07-11 23:27   ` Bert Karwatzki
  0 siblings, 0 replies; 3+ messages in thread
From: Bert Karwatzki @ 2025-07-11 23:27 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: linux-kernel, linux-next, linux-rt-devel, ankur.a.arora,
	bobo.shaobowang, boqun.feng, frederic, joel, neeraj.upadhyay,
	paulmck, rcu, urezki, wangxiongfeng2, xiexiuqi, xiqi2, spasswolf

Am Freitag, dem 11.07.2025 um 19:06 -0400 schrieb Joel Fernandes:
> 
> On 7/11/2025 7:00 PM, Bert Karwatzki wrote:
> > When booting linux next-20250711 (with CONFIG_PREEMPT_RT=y) on my MSI Alpha 15 
> > Laptop running debian sid amd64 the boot process hangs with the last 
> > messages displayed on screen being:
> > 
> > fbcon: amdgpudrmfb (fb0) is primary device
> > Console: switching to colour frame buffer device
> > amdgpu: 0000:08:00.0: [drm]fb0: admgpudrmfb frame buffer device
> > 
> > after some time (about 60s) this error messages appears (hand copied
> > from screen, not entirely accurate)
> > 
> > rcu_preempt self detected stall
> > 
> > with call trace
> > run_irq_workd
> > smpboot_thread_fn
> > kthread
> > ? kthreads_online_cpu
> > ? kthreads_online_cpu
> > ret_from_fork
> > ? kthreads_online_cpu
> > ret_from_fork
> > 
> > This only occurs when compiling with CONFIG_PREEMPT_RT=y.
> > I bisected this and found the first bad commit to be
> > 
> > 3284e4adca9b ("rcu: Fix rcu_read_unlock() deadloop due to IRQ work")
> 
> This commit is still using old code which was fixed in the last day.
> 
> Here is the new commit:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/rcu/linux.git/commit/?h=next&id=2e154d164418e1eaadbf5dc58cbf19e7be8fdc67
> 
> Thanks!
> 
>  - Joel

I already found the new commit, it works!

Bert Karwatzki

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-07-11 23:28 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-11 23:00 commit 3284e4adca9b causes hang on boot with CONFIG_PREEMPT_RT=y Bert Karwatzki
2025-07-11 23:06 ` Joel Fernandes
2025-07-11 23:27   ` Bert Karwatzki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).