* [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems
@ 2025-12-02 16:40 René Rebe
2025-12-02 16:54 ` John Paul Adrian Glaubitz
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: René Rebe @ 2025-12-02 16:40 UTC (permalink / raw)
To: linux-pci, linux-kernel
Cc: Bjorn Helgaas, John Paul Adrian Glaubitz, Riccardo Mottola
Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all
non-x86") was bisected to break various non-x86 RISC Unix systems,
e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot
on modern ARM64, PPC64 and RISCV ISAs besides new enough x86.
Sun Blade 1000:
ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0)
ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603]
ERROR(0):
TPC<MakeIocReady+0xc/0x278 [mptbase]>
ERROR(0): M_SYND(0), E_SYND(0), Privileged
ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus"
ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000]
ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000]
ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000]
ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000]
ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000]
ERROR(0): E-cache idx[b08040] tag[000000001e008fa0]
ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff]
Kernel panic - not syncing: Irrecoverable deferred error trap.
CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18
Call Trace:
[<00000000004294b0>] panic+0xf0/0x370
[<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8
[<0000000000405e88>] c_deferred+0x18/0x24
[<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase]
[<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase]
[<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase]
[<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas]
[<0000000000788308>] local_pci_probe+0x24/0x70
[<0000000000788dac>] pci_device_probe+0x1c0/0x1d0
[<000000000082633c>] really_probe+0x13c/0x29c
[<0000000000826590>] __driver_probe_device+0xf4/0x104
[<0000000000826614>] driver_probe_device+0x24/0xa0
[<000000000082683c>] __driver_attach+0xe8/0x104
[<0000000000824da0>] bus_for_each_dev+0x58/0x84
[<0000000000825508>] bus_add_driver+0xdc/0x1f8
[<0000000000827110>] driver_register+0x70/0x120
Niagara T1:
mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible
NON-RESUMABLE ERROR: Reporting on cpu 31
NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]>
NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000
NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000]
NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c]
NON-RESUMABLE ERROR: type [precise nonresumable]
NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv >
NON-RESUMABLE ERROR: raddr [0x000000ea00300000]
Kernel panic - not syncing: Non-resumable error.
CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1
Call Trace:
[<00000000004373c4>] dump_stack+0x8/0x18
[<0000000000429540>] panic+0xf4/0x398
[<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240
[<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8
[<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase]
[<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase]
[<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase]
[<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas]
[<0000000000b3fab0>] local_pci_probe+0x30/0x80
[<0000000000b405d4>] pci_device_probe+0xb4/0x240
[<0000000000bfd348>] really_probe+0xc8/0x400
[<0000000000bfd70c>] __driver_probe_device+0x8c/0x160
[<0000000000bfd8c8>] driver_probe_device+0x28/0x100
[<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0
[<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0
[<0000000000bfcafc>] driver_attach+0x1c/0x40
Press Stop-A (L1-A) from sun keyboard or send break
twice on console to return to the boot prom
Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86")
Signed-off-by: René Rebe <rene@exactco.de>
---
Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01
---
drivers/pci/pci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b14dd064006c..7619d2cfa66d 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge)
/*
* Out of caution, we only allow PCIe ports from 2015 or newer
- * into D3 on x86.
+ * into D3 or other modern ISAs only.
*/
- if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015)
+ if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015)
return true;
break;
}
--
2.52.0
--
René Rebe, ExactCODE GmbH, Berlin, Germany
https://exactco.de • https://t2linux.com • https://patreon.com/renerebe
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 16:40 [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems René Rebe @ 2025-12-02 16:54 ` John Paul Adrian Glaubitz 2025-12-02 17:04 ` René Rebe 2025-12-02 17:28 ` Bjorn Helgaas 2025-12-03 5:15 ` Lukas Wunner 2 siblings, 1 reply; 22+ messages in thread From: John Paul Adrian Glaubitz @ 2025-12-02 16:54 UTC (permalink / raw) To: René Rebe, linux-pci, linux-kernel; +Cc: Bjorn Helgaas, Riccardo Mottola Hi Rene, On Tue, 2025-12-02 at 17:40 +0100, René Rebe wrote: > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all > non-x86") was bisected to break various non-x86 RISC Unix systems, > e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot > on modern ARM64, PPC64 and RISCV ISAs besides new enough x86. I think "ISA" is a misnomer here as this issue is not a matter of the instruction set architecture in use but the PCI bus. So, I suggest to use the term "systems" here as well. Plus, I suggest the following message for the summary: "pci: Further restrict the use of D3 power state" > Sun Blade 1000: > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0) > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603] > ERROR(0): > TPC<MakeIocReady+0xc/0x278 [mptbase]> > ERROR(0): M_SYND(0), E_SYND(0), Privileged > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus" > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000] > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000] > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000] > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000] > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0] > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff] > Kernel panic - not syncing: Irrecoverable deferred error trap. > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18 > Call Trace: > [<00000000004294b0>] panic+0xf0/0x370 > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8 > [<0000000000405e88>] c_deferred+0x18/0x24 > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase] > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase] > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase] > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas] > [<0000000000788308>] local_pci_probe+0x24/0x70 > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0 > [<000000000082633c>] really_probe+0x13c/0x29c > [<0000000000826590>] __driver_probe_device+0xf4/0x104 > [<0000000000826614>] driver_probe_device+0x24/0xa0 > [<000000000082683c>] __driver_attach+0xe8/0x104 > [<0000000000824da0>] bus_for_each_dev+0x58/0x84 > [<0000000000825508>] bus_add_driver+0xdc/0x1f8 > [<0000000000827110>] driver_register+0x70/0x120 > > Niagara T1: > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible > NON-RESUMABLE ERROR: Reporting on cpu 31 > NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]> > NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000 > NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000] > NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c] > NON-RESUMABLE ERROR: type [precise nonresumable] > NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv > > NON-RESUMABLE ERROR: raddr [0x000000ea00300000] > Kernel panic - not syncing: Non-resumable error. > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1 > Call Trace: > [<00000000004373c4>] dump_stack+0x8/0x18 > [<0000000000429540>] panic+0xf4/0x398 > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240 > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8 > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase] > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase] > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase] > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas] > [<0000000000b3fab0>] local_pci_probe+0x30/0x80 > [<0000000000b405d4>] pci_device_probe+0xb4/0x240 > [<0000000000bfd348>] really_probe+0xc8/0x400 > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160 > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100 > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0 > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0 > [<0000000000bfcafc>] driver_attach+0x1c/0x40 > Press Stop-A (L1-A) from sun keyboard or send break > twice on console to return to the boot prom > > Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86") > Signed-off-by: René Rebe <rene@exactco.de> > --- > Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01 > --- > drivers/pci/pci.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index b14dd064006c..7619d2cfa66d 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) > > /* > * Out of caution, we only allow PCIe ports from 2015 or newer > - * into D3 on x86. > + * into D3 or other modern ISAs only. Same here, I suggest "systems" instead of "ISAs". > */ > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) > + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015) Is there actually a justification to restrict the use of D3 to ARM64, PPC64 and RISCV? What about MIPS, LoongArch or s390x? Thanks, Adrian > return true; > break; > } > -- > 2.52.0 > > -- > René Rebe, ExactCODE GmbH, Berlin, Germany > https://exactco.de • https://t2linux.com • https://patreon.com/renerebe -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 16:54 ` John Paul Adrian Glaubitz @ 2025-12-02 17:04 ` René Rebe 2025-12-02 18:20 ` PCI bridge window issue (Was: Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems) Ilpo Järvinen 2025-12-06 1:07 ` [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems Maciej W. Rozycki 0 siblings, 2 replies; 22+ messages in thread From: René Rebe @ 2025-12-02 17:04 UTC (permalink / raw) To: glaubitz; +Cc: linux-pci, linux-kernel, bhelgaas, riccardo.mottola Hi, On Tue, 02 Dec 2025 17:54:33 +0100, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > Hi Rene, > > On Tue, 2025-12-02 at 17:40 +0100, René Rebe wrote: > > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all > > non-x86") was bisected to break various non-x86 RISC Unix systems, > > e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot > > on modern ARM64, PPC64 and RISCV ISAs besides new enough x86. > > I think "ISA" is a misnomer here as this issue is not a matter of the > instruction set architecture in use but the PCI bus. So, I suggest to > use the term "systems" here as well. > > Plus, I suggest the following message for the summary: > > "pci: Further restrict the use of D3 power state" I thought ISA is the correct term and few still remember an "ISA" bus, but happy to rephrase to whatever is preferred. > > Sun Blade 1000: > > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0) > > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603] > > ERROR(0): > > TPC<MakeIocReady+0xc/0x278 [mptbase]> > > ERROR(0): M_SYND(0), E_SYND(0), Privileged > > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus" > > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] > > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000] > > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000] > > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000] > > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000] > > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0] > > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff] > > Kernel panic - not syncing: Irrecoverable deferred error trap. > > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18 > > Call Trace: > > [<00000000004294b0>] panic+0xf0/0x370 > > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8 > > [<0000000000405e88>] c_deferred+0x18/0x24 > > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase] > > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase] > > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase] > > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas] > > [<0000000000788308>] local_pci_probe+0x24/0x70 > > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0 > > [<000000000082633c>] really_probe+0x13c/0x29c > > [<0000000000826590>] __driver_probe_device+0xf4/0x104 > > [<0000000000826614>] driver_probe_device+0x24/0xa0 > > [<000000000082683c>] __driver_attach+0xe8/0x104 > > [<0000000000824da0>] bus_for_each_dev+0x58/0x84 > > [<0000000000825508>] bus_add_driver+0xdc/0x1f8 > > [<0000000000827110>] driver_register+0x70/0x120 > > > > Niagara T1: > > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible > > NON-RESUMABLE ERROR: Reporting on cpu 31 > > NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]> > > NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000 > > NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000] > > NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c] > > NON-RESUMABLE ERROR: type [precise nonresumable] > > NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv > > > NON-RESUMABLE ERROR: raddr [0x000000ea00300000] > > Kernel panic - not syncing: Non-resumable error. > > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1 > > Call Trace: > > [<00000000004373c4>] dump_stack+0x8/0x18 > > [<0000000000429540>] panic+0xf4/0x398 > > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240 > > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8 > > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase] > > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase] > > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase] > > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas] > > [<0000000000b3fab0>] local_pci_probe+0x30/0x80 > > [<0000000000b405d4>] pci_device_probe+0xb4/0x240 > > [<0000000000bfd348>] really_probe+0xc8/0x400 > > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160 > > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100 > > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0 > > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0 > > [<0000000000bfcafc>] driver_attach+0x1c/0x40 > > Press Stop-A (L1-A) from sun keyboard or send break > > twice on console to return to the boot prom > > > > Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86") > > Signed-off-by: René Rebe <rene@exactco.de> > > --- > > Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01 > > --- > > drivers/pci/pci.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index b14dd064006c..7619d2cfa66d 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) > > > > /* > > * Out of caution, we only allow PCIe ports from 2015 or newer > > - * into D3 on x86. > > + * into D3 or other modern ISAs only. > > Same here, I suggest "systems" instead of "ISAs". > > > */ > > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) > > + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015) > > Is there actually a justification to restrict the use of D3 to ARM64, > PPC64 and RISCV? What about MIPS, LoongArch or s390x? Because the ones I picked are more modern, and thus more likely to work. MIPS is very old. and I have no LoongArch nor regular access to s390x. Maybe users of those want to allow list after testing? Now that I think about it I was wondering why ALSA RAD1 audio is not longer working in my Sgi Octane with the PCI window not being enabled. Would not suprise me it was some change like this, too. Should bisect next ;-) Before the breakign change it was disabled for all this other arch anyway with: static inline int dmi_get_bios_year(void) { return -ENXIO; } and comparing whether the negative error code is greater than 2014, ... René > Thanks, > Adrian > > > return true; > > break; > > } > > -- > > 2.52.0 > > > > -- > > René Rebe, ExactCODE GmbH, Berlin, Germany > > https://exactco.de • https://t2linux.com • https://patreon.com/renerebe > > -- > .''`. John Paul Adrian Glaubitz > : :' : Debian Developer > `. `' Physicist > `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 -- René Rebe, ExactCODE GmbH, Berlin, Germany https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply [flat|nested] 22+ messages in thread
* PCI bridge window issue (Was: Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems) 2025-12-02 17:04 ` René Rebe @ 2025-12-02 18:20 ` Ilpo Järvinen 2025-12-02 18:29 ` PCI bridge window issue René Rebe 2025-12-06 1:07 ` [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems Maciej W. Rozycki 1 sibling, 1 reply; 22+ messages in thread From: Ilpo Järvinen @ 2025-12-02 18:20 UTC (permalink / raw) To: René Rebe; +Cc: glaubitz, linux-pci, LKML, bhelgaas, riccardo.mottola [-- Attachment #1: Type: text/plain, Size: 1031 bytes --] On Tue, 2 Dec 2025, René Rebe wrote: > s390x. Maybe users of those want to allow list after testing? Now that > I think about it I was wondering why ALSA RAD1 audio is not longer > working in my Sgi Octane with the PCI window not being enabled. Would > not suprise me it was some change like this, too. Should bisect next Hi René, Could you please send me a dmesg and contents of the /proc/iomem (taken with root right so it shows the real addresses) so I can look at this PCI bridge window issue. If you know a working kernel, having logs from working and broken case would be very helpful to easily locate the differences. At this point, no need to bisect as I might be able to figure it out even without pinpointing the commit. To avoid spending on issues that are already know and have a fix, please check you're not running somewhat old kernel as I've already fixed a few things that have gotten broken due to recent made PCI bridge window fitting and assignment algorithm changes. -- i. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PCI bridge window issue 2025-12-02 18:20 ` PCI bridge window issue (Was: Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems) Ilpo Järvinen @ 2025-12-02 18:29 ` René Rebe 2025-12-02 19:35 ` Ilpo Järvinen 0 siblings, 1 reply; 22+ messages in thread From: René Rebe @ 2025-12-02 18:29 UTC (permalink / raw) To: ilpo.jarvinen Cc: glaubitz, linux-pci, linux-kernel, bhelgaas, riccardo.mottola Hi Ilpo, On Tue, 2 Dec 2025 20:20:09 +0200 (EET), Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > On Tue, 2 Dec 2025, René Rebe wrote: > > > s390x. Maybe users of those want to allow list after testing? Now that > > I think about it I was wondering why ALSA RAD1 audio is not longer > > working in my Sgi Octane with the PCI window not being enabled. Would > > not suprise me it was some change like this, too. Should bisect next > > Hi René, > > Could you please send me a dmesg and contents of the /proc/iomem (taken > with root right so it shows the real addresses) so I can look at this PCI > bridge window issue. If you know a working kernel, having logs from > working and broken case would be very helpful to easily locate the > differences. Thank you so much for offering help with that different issue. Sgi/Octane IP30 only went upstream some years ago. I only have the likewise not upstream snd-rad1 working with much older out of tree kernels. Thanks you for the hints, I'll try to find some time to to further debug this soon to bring the snd-rad1 ALSA driver upstream, too. > At this point, no need to bisect as I might be able to figure it out even > without pinpointing the commit. To avoid spending on issues that are > already know and have a fix, please check you're not running somewhat old > kernel as I've already fixed a few things that have gotten broken due to > recent made PCI bridge window fitting and assignment algorithm changes. I can not easily bisect mips64 sgi-ip30 anyway. As it was out of tree for 20y and the uptreamed code changed a lot during cleanup for merging. Good to have a contact to look into this next. Thanks! René -- René Rebe, ExactCODE GmbH, Berlin, Germany https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: PCI bridge window issue 2025-12-02 18:29 ` PCI bridge window issue René Rebe @ 2025-12-02 19:35 ` Ilpo Järvinen 0 siblings, 0 replies; 22+ messages in thread From: Ilpo Järvinen @ 2025-12-02 19:35 UTC (permalink / raw) To: René Rebe; +Cc: glaubitz, linux-pci, LKML, bhelgaas, riccardo.mottola [-- Attachment #1: Type: text/plain, Size: 2370 bytes --] On Tue, 2 Dec 2025, René Rebe wrote: > Hi Ilpo, > > On Tue, 2 Dec 2025 20:20:09 +0200 (EET), Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote: > > > On Tue, 2 Dec 2025, René Rebe wrote: > > > > > s390x. Maybe users of those want to allow list after testing? Now that > > > I think about it I was wondering why ALSA RAD1 audio is not longer > > > working in my Sgi Octane with the PCI window not being enabled. Would > > > not suprise me it was some change like this, too. Should bisect next > > > > Hi René, > > > > Could you please send me a dmesg and contents of the /proc/iomem (taken > > with root right so it shows the real addresses) so I can look at this PCI > > bridge window issue. If you know a working kernel, having logs from > > working and broken case would be very helpful to easily locate the > > differences. > > Thank you so much for offering help with that different > issue. Sgi/Octane IP30 only went upstream some years ago. I only have > the likewise not upstream snd-rad1 working with much older out of tree > kernels. Thanks you for the hints, I'll try to find some time to to > further debug this soon to bring the snd-rad1 ALSA driver upstream, > too. Okay, if it's an old issue, it's likely not because of the recent PCI core changes. If there are "can't assign" or "no compatible bridge window" lines for PCI resources in the log, those happen before some endpoint driver even comes into picture so it could be PCI core issue so in that sense it might not matter if the endpoint driver is in-tree or out-of-tree as long as the kernel you're testing with is otherwise "new enough" to contain the recent changes and fixes to PCI subsystem. -- i. > > At this point, no need to bisect as I might be able to figure it out even > > without pinpointing the commit. To avoid spending on issues that are > > already know and have a fix, please check you're not running somewhat old > > kernel as I've already fixed a few things that have gotten broken due to > > recent made PCI bridge window fitting and assignment algorithm changes. > > I can not easily bisect mips64 sgi-ip30 anyway. As it was out of tree > for 20y and the uptreamed code changed a lot during cleanup for > merging. > > Good to have a contact to look into this next. > > Thanks! > René > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 17:04 ` René Rebe 2025-12-02 18:20 ` PCI bridge window issue (Was: Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems) Ilpo Järvinen @ 2025-12-06 1:07 ` Maciej W. Rozycki 2025-12-06 8:31 ` John Paul Adrian Glaubitz 2025-12-06 10:14 ` René Rebe 1 sibling, 2 replies; 22+ messages in thread From: Maciej W. Rozycki @ 2025-12-06 1:07 UTC (permalink / raw) To: René Rebe Cc: glaubitz, linux-pci, linux-kernel, Bjorn Helgaas, riccardo.mottola On Tue, 2 Dec 2025, René Rebe wrote: > > Is there actually a justification to restrict the use of D3 to ARM64, > > PPC64 and RISCV? What about MIPS, LoongArch or s390x? > > Because the ones I picked are more modern, and thus more likely to > work. MIPS is very old. [...] How old is "very old?" Granted, the newest MIPS CPU/system controller (aka host bridge) I own is from 2013 and conventional PCI only, but that is just because the core was synthesised for interfacing a conventional PCI base board I have the core card plugged into. Is it very old already or just somewhat old? Chips continue being manufactured to date and I'm not sure as to new core designs, but those went through to at least 2018 and I'd expect some were combined with PCIe system controller IP. So this seems like something that needs to be keyed off perhaps the capabilities of the system controller/host bridge? If you give me a shell recipe to trigger the issue you came across, then I can see what happens with some of my MIPS systems. I've got a bunch of options with PCI-PCIe reverse bridges and PCIe switches I could try. Maciej ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-06 1:07 ` [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems Maciej W. Rozycki @ 2025-12-06 8:31 ` John Paul Adrian Glaubitz 2025-12-06 10:02 ` René Rebe [not found] ` <339B5A39-BC20-489A-9969-BF01B4E6AD63@exactco.de> 2025-12-06 10:14 ` René Rebe 1 sibling, 2 replies; 22+ messages in thread From: John Paul Adrian Glaubitz @ 2025-12-06 8:31 UTC (permalink / raw) To: Maciej W. Rozycki, René Rebe Cc: linux-pci, linux-kernel, Bjorn Helgaas, riccardo.mottola On Sat, 2025-12-06 at 01:07 +0000, Maciej W. Rozycki wrote: > On Tue, 2 Dec 2025, René Rebe wrote: > > > > Is there actually a justification to restrict the use of D3 to ARM64, > > > PPC64 and RISCV? What about MIPS, LoongArch or s390x? > > > > Because the ones I picked are more modern, and thus more likely to > > work. MIPS is very old. [...] > > How old is "very old?" I've got two desktop and one embedded Loongson MIPS systems at home (not LoongArch) and these are very recent (made in the 2020s). The desktop systems already come with PCI Express slots. Adrian -- .''`. John Paul Adrian Glaubitz : :' : Debian Developer `. `' Physicist `- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-06 8:31 ` John Paul Adrian Glaubitz @ 2025-12-06 10:02 ` René Rebe [not found] ` <339B5A39-BC20-489A-9969-BF01B4E6AD63@exactco.de> 1 sibling, 0 replies; 22+ messages in thread From: René Rebe @ 2025-12-06 10:02 UTC (permalink / raw) To: John Paul Adrian Glaubitz Cc: Maciej W. Rozycki, linux-pci, linux-kernel, Bjorn Helgaas, riccardo.mottola (Resent, was accidentally HTML before :-/) Hey, On 6. Dec 2025, at 09:31, John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> wrote: > > On Sat, 2025-12-06 at 01:07 +0000, Maciej W. Rozycki wrote: >> On Tue, 2 Dec 2025, René Rebe wrote: >> >>>> Is there actually a justification to restrict the use of D3 to ARM64, >>>> PPC64 and RISCV? What about MIPS, LoongArch or s390x? >>> >>> Because the ones I picked are more modern, and thus more likely to >>> work. MIPS is very old. [...] >> >> How old is "very old?" > > I've got two desktop and one embedded Loongson MIPS systems at home (not LoongArch) > and these are very recent (made in the 2020s). The desktop systems already come with > PCI Express slots. That’s great and all, but did you test a recent kernel since this PCI change I bisected for sparc64? I love my quirky Sgi MIPS64 Octane and O2 also very much, but fact is: those systems had not only special proprietary high speed xbow interconnects, but also very glitchy PCI bridges that already barely work to start with. Also that just one modern Loongson system might work, does not mean all the history of MIPS(64) system will be okay. Just yesterday I found this change also breaking my HP PA-RISC C8000 [1] with: BT Port failed to come ready! BT_TRANSFER_INIT: B_BUSY failed to clear! There was a reason given my experience keeping all CPU ISAs supported, I had initially only chosen to allow modern ones. And again, they all where not allowed to D3hot before, and only randomly allow listed since a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86”), Mar 20 11:06:04 2025. So we probably should update this to at least include HPPA until someone finds time to further debug and patch this better. That being said I did not yet found an issue on old x86 systems with the 2015 Year check removed to d3hot those more than mainline currently does. Mit freundlichen Grüßen, René [1] https://t2linux.com/hardware/desktop/HP/c8000/ -- https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply [flat|nested] 22+ messages in thread
[parent not found: <339B5A39-BC20-489A-9969-BF01B4E6AD63@exactco.de>]
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems [not found] ` <339B5A39-BC20-489A-9969-BF01B4E6AD63@exactco.de> @ 2025-12-07 14:40 ` Maciej W. Rozycki 0 siblings, 0 replies; 22+ messages in thread From: Maciej W. Rozycki @ 2025-12-07 14:40 UTC (permalink / raw) To: René Rebe Cc: John Paul Adrian Glaubitz, linux-pci, linux-kernel, Bjorn Helgaas, riccardo.mottola On Sat, 6 Dec 2025, René Rebe wrote: > I love my quirky Sgi MIPS64 Octane and O2 also very much, but fact is: those > systems had not only special proprietary high speed xbow interconnects, but also > very glitchy PCI bridges that already barely work to start with. > > Also that just one modern Loongson system might work, does not mean all the > history of MIPS(64) system will be okay. Obviously, but then the individual problematic systems/chips need to be blacklisted rather than the whole MIPS port. > That being said I did not yet found an issue on old x86 systems with the 2015 > Year check removed to d3hot those more than mainstream currently does. Well, x86 is special in that the kernel has to interact with the firmware (BIOS/ACPI/whatever) that has traditionally had its own quirks even where the hardware itself is sane. Maciej ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-06 1:07 ` [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems Maciej W. Rozycki 2025-12-06 8:31 ` John Paul Adrian Glaubitz @ 2025-12-06 10:14 ` René Rebe 2025-12-07 14:31 ` Maciej W. Rozycki 1 sibling, 1 reply; 22+ messages in thread From: René Rebe @ 2025-12-06 10:14 UTC (permalink / raw) To: Maciej W. Rozycki Cc: glaubitz, linux-pci, linux-kernel, Bjorn Helgaas, riccardo.mottola Hi, > On 6. Dec 2025, at 02:07, Maciej W. Rozycki <macro@orcam.me.uk> wrote: > > On Tue, 2 Dec 2025, René Rebe wrote: > >>> Is there actually a justification to restrict the use of D3 to ARM64, >>> PPC64 and RISCV? What about MIPS, LoongArch or s390x? >> >> Because the ones I picked are more modern, and thus more likely to >> work. MIPS is very old. [...] > > How old is "very old?" > > Granted, the newest MIPS CPU/system controller (aka host bridge) I own is > from 2013 and conventional PCI only, but that is just because the core was > synthesised for interfacing a conventional PCI base board I have the core > card plugged into. Is it very old already or just somewhat old? > > Chips continue being manufactured to date and I'm not sure as to new core > designs, but those went through to at least 2018 and I'd expect some were > combined with PCIe system controller IP. > > So this seems like something that needs to be keyed off perhaps the > capabilities of the system controller/host bridge? If you give me a shell > recipe to trigger the issue you came across, then I can see what happens > with some of my MIPS systems. I've got a bunch of options with PCI-PCIe > reverse bridges and PCIe switches I could try. Just booting a kernel with or since a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86”) should be enough. The systems that fail for me do so instantly booting, usually earlier than later. e.g. when a storage, network or system controller driver initializes. Best, René -- https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-06 10:14 ` René Rebe @ 2025-12-07 14:31 ` Maciej W. Rozycki 0 siblings, 0 replies; 22+ messages in thread From: Maciej W. Rozycki @ 2025-12-07 14:31 UTC (permalink / raw) To: René Rebe Cc: glaubitz, linux-pci, linux-kernel, Bjorn Helgaas, riccardo.mottola On Sat, 6 Dec 2025, René Rebe wrote: > > So this seems like something that needs to be keyed off perhaps the > > capabilities of the system controller/host bridge? If you give me a shell > > recipe to trigger the issue you came across, then I can see what happens > > with some of my MIPS systems. I've got a bunch of options with PCI-PCIe > > reverse bridges and PCIe switches I could try. > > Just booting a kernel with or since a5fb3ff63287 ("PCI: Allow PCI bridges to go > to D3Hot on all non-x86”) should be enough. The systems that fail for me do > so instantly booting, usually earlier than later. e.g. when a storage, network or > system controller driver initializes. I booted 6.18 as released last week on my Malta and saw no issues in this area. Maciej ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 16:40 [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems René Rebe 2025-12-02 16:54 ` John Paul Adrian Glaubitz @ 2025-12-02 17:28 ` Bjorn Helgaas 2025-12-02 17:41 ` René Rebe 2025-12-02 21:54 ` Brian Norris 2025-12-03 5:15 ` Lukas Wunner 2 siblings, 2 replies; 22+ messages in thread From: Bjorn Helgaas @ 2025-12-02 17:28 UTC (permalink / raw) To: René Rebe Cc: linux-pci, linux-kernel, Bjorn Helgaas, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Brian Norris, Rafael J. Wysocki, Lukas Wunner, Mario Limonciello [+cc Mani, Brian (a5fb3ff63287 authors), Rafael, Lukas, Mario] On Tue, Dec 02, 2025 at 05:40:07PM +0100, René Rebe wrote: > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all > non-x86") was bisected to break various non-x86 RISC Unix systems, > e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot > on modern ARM64, PPC64 and RISCV ISAs besides new enough x86. I think we need some kind of analysis of what is happening to the PCI devices here. I don't know why the CPU architecture per se would be related to PCI power management. pci_bridge_d3_possible() is already a barely maintainable hodge podge of random things that work and don't work. Generally speaking most of those cases relate to firmware. > Sun Blade 1000: > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0) > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603] > ERROR(0): > TPC<MakeIocReady+0xc/0x278 [mptbase]> > ERROR(0): M_SYND(0), E_SYND(0), Privileged > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus" > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000] > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000] > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000] > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000] > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0] > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff] > Kernel panic - not syncing: Irrecoverable deferred error trap. > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18 > Call Trace: > [<00000000004294b0>] panic+0xf0/0x370 > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8 > [<0000000000405e88>] c_deferred+0x18/0x24 > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase] I assume both of these crashes are related to the CHIPREG_READ32(&ioc->chip->Doorbell) in mpt_GetIocState(), e.g., maybe that PCI read failed because an upstream bridge was not in D0 and therefore treated the read as an unsupported request. > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase] > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase] > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas] > [<0000000000788308>] local_pci_probe+0x24/0x70 > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0 > [<000000000082633c>] really_probe+0x13c/0x29c > [<0000000000826590>] __driver_probe_device+0xf4/0x104 > [<0000000000826614>] driver_probe_device+0x24/0xa0 > [<000000000082683c>] __driver_attach+0xe8/0x104 > [<0000000000824da0>] bus_for_each_dev+0x58/0x84 > [<0000000000825508>] bus_add_driver+0xdc/0x1f8 > [<0000000000827110>] driver_register+0x70/0x120 > > Niagara T1: > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible > NON-RESUMABLE ERROR: Reporting on cpu 31 > NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]> > NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000 > NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000] > NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c] > NON-RESUMABLE ERROR: type [precise nonresumable] > NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv > > NON-RESUMABLE ERROR: raddr [0x000000ea00300000] > Kernel panic - not syncing: Non-resumable error. > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1 > Call Trace: > [<00000000004373c4>] dump_stack+0x8/0x18 > [<0000000000429540>] panic+0xf4/0x398 > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240 > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8 > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase] > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase] > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase] > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas] > [<0000000000b3fab0>] local_pci_probe+0x30/0x80 > [<0000000000b405d4>] pci_device_probe+0xb4/0x240 > [<0000000000bfd348>] really_probe+0xc8/0x400 > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160 > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100 > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0 > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0 > [<0000000000bfcafc>] driver_attach+0x1c/0x40 > Press Stop-A (L1-A) from sun keyboard or send break > twice on console to return to the boot prom > > Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86") > Signed-off-by: René Rebe <rene@exactco.de> > --- > Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01 > --- > drivers/pci/pci.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index b14dd064006c..7619d2cfa66d 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) > > /* > * Out of caution, we only allow PCIe ports from 2015 or newer > - * into D3 on x86. > + * into D3 or other modern ISAs only. > */ > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) > + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015) > return true; > break; > } > -- > 2.52.0 > > -- > René Rebe, ExactCODE GmbH, Berlin, Germany > https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 17:28 ` Bjorn Helgaas @ 2025-12-02 17:41 ` René Rebe 2025-12-02 21:54 ` Brian Norris 1 sibling, 0 replies; 22+ messages in thread From: René Rebe @ 2025-12-02 17:41 UTC (permalink / raw) To: helgaas Cc: linux-pci, linux-kernel, bhelgaas, glaubitz, riccardo.mottola, mani, briannorris, rafael, lukas, mario.limonciello Hi, thank you for your review. On Tue, 2 Dec 2025 11:28:37 -0600, Bjorn Helgaas <helgaas@kernel.org> wrote: > [+cc Mani, Brian (a5fb3ff63287 authors), Rafael, Lukas, Mario] > > On Tue, Dec 02, 2025 at 05:40:07PM +0100, René Rebe wrote: > > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all > > non-x86") was bisected to break various non-x86 RISC Unix systems, > > e.g. sparc64, see two example oopses below. Fix by only allowing D3Hot > > on modern ARM64, PPC64 and RISCV ISAs besides new enough x86. > > I think we need some kind of analysis of what is happening to the PCI > devices here. I don't know why the CPU architecture per se would be > related to PCI power management. That surely would be the best, but given few maintainers work on older architectures it might take a while. This is also old hw from before 2015, like the x86 DMI test. Given the commit enabled it for all that previously failing the dmi year check due: static inline int dmi_get_bios_year(void) { return -ENXIO; } Is it not sensible to first reinstate this for such $arch also to stable trees while we further work on this? > pci_bridge_d3_possible() is already a barely maintainable hodge podge > of random things that work and don't work. Generally speaking most of > those cases relate to firmware. Fair, but this is a rather simple hotfix, for a simple year chec, for a commit that just recently broke this systems. I would also expect this high performance Unix systems might not have been designed or test with dynamic PCI power management in mind, ... René > > Sun Blade 1000: > > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0) > > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603] > > ERROR(0): > > TPC<MakeIocReady+0xc/0x278 [mptbase]> > > ERROR(0): M_SYND(0), E_SYND(0), Privileged > > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus" > > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] > > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000] > > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000] > > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000] > > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000] > > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0] > > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff] > > Kernel panic - not syncing: Irrecoverable deferred error trap. > > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18 > > Call Trace: > > [<00000000004294b0>] panic+0xf0/0x370 > > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8 > > [<0000000000405e88>] c_deferred+0x18/0x24 > > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase] > > I assume both of these crashes are related to the > CHIPREG_READ32(&ioc->chip->Doorbell) in mpt_GetIocState(), e.g., maybe > that PCI read failed because an upstream bridge was not in D0 and > therefore treated the read as an unsupported request. > > > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase] > > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase] > > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas] > > [<0000000000788308>] local_pci_probe+0x24/0x70 > > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0 > > [<000000000082633c>] really_probe+0x13c/0x29c > > [<0000000000826590>] __driver_probe_device+0xf4/0x104 > > [<0000000000826614>] driver_probe_device+0x24/0xa0 > > [<000000000082683c>] __driver_attach+0xe8/0x104 > > [<0000000000824da0>] bus_for_each_dev+0x58/0x84 > > [<0000000000825508>] bus_add_driver+0xdc/0x1f8 > > [<0000000000827110>] driver_register+0x70/0x120 > > > > Niagara T1: > > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible > > NON-RESUMABLE ERROR: Reporting on cpu 31 > > NON-RESUMABLE ERROR: TPC [0x0000000010184034] <MakeIocReady+0x10/0x298 [mptbase]> > > NON-RESUMABLE ERROR: RAW [1f10000000000007:0000000e3179235c:0000000202000004:000000ea00300000 > > NON-RESUMABLE ERROR: 00000000001f0000:0000000000000000:0000000000000000:0000000000000000] > > NON-RESUMABLE ERROR: handle [0x1f10000000000007] stick [0x0000000e3179235c] > > NON-RESUMABLE ERROR: type [precise nonresumable] > > NON-RESUMABLE ERROR: attrs [0x02000004] < PIO sp-faulted priv > > > NON-RESUMABLE ERROR: raddr [0x000000ea00300000] > > Kernel panic - not syncing: Non-resumable error. > > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1 > > Call Trace: > > [<00000000004373c4>] dump_stack+0x8/0x18 > > [<0000000000429540>] panic+0xf4/0x398 > > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240 > > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8 > > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase] > > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase] > > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase] > > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas] > > [<0000000000b3fab0>] local_pci_probe+0x30/0x80 > > [<0000000000b405d4>] pci_device_probe+0xb4/0x240 > > [<0000000000bfd348>] really_probe+0xc8/0x400 > > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160 > > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100 > > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0 > > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0 > > [<0000000000bfcafc>] driver_attach+0x1c/0x40 > > Press Stop-A (L1-A) from sun keyboard or send break > > twice on console to return to the boot prom > > > > Fixes: a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all non-x86") > > Signed-off-by: René Rebe <rene@exactco.de> > > --- > > Tested on Sun Blade 1000, and shipping in all T2/Linux builds since 2025-08-01 > > --- > > drivers/pci/pci.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index b14dd064006c..7619d2cfa66d 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -3033,9 +3033,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) > > > > /* > > * Out of caution, we only allow PCIe ports from 2015 or newer > > - * into D3 on x86. > > + * into D3 or other modern ISAs only. > > */ > > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) > > + if (IS_ENABLED(CONFIG_ARM64) || IS_ENABLED(CONFIG_PPC64) || IS_ENABLED(CONFIG_RISCV) || dmi_get_bios_year() >= 2015) > > return true; > > break; > > } > > -- > > 2.52.0 > > > > -- > > René Rebe, ExactCODE GmbH, Berlin, Germany > > https://exactco.de • https://t2linux.com • https://patreon.com/renerebe -- René Rebe, ExactCODE GmbH, Berlin, Germany https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 17:28 ` Bjorn Helgaas 2025-12-02 17:41 ` René Rebe @ 2025-12-02 21:54 ` Brian Norris 2025-12-03 4:49 ` Lukas Wunner 1 sibling, 1 reply; 22+ messages in thread From: Brian Norris @ 2025-12-02 21:54 UTC (permalink / raw) To: Bjorn Helgaas Cc: René Rebe, linux-pci, linux-kernel, Bjorn Helgaas, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Rafael J. Wysocki, Lukas Wunner, Mario Limonciello On Tue, Dec 02, 2025 at 11:28:37AM -0600, Bjorn Helgaas wrote: > I think we need some kind of analysis of what is happening to the PCI > devices here. I don't know why the CPU architecture per se would be > related to PCI power management. Agreed, and I think it will be very hard to ever make any traction on modernizing the PM stack here if we can't any sort of "why?" answer out of the systems that don't work. The last time this came up, the answer was essentially: https://lore.kernel.org/all/CAJZ5v0j_6jeMAQ7eFkZBe5Yi+USGzysxAgfemYh=-zq4h5W+Qg@mail.gmail.com/ The DMI check at the end of pci_bridge_d3_possible() is really something to the effect of "there is no particular reason to prevent this bridge from going into D3, but try to avoid platforms where it may not work". i.e., no specific reason, but a vague understanding that there is some old HW that doesn't work. That's not very helpful for supporting non-DMI systems that don't have a programmatic notion of "old." OTOH, I sympathize with Rene, that it's hard to dig into what amounts to new development on old platforms, and yet, they do remain broken. > pci_bridge_d3_possible() is already a barely maintainable hodge podge > of random things that work and don't work. Generally speaking most of > those cases relate to firmware. I wonder if we could take a different approach that helps straddle the uncertain boundary here a bit: 1) be more aggressive at *permitting* runtime PM / D3 for bridges (i.e., if we think a bridge might be OK to go to D3, then manage its get()/put() properly); and 2) be less aggressive about default-enabling runtime suspend / D3 (i.e., only call pm_runtime_allow() in drivers/pci/pcie/portdrv.c in limited circumstances). For #2, that would actually match the documentation: Documentation/power/pci.rst The driver itself should not call pm_runtime_allow(), though. Instead, it should let user space or some platform-specific code do that (user space can do it via sysfs as stated above), but it must be prepared to handle the runtime PM of the device correctly as soon as pm_runtime_allow() is called (which may happen at any time, even before the driver is loaded). So instead of portdrv.c calling pm_runtime_allow(), we'd leave that decision to user space (i.e., udev or similar). That will help limit the impact of getting #1 "wrong." And it's possible the bad systems didn't really want aggressive PM anyway, so it's not worth much trouble. For #1, that means pci_bridge_d3_possible() would become more like pci_bridge_d3_impossible(). We could leave it as-is, or at least ensure it fails toward the "possible" side. IOW, user space can choose to opt in by way of: echo auto > /sys/bus/pci/devices/[port device]/power/control That might require some new udev rules if existing x86 systems are supposed to retain their old behavior. Personally, I care more about #1 (that the kernel manages pm_runtime_*() refcounts properly, so that my systems *can* opt into aggressive PM), and less about #2 (it's a fact of life that PM policy often requires careful udev / sysfs management, and that the defaults will not necessarily give the best power savings). This might leave some old unmaintained systems as "D3 possible", but we don't actually exercise it if user space doesn't poke /sys/bus/pci/devices/[port device]/power/control. Brian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 21:54 ` Brian Norris @ 2025-12-03 4:49 ` Lukas Wunner 2025-12-03 14:27 ` Mika Westerberg 0 siblings, 1 reply; 22+ messages in thread From: Lukas Wunner @ 2025-12-03 4:49 UTC (permalink / raw) To: Brian Norris Cc: Bjorn Helgaas, René Rebe, linux-pci, linux-kernel, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Rafael J. Wysocki, Mario Limonciello, Mika Westerberg [cc += Mika] On Tue, Dec 02, 2025 at 01:54:00PM -0800, Brian Norris wrote: > I wonder if we could take a different approach that helps straddle the > uncertain boundary here a bit: [...] > 2) be less aggressive about default-enabling runtime suspend / D3 > (i.e., only call pm_runtime_allow() in drivers/pci/pcie/portdrv.c in > limited circumstances). [...] > So instead of portdrv.c calling pm_runtime_allow(), we'd leave that > decision to user space (i.e., udev or similar). That will help limit the > impact of getting #1 "wrong." And it's possible the bad systems didn't > really want aggressive PM anyway, so it's not worth much trouble. I think runtime PM support in the PCIe port driver was primarily motivated by the need to power down Thunderbolt controllers when they're not in use. A Thunderbolt controller exposes a PCIe switch. Daisy-chained Thunderbolt devices are thus visible to the OS as nested switches. If we followed the approach you're suggesting, users would have to manually allow runtime PM on every Switch Upstream and Downstream Port as well as the Root Port and they'd have to do that upon hotplugging a device. Yes, yes, users could add a udev rule to allow runtime PM automatically by default, but that's exactly the policy we have hardcoded in the kernel right now, so why the change? I expect massive power regressions for users (not least Chromebook users) if we made that change. The discrete Thunderbolt controller in my machine consumes 1.5W when nothing is attached. Some laptops have multiple of these. Recent CPUs with integrated Thunderbolt/USB4 may fail to transition the package to a low power state unless the Thunderbolt ports go to D3hot. So I don't think this approach is a viable option. Thanks, Lukas ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-03 4:49 ` Lukas Wunner @ 2025-12-03 14:27 ` Mika Westerberg 2025-12-03 14:48 ` René Rebe 0 siblings, 1 reply; 22+ messages in thread From: Mika Westerberg @ 2025-12-03 14:27 UTC (permalink / raw) To: Lukas Wunner Cc: Brian Norris, Bjorn Helgaas, René Rebe, linux-pci, linux-kernel, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Rafael J. Wysocki, Mario Limonciello Hi, On Wed, Dec 03, 2025 at 05:49:37AM +0100, Lukas Wunner wrote: > [cc += Mika] > > On Tue, Dec 02, 2025 at 01:54:00PM -0800, Brian Norris wrote: > > I wonder if we could take a different approach that helps straddle the > > uncertain boundary here a bit: > [...] > > 2) be less aggressive about default-enabling runtime suspend / D3 > > (i.e., only call pm_runtime_allow() in drivers/pci/pcie/portdrv.c in > > limited circumstances). > [...] > > So instead of portdrv.c calling pm_runtime_allow(), we'd leave that > > decision to user space (i.e., udev or similar). That will help limit the > > impact of getting #1 "wrong." And it's possible the bad systems didn't > > really want aggressive PM anyway, so it's not worth much trouble. > > I think runtime PM support in the PCIe port driver was primarily > motivated by the need to power down Thunderbolt controllers when > they're not in use. That and also there are discrete GPUs that can runtime suspend when not in use. > A Thunderbolt controller exposes a PCIe switch. Daisy-chained > Thunderbolt devices are thus visible to the OS as nested switches. > If we followed the approach you're suggesting, users would have to > manually allow runtime PM on every Switch Upstream and Downstream Port > as well as the Root Port and they'd have to do that upon hotplugging > a device. Yes, yes, users could add a udev rule to allow runtime PM > automatically by default, but that's exactly the policy we have hardcoded > in the kernel right now, so why the change? > > I expect massive power regressions for users (not least Chromebook > users) if we made that change. > > The discrete Thunderbolt controller in my machine consumes 1.5W > when nothing is attached. Some laptops have multiple of these. > Recent CPUs with integrated Thunderbolt/USB4 may fail to transition > the package to a low power state unless the Thunderbolt ports go > to D3hot. > > So I don't think this approach is a viable option. I agree. If this is limited to some older RISC machines (based on the $subject) perhaps this could be solved by adding udev rules to block runtime PM on those certain ports? ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-03 14:27 ` Mika Westerberg @ 2025-12-03 14:48 ` René Rebe 2025-12-03 15:22 ` Rafael J. Wysocki 0 siblings, 1 reply; 22+ messages in thread From: René Rebe @ 2025-12-03 14:48 UTC (permalink / raw) To: Mika Westerberg Cc: Lukas Wunner, Brian Norris, Bjorn Helgaas, linux-pci, linux-kernel, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Rafael J. Wysocki, Mario Limonciello Hi, > On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote: … >> A Thunderbolt controller exposes a PCIe switch. Daisy-chained >> Thunderbolt devices are thus visible to the OS as nested switches. >> If we followed the approach you're suggesting, users would have to >> manually allow runtime PM on every Switch Upstream and Downstream Port >> as well as the Root Port and they'd have to do that upon hotplugging >> a device. Yes, yes, users could add a udev rule to allow runtime PM >> automatically by default, but that's exactly the policy we have hardcoded >> in the kernel right now, so why the change? >> >> I expect massive power regressions for users (not least Chromebook >> users) if we made that change. >> >> The discrete Thunderbolt controller in my machine consumes 1.5W >> when nothing is attached. Some laptops have multiple of these. >> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition >> the package to a low power state unless the Thunderbolt ports go >> to D3hot. >> >> So I don't think this approach is a viable option. > > I agree. If this is limited to some older RISC machines (based on the > $subject) perhaps this could be solved by adding udev rules to block > runtime PM on those certain ports? Let’s not overcomplicate it for now. All we have are a couple of old Unix RISC workstations. Let’s see if we can somehow fix them for real first. Given the feedback that D3Hot “should” more often work I went ahead and changed the patch in T2/Linux removing the 2015 check and all arch except SPARC and let our prosumer enthusiast users find out if something else breaks first to gather more data points. I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have some other TODO first, so it might be January for more work on that. Maybe we should push a patch to only disable this for SPARC64 to stable In the meantime? https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 2b53219fda3b..869d204a70a3 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) return false; /* - * Out of caution, we only allow PCIe ports from 2015 or newer - * into D3 on x86. + * It should be safe to put PCIe ports to D3. */ - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) + if (!IS_ENABLED(CONFIG_SPARC64)) return true; break; } René -- https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-03 14:48 ` René Rebe @ 2025-12-03 15:22 ` Rafael J. Wysocki 2025-12-03 15:26 ` René Rebe 0 siblings, 1 reply; 22+ messages in thread From: Rafael J. Wysocki @ 2025-12-03 15:22 UTC (permalink / raw) To: René Rebe Cc: Mika Westerberg, Lukas Wunner, Brian Norris, Bjorn Helgaas, linux-pci, linux-kernel, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Rafael J. Wysocki, Mario Limonciello On Wed, Dec 3, 2025 at 3:48 PM René Rebe <rene@exactco.de> wrote: > > Hi, > > > On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > > … > > >> A Thunderbolt controller exposes a PCIe switch. Daisy-chained > >> Thunderbolt devices are thus visible to the OS as nested switches. > >> If we followed the approach you're suggesting, users would have to > >> manually allow runtime PM on every Switch Upstream and Downstream Port > >> as well as the Root Port and they'd have to do that upon hotplugging > >> a device. Yes, yes, users could add a udev rule to allow runtime PM > >> automatically by default, but that's exactly the policy we have hardcoded > >> in the kernel right now, so why the change? > >> > >> I expect massive power regressions for users (not least Chromebook > >> users) if we made that change. > >> > >> The discrete Thunderbolt controller in my machine consumes 1.5W > >> when nothing is attached. Some laptops have multiple of these. > >> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition > >> the package to a low power state unless the Thunderbolt ports go > >> to D3hot. > >> > >> So I don't think this approach is a viable option. > > > > I agree. If this is limited to some older RISC machines (based on the > > $subject) perhaps this could be solved by adding udev rules to block > > runtime PM on those certain ports? > > Let’s not overcomplicate it for now. All we have are a couple of old Unix > RISC workstations. Let’s see if we can somehow fix them for real first. > > Given the feedback that D3Hot “should” more often work I went ahead > and changed the patch in T2/Linux removing the 2015 check and all arch > except SPARC and let our prosumer enthusiast users find out if something > else breaks first to gather more data points. > > I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have > some other TODO first, so it might be January for more work on that. > > Maybe we should push a patch to only disable this for SPARC64 to stable > In the meantime? > > https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 2b53219fda3b..869d204a70a3 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) > return false; > > /* > - * Out of caution, we only allow PCIe ports from 2015 or newer > - * into D3 on x86. > + * It should be safe to put PCIe ports to D3. > */ > - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) > + if (!IS_ENABLED(CONFIG_SPARC64)) > return true; > break; > } I would prefer if ((IS_ENABLED(CONFIG_X86) && dmi_get_bios_year() >= 2015) || !IS_ENABLED(CONFIG_SPARC64)) return true; ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-03 15:22 ` Rafael J. Wysocki @ 2025-12-03 15:26 ` René Rebe 2025-12-03 17:16 ` Rafael J. Wysocki 0 siblings, 1 reply; 22+ messages in thread From: René Rebe @ 2025-12-03 15:26 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Mika Westerberg, Lukas Wunner, Brian Norris, Bjorn Helgaas, linux-pci, linux-kernel, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Mario Limonciello Hi, > On 3. Dec 2025, at 16:22, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Wed, Dec 3, 2025 at 3:48 PM René Rebe <rene@exactco.de> wrote: >> >> Hi, >> >>> On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote: >> >> … >> >>>> A Thunderbolt controller exposes a PCIe switch. Daisy-chained >>>> Thunderbolt devices are thus visible to the OS as nested switches. >>>> If we followed the approach you're suggesting, users would have to >>>> manually allow runtime PM on every Switch Upstream and Downstream Port >>>> as well as the Root Port and they'd have to do that upon hotplugging >>>> a device. Yes, yes, users could add a udev rule to allow runtime PM >>>> automatically by default, but that's exactly the policy we have hardcoded >>>> in the kernel right now, so why the change? >>>> >>>> I expect massive power regressions for users (not least Chromebook >>>> users) if we made that change. >>>> >>>> The discrete Thunderbolt controller in my machine consumes 1.5W >>>> when nothing is attached. Some laptops have multiple of these. >>>> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition >>>> the package to a low power state unless the Thunderbolt ports go >>>> to D3hot. >>>> >>>> So I don't think this approach is a viable option. >>> >>> I agree. If this is limited to some older RISC machines (based on the >>> $subject) perhaps this could be solved by adding udev rules to block >>> runtime PM on those certain ports? >> >> Let’s not overcomplicate it for now. All we have are a couple of old Unix >> RISC workstations. Let’s see if we can somehow fix them for real first. >> >> Given the feedback that D3Hot “should” more often work I went ahead >> and changed the patch in T2/Linux removing the 2015 check and all arch >> except SPARC and let our prosumer enthusiast users find out if something >> else breaks first to gather more data points. >> >> I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have >> some other TODO first, so it might be January for more work on that. >> >> Maybe we should push a patch to only disable this for SPARC64 to stable >> In the meantime? >> >> https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index 2b53219fda3b..869d204a70a3 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) >> return false; >> >> /* >> - * Out of caution, we only allow PCIe ports from 2015 or newer >> - * into D3 on x86. >> + * It should be safe to put PCIe ports to D3. >> */ >> - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) >> + if (!IS_ENABLED(CONFIG_SPARC64)) >> return true; >> break; >> } > > I would prefer > > if ((IS_ENABLED(CONFIG_X86) && dmi_get_bios_year() >= 2015) || > !IS_ENABLED(CONFIG_SPARC64)) > return true; Sorry for any confusion, I did not mean the above for upstream, but as I tried to express for us downstream in T2 to gather more data (if any) from our users for for pre 2015 x86 machines. Should I send your proposal which matches mine for stable in the meantime? René -- https://exactco.de • https://t2linux.com • https://patreon.com/renerebe ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-03 15:26 ` René Rebe @ 2025-12-03 17:16 ` Rafael J. Wysocki 0 siblings, 0 replies; 22+ messages in thread From: Rafael J. Wysocki @ 2025-12-03 17:16 UTC (permalink / raw) To: René Rebe Cc: Rafael J. Wysocki, Mika Westerberg, Lukas Wunner, Brian Norris, Bjorn Helgaas, linux-pci, linux-kernel, John Paul Adrian Glaubitz, Riccardo Mottola, Manivannan Sadhasivam, Mario Limonciello On Wed, Dec 3, 2025 at 4:26 PM René Rebe <rene@exactco.de> wrote: > > Hi, > > > On 3. Dec 2025, at 16:22, Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Wed, Dec 3, 2025 at 3:48 PM René Rebe <rene@exactco.de> wrote: > >> > >> Hi, > >> > >>> On 3. Dec 2025, at 15:27, Mika Westerberg <mika.westerberg@linux.intel.com> wrote: > >> > >> … > >> > >>>> A Thunderbolt controller exposes a PCIe switch. Daisy-chained > >>>> Thunderbolt devices are thus visible to the OS as nested switches. > >>>> If we followed the approach you're suggesting, users would have to > >>>> manually allow runtime PM on every Switch Upstream and Downstream Port > >>>> as well as the Root Port and they'd have to do that upon hotplugging > >>>> a device. Yes, yes, users could add a udev rule to allow runtime PM > >>>> automatically by default, but that's exactly the policy we have hardcoded > >>>> in the kernel right now, so why the change? > >>>> > >>>> I expect massive power regressions for users (not least Chromebook > >>>> users) if we made that change. > >>>> > >>>> The discrete Thunderbolt controller in my machine consumes 1.5W > >>>> when nothing is attached. Some laptops have multiple of these. > >>>> Recent CPUs with integrated Thunderbolt/USB4 may fail to transition > >>>> the package to a low power state unless the Thunderbolt ports go > >>>> to D3hot. > >>>> > >>>> So I don't think this approach is a viable option. > >>> > >>> I agree. If this is limited to some older RISC machines (based on the > >>> $subject) perhaps this could be solved by adding udev rules to block > >>> runtime PM on those certain ports? > >> > >> Let’s not overcomplicate it for now. All we have are a couple of old Unix > >> RISC workstations. Let’s see if we can somehow fix them for real first. > >> > >> Given the feedback that D3Hot “should” more often work I went ahead > >> and changed the patch in T2/Linux removing the 2015 check and all arch > >> except SPARC and let our prosumer enthusiast users find out if something > >> else breaks first to gather more data points. > >> > >> I’ll try to find time to debug the SPARC64 Sun Blade 1K issue, but I have > >> some other TODO first, so it might be January for more work on that. > >> > >> Maybe we should push a patch to only disable this for SPARC64 to stable > >> In the meantime? > >> > >> https://svn.exactcode.de/t2/trunk/package/kernel/linux/hotfix-legacy-pci-bridge-d3.patch > >> > >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >> index 2b53219fda3b..869d204a70a3 100644 > >> --- a/drivers/pci/pci.c > >> +++ b/drivers/pci/pci.c > >> @@ -3067,10 +3067,9 @@ bool pci_bridge_d3_possible(struct pci_dev *bridge) > >> return false; > >> > >> /* > >> - * Out of caution, we only allow PCIe ports from 2015 or newer > >> - * into D3 on x86. > >> + * It should be safe to put PCIe ports to D3. > >> */ > >> - if (!IS_ENABLED(CONFIG_X86) || dmi_get_bios_year() >= 2015) > >> + if (!IS_ENABLED(CONFIG_SPARC64)) > >> return true; > >> break; > >> } > > > > I would prefer > > > > if ((IS_ENABLED(CONFIG_X86) && dmi_get_bios_year() >= 2015) || > > !IS_ENABLED(CONFIG_SPARC64)) > > return true; > > Sorry for any confusion, I did not mean the above for upstream, but as I > tried to express for us downstream in T2 to gather more data (if any) from > our users for for pre 2015 x86 machines. > > Should I send your proposal which matches mine for stable in the > meantime? Yes, you can do this, as far as I'm concerned. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems 2025-12-02 16:40 [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems René Rebe 2025-12-02 16:54 ` John Paul Adrian Glaubitz 2025-12-02 17:28 ` Bjorn Helgaas @ 2025-12-03 5:15 ` Lukas Wunner 2 siblings, 0 replies; 22+ messages in thread From: Lukas Wunner @ 2025-12-03 5:15 UTC (permalink / raw) To: René Rebe Cc: linux-pci, linux-kernel, Bjorn Helgaas, John Paul Adrian Glaubitz, Riccardo Mottola On Tue, Dec 02, 2025 at 05:40:07PM +0100, René Rebe wrote: > Commit a5fb3ff63287 ("PCI: Allow PCI bridges to go to D3Hot on all > non-x86") was bisected to break various non-x86 RISC Unix systems, > e.g. sparc64, see two example oopses below. [...] > Sun Blade 1000: > ERROR(0): Cheetah error trap taken afsr[0010080005000000] afar[000007f900800000] TL1(0) > ERROR(0): TPC[100a05a4] TNPC[100a05a8] O7[42acc8] TSTATE[4411001603] > ERROR(0): > TPC<MakeIocReady+0xc/0x278 [mptbase]> > ERROR(0): M_SYND(0), E_SYND(0), Privileged > ERROR(0): Highest priority error (0000080000000000) "Bus error response from system bus" > ERROR(0): D-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] > ERROR(0): D-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[0000000000000000] > ERROR(0): I-cache idx[0] tag[0000000000000000] utag[0000000000000000] stag[0000000000000000] u[0000000000000000] l[0000000000000000] > ERROR(0): I-cache INSN0[0000000000000000] INSN1[0000000000000000] INSN2[0000000000000000] INSN3[0000000000000000] > ERROR(0): I-cache INSN4[0000000000000000] INSN5[0000000000000000] INSN6[0000000000000000] INSN7[0000000000000000] > ERROR(0): E-cache idx[b08040] tag[000000001e008fa0] > ERROR(0): E-cache data0[0000000000000000] data1[0000000000000000] data2[0000000000000000] data3[ffffffffffffffff] > Kernel panic - not syncing: Irrecoverable deferred error trap. Some ARM PCIe host controllers are known to raise a Data Abort exception upon a Completion Timeout (pcie-brcmstb.c is a case in point). It looks like these SPARC CPUs behave similarly. > CPU: 0 UID: 0 PID: 46 Comm: (udev-worker) Not tainted 6.14.0-rc1-00001-ga5fb3ff63287 #18 > Call Trace: > [<00000000004294b0>] panic+0xf0/0x370 > [<0000000000435bc4>] cheetah_deferred_handler+0x2c8/0x2d8 > [<0000000000405e88>] c_deferred+0x18/0x24 > [<00000000100a05a4>] MakeIocReady+0xc/0x278 [mptbase] > [<00000000100a089c>] mpt_do_ioc_recovery+0x8c/0x1054 [mptbase] > [<000000001009f2d4>] mpt_attach+0x920/0xa68 [mptbase] > [<000000001012424c>] mptsas_probe+0x8/0x3e8 [mptsas] > [<0000000000788308>] local_pci_probe+0x24/0x70 > [<0000000000788dac>] pci_device_probe+0x1c0/0x1d0 > [<000000000082633c>] really_probe+0x13c/0x29c > [<0000000000826590>] __driver_probe_device+0xf4/0x104 > [<0000000000826614>] driver_probe_device+0x24/0xa0 > [<000000000082683c>] __driver_attach+0xe8/0x104 > [<0000000000824da0>] bus_for_each_dev+0x58/0x84 > [<0000000000825508>] bus_add_driver+0xdc/0x1f8 > [<0000000000827110>] driver_register+0x70/0x120 I suspect this is a bug in the mpt3sas driver and/or scsi layer. A runtime PM ref is held on the PCI Endpoint device when the driver probes, so that ref must have been dropped. The Endpoint (SCSI host controller) went into runtime suspend, which allowed the Root Port to go to D3hot. When the Root Port is in D3hot, MMIO to the attached Endpoint will cause Completion Timeouts. (Config Space accesses will still work.) I'm not seeing any "pm_runtime" or "autopm" occurrences in drivers/scsi/mpt3sas/, so perhaps the issue is in the scsi layer? To track this down, you'd have to instrument calls to pm_runtime_put() and friends with a printk to see where runtime PM refs are acquired and dropped. Alternatively, enabling tracing may help, there's a few tracepoints in runtime PM code. > mptsas 0000:07:00.0: Unable to change power state from D3cold to D0, device inaccessible Maybe the Root Port or Endpoint need extra delays to resume to D0? > CPU: 31 UID: 0 PID: 367 Comm: (udev-worker) Not tainted 6.16.12+3-sparc64-smp #1 NONE Debian 6.16.12-2+sparc64.1 > Call Trace: > [<00000000004373c4>] dump_stack+0x8/0x18 > [<0000000000429540>] panic+0xf4/0x398 > [<000000000043afcc>] sun4v_nonresum_error+0x16c/0x240 > [<0000000000406eb8>] sun4v_nonres_mondo+0xc8/0xd8 > [<0000000010184034>] MakeIocReady+0x10/0x298 [mptbase] > [<00000000101844b4>] mpt_do_ioc_recovery+0x9c/0x1110 [mptbase] > [<00000000101836f8>] mpt_attach+0xb58/0xd20 [mptbase] > [<0000000010287f30>] mptsas_probe+0x10/0x440 [mptsas] > [<0000000000b3fab0>] local_pci_probe+0x30/0x80 > [<0000000000b405d4>] pci_device_probe+0xb4/0x240 > [<0000000000bfd348>] really_probe+0xc8/0x400 > [<0000000000bfd70c>] __driver_probe_device+0x8c/0x160 > [<0000000000bfd8c8>] driver_probe_device+0x28/0x100 > [<0000000000bfdb7c>] __driver_attach+0xbc/0x1e0 > [<0000000000bfacfc>] bus_for_each_dev+0x5c/0xc0 > [<0000000000bfcafc>] driver_attach+0x1c/0x40 Same stracktrace, same bug I guess. Thanks, Lukas ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-12-07 14:40 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-02 16:40 [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems René Rebe
2025-12-02 16:54 ` John Paul Adrian Glaubitz
2025-12-02 17:04 ` René Rebe
2025-12-02 18:20 ` PCI bridge window issue (Was: Re: [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems) Ilpo Järvinen
2025-12-02 18:29 ` PCI bridge window issue René Rebe
2025-12-02 19:35 ` Ilpo Järvinen
2025-12-06 1:07 ` [PATCH] PCI: Fix PCI bridges not to go to D3Hot on older RISC systems Maciej W. Rozycki
2025-12-06 8:31 ` John Paul Adrian Glaubitz
2025-12-06 10:02 ` René Rebe
[not found] ` <339B5A39-BC20-489A-9969-BF01B4E6AD63@exactco.de>
2025-12-07 14:40 ` Maciej W. Rozycki
2025-12-06 10:14 ` René Rebe
2025-12-07 14:31 ` Maciej W. Rozycki
2025-12-02 17:28 ` Bjorn Helgaas
2025-12-02 17:41 ` René Rebe
2025-12-02 21:54 ` Brian Norris
2025-12-03 4:49 ` Lukas Wunner
2025-12-03 14:27 ` Mika Westerberg
2025-12-03 14:48 ` René Rebe
2025-12-03 15:22 ` Rafael J. Wysocki
2025-12-03 15:26 ` René Rebe
2025-12-03 17:16 ` Rafael J. Wysocki
2025-12-03 5:15 ` Lukas Wunner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox