public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Adam Stylinski <kungfujesus06@gmail.com>
Cc: linux-pci@vger.kernel.org, bhelgaas@google.com
Subject: Re: Kernel regression in 6.13
Date: Thu, 18 Dec 2025 15:54:32 +0200 (EET)	[thread overview]
Message-ID: <fcf1a483-8ef8-7450-2e9e-f82be527a49d@linux.intel.com> (raw)
In-Reply-To: <aUFreCbbgfb-5Wh1@eggsbenedict>

[-- Attachment #1: Type: text/plain, Size: 14233 bytes --]

On Tue, 16 Dec 2025, Adam Stylinski wrote:
> On Tue, Dec 16, 2025 at 04:14:10PM +0200, Ilpo Järvinen wrote:
> > On Tue, 16 Dec 2025, Adam Stylinski wrote:
> > 
> > > On Tue, Dec 16, 2025 at 11:49:45AM +0200, Ilpo Järvinen wrote:
> > > > On Mon, 15 Dec 2025, Adam Stylinski wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > I seem to be encountering a regression that prevents my system from 
> > > > > booting.  The regression occurred between 6.12 and 6.13.  I've bisected 
> > > > > it to this commit:
> > > > > 665745f274870c921020f610e2c99a3b1613519b
> > > > > 
> > > > > Some info about this system: it's ancient. It's a Q9650 that I used as a 
> > > > > mythbackend/frontend for over a decade. This booting failure on newer 
> > > > > kernels finally forced my hand to buy new a "new" PCI Express based 
> > > > > tuner and upgrade the system into the modern age. It boots via MBR on a 
> > > > > P45 based chipset (A P5Q Plus board, to be precise).  Given the age, I 
> > > > > chalked the issue up to possibly some failing hardware or memory 
> > > > > corruption that happened at compile time. I recently pulled the system 
> > > > > back out again to do some performance testing in zlib-ng only to find 
> > > > > out it hangs on the latest Ubuntu server ISO. I figured at this point it 
> > > > > wasn't something specific to my kernel config / compilation and it's 
> > > > > likely a regression. It's also old enough that I may be in the position 
> > > > > of the only one having this problem, so I took it upon myself to bisect 
> > > > > what was going on. Let me know if there's anything you'd like me to test 
> > > > > or try.
> > > > 
> > > > Hi,
> > > > 
> > > > Thanks for the report.
> > > > 
> > > > In pcie_bwnotif_enable() there's pcie_capability_set_word() that enables
> > > > bandwidth notifications:
> > > > 
> > > > 	        pcie_capability_set_word(port, PCI_EXP_LNKCTL,
> > > >                                  PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
> > > > 
> > > > So as the first step change those PCI_EXP_LNKCTL_LBMIE | 
> > > > PCI_EXP_LNKCTL_LABIE into 0 to see if not enabling the bandwitdh 
> > > > notification allows the system to come up.
> > > > 
> > > > I suggest not trying this directly at the top of 665745f27487 
> > > > ("PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller") 
> > > > but on a kernel that is expected to have fixes since 665745f27487 
> > > > including those made to the other PCIe service drivers that share 
> > > > interrupt handler with bwctrl (so basically some stable version).
> > > > 
> > > > If that works try to enable those bits one at a time.
> > > > 
> > > > Please also send lspci -vvv.
> > > > 
> > > > -- 
> > > >  i.
> > > > 
> > > 
> > > I'll try changing those values atop of the 6.18 tagged commit and let you know how it goes.  Thanks for looking into this.
> > > The privileged lspci -vv output is below:
> > > 
> > > 00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)
> > > 	Subsystem: ASUSTeK Computer Inc. P5Q Deluxe Motherboard
> > > 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> > > 	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
> > > 	Latency: 0
> > > 	Capabilities: [e0] Vendor Specific Information: Intel <unknown>
> > > 
> > > 00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03) (prog-if 00 [Normal decode])
> > > 	Subsystem: ASUSTeK Computer Inc. P5Q Deluxe Motherboard
> > > 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> > > 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> > > 	Latency: 0, Cache Line Size: 32 bytes
> > > 	Interrupt: pin A routed to IRQ 24
> > > 	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> > > 	I/O behind bridge: c000-cfff [size=4K] [16-bit]
> > > 	Memory behind bridge: fd000000-fe9fffff [size=26M] [32-bit]
> > > 	Prefetchable memory behind bridge: c0000000-dfffffff [size=512M] [32-bit]
> > > 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> > > 	BridgeCtl: Parity- SERR+ NoISA- VGA+ VGA16+ MAbort- >Reset- FastB2B-
> > > 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> > > 	Capabilities: [88] Subsystem: ASUSTeK Computer Inc. P5Q Deluxe Motherboard
> > > 	Capabilities: [80] Power Management version 3
> > > 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> > > 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> > > 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> > > 		Address: fee02000  Data: 0020
> > > 	Capabilities: [a0] Express (v2) Root Port (Slot+), IntMsgNum 0
> > > 		DevCap:	MaxPayload 128 bytes, PhantFunc 0
> > > 			ExtTag- RBE+ TEE-IO-
> > > 		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
> > > 			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> > > 			MaxPayload 128 bytes, MaxReadReq 128 bytes
> > > 		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> > > 		LnkCap:	Port #2, Speed 5GT/s, Width x16, ASPM L0s, Exit Latency L0s <256ns
> > > 			ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp-
> > > 		LnkCtl:	ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
> > > 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
> > > 		LnkSta:	Speed 2.5GT/s, Width x16
> > > 			TrErr- Train- SlotClk+ DLActive- BWMgmt+ ABWMgmt+
> > 
> > At least this Root Port has both BWMgmt and ABWMgmt asserted (not a 
> > problem in itself, necessarily).
> > 
> > If you get the system working by changing that set_word call, it's worth 
> > to check if these got reasserted (bwctrl tries to clear them right after 
> > the set word call but it could be they get reasserted).
> > 
> > -- 
> >  i.
> 
> Yes, I was able to boot after forcing those flags to zero.  Here's lspci -vvv after booting into 6.18:
> 
> 00:00.0 Host bridge: Intel Corporation 4 Series Chipset DRAM Controller (rev 03)
> 	Subsystem: ASUSTeK Computer Inc. P5Q Deluxe Motherboard
> 	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
> 	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ >SERR- <PERR- INTx-
> 	Latency: 0
> 	Capabilities: [e0] Vendor Specific Information: Intel <unknown>
> lspci: Unable to load libkmod resources: error -2
> 
> 00:01.0 PCI bridge: Intel Corporation 4 Series Chipset PCI Express Root Port (rev 03) (prog-if 00 [Normal decode])
> 	Subsystem: ASUSTeK Computer Inc. P5Q Deluxe Motherboard
> 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
> 	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> 	Latency: 0, Cache Line Size: 32 bytes
> 	Interrupt: pin A routed to IRQ 24
> 	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
> 	I/O behind bridge: c000-cfff [size=4K] [16-bit]
> 	Memory behind bridge: fd000000-fe9fffff [size=26M] [32-bit]
> 	Prefetchable memory behind bridge: c0000000-dfffffff [size=512M] [32-bit]
> 	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
> 	BridgeCtl: Parity- SERR+ NoISA- VGA+ VGA16+ MAbort- >Reset- FastB2B-
> 		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
> 	Capabilities: [88] Subsystem: ASUSTeK Computer Inc. P5Q Deluxe Motherboard
> 	Capabilities: [80] Power Management version 3
> 		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
> 		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
> 	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
> 		Address: fee02000  Data: 0020
> 	Capabilities: [a0] Express (v2) Root Port (Slot+), IntMsgNum 0
> 		DevCap:	MaxPayload 128 bytes, PhantFunc 0
> 			ExtTag- RBE+ TEE-IO-
> 		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
> 			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> 			MaxPayload 128 bytes, MaxReadReq 128 bytes
> 		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
> 		LnkCap:	Port #2, Speed 5GT/s, Width x16, ASPM L0s, Exit Latency L0s <256ns
> 			ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp-
> 		LnkCtl:	ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
> 			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
> 		LnkSta:	Speed 2.5GT/s, Width x16
> 			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> 		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
> 			Slot #0, PowerLimit 75W; Interlock- NoCompl-
> 		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
> 			Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
> 		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> 			Changed: MRL- PresDet+ LinkState-
> 		RootCap: CRSVisible-
> 		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
> 		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
> 		DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR-
> 			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
> 			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
> 			 FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd-
> 			 AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
> 		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
> 			 AtomicOpsCtl: ReqEn- EgressBlck-
> 			 IDOReq- IDOCompl- LTR- EmergencyPowerReductionReq-
> 			 10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
> 		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> 			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> 			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
> 		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> 			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
> 			 Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
> 	Capabilities: [100 v1] Virtual Channel
> 		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
> 		Arb:	Fixed- WRR32- WRR64- WRR128-
> 		Ctrl:	ArbSelect=Fixed
> 		Status:	InProgress-
> 		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
> 			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
> 			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=01
> 			Status:	NegoPending- InProgress-
> 	Capabilities: [140 v1] Root Complex Link
> 		Desc:	PortNumber=02 ComponentID=01 EltType=Config
> 		Link0:	Desc:	TargetPort=00 TargetComponent=01 AssocRCRB- LinkType=MemMapped LinkValid+
> 			Addr:	00000000fed19000
> 	Kernel driver in use: pcieport

Hi.

Here's a quirk patch to disable bwctrl on this Root Port, assuming I 
guessed the PCI device ID for it right, please check it matches to 00:01.0 
(I should have asked lspci with -n to see the raw number but you can 
easily correct it yourself too before compiling the kernel).

From 1e13651f8789fb9df060269a7b7c396211d910f8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com>
Date: Thu, 18 Dec 2025 15:45:25 +0200
Subject: [PATCH 1/1] PCI/bwctrl: Disable BW controller on Intel P45 using a quirk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The commit 665745f27487 ("PCI/bwctrl: Re-add BW notification portdrv as
PCIe BW controller") was found to lead to a boot hang on a Intel P45
system. Testing without setting Link Bandwidth Management Interrupt
Enable (LBMIE) and Link Autonomous Bandwidth Interrupt Enable (LABIE)
(PCIe r7.0, sec. 7.5.3.7) in bwctrl allowed system to come up.

Add no_bw_notif into the struct pci_dev and quirk Intel P45 Root Port
with it.

Reported-by: Adam Stylinski <kungfujesus06@gmail.com>
Link: https://lore.kernel.org/linux-pci/aUCt1tHhm_-XIVvi@eggsbenedict/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pcie/bwctrl.c |  3 +++
 drivers/pci/quirks.c      | 10 ++++++++++
 include/linux/pci.h       |  1 +
 3 files changed, 14 insertions(+)

diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index 36f939f23d34..4ae92c9f912a 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -250,6 +250,9 @@ static int pcie_bwnotif_probe(struct pcie_device *srv)
 	struct pci_dev *port = srv->port;
 	int ret;
 
+	if (port->no_bw_notif)
+		return -ENODEV;
+
 	/* Can happen if we run out of bus numbers during enumeration. */
 	if (!port->subordinate)
 		return -ENODEV;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b9c252aa6fe0..6ef42a2c4831 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1359,6 +1359,16 @@ static void quirk_transparent_bridge(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL,	PCI_DEVICE_ID_INTEL_82380FB,	quirk_transparent_bridge);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_TOSHIBA,	0x605,	quirk_transparent_bridge);
 
+/*
+ * Enabling Link Bandwidth Management Interrupts (BW notifications) can cause
+ * boot hangs on P45.
+ */
+static void quirk_p45_bw_notifications(struct pci_dev *dev)
+{
+	dev->no_bw_notif = 1;
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e21, quirk_p45_bw_notifications);
+
 /*
  * Common misconfiguration of the MediaGX/Geode PCI master that will reduce
  * PCI bandwidth from 70MB/s to 25MB/s.  See the GXM/GXLV/GX1 datasheets
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 864775651c6f..3a556cd749e3 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -406,6 +406,7 @@ struct pci_dev {
 						      user sysfs */
 	unsigned int	clear_retrain_link:1;	/* Need to clear Retrain Link
 						   bit manually */
+	unsigned int	no_bw_notif:1;	/* BW notifications may cause issues */
 	unsigned int	d3hot_delay;	/* D3hot->D0 transition time in ms */
 	unsigned int	d3cold_delay;	/* D3cold->D0 transition time in ms */
 

base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
-- 
2.39.5

  reply	other threads:[~2025-12-18 13:54 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-16  0:54 Kernel regression in 6.13 Adam Stylinski
2025-12-16  9:49 ` Ilpo Järvinen
2025-12-16 13:57   ` Adam Stylinski
2025-12-16 14:14     ` Ilpo Järvinen
2025-12-16 14:23       ` Adam Stylinski
2025-12-18 13:54         ` Ilpo Järvinen [this message]
     [not found]           ` <aURK74sGvdTGBMdb@eggsbenedict>
     [not found]             ` <3fd99997-a639-d971-e43e-bcc973aa6d04@linux.intel.com>
     [not found]               ` <aUWG4H3GH-k2ebpa@eggsbenedict>
     [not found]                 ` <aVyBBOXXDEN8dses@thinkpad>
     [not found]                   ` <2c078897-3a1e-a016-1d93-237cfb71fb94@linux.intel.com>
     [not found]                     ` <aWgnZLSenkOiXiuM@thinkpad>
2026-01-14 23:40                       ` Adam Stylinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcf1a483-8ef8-7450-2e9e-f82be527a49d@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=kungfujesus06@gmail.com \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox