public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2
@ 2026-01-22 13:09 Håkon Bugge
  2026-01-22 13:09 ` [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
  2026-01-22 13:09 ` [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits Håkon Bugge
  0 siblings, 2 replies; 16+ messages in thread
From: Håkon Bugge @ 2026-01-22 13:09 UTC (permalink / raw)
  To: Bjorn Helgaas, Niklas Schnelle
  Cc: Alex Williamson, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi

This series add the initialization of the Link Control register's RCB
to pci_configure_rcb() called from pci_configure_device() and also
cleans up the incorrect program_hpx_type2():

1. It should only be called when we own the PCIe native hotplug and
   not the AER ownership
2. It should only manipulate the AER-bits

In addition, the second commit adds a warning if the _HPX type2
record attempts to modify the Link Control register.

The programming of the device's RCB is constrained to the device types
where it is applicable and also skips VFs. If the Root Port's RCB
cannot be determined, we also skip the programming of the device's
RCB.

Then, we program the device's RCB according to the Root Port's setting.

Håkon Bugge (2):
  PCI: Initialize RCB from pci_configure_device
  PCI/ACPI: Confine program_hpx_type2 to the AER bits

 drivers/pci/pci-acpi.c | 61 +++++++++++++++++++-----------------------
 drivers/pci/pci.h      |  3 +++
 drivers/pci/pcie/aer.c |  3 ---
 drivers/pci/probe.c    | 53 ++++++++++++++++++++++++++++++++++++
 4 files changed, 83 insertions(+), 37 deletions(-)

--
2.43.5


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 13:09 [PATCH v3 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2 Håkon Bugge
@ 2026-01-22 13:09 ` Håkon Bugge
  2026-01-22 13:45   ` Ilpo Järvinen
                     ` (2 more replies)
  2026-01-22 13:09 ` [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits Håkon Bugge
  1 sibling, 3 replies; 16+ messages in thread
From: Håkon Bugge @ 2026-01-22 13:09 UTC (permalink / raw)
  To: Bjorn Helgaas, Niklas Schnelle
  Cc: Alex Williamson, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi, Håkon Bugge

Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
instructed program_hpx_type2() to set the RCB in an endpoint,
although it's RC did not have the RCB bit set.

e42010d8207f fixed that by qualifying the setting of the RCB in the
endpoint with the RC supporting an 128 byte RCB.

In retrospect, the program_hpx_type2() should only modify the AER
bits, and stay away from fiddling with the Link Control Register.

Hence, we explicitly program the RCB from pci_configure_device().

According to PCIe r7.0, sec 7.5.3.7, RCB is only valid for Root Ports
(where it is Read-Only), Bridges, and Endpoints. The bit is 'RsvdP'
for Virtual Functions. Hence, for other cases than Bridges and Physical
Endpoints, we bail out early from pci_configure_rcb().

If the Root Port's RCB cannot be determined, we do nothing.

If RCB is set in the Root Port and not in the device, we set it. If it
is set in the device but not in the Root Port, we print an info
message and reset it.

Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

Note, that the current duplication of pcie_root_rcb_set() will be
removed in the next commit.

v2 -> v3:
   * Qualified the device types more strictly
   * s/pcie_root_rcb_set/pcie_read_root_rcb/ and changed signature
   * Do nothing if the RP's RCB cannot be determined
   * Reset the device's RCB if not set in the RP
---
 drivers/pci/probe.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 53 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d9..7165ac4065c97 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2410,6 +2410,58 @@ static void pci_configure_serr(struct pci_dev *dev)
 	}
 }
 
+static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
+{
+	struct pci_dev *rp = pcie_find_root_port(dev);
+	u16 lnkctl;
+
+	if (!rp)
+		return false;
+
+	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
+
+	*rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
+	return true;
+}
+
+static void pci_configure_rcb(struct pci_dev *dev)
+{
+	u16 lnkctl;
+	bool rcb;
+
+	/*
+	 * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
+	 * Ports (where it is read-only), Endpoints, and Bridges.  It
+	 * may only be set for Endpoints and Bridges if it is set in
+	 * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
+	 * Functions. If the Root Port's RCB cannot be determined, we
+	 * bail out.
+	 */
+	if (!pci_is_pcie(dev) ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
+	    dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
+		return;
+
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+	if (rcb) {
+		if (lnkctl & PCI_EXP_LNKCTL_RCB)
+			return;
+
+		lnkctl |= PCI_EXP_LNKCTL_RCB;
+	} else {
+		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
+			return;
+
+		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
+		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
+	}
+
+	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	pci_configure_mps(dev);
@@ -2419,6 +2471,7 @@ static void pci_configure_device(struct pci_dev *dev)
 	pci_configure_aspm_l1ss(dev);
 	pci_configure_eetlp_prefix(dev);
 	pci_configure_serr(dev);
+	pci_configure_rcb(dev);
 
 	pci_acpi_program_hp_params(dev);
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits
  2026-01-22 13:09 [PATCH v3 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2 Håkon Bugge
  2026-01-22 13:09 ` [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
@ 2026-01-22 13:09 ` Håkon Bugge
  2026-01-27 22:24   ` Bjorn Helgaas
  1 sibling, 1 reply; 16+ messages in thread
From: Håkon Bugge @ 2026-01-22 13:09 UTC (permalink / raw)
  To: Bjorn Helgaas, Niklas Schnelle, Rafael J. Wysocki, Len Brown,
	Mahesh J Salgaonkar, Oliver O'Halloran, Greg Kroah-Hartman,
	Kenji Kaneshige
  Cc: Alex Williamson, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi, Håkon Bugge, linuxppc-dev

program_hpx_type2() is today unconditionally called, despite the fact
that when the _HPX was added to the ACPI spec. v3.0, the description
stated:

 OSPM [1] will only evaluate _HPX with Setting Record – Type 2 if OSPM
 is not controlling the PCI Express Advanced Error Reporting
 capability.

Hence, we only call program_hpx_type2() when the OSPM owns the PCIe
hotplug capability but not the AER.

The Advanced Configuration and Power Interface (ACPI) Specification
version 6.6 has a provision that gives the OSPM the ability to control
the other PCIe Device Control bits any way. In a note in section
6.2.9, it is stated:

"OSPM may override the settings provided by the _HPX object's Type2
record (PCI Express Settings) or Type3 record (PCI Express Descriptor
Settings) when OSPM has assumed native control of the corresponding
feature."

So, in order to preserve the non-AER bits in PCIe Device Control, in
particular the performance sensitive ExtTag and RO, we make sure
program_hpx_type2() if called, doesn't modify any non-AER bits.

Also, when program_hpx_type2() is called, we completely avoid
modifying any bits in the Link Control register. However, if the _HPX
type 2 records contains bits indicating such modifications, we print
an info message.

[1] Operating System-directed configuration and Power Management

Fixes: 40abb96c51bb ("[PATCH] pciehp: Fix programming hotplug parameters")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v2 -> v3:
   * No changes

v1 -> v2:
   * Fixed comment style
   * Simplified the and/or logic when programming the Device Control
     register
   * Fixed the incorrect and brutal warning about Link Control
     register bits set and changed it to an info message about _HPX
     attempting to set/reset bits therein.
   * Removed the RCB programming from program_hpx_type2()
   * Moved the PCI_EXP_AER_FLAGS definition from
     drivers/pci/pcie/aer.c to drivers/pci/pci.h
---
 drivers/pci/pci-acpi.c | 61 +++++++++++++++++++-----------------------
 drivers/pci/pci.h      |  3 +++
 drivers/pci/pcie/aer.c |  3 ---
 3 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 9369377725fa0..34ea22f65a410 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -271,21 +271,6 @@ static acpi_status decode_type1_hpx_record(union acpi_object *record,
 	return AE_OK;
 }
 
-static bool pcie_root_rcb_set(struct pci_dev *dev)
-{
-	struct pci_dev *rp = pcie_find_root_port(dev);
-	u16 lnkctl;
-
-	if (!rp)
-		return false;
-
-	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
-	if (lnkctl & PCI_EXP_LNKCTL_RCB)
-		return true;
-
-	return false;
-}
-
 /* _HPX PCI Express Setting Record (Type 2) */
 struct hpx_type2 {
 	u32 revision;
@@ -311,6 +296,7 @@ static void program_hpx_type2(struct pci_dev *dev, struct hpx_type2 *hpx)
 {
 	int pos;
 	u32 reg32;
+	const struct pci_host_bridge *host;
 
 	if (!hpx)
 		return;
@@ -318,6 +304,15 @@ static void program_hpx_type2(struct pci_dev *dev, struct hpx_type2 *hpx)
 	if (!pci_is_pcie(dev))
 		return;
 
+	host = pci_find_host_bridge(dev->bus);
+
+	/*
+	 * We only do the HP programming if we own the PCIe native
+	 * hotplug and not the AER ownership
+	 */
+	if (!host->native_pcie_hotplug || host->native_aer)
+		return;
+
 	if (hpx->revision > 1) {
 		pci_warn(dev, "PCIe settings rev %d not supported\n",
 			 hpx->revision);
@@ -325,33 +320,31 @@ static void program_hpx_type2(struct pci_dev *dev, struct hpx_type2 *hpx)
 	}
 
 	/*
-	 * Don't allow _HPX to change MPS or MRRS settings.  We manage
-	 * those to make sure they're consistent with the rest of the
+	 * We only allow _HPX to program the AER registers, namely
+	 * PCI_EXP_DEVCTL_CERE, PCI_EXP_DEVCTL_NFERE,
+	 * PCI_EXP_DEVCTL_FERE, and PCI_EXP_DEVCTL_URRE.
+	 *
+	 * The other settings in PCIe DEVCTL are managed by OS in
+	 * order to make sure they're consistent with the rest of the
 	 * platform.
 	 */
-	hpx->pci_exp_devctl_and |= PCI_EXP_DEVCTL_PAYLOAD |
-				    PCI_EXP_DEVCTL_READRQ;
-	hpx->pci_exp_devctl_or &= ~(PCI_EXP_DEVCTL_PAYLOAD |
-				    PCI_EXP_DEVCTL_READRQ);
+	hpx->pci_exp_devctl_and |= ~PCI_EXP_AER_FLAGS;
+	hpx->pci_exp_devctl_or &= PCI_EXP_AER_FLAGS;
 
 	/* Initialize Device Control Register */
 	pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 			~hpx->pci_exp_devctl_and, hpx->pci_exp_devctl_or);
 
-	/* Initialize Link Control Register */
+	/* Log if _HPX attempts to modify PCIe Link Control register */
 	if (pcie_cap_has_lnkctl(dev)) {
-
-		/*
-		 * If the Root Port supports Read Completion Boundary of
-		 * 128, set RCB to 128.  Otherwise, clear it.
-		 */
-		hpx->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
-		hpx->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
-		if (pcie_root_rcb_set(dev))
-			hpx->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
-
-		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
-			~hpx->pci_exp_lnkctl_and, hpx->pci_exp_lnkctl_or);
+		if (hpx->pci_exp_lnkctl_and)
+			pci_info(dev,
+				 "_HPX attempts to reset the following bits in PCIe Link Control: 0x%04x\n",
+				 hpx->pci_exp_lnkctl_and);
+		if (hpx->pci_exp_lnkctl_or)
+			pci_info(dev,
+				 "_HPX attempts to set the following bits in PCIe Link Control: 0x%04x\n",
+				 hpx->pci_exp_lnkctl_or);
 	}
 
 	/* Find Advanced Error Reporting Enhanced Capability */
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 0e67014aa0013..f388d4414dd3a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -88,6 +88,9 @@ struct pcie_tlp_log;
 #define PCI_BUS_BRIDGE_MEM_WINDOW	1
 #define PCI_BUS_BRIDGE_PREF_MEM_WINDOW	2
 
+#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
+				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
+
 extern const unsigned char pcie_link_speed[];
 extern bool pci_early_dump;
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e0bcaa896803c..9472d86cef552 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -239,9 +239,6 @@ void pcie_ecrc_get_policy(char *str)
 }
 #endif	/* CONFIG_PCIE_ECRC */
 
-#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
-				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
-
 int pcie_aer_is_native(struct pci_dev *dev)
 {
 	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 13:09 ` [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
@ 2026-01-22 13:45   ` Ilpo Järvinen
  2026-01-22 15:53     ` Haakon Bugge
  2026-01-27 21:58     ` Bjorn Helgaas
  2026-01-23 13:05   ` Niklas Schnelle
  2026-01-27 22:11   ` Bjorn Helgaas
  2 siblings, 2 replies; 16+ messages in thread
From: Ilpo Järvinen @ 2026-01-22 13:45 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Bjorn Helgaas, Niklas Schnelle, Alex Williamson,
	Johannes Thumshirn, linux-pci, LKML, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 4155 bytes --]

On Thu, 22 Jan 2026, Håkon Bugge wrote:

> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> instructed program_hpx_type2() to set the RCB in an endpoint,
> although it's RC did not have the RCB bit set.
> 
> e42010d8207f fixed that by qualifying the setting of the RCB in the
> endpoint with the RC supporting an 128 byte RCB.
> 
> In retrospect, the program_hpx_type2() should only modify the AER
> bits, and stay away from fiddling with the Link Control Register.
> 
> Hence, we explicitly program the RCB from pci_configure_device().
> 
> According to PCIe r7.0, sec 7.5.3.7, RCB is only valid for Root Ports
> (where it is Read-Only), Bridges, and Endpoints. The bit is 'RsvdP'
> for Virtual Functions. Hence, for other cases than Bridges and Physical
> Endpoints, we bail out early from pci_configure_rcb().
> 
> If the Root Port's RCB cannot be determined, we do nothing.
> 
> If RCB is set in the Root Port and not in the device, we set it. If it
> is set in the device but not in the Root Port, we print an info
> message and reset it.
> 
> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> 
> ---
> 
> Note, that the current duplication of pcie_root_rcb_set() will be
> removed in the next commit.
> 
> v2 -> v3:
>    * Qualified the device types more strictly
>    * s/pcie_root_rcb_set/pcie_read_root_rcb/ and changed signature
>    * Do nothing if the RP's RCB cannot be determined
>    * Reset the device's RCB if not set in the RP
> ---
>  drivers/pci/probe.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 53 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 41183aed8f5d9..7165ac4065c97 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2410,6 +2410,58 @@ static void pci_configure_serr(struct pci_dev *dev)
>  	}
>  }
>  
> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
> +{
> +	struct pci_dev *rp = pcie_find_root_port(dev);
> +	u16 lnkctl;
> +
> +	if (!rp)
> +		return false;
> +
> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +
> +	*rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> +	return true;
> +}
> +
> +static void pci_configure_rcb(struct pci_dev *dev)
> +{
> +	u16 lnkctl;
> +	bool rcb;
> +
> +	/*
> +	 * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
> +	 * Ports (where it is read-only), Endpoints, and Bridges.  It
> +	 * may only be set for Endpoints and Bridges if it is set in
> +	 * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
> +	 * Functions. If the Root Port's RCB cannot be determined, we
> +	 * bail out.
> +	 */
> +	if (!pci_is_pcie(dev) ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
> +	    dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
> +		return;
> +
> +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +	if (rcb) {
> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +			return;
> +
> +		lnkctl |= PCI_EXP_LNKCTL_RCB;
> +	} else {
> +		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> +			return;
> +
> +		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> +		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> +	}
> +
> +	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);

So this sequence is effectively implementing this simple statement:

	pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
					   PCI_EXP_LNKCTL_RCB,
					   rcb ? PCI_EXP_LNKCTL_RCB : 0);

+ the print.

Is there a good reason why you want to avoid the write by using early 
returns?


I also wonder if those clear & set & clean_and_set interfaces should 
implement the write avoidance if it's an useful thing (callers should be 
checked they're not used for RW1C bits if that's implemented though).

--
 i.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 13:45   ` Ilpo Järvinen
@ 2026-01-22 15:53     ` Haakon Bugge
  2026-01-27 21:58     ` Bjorn Helgaas
  1 sibling, 0 replies; 16+ messages in thread
From: Haakon Bugge @ 2026-01-22 15:53 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Bjorn Helgaas, Niklas Schnelle, Alex Williamson,
	Johannes Thumshirn, linux-pci@vger.kernel.org, LKML,
	linux-acpi@vger.kernel.org



> On 22 Jan 2026, at 14:45, Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote:
> 
> On Thu, 22 Jan 2026, Håkon Bugge wrote:
> 
>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>> instructed program_hpx_type2() to set the RCB in an endpoint,
>> although it's RC did not have the RCB bit set.
>> 
>> e42010d8207f fixed that by qualifying the setting of the RCB in the
>> endpoint with the RC supporting an 128 byte RCB.
>> 
>> In retrospect, the program_hpx_type2() should only modify the AER
>> bits, and stay away from fiddling with the Link Control Register.
>> 
>> Hence, we explicitly program the RCB from pci_configure_device().
>> 
>> According to PCIe r7.0, sec 7.5.3.7, RCB is only valid for Root Ports
>> (where it is Read-Only), Bridges, and Endpoints. The bit is 'RsvdP'
>> for Virtual Functions. Hence, for other cases than Bridges and Physical
>> Endpoints, we bail out early from pci_configure_rcb().
>> 
>> If the Root Port's RCB cannot be determined, we do nothing.
>> 
>> If RCB is set in the Root Port and not in the device, we set it. If it
>> is set in the device but not in the Root Port, we print an info
>> message and reset it.
>> 
>> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>> 
>> ---
>> 
>> Note, that the current duplication of pcie_root_rcb_set() will be
>> removed in the next commit.
>> 
>> v2 -> v3:
>>   * Qualified the device types more strictly
>>   * s/pcie_root_rcb_set/pcie_read_root_rcb/ and changed signature
>>   * Do nothing if the RP's RCB cannot be determined
>>   * Reset the device's RCB if not set in the RP
>> ---
>> drivers/pci/probe.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 53 insertions(+)
>> 
>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>> index 41183aed8f5d9..7165ac4065c97 100644
>> --- a/drivers/pci/probe.c
>> +++ b/drivers/pci/probe.c
>> @@ -2410,6 +2410,58 @@ static void pci_configure_serr(struct pci_dev *dev)
>> }
>> }
>> 
>> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
>> +{
>> + struct pci_dev *rp = pcie_find_root_port(dev);
>> + u16 lnkctl;
>> +
>> + if (!rp)
>> + return false;
>> +
>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>> +
>> + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>> + return true;
>> +}
>> +
>> +static void pci_configure_rcb(struct pci_dev *dev)
>> +{
>> + u16 lnkctl;
>> + bool rcb;
>> +
>> + /*
>> +  * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
>> +  * Ports (where it is read-only), Endpoints, and Bridges.  It
>> +  * may only be set for Endpoints and Bridges if it is set in
>> +  * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
>> +  * Functions. If the Root Port's RCB cannot be determined, we
>> +  * bail out.
>> +  */
>> + if (!pci_is_pcie(dev) ||
>> +     pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> +     pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>> +     pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> +     pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
>> +     dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
>> + return;
>> +
>> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>> + if (rcb) {
>> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
>> + return;
>> +
>> + lnkctl |= PCI_EXP_LNKCTL_RCB;
>> + } else {
>> + if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>> + return;
>> +
>> + pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>> + }
>> +
>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> 
> So this sequence is effectively implementing this simple statement:
> 
> pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>    PCI_EXP_LNKCTL_RCB,
>    rcb ? PCI_EXP_LNKCTL_RCB : 0);
> 
> + the print.

Well, not exactly, as there will be no writes unless required. This was discussed here [1].

> Is there a good reason why you want to avoid the write by using early 
> returns?

No other reasons but performance.

> I also wonder if those clear & set & clean_and_set interfaces should 
> implement the write avoidance if it's an useful thing (callers should be 
> checked they're not used for RW1C bits if that's implemented though).

That may be a good idea, but for sure, outside the scope of this series.


Thxs, Håkon

[1] https://lore.kernel.org/linux-pci/ECE29E32-7925-44C3-BAAA-B16003E9E997@oracle.com/



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 13:09 ` [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
  2026-01-22 13:45   ` Ilpo Järvinen
@ 2026-01-23 13:05   ` Niklas Schnelle
  2026-01-23 17:38     ` Haakon Bugge
  2026-01-27 22:11   ` Bjorn Helgaas
  2 siblings, 1 reply; 16+ messages in thread
From: Niklas Schnelle @ 2026-01-23 13:05 UTC (permalink / raw)
  To: Håkon Bugge, Bjorn Helgaas
  Cc: Alex Williamson, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi

On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> instructed program_hpx_type2() to set the RCB in an endpoint,
> although it's RC did not have the RCB bit set.
> 
--- snip ---
>  
> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
> +{
> +	struct pci_dev *rp = pcie_find_root_port(dev);
> +	u16 lnkctl;
> +
> +	if (!rp)
> +		return false;
> +
> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +
> +	*rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> +	return true;
> +}

In principle this is one of these things where platforms like s390
where the root port is hidden (only s390?) might want a hook to use
firmware supplied information to determine if the hidden root port
supports the setting. I wonder if this would make sense as a __weak
pcibios_read_rcb_supported() function. Not saying we need this now,
just thinking out loud.

> +
> +static void pci_configure_rcb(struct pci_dev *dev)
> +{
> +	u16 lnkctl;
> +	bool rcb;
> +
> +	/*
> +	 * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
> +	 * Ports (where it is read-only), Endpoints, and Bridges.  It
> +	 * may only be set for Endpoints and Bridges if it is set in
> +	 * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
> +	 * Functions. If the Root Port's RCB cannot be determined, we
> +	 * bail out.
> +	 */
> +	if (!pci_is_pcie(dev) ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
> +	    dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
> +		return;

This solves the concern Bjorn had for hidden root ports in VMs and I
confirmed that the check correctly bails on s390 even for PFs. Thanks!

> +
> +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +	if (rcb) {
> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +			return;
> +
> +		lnkctl |= PCI_EXP_LNKCTL_RCB;
> +	} else {
> +		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> +			return;
> +
> +		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");

The FW_INFO in here seems to be a remnant from ACPI code. As far as I
know this isn't usually used in core PCI code and seems conceptually
misleading to me since this doesn't come out of ACPI or other firmware
code.

> +		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> +	}
> +
> +	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);

I do see Ilpo's point about pcie_capability_clear_and_set_word() and
agree that it would look cleaner. For good measure I tested with that
variant too and do prefer it, especially since it also gets rid of the
lnkctl variable. On ther other hand the message might help identify
weird firmware behavior. So no strong preference from my side.

> +}
> +
>  static void pci_configure_device(struct pci_dev *dev)
>  {
>  	pci_configure_mps(dev);
> @@ -2419,6 +2471,7 @@ static void pci_configure_device(struct pci_dev *dev)
>  	pci_configure_aspm_l1ss(dev);
>  	pci_configure_eetlp_prefix(dev);
>  	pci_configure_serr(dev);
> +	pci_configure_rcb(dev);
>  
>  	pci_acpi_program_hp_params(dev);
>  }

I tested that this patch series does not create problems on s390 and
would leave the RCB setting as is if our firmware set it. Since the
second patch doesn't touch code that is build on s390 I think the
Tested-by only makes sense for this one.

So feel free to add my:

Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>

Furthermore with either the FW_INFO dropped or Ilpo's suggestion
incorporated feel free to also add:

Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-23 13:05   ` Niklas Schnelle
@ 2026-01-23 17:38     ` Haakon Bugge
  2026-01-23 18:54       ` Niklas Schnelle
  0 siblings, 1 reply; 16+ messages in thread
From: Haakon Bugge @ 2026-01-23 17:38 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Alex Williamson, Johannes Thumshirn,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org

>> On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>> instructed program_hpx_type2() to set the RCB in an endpoint,
>> although it's RC did not have the RCB bit set.
>> 
>> --- snip ---
>> 
>> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
>> +{
>> +	struct pci_dev *rp = pcie_find_root_port(dev);
>> +	u16 lnkctl;
>> +
>> +	if (!rp)
>> +		return false;
>> +
>> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>> +
>> +	*rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>> +	return true;
>> +}
> 
> In principle this is one of these things where platforms like s390
> where the root port is hidden (only s390?) might want a hook to use
> firmware supplied information to determine if the hidden root port
> supports the setting. I wonder if this would make sense as a __weak
> pcibios_read_rcb_supported() function. Not saying we need this now,
> just thinking out loud.

That may be a good idea. But I am not quite sure how we can find the Root Port from an "orphan" bridge or endpoint through the pci_bios_read set of interfaces.

>> +
>> +static void pci_configure_rcb(struct pci_dev *dev)
>> +{
>> +	u16 lnkctl;
>> +	bool rcb;
>> +
>> +	/*
>> +	 * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root
>> +	 * Ports (where it is read-only), Endpoints, and Bridges.  It
>> +	 * may only be set for Endpoints and Bridges if it is set in
>> +	 * the Root Port. For Endpoints, it is 'RsvdP' for Virtual
>> +	 * Functions. If the Root Port's RCB cannot be determined, we
>> +	 * bail out.
>> +	 */
>> +	if (!pci_is_pcie(dev) ||
>> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
>> +	    dev->is_virtfn || !pcie_read_root_rcb(dev, &rcb))
>> +		return;
> 
> This solves the concern Bjorn had for hidden root ports in VMs and I
> confirmed that the check correctly bails on s390 even for PFs. Thanks!

Thanks for confirming!

>> +
>> +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>> +	if (rcb) {
>> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
>> +			return;
>> +
>> +		lnkctl |= PCI_EXP_LNKCTL_RCB;
>> +	} else {
>> +		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>> +			return;
>> +
>> +		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> 
> The FW_INFO in here seems to be a remnant from ACPI code. As far as I
> know this isn't usually used in core PCI code and seems conceptually
> misleading to me since this doesn't come out of ACPI or other firmware
> code.

I humbly disagree. As per PCIe r7.0, sec 7.5.3.7: "Default value of this bit is 0b". So, if we find it set, and it is not set in the Root Port, to me, it must be a firmware bug. And that is exactly what FW_INFO is intended for: "Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs" (commit a0ad05c75aa3).

>> +		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>> +	}
>> +
>> +	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> 
> I do see Ilpo's point about pcie_capability_clear_and_set_word() and
> agree that it would look cleaner. For good measure I tested with that
> variant too and do prefer it, especially since it also gets rid of the
> lnkctl variable. On ther other hand the message might help identify
> weird firmware behavior. So no strong preference from my side.

OK.

>> +}
>> +
>> static void pci_configure_device(struct pci_dev *dev)
>> {
>> 	pci_configure_mps(dev);
>> @@ -2419,6 +2471,7 @@ static void pci_configure_device(struct pci_dev *dev)
>> 	pci_configure_aspm_l1ss(dev);
>> 	pci_configure_eetlp_prefix(dev);
>> 	pci_configure_serr(dev);
>> +	pci_configure_rcb(dev);
>> 
>> 	pci_acpi_program_hp_params(dev);
>> }
> 
> I tested that this patch series does not create problems on s390 and
> would leave the RCB setting as is if our firmware set it. Since the
> second patch doesn't touch code that is build on s390 I think the
> Tested-by only makes sense for this one.
> 
> So feel free to add my:
> 
> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>

Thanks a lot for the testing!

> Furthermore with either the FW_INFO dropped or Ilpo's suggestion
> incorporated feel free to also add:
> 
> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>

Thanks for the conditional r-b. I'll like Bjorn to chime in here, as he introduced FW_INFO and did not object to the existing (non Ilpo variant).


Thxs, Håkon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-23 17:38     ` Haakon Bugge
@ 2026-01-23 18:54       ` Niklas Schnelle
  2026-01-27 17:28         ` Haakon Bugge
  0 siblings, 1 reply; 16+ messages in thread
From: Niklas Schnelle @ 2026-01-23 18:54 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Bjorn Helgaas, Alex Williamson, Johannes Thumshirn,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org

On Fri, 2026-01-23 at 17:38 +0000, Haakon Bugge wrote:
> > > On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
> > > Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> > > Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> > > instructed program_hpx_type2() to set the RCB in an endpoint,
> > > although it's RC did not have the RCB bit set.
> > > 
> > > --- snip ---
> > > 
> > > +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
> > > +{
> > > +	struct pci_dev *rp = pcie_find_root_port(dev);
> > > +	u16 lnkctl;
> > > +
> > > +	if (!rp)
> > > +		return false;
> > > +
> > > +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> > > +
> > > +	*rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> > > +	return true;
> > > +}
> > 
> > In principle this is one of these things where platforms like s390
> > where the root port is hidden (only s390?) might want a hook to use
> > firmware supplied information to determine if the hidden root port
> > supports the setting. I wonder if this would make sense as a __weak
> > pcibios_read_rcb_supported() function. Not saying we need this now,
> > just thinking out loud.
> 
> That may be a good idea. But I am not quite sure how we can find the Root Port from an "orphan" bridge or endpoint through the pci_bios_read set of interfaces.

I only meant to have it as a function in a similar manner to e.g.
pcibios_enable_device() that we could override. But I don't think that
needs or even should be part of this patch as there wouldn't be a user
of the override yet.

> 
> > > +
> > > +static void pci_configure_rcb(struct pci_dev *dev)
> 
--- snip ---
> > > +
> > > +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> > > +	if (rcb) {
> > > +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> > > +			return;
> > > +
> > > +		lnkctl |= PCI_EXP_LNKCTL_RCB;
> > > +	} else {
> > > +		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> > > +			return;
> > > +
> > > +		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> > 
> > The FW_INFO in here seems to be a remnant from ACPI code. As far as I
> > know this isn't usually used in core PCI code and seems conceptually
> > misleading to me since this doesn't come out of ACPI or other firmware
> > code.
> 
> I humbly disagree. As per PCIe r7.0, sec 7.5.3.7: "Default value of this bit is 0b". So, if we find it set, and it is not set in the Root Port, to me, it must be a firmware bug. And that is exactly what FW_INFO is intended for: "Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs" (commit a0ad05c75aa3).

Ah thanks for the pointer, I wasn't aware of this explicit "for
reporting FW issues" use. Reading commit a0ad05c75aa3 ("Introduce
FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs")
now I agree this makes sense after all.

> 
> > > +		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> > > +	}
> > > +
> > > +	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> > 
--- snip ---
> > 
> > I tested that this patch series does not create problems on s390 and
> > would leave the RCB setting as is if our firmware set it. Since the
> > second patch doesn't touch code that is build on s390 I think the
> > Tested-by only makes sense for this one.
> > 
> > So feel free to add my:
> > 
> > Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
> 
> Thanks a lot for the testing!
> 
> > Furthermore with either the FW_INFO dropped or Ilpo's suggestion
> > incorporated feel free to also add:
> > 
> > Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
> 
> Thanks for the conditional r-b. I'll like Bjorn to chime in here, as he introduced FW_INFO and did not object to the existing (non Ilpo variant).

As stated above with the additional explanation this makes sense to me
now so no need for the conditional anymore either ;)

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-23 18:54       ` Niklas Schnelle
@ 2026-01-27 17:28         ` Haakon Bugge
  0 siblings, 0 replies; 16+ messages in thread
From: Haakon Bugge @ 2026-01-27 17:28 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Alex Williamson, Johannes Thumshirn,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org



> On 23 Jan 2026, at 19:54, Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> 
> On Fri, 2026-01-23 at 17:38 +0000, Haakon Bugge wrote:
>>>> On Thu, 2026-01-22 at 14:09 +0100, Håkon Bugge wrote:
>>>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>>>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>>>> instructed program_hpx_type2() to set the RCB in an endpoint,
>>>> although it's RC did not have the RCB bit set.
>>>> 
>>>> --- snip ---
>>>> 
>>>> +static bool pcie_read_root_rcb(struct pci_dev *dev, bool *rcb)
>>>> +{
>>>> + struct pci_dev *rp = pcie_find_root_port(dev);
>>>> + u16 lnkctl;
>>>> +
>>>> + if (!rp)
>>>> + return false;
>>>> +
>>>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>>>> +
>>>> + *rcb = !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>>>> + return true;
>>>> +}
>>> 
>>> In principle this is one of these things where platforms like s390
>>> where the root port is hidden (only s390?) might want a hook to use
>>> firmware supplied information to determine if the hidden root port
>>> supports the setting. I wonder if this would make sense as a __weak
>>> pcibios_read_rcb_supported() function. Not saying we need this now,
>>> just thinking out loud.
>> 
>> That may be a good idea. But I am not quite sure how we can find the Root Port from an "orphan" bridge or endpoint through the pci_bios_read set of interfaces.
> 
> I only meant to have it as a function in a similar manner to e.g.
> pcibios_enable_device() that we could override. But I don't think that
> needs or even should be part of this patch as there wouldn't be a user
> of the override yet.

Understood.

[snip]

>>> The FW_INFO in here seems to be a remnant from ACPI code. As far as I
>>> know this isn't usually used in core PCI code and seems conceptually
>>> misleading to me since this doesn't come out of ACPI or other firmware
>>> code.
>> 
>> I humbly disagree. As per PCIe r7.0, sec 7.5.3.7: "Default value of this bit is 0b". So, if we find it set, and it is not set in the Root Port, to me, it must be a firmware bug. And that is exactly what FW_INFO is intended for: "Introduce FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs" (commit a0ad05c75aa3).
> 
> Ah thanks for the pointer, I wasn't aware of this explicit "for
> reporting FW issues" use. Reading commit a0ad05c75aa3 ("Introduce
> FW_BUG, FW_WARN and FW_INFO to consistenly tell users about BIOS bugs")
> now I agree this makes sense after all.

I must admit, I wasn't either, before I read that commit :-)

>> 
>>>> + lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>>>> + }
>>>> +
>>>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>>> 
> --- snip ---
>>> 
>>> I tested that this patch series does not create problems on s390 and
>>> would leave the RCB setting as is if our firmware set it. Since the
>>> second patch doesn't touch code that is build on s390 I think the
>>> Tested-by only makes sense for this one.
>>> 
>>> So feel free to add my:
>>> 
>>> Tested-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> 
>> Thanks a lot for the testing!
>> 
>>> Furthermore with either the FW_INFO dropped or Ilpo's suggestion
>>> incorporated feel free to also add:
>>> 
>>> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com>
>> 
>> Thanks for the conditional r-b. I'll like Bjorn to chime in here, as he introduced FW_INFO and did not object to the existing (non Ilpo variant).
> 
> As stated above with the additional explanation this makes sense to me
> now so no need for the conditional anymore either ;)

Thanks for both your t-b and r-b on this commit!


Thxs, Håkon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 13:45   ` Ilpo Järvinen
  2026-01-22 15:53     ` Haakon Bugge
@ 2026-01-27 21:58     ` Bjorn Helgaas
  1 sibling, 0 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2026-01-27 21:58 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Håkon Bugge, Bjorn Helgaas, Niklas Schnelle, Alex Williamson,
	Johannes Thumshirn, linux-pci, LKML, linux-acpi

On Thu, Jan 22, 2026 at 03:45:58PM +0200, Ilpo Järvinen wrote:
> On Thu, 22 Jan 2026, Håkon Bugge wrote:
> 
> > Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> > Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> > instructed program_hpx_type2() to set the RCB in an endpoint,
> > although it's RC did not have the RCB bit set.

> > +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> > +	if (rcb) {
> > +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> > +			return;
> > +
> > +		lnkctl |= PCI_EXP_LNKCTL_RCB;
> > +	} else {
> > +		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> > +			return;
> > +
> > +		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> > +		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> > +	}
> > +
> > +	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> 
> So this sequence is effectively implementing this simple statement:
> 
> 	pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> 					   PCI_EXP_LNKCTL_RCB,
> 					   rcb ? PCI_EXP_LNKCTL_RCB : 0);
> 
> + the print.
> 
> Is there a good reason why you want to avoid the write by using early 
> returns?

Good question, pcie_capability_clear_and_set_word() is much more
readable.

> I also wonder if those clear & set & clean_and_set interfaces should 
> implement the write avoidance if it's an useful thing (callers should be 
> checked they're not used for RW1C bits if that's implemented though).

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 13:09 ` [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
  2026-01-22 13:45   ` Ilpo Järvinen
  2026-01-23 13:05   ` Niklas Schnelle
@ 2026-01-27 22:11   ` Bjorn Helgaas
  2026-01-28 17:08     ` Haakon Bugge
  2 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2026-01-27 22:11 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Bjorn Helgaas, Niklas Schnelle, Alex Williamson,
	Johannes Thumshirn, linux-pci, linux-kernel, linux-acpi

On Thu, Jan 22, 2026 at 02:09:53PM +0100, Håkon Bugge wrote:
> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> instructed program_hpx_type2() to set the RCB in an endpoint,
> although it's RC did not have the RCB bit set.
> ...

> +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +	if (rcb) {
> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +			return;
> +
> +		lnkctl |= PCI_EXP_LNKCTL_RCB;
> +	} else {
> +		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> +			return;
> +
> +		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");

I know I suggested all this code and the message, but I'm not sure
it's worth it.  If the device doesn't work, that will be obvious
anyway, so this all feels over-engineered.

> +		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> +	}
> +
> +	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> +}

What do you think about this?

    PCI: Initialize RCB from pci_configure_device()
    
    Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root
    Port supports it (_HPX)") worked around a bogus _HPX type 2 record, which
    caused program_hpx_type2() to set the RCB in an endpoint even though the
    Root Port did not have the RCB bit set.
    
    e42010d8207f fixed that by setting the RCB in the endpoint only when it was
    set in the Root Port.
    
    In retrospect, program_hpx_type2() is intended for AER-related settings,
    and the RCB should be configured elsewhere so it doesn't depend on the
    presence or contents of an _HPX record.
    
    Explicitly program the RCB from pci_configure_device() so it matches the
    Root Port's RCB.  The Root Port may not be visible to virtualized guests;
    in that case, leave RCB alone.
    
    Fixes: e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d..8571c7c6e1a0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2410,6 +2410,37 @@ static void pci_configure_serr(struct pci_dev *dev)
 	}
 }
 
+static void pci_configure_rcb(struct pci_dev *dev)
+{
+	struct pci_dev *rp;
+	u16 rp_lnkctl;
+
+	/*
+	 * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
+	 * (where it is read-only), Endpoints, and Bridges.  It may only be
+	 * set for Endpoints and Bridges if it is set in the Root Port. For
+	 * Endpoints, it is 'RsvdP' for Virtual Functions.
+	 */
+	if (!pci_is_pcie(dev) ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
+	    pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
+	    dev->is_virtfn)
+		return;
+
+	/* Root Port often not visible to virtualized guests */
+	rp = pcie_find_root_port(dev);
+	if (!rp)
+		return;
+
+	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &rp_lnkctl);
+	pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
+					   PCI_EXP_LNKCTL_RCB,
+					   (rp_lnkctl & PCI_EXP_LNKCTL_RCB) ?
+					   PCI_EXP_LNKCTL_RCB : 0);
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	pci_configure_mps(dev);
@@ -2419,6 +2450,7 @@ static void pci_configure_device(struct pci_dev *dev)
 	pci_configure_aspm_l1ss(dev);
 	pci_configure_eetlp_prefix(dev);
 	pci_configure_serr(dev);
+	pci_configure_rcb(dev);
 
 	pci_acpi_program_hp_params(dev);
 }

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits
  2026-01-22 13:09 ` [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits Håkon Bugge
@ 2026-01-27 22:24   ` Bjorn Helgaas
  2026-01-28 17:19     ` Haakon Bugge
  0 siblings, 1 reply; 16+ messages in thread
From: Bjorn Helgaas @ 2026-01-27 22:24 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Bjorn Helgaas, Niklas Schnelle, Rafael J. Wysocki, Len Brown,
	Mahesh J Salgaonkar, Oliver O'Halloran, Greg Kroah-Hartman,
	Kenji Kaneshige, Alex Williamson, Johannes Thumshirn, linux-pci,
	linux-kernel, linux-acpi, linuxppc-dev

On Thu, Jan 22, 2026 at 02:09:54PM +0100, Håkon Bugge wrote:
> program_hpx_type2() is today unconditionally called, despite the fact
> that when the _HPX was added to the ACPI spec. v3.0, the description
> stated:
> 
>  OSPM [1] will only evaluate _HPX with Setting Record – Type 2 if OSPM
>  is not controlling the PCI Express Advanced Error Reporting
>  capability.
> 
> Hence, we only call program_hpx_type2() when the OSPM owns the PCIe
> hotplug capability but not the AER.
> 
> The Advanced Configuration and Power Interface (ACPI) Specification
> version 6.6 has a provision that gives the OSPM the ability to control
> the other PCIe Device Control bits any way. In a note in section
> 6.2.9, it is stated:
> 
> "OSPM may override the settings provided by the _HPX object's Type2
> record (PCI Express Settings) or Type3 record (PCI Express Descriptor
> Settings) when OSPM has assumed native control of the corresponding
> feature."
> 
> So, in order to preserve the non-AER bits in PCIe Device Control, in
> particular the performance sensitive ExtTag and RO, we make sure
> program_hpx_type2() if called, doesn't modify any non-AER bits.
> 
> Also, when program_hpx_type2() is called, we completely avoid
> modifying any bits in the Link Control register. However, if the _HPX
> type 2 records contains bits indicating such modifications, we print
> an info message.
> 
> [1] Operating System-directed configuration and Power Management

I looked at the specs again and pulled out some more details.  Here's
what seemed relevant to me:

    PCI/ACPI: Restrict program_hpx_type2() to AER bits
    
    Previously program_hpx_type2() applied PCIe settings unconditionally, which
    could incorrectly change bits like Extended Tag Field Enable and Enable
    Relaxed Ordering.
    
    When _HPX was added to ACPI r3.0, the intent of the PCIe Setting Record
    (Type 2) in sec 6.2.7.3 was to configure AER registers when the OS does not
    own the AER Capability:
    
      The PCI Express setting record contains ... [the AER] Uncorrectable Error
      Mask, Uncorrectable Error Severity, Correctable Error Mask ... to be used
      when configuring registers in the Advanced Error Reporting Extended
      Capability Structure ...
    
      OSPM will only evaluate _HPX with Setting Record – Type 2 if OSPM is not
      controlling the PCI Express Advanced Error Reporting capability.
    
    ACPI r3.0b, sec 6.2.7.3, added more AER registers, including registers in
    the PCIe Capability with AER-related bits, and the restriction that the OS
    use this only when it owns PCIe native hotplug:
    
      ... when configuring PCI Express registers in the Advanced Error
      Reporting Extended Capability Structure *or PCI Express Capability
      Structure* ...
    
      An OS that has assumed ownership of native hot plug but does not ... have
      ownership of the AER register set must use ... the Type 2 record to
      program the AER registers ...
    
      However, since the Type 2 record also includes register bits that have
      functions other than AER, the OS must ignore values ... that are not
      applicable.
    
    Restrict program_hpx_type2() to only the intended purpose:
    
      - Apply settings only when OS owns PCIe native hotplug but not AER,
    
      - Only touch the AER-related bits (Error Reporting Enables) in Device
        Control
    
      - Don't touch Link Control at all, since nothing there seems AER-related,
        but log _HPX settings for debugging purposes
    
    Note that Read Completion Boundary is now configured elsewhere, since it is
    unrelated to _HPX.

> +	/* Log if _HPX attempts to modify PCIe Link Control register */
>  	if (pcie_cap_has_lnkctl(dev)) {
> -
> -		/*
> -		 * If the Root Port supports Read Completion Boundary of
> -		 * 128, set RCB to 128.  Otherwise, clear it.
> -		 */
> -		hpx->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
> -		hpx->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
> -		if (pcie_root_rcb_set(dev))
> -			hpx->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
> -
> -		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> -			~hpx->pci_exp_lnkctl_and, hpx->pci_exp_lnkctl_or);
> +		if (hpx->pci_exp_lnkctl_and)
> +			pci_info(dev,
> +				 "_HPX attempts to reset the following bits in PCIe Link Control: 0x%04x\n",
> +				 hpx->pci_exp_lnkctl_and);
> +		if (hpx->pci_exp_lnkctl_or)
> +			pci_info(dev,
> +				 "_HPX attempts to set the following bits in PCIe Link Control: 0x%04x\n",
> +				 hpx->pci_exp_lnkctl_or);

Again I'm afraid I suggested some over-engineering, and even worse, I
misinterpreted the pci_exp_lnkctl_and and pci_exp_lnkctl_or usage when
I said "if pci_exp_lnkctl_and or pci_exp_lnkctl_or are *non-zero*".

Per spec, we are supposed to AND the register with pci_exp_lnkctl_and,
so the interesting value would be anything other than 0xffff.  Since
we OR it with pci_exp_lnkctl_or, the interesting values there would be
anything non-zero.  So I think what we would want is something like
this:

+	/* Log if _HPX attempts to modify PCIe Link Control register */
 	if (pcie_cap_has_lnkctl(dev)) {
-
-		/*
-		 * If the Root Port supports Read Completion Boundary of
-		 * 128, set RCB to 128.  Otherwise, clear it.
-		 */
-		hpx->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
-		hpx->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
-		if (pcie_root_rcb_set(dev))
-			hpx->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
-
-		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
-			~hpx->pci_exp_lnkctl_and, hpx->pci_exp_lnkctl_or);
+		if (hpx->pci_exp_lnkctl_and != 0xffff ||
+		    hpx->pci_exp_lnkctl_or != 0)
+			pci_info(dev, "_HPX attempts Link Control setting (AND %#06x OR %#06x)\n",
+				 hpx->pci_exp_lnkctl_and,
+				 hpx->pci_exp_lnkctl_or);
 	}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-27 22:11   ` Bjorn Helgaas
@ 2026-01-28 17:08     ` Haakon Bugge
  2026-01-28 17:20       ` Bjorn Helgaas
  0 siblings, 1 reply; 16+ messages in thread
From: Haakon Bugge @ 2026-01-28 17:08 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Niklas Schnelle, Alex Williamson,
	Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org

> On Thu, Jan 22, 2026 at 02:09:53PM +0100, Håkon Bugge wrote:
>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>> instructed program_hpx_type2() to set the RCB in an endpoint,
>> although it's RC did not have the RCB bit set.
>> ...
>> 
>> +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>> +	if (rcb) {
>> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
>> +			return;
>> +
>> +		lnkctl |= PCI_EXP_LNKCTL_RCB;
>> +	} else {
>> +		if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>> +			return;
>> +
>> +		pci_info(dev, FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> 
> I know I suggested all this code and the message, but I'm not sure
> it's worth it.  If the device doesn't work,

I see your point. If the situation my commit is fixing has been there, the system would have failed, and a fix to the firmware must have been applied. Hence, so need to fix it in the OS.

> that will be obvious
> anyway, so this all feels over-engineered.
> 
>> +		lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>> +	}
>> +
>> +	pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>> +}
> 
> What do you think about this?
> 
>    PCI: Initialize RCB from pci_configure_device()
> 
>    Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root
>    Port supports it (_HPX)") worked around a bogus _HPX type 2 record, which
>    caused program_hpx_type2() to set the RCB in an endpoint even though the
>    Root Port did not have the RCB bit set.
> 
>    e42010d8207f fixed that by setting the RCB in the endpoint only when it was
>    set in the Root Port.
> 
>    In retrospect, program_hpx_type2() is intended for AER-related settings,
>    and the RCB should be configured elsewhere so it doesn't depend on the
>    presence or contents of an _HPX record.
> 
>    Explicitly program the RCB from pci_configure_device() so it matches the
>    Root Port's RCB.  The Root Port may not be visible to virtualized guests;
>    in that case, leave RCB alone.
> 
>    Fixes: e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")

Crisp and clear. For this and the other commit, is it OK that I add you as a co-developer? Aka:

Co-developed-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

?


> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 41183aed8f5d..8571c7c6e1a0 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2410,6 +2410,37 @@ static void pci_configure_serr(struct pci_dev *dev)
> 	}
> }
> 
> +static void pci_configure_rcb(struct pci_dev *dev)
> +{
> +	struct pci_dev *rp;
> +	u16 rp_lnkctl;
> +
> +	/*
> +	 * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
> +	 * (where it is read-only), Endpoints, and Bridges.  It may only be
> +	 * set for Endpoints and Bridges if it is set in the Root Port. For
> +	 * Endpoints, it is 'RsvdP' for Virtual Functions.
> +	 */
> +	if (!pci_is_pcie(dev) ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> +	    pci_pcie_type(dev) == PCI_EXP_TYPE_RC_EC ||
> +	    dev->is_virtfn)
> +		return;
> +
> +	/* Root Port often not visible to virtualized guests */
> +	rp = pcie_find_root_port(dev);
> +	if (!rp)
> +		return;
> +
> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &rp_lnkctl);
> +	pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> +					   PCI_EXP_LNKCTL_RCB,
> +					   (rp_lnkctl & PCI_EXP_LNKCTL_RCB) ?
> +					   PCI_EXP_LNKCTL_RCB : 0);

Looks good to me! This will enforce the locked flavour of pcie_capability_clear_and_set_word(). Is that an overkill?

Again, thank for the effort you put into this, ,Bjorn!


Thxs, Håkon



> +}
> +
> static void pci_configure_device(struct pci_dev *dev)
> {
> 	pci_configure_mps(dev);
> @@ -2419,6 +2450,7 @@ static void pci_configure_device(struct pci_dev *dev)
> 	pci_configure_aspm_l1ss(dev);
> 	pci_configure_eetlp_prefix(dev);
> 	pci_configure_serr(dev);
> +	pci_configure_rcb(dev);
> 
> 	pci_acpi_program_hp_params(dev);
> }

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits
  2026-01-27 22:24   ` Bjorn Helgaas
@ 2026-01-28 17:19     ` Haakon Bugge
  2026-01-29 16:36       ` Haakon Bugge
  0 siblings, 1 reply; 16+ messages in thread
From: Haakon Bugge @ 2026-01-28 17:19 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Niklas Schnelle, Rafael J. Wysocki, Len Brown,
	Mahesh J Salgaonkar, Oliver O'Halloran, Greg Kroah-Hartman,
	Kenji Kaneshige, Alex Williamson, Johannes Thumshirn,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, linuxppc-dev@lists.ozlabs.org

> On Thu, Jan 22, 2026 at 02:09:54PM +0100, Håkon Bugge wrote:
>> program_hpx_type2() is today unconditionally called, despite the fact
>> that when the _HPX was added to the ACPI spec. v3.0, the description
>> stated:
>> 
>> OSPM [1] will only evaluate _HPX with Setting Record – Type 2 if OSPM
>> is not controlling the PCI Express Advanced Error Reporting
>> capability.
>> 
>> Hence, we only call program_hpx_type2() when the OSPM owns the PCIe
>> hotplug capability but not the AER.
>> 
>> The Advanced Configuration and Power Interface (ACPI) Specification
>> version 6.6 has a provision that gives the OSPM the ability to control
>> the other PCIe Device Control bits any way. In a note in section
>> 6.2.9, it is stated:
>> 
>> "OSPM may override the settings provided by the _HPX object's Type2
>> record (PCI Express Settings) or Type3 record (PCI Express Descriptor
>> Settings) when OSPM has assumed native control of the corresponding
>> feature."
>> 
>> So, in order to preserve the non-AER bits in PCIe Device Control, in
>> particular the performance sensitive ExtTag and RO, we make sure
>> program_hpx_type2() if called, doesn't modify any non-AER bits.
>> 
>> Also, when program_hpx_type2() is called, we completely avoid
>> modifying any bits in the Link Control register. However, if the _HPX
>> type 2 records contains bits indicating such modifications, we print
>> an info message.
>> 
>> [1] Operating System-directed configuration and Power Management
> 
> I looked at the specs again and pulled out some more details.  Here's
> what seemed relevant to me:
> 
>    PCI/ACPI: Restrict program_hpx_type2() to AER bits
> 
>    Previously program_hpx_type2() applied PCIe settings unconditionally, which
>    could incorrectly change bits like Extended Tag Field Enable and Enable
>    Relaxed Ordering.
> 
>    When _HPX was added to ACPI r3.0, the intent of the PCIe Setting Record
>    (Type 2) in sec 6.2.7.3 was to configure AER registers when the OS does not
>    own the AER Capability:
> 
>      The PCI Express setting record contains ... [the AER] Uncorrectable Error
>      Mask, Uncorrectable Error Severity, Correctable Error Mask ... to be used
>      when configuring registers in the Advanced Error Reporting Extended
>      Capability Structure ...
> 
>      OSPM will only evaluate _HPX with Setting Record – Type 2 if OSPM is not
>      controlling the PCI Express Advanced Error Reporting capability.
> 
>    ACPI r3.0b, sec 6.2.7.3, added more AER registers, including registers in
>    the PCIe Capability with AER-related bits, and the restriction that the OS
>    use this only when it owns PCIe native hotplug:
> 
>      ... when configuring PCI Express registers in the Advanced Error
>      Reporting Extended Capability Structure *or PCI Express Capability
>      Structure* ...
> 
>      An OS that has assumed ownership of native hot plug but does not ... have
>      ownership of the AER register set must use ... the Type 2 record to
>      program the AER registers ...
> 
>      However, since the Type 2 record also includes register bits that have
>      functions other than AER, the OS must ignore values ... that are not
>      applicable.
> 
>    Restrict program_hpx_type2() to only the intended purpose:
> 
>      - Apply settings only when OS owns PCIe native hotplug but not AER,
> 
>      - Only touch the AER-related bits (Error Reporting Enables) in Device
>        Control
> 
>      - Don't touch Link Control at all, since nothing there seems AER-related,
>        but log _HPX settings for debugging purposes
> 
>    Note that Read Completion Boundary is now configured elsewhere, since it is
>    unrelated to _HPX.

Thanks for the word-smithing and improved accuracy!

>> +	/* Log if _HPX attempts to modify PCIe Link Control register */
>> 	if (pcie_cap_has_lnkctl(dev)) {
>> -
>> -		/*
>> -		 * If the Root Port supports Read Completion Boundary of
>> -		 * 128, set RCB to 128.  Otherwise, clear it.
>> -		 */
>> -		hpx->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
>> -		hpx->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
>> -		if (pcie_root_rcb_set(dev))
>> -			hpx->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
>> -
>> -		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>> -			~hpx->pci_exp_lnkctl_and, hpx->pci_exp_lnkctl_or);

This was what confused me a lot, the bit-wise NOT above. That must be wrong, as pcie_capability_clear_and_set_word() inverts the "clear" argument.

>> +		if (hpx->pci_exp_lnkctl_and)
>> +			pci_info(dev,
>> +				 "_HPX attempts to reset the following bits in PCIe Link Control: 0x%04x\n",
>> +				 hpx->pci_exp_lnkctl_and);
>> +		if (hpx->pci_exp_lnkctl_or)
>> +			pci_info(dev,
>> +				 "_HPX attempts to set the following bits in PCIe Link Control: 0x%04x\n",
>> +				 hpx->pci_exp_lnkctl_or);
> 
> Again I'm afraid I suggested some over-engineering, and even worse, I
> misinterpreted the pci_exp_lnkctl_and and pci_exp_lnkctl_or usage when
> I said "if pci_exp_lnkctl_and or pci_exp_lnkctl_or are *non-zero*".
> 
> Per spec, we are supposed to AND the register with pci_exp_lnkctl_and,
> so the interesting value would be anything other than 0xffff.  Since
> we OR it with pci_exp_lnkctl_or, the interesting values there would be
> anything non-zero.  So I think what we would want is something like
> this:
> 
> +	/* Log if _HPX attempts to modify PCIe Link Control register */
> 	if (pcie_cap_has_lnkctl(dev)) {
> -
> -		/*
> -		 * If the Root Port supports Read Completion Boundary of
> -		 * 128, set RCB to 128.  Otherwise, clear it.
> -		 */
> -		hpx->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
> -		hpx->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
> -		if (pcie_root_rcb_set(dev))
> -			hpx->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
> -
> -		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> -			~hpx->pci_exp_lnkctl_and, hpx->pci_exp_lnkctl_or);
> +		if (hpx->pci_exp_lnkctl_and != 0xffff ||
> +		    hpx->pci_exp_lnkctl_or != 0)
> +			pci_info(dev, "_HPX attempts Link Control setting (AND %#06x OR %#06x)\n",
> +				 hpx->pci_exp_lnkctl_and,
> +				 hpx->pci_exp_lnkctl_or);
> 	}

Since we do not want to fix incorrect firmware in this respect, the above makes perfect sense to me. A v4 is on its way.


Thxs, Håkon


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-28 17:08     ` Haakon Bugge
@ 2026-01-28 17:20       ` Bjorn Helgaas
  0 siblings, 0 replies; 16+ messages in thread
From: Bjorn Helgaas @ 2026-01-28 17:20 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Bjorn Helgaas, Niklas Schnelle, Alex Williamson,
	Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org

On Wed, Jan 28, 2026 at 05:08:23PM +0000, Haakon Bugge wrote:
> > On Thu, Jan 22, 2026 at 02:09:53PM +0100, Håkon Bugge wrote:
> >> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> >> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> >> instructed program_hpx_type2() to set the RCB in an endpoint,
> >> although it's RC did not have the RCB bit set.
> >> ...

> For this and the other commit, is it OK that I add
> you as a co-developer? Aka:
> 
> Co-developed-by: Bjorn Helgaas <helgaas@kernel.org>
> Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

No need for co-developed-by, IMO this is just part of normal patch
iteration. And I'll add my Signed-off-by when merging the patches.

> > +	pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
> > +					   PCI_EXP_LNKCTL_RCB,
> > +					   (rp_lnkctl & PCI_EXP_LNKCTL_RCB) ?
> > +					   PCI_EXP_LNKCTL_RCB : 0);
> 
> Looks good to me! This will enforce the locked flavour of
> pcie_capability_clear_and_set_word(). Is that an overkill?

Probably no need for locking in this instance, but it's a per-device
lock so there won't be any contention anyway.

Bjorn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits
  2026-01-28 17:19     ` Haakon Bugge
@ 2026-01-29 16:36       ` Haakon Bugge
  0 siblings, 0 replies; 16+ messages in thread
From: Haakon Bugge @ 2026-01-29 16:36 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Niklas Schnelle, Rafael J. Wysocki, Len Brown,
	Mahesh J Salgaonkar, Oliver O'Halloran, Greg Kroah-Hartman,
	Kenji Kaneshige, Alex Williamson, Johannes Thumshirn,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, linuxppc-dev@lists.ozlabs.org

>>> Thanks for the word-smithing and improved accuracy!
>>> 
>>> +	/* Log if _HPX attempts to modify PCIe Link Control register */
>>> 	if (pcie_cap_has_lnkctl(dev)) {
>>> -
>>> -		/*
>>> -		 * If the Root Port supports Read Completion Boundary of
>>> -		 * 128, set RCB to 128.  Otherwise, clear it.
>>> -		 */
>>> -		hpx->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
>>> -		hpx->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
>>> -		if (pcie_root_rcb_set(dev))
>>> -			hpx->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
>>> -
>>> -		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
>>> -			~hpx->pci_exp_lnkctl_and, hpx->pci_exp_lnkctl_or);
>>> 
> This was what confused me a lot, the bit-wise NOT above. That must be wrong, as pcie_capability_clear_and_set_word() inverts the "clear" argument.

Have to correct myself here. ACPI states:

When configuring a given register, OSPM uses the following algorithm:
1. Read the register’s current value, which contains the register’s default value.
2. Perform a bit-wise AND operation with the “AND mask” from the table below.
[]

Because pcie_capability_clear_and_set_word() inverts the "clear" argument, the above bitwise NOT is of course correct. Two bitwise NOTs is a no-op and we follow the ACPI outlined algorithm.


Sorry for the noise, Håkon


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-01-29 16:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-22 13:09 [PATCH v3 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2 Håkon Bugge
2026-01-22 13:09 ` [PATCH v3 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
2026-01-22 13:45   ` Ilpo Järvinen
2026-01-22 15:53     ` Haakon Bugge
2026-01-27 21:58     ` Bjorn Helgaas
2026-01-23 13:05   ` Niklas Schnelle
2026-01-23 17:38     ` Haakon Bugge
2026-01-23 18:54       ` Niklas Schnelle
2026-01-27 17:28         ` Haakon Bugge
2026-01-27 22:11   ` Bjorn Helgaas
2026-01-28 17:08     ` Haakon Bugge
2026-01-28 17:20       ` Bjorn Helgaas
2026-01-22 13:09 ` [PATCH v3 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits Håkon Bugge
2026-01-27 22:24   ` Bjorn Helgaas
2026-01-28 17:19     ` Haakon Bugge
2026-01-29 16:36       ` Haakon Bugge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox