public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2
@ 2026-01-21 11:35 Håkon Bugge
  2026-01-21 11:35 ` [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
  2026-01-21 11:35 ` [PATCH v2 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits Håkon Bugge
  0 siblings, 2 replies; 14+ messages in thread
From: Håkon Bugge @ 2026-01-21 11:35 UTC (permalink / raw)
  To: Bjorn Helgaas; +Cc: Johannes Thumshirn, linux-pci, linux-kernel, linux-acpi

This series add the initialization of the Link Control register's RCB
to pci_configure_rcb() called from pci_configure_device() and also
cleans up the incorrect program_hpx_type2():

1. It should only be called when we own the PCIe native hotplug and
   not the AER ownership
2. It should only manipulate the AER-bits

In addition, the second commit adds a warning if the _HPX type2
record attempts to modify the Link Control register.

Håkon Bugge (2):
  PCI: Initialize RCB from pci_configure_device
  PCI/ACPI: Confine program_hpx_type2 to the AER bits

 drivers/pci/pci-acpi.c | 61 +++++++++++++++++++-----------------------
 drivers/pci/pci.h      |  3 +++
 drivers/pci/pcie/aer.c |  3 ---
 drivers/pci/probe.c    | 27 +++++++++++++++++++
 4 files changed, 57 insertions(+), 37 deletions(-)

--
2.43.5


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-21 11:35 [PATCH v2 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2 Håkon Bugge
@ 2026-01-21 11:35 ` Håkon Bugge
  2026-01-21 17:41   ` Lukas Wunner
  2026-01-21 22:40   ` Bjorn Helgaas
  2026-01-21 11:35 ` [PATCH v2 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits Håkon Bugge
  1 sibling, 2 replies; 14+ messages in thread
From: Håkon Bugge @ 2026-01-21 11:35 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Johannes Thumshirn, linux-pci, linux-kernel, linux-acpi,
	Håkon Bugge

Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
instructed program_hpx_type2() to set the RCB in an endpoint,
although it's RC did not have the RCB bit set.

e42010d8207f fixed that by qualifying the setting of the RCB in the
endpoint with the RC supporting an 128 byte RCB.

In retrospect, the program_hpx_type2() should only modify the AER
bits, and stay away from fiddling with the Link Control Register.

Hence, we explicitly program the RCB from pci_configure_device(). RCB
is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
cases we skip programming it. Then, if the Root Port has RCB set and
it is not set in the EP, we set it.

Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

Note, that the current duplication of pcie_root_rcb_set() will be
removed in the next commit.
---
 drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d9..347af29868124 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
 	}
 }
 
+static bool pcie_root_rcb_set(struct pci_dev *dev)
+{
+	struct pci_dev *rp = pcie_find_root_port(dev);
+	u16 lnkctl;
+
+	if (!rp)
+		return false;
+
+	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
+
+	return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
+}
+
+static void pci_configure_rcb(struct pci_dev *dev)
+{
+	/*
+	 * Obviously, we need a Link Control register. The RCB is RO
+	 * in Root Ports, so no need to attempt to set it for
+	 * them. For VFs, the RCB is RsvdP, so, no need to set it.
+	 * Then, if the Root Port has RCB set, then we set for the EP
+	 * unless already set.
+	 */
+	if (pcie_cap_has_lnkctl(dev) &&
+	    (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
+	    !dev->is_virtfn && pcie_root_rcb_set(dev)) {
+		u16 lnkctl;
+
+		pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+		if (lnkctl & PCI_EXP_LNKCTL_RCB)
+			return;
+
+		pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);
+	}
+}
+
 static void pci_configure_device(struct pci_dev *dev)
 {
 	pci_configure_mps(dev);
@@ -2419,6 +2454,7 @@ static void pci_configure_device(struct pci_dev *dev)
 	pci_configure_aspm_l1ss(dev);
 	pci_configure_eetlp_prefix(dev);
 	pci_configure_serr(dev);
+	pci_configure_rcb(dev);
 
 	pci_acpi_program_hp_params(dev);
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits
  2026-01-21 11:35 [PATCH v2 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2 Håkon Bugge
  2026-01-21 11:35 ` [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
@ 2026-01-21 11:35 ` Håkon Bugge
  1 sibling, 0 replies; 14+ messages in thread
From: Håkon Bugge @ 2026-01-21 11:35 UTC (permalink / raw)
  To: Bjorn Helgaas, Rafael J. Wysocki, Len Brown, Mahesh J Salgaonkar,
	Oliver O'Halloran, Greg Kroah-Hartman, Kenji Kaneshige
  Cc: Johannes Thumshirn, linux-pci, linux-kernel, linux-acpi,
	Håkon Bugge, linuxppc-dev

program_hpx_type2() is today unconditionally called, despite the fact
that when the _HPX was added to the ACPI spec. v3.0, the description
stated:

 OSPM [1] will only evaluate _HPX with Setting Record – Type 2 if OSPM
 is not controlling the PCI Express Advanced Error Reporting
 capability.

Hence, we only call program_hpx_type2() when the OSPM owns the PCIe
hotplug capability but not the AER.

The Advanced Configuration and Power Interface (ACPI) Specification
version 6.6 has a provision that gives the OSPM the ability to control
the other PCIe Device Control bits any way. In a note in section
6.2.9, it is stated:

"OSPM may override the settings provided by the _HPX object's Type2
record (PCI Express Settings) or Type3 record (PCI Express Descriptor
Settings) when OSPM has assumed native control of the corresponding
feature."

So, in order to preserve the non-AER bits in PCIe Device Control, in
particular the performance sensitive ExtTag and RO, we make sure
program_hpx_type2() if called, doesn't modify any non-AER bits.

Also, when program_hpx_type2() is called, we completely avoid
modifying any bits in the Link Control register. However, if the _HPX
type 2 records contains bits indicating such modifications, we print
an info message.

[1] Operating System-directed configuration and Power Management

Fixes: 40abb96c51bb ("[PATCH] pciehp: Fix programming hotplug parameters")
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>

---

v1 -> v2:
   * Fixed comment style
   * Simplified the and/or logic when programming the Device Control
     register
   * Fixed the incorrect and brutal warning about Link Control
     register bits set and changed it to an info message about _HPX
     attempting to set/reset bits therein.
   * Removed the RCB programming from program_hpx_type2()
   * Moved the PCI_EXP_AER_FLAGS definition from
     drivers/pci/pcie/aer.c to drivers/pci/pci.h
---
 drivers/pci/pci-acpi.c | 61 +++++++++++++++++++-----------------------
 drivers/pci/pci.h      |  3 +++
 drivers/pci/pcie/aer.c |  3 ---
 3 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 9369377725fa0..34ea22f65a410 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -271,21 +271,6 @@ static acpi_status decode_type1_hpx_record(union acpi_object *record,
 	return AE_OK;
 }
 
-static bool pcie_root_rcb_set(struct pci_dev *dev)
-{
-	struct pci_dev *rp = pcie_find_root_port(dev);
-	u16 lnkctl;
-
-	if (!rp)
-		return false;
-
-	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
-	if (lnkctl & PCI_EXP_LNKCTL_RCB)
-		return true;
-
-	return false;
-}
-
 /* _HPX PCI Express Setting Record (Type 2) */
 struct hpx_type2 {
 	u32 revision;
@@ -311,6 +296,7 @@ static void program_hpx_type2(struct pci_dev *dev, struct hpx_type2 *hpx)
 {
 	int pos;
 	u32 reg32;
+	const struct pci_host_bridge *host;
 
 	if (!hpx)
 		return;
@@ -318,6 +304,15 @@ static void program_hpx_type2(struct pci_dev *dev, struct hpx_type2 *hpx)
 	if (!pci_is_pcie(dev))
 		return;
 
+	host = pci_find_host_bridge(dev->bus);
+
+	/*
+	 * We only do the HP programming if we own the PCIe native
+	 * hotplug and not the AER ownership
+	 */
+	if (!host->native_pcie_hotplug || host->native_aer)
+		return;
+
 	if (hpx->revision > 1) {
 		pci_warn(dev, "PCIe settings rev %d not supported\n",
 			 hpx->revision);
@@ -325,33 +320,31 @@ static void program_hpx_type2(struct pci_dev *dev, struct hpx_type2 *hpx)
 	}
 
 	/*
-	 * Don't allow _HPX to change MPS or MRRS settings.  We manage
-	 * those to make sure they're consistent with the rest of the
+	 * We only allow _HPX to program the AER registers, namely
+	 * PCI_EXP_DEVCTL_CERE, PCI_EXP_DEVCTL_NFERE,
+	 * PCI_EXP_DEVCTL_FERE, and PCI_EXP_DEVCTL_URRE.
+	 *
+	 * The other settings in PCIe DEVCTL are managed by OS in
+	 * order to make sure they're consistent with the rest of the
 	 * platform.
 	 */
-	hpx->pci_exp_devctl_and |= PCI_EXP_DEVCTL_PAYLOAD |
-				    PCI_EXP_DEVCTL_READRQ;
-	hpx->pci_exp_devctl_or &= ~(PCI_EXP_DEVCTL_PAYLOAD |
-				    PCI_EXP_DEVCTL_READRQ);
+	hpx->pci_exp_devctl_and |= ~PCI_EXP_AER_FLAGS;
+	hpx->pci_exp_devctl_or &= PCI_EXP_AER_FLAGS;
 
 	/* Initialize Device Control Register */
 	pcie_capability_clear_and_set_word(dev, PCI_EXP_DEVCTL,
 			~hpx->pci_exp_devctl_and, hpx->pci_exp_devctl_or);
 
-	/* Initialize Link Control Register */
+	/* Log if _HPX attempts to modify PCIe Link Control register */
 	if (pcie_cap_has_lnkctl(dev)) {
-
-		/*
-		 * If the Root Port supports Read Completion Boundary of
-		 * 128, set RCB to 128.  Otherwise, clear it.
-		 */
-		hpx->pci_exp_lnkctl_and |= PCI_EXP_LNKCTL_RCB;
-		hpx->pci_exp_lnkctl_or &= ~PCI_EXP_LNKCTL_RCB;
-		if (pcie_root_rcb_set(dev))
-			hpx->pci_exp_lnkctl_or |= PCI_EXP_LNKCTL_RCB;
-
-		pcie_capability_clear_and_set_word(dev, PCI_EXP_LNKCTL,
-			~hpx->pci_exp_lnkctl_and, hpx->pci_exp_lnkctl_or);
+		if (hpx->pci_exp_lnkctl_and)
+			pci_info(dev,
+				 "_HPX attempts to reset the following bits in PCIe Link Control: 0x%04x\n",
+				 hpx->pci_exp_lnkctl_and);
+		if (hpx->pci_exp_lnkctl_or)
+			pci_info(dev,
+				 "_HPX attempts to set the following bits in PCIe Link Control: 0x%04x\n",
+				 hpx->pci_exp_lnkctl_or);
 	}
 
 	/* Find Advanced Error Reporting Enhanced Capability */
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 0e67014aa0013..f388d4414dd3a 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -88,6 +88,9 @@ struct pcie_tlp_log;
 #define PCI_BUS_BRIDGE_MEM_WINDOW	1
 #define PCI_BUS_BRIDGE_PREF_MEM_WINDOW	2
 
+#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
+				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
+
 extern const unsigned char pcie_link_speed[];
 extern bool pci_early_dump;
 
diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index e0bcaa896803c..9472d86cef552 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -239,9 +239,6 @@ void pcie_ecrc_get_policy(char *str)
 }
 #endif	/* CONFIG_PCIE_ECRC */
 
-#define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
-				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
-
 int pcie_aer_is_native(struct pci_dev *dev)
 {
 	struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-21 11:35 ` [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
@ 2026-01-21 17:41   ` Lukas Wunner
  2026-01-21 19:13     ` Haakon Bugge
  2026-01-21 22:40   ` Bjorn Helgaas
  1 sibling, 1 reply; 14+ messages in thread
From: Lukas Wunner @ 2026-01-21 17:41 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi

On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> +	if (pcie_cap_has_lnkctl(dev) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> +	    !dev->is_virtfn && pcie_root_rcb_set(dev)) {
> +		u16 lnkctl;
> +
> +		pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +			return;
> +
> +		pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);

You may want to use pcie_capability_set_word() for brevity.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-21 17:41   ` Lukas Wunner
@ 2026-01-21 19:13     ` Haakon Bugge
  0 siblings, 0 replies; 14+ messages in thread
From: Haakon Bugge @ 2026-01-21 19:13 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org

> On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> +	if (pcie_cap_has_lnkctl(dev) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> +	    !dev->is_virtfn && pcie_root_rcb_set(dev)) {
> +		u16 lnkctl;
> +
> +		pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +			return;
> +
> +		pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);
> 
> You may want to use pcie_capability_set_word() for brevity.

Well, that would incur an additional pcie_capability_read_word(), so between brevity and performance, I chose performance. Another, probably better reason to use pcie_capability_set_word() is that it calls (for PCI_EXP_LNKCTL) pcie_capability_clear_and_set_word_locked(). However, I assume that during device probing, the locking is not needed.


Thxs, Håkon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-21 11:35 ` [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
  2026-01-21 17:41   ` Lukas Wunner
@ 2026-01-21 22:40   ` Bjorn Helgaas
  2026-01-22  9:42     ` Haakon Bugge
  2026-01-22 10:36     ` Bjorn Helgaas
  1 sibling, 2 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2026-01-21 22:40 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi

On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> instructed program_hpx_type2() to set the RCB in an endpoint,
> although it's RC did not have the RCB bit set.
> 
> e42010d8207f fixed that by qualifying the setting of the RCB in the
> endpoint with the RC supporting an 128 byte RCB.
> 
> In retrospect, the program_hpx_type2() should only modify the AER
> bits, and stay away from fiddling with the Link Control Register.
> 
> Hence, we explicitly program the RCB from pci_configure_device(). RCB
> is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
> cases we skip programming it. Then, if the Root Port has RCB set and
> it is not set in the EP, we set it.
> 
> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> 
> ---
> 
> Note, that the current duplication of pcie_root_rcb_set() will be
> removed in the next commit.
> ---
>  drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 36 insertions(+)
> 
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 41183aed8f5d9..347af29868124 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
>  	}
>  }
>  
> +static bool pcie_root_rcb_set(struct pci_dev *dev)
> +{
> +	struct pci_dev *rp = pcie_find_root_port(dev);
> +	u16 lnkctl;
> +
> +	if (!rp)
> +		return false;
> +
> +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> +
> +	return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> +}
> +
> +static void pci_configure_rcb(struct pci_dev *dev)
> +{
> +	/*
> +	 * Obviously, we need a Link Control register. The RCB is RO
> +	 * in Root Ports, so no need to attempt to set it for
> +	 * them. For VFs, the RCB is RsvdP, so, no need to set it.
> +	 * Then, if the Root Port has RCB set, then we set for the EP
> +	 * unless already set.
> +	 */
> +	if (pcie_cap_has_lnkctl(dev) &&
> +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> +	    !dev->is_virtfn && pcie_root_rcb_set(dev)) {
> +		u16 lnkctl;
> +
> +		pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> +			return;
> +
> +		pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);
> +	}

RCB isn't meaningful for switches, so we'll read their LNKCTL
unnecessarily.  I propose something like this, which also clears RCB
if it's set when it shouldn't be (I think this would indicate a
firmware defect):

        /*
         * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
         * (where it is read-only), Endpoints, and Bridges.  It may only be
         * set for Endpoints and Bridges if it is set in the Root Port.
         */
        if (!pci_is_pcie(dev) ||
            pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
            pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
            pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
            dev->is_virtfn)
                return;

        rcb = pcie_root_rcb_set(dev);

        pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
        if (rcb) {
                if (lnkctl & PCI_EXP_LNKCTL_RCB)
                        return;

                lnkctl |= PCI_EXP_LNKCTL_RCB;
        } else {
                if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
                        return;

                pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
                lnkctl &= ~PCI_EXP_LNKCTL_RCB;
        }

        pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);

> +}
> +
>  static void pci_configure_device(struct pci_dev *dev)
>  {
>  	pci_configure_mps(dev);
> @@ -2419,6 +2454,7 @@ static void pci_configure_device(struct pci_dev *dev)
>  	pci_configure_aspm_l1ss(dev);
>  	pci_configure_eetlp_prefix(dev);
>  	pci_configure_serr(dev);
> +	pci_configure_rcb(dev);
>  
>  	pci_acpi_program_hp_params(dev);
>  }
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-21 22:40   ` Bjorn Helgaas
@ 2026-01-22  9:42     ` Haakon Bugge
  2026-01-22 10:22       ` Bjorn Helgaas
  2026-01-22 10:36     ` Bjorn Helgaas
  1 sibling, 1 reply; 14+ messages in thread
From: Haakon Bugge @ 2026-01-22  9:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org

> On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> [snip]

> RCB isn't meaningful for switches, so we'll read their LNKCTL
> unnecessarily.  I propose something like this, which also clears RCB
> if it's set when it shouldn't be (I think this would indicate a
> firmware defect):
> 
>        /*
>         * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
>         * (where it is read-only), Endpoints, and Bridges.  It may only be
>         * set for Endpoints and Bridges if it is set in the Root Port.
>         */
>        if (!pci_is_pcie(dev) ||
>            pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>            pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>            pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>            dev->is_virtfn)

I see that sec 1.3.2 defines "Endpoints are classified as either legacy, PCI Express, or Root Complex Integrated Endpoints (RCiEPs)." But, shouldn't we also exclude Root Complex Event Collectors (PCI_EXP_TYPE_RC_EC)?

>                return;
> 
>        rcb = pcie_root_rcb_set(dev);
> 
>        pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>        if (rcb) {
>                if (lnkctl & PCI_EXP_LNKCTL_RCB)
>                        return;
> 
>                lnkctl |= PCI_EXP_LNKCTL_RCB;
>        } else {

I was thinking that since sec 7.5.3.7 states (for Endpoints and Bridges): "Default value of this bit is 0b.", that this case didn't happen. But of course, a defect BIOS should be considered.

A v3 is on its way.

I really appreciate your diligent review, Bjorn!


Thxs, Håkon


>                if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>                        return;
> 
>                pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>                lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>        }
> 
>        pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> 
>> +}
>> +
>> static void pci_configure_device(struct pci_dev *dev)
>> {
>> 	pci_configure_mps(dev);
>> @@ -2419,6 +2454,7 @@ static void pci_configure_device(struct pci_dev *dev)
>> 	pci_configure_aspm_l1ss(dev);
>> 	pci_configure_eetlp_prefix(dev);
>> 	pci_configure_serr(dev);
>> +	pci_configure_rcb(dev);
>> 
>> 	pci_acpi_program_hp_params(dev);
>> }
>> --
>> 2.43.5

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22  9:42     ` Haakon Bugge
@ 2026-01-22 10:22       ` Bjorn Helgaas
  0 siblings, 0 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 10:22 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org

On Thu, Jan 22, 2026 at 09:42:46AM +0000, Haakon Bugge wrote:
> > On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> > [snip]
> 
> > RCB isn't meaningful for switches, so we'll read their LNKCTL
> > unnecessarily.  I propose something like this, which also clears RCB
> > if it's set when it shouldn't be (I think this would indicate a
> > firmware defect):
> > 
> >        /*
> >         * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
> >         * (where it is read-only), Endpoints, and Bridges.  It may only be
> >         * set for Endpoints and Bridges if it is set in the Root Port.
> >         */
> >        if (!pci_is_pcie(dev) ||
> >            pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >            pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> >            pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> >            dev->is_virtfn)
> 
> I see that sec 1.3.2 defines "Endpoints are classified as either
> legacy, PCI Express, or Root Complex Integrated Endpoints (RCiEPs)."
> But, shouldn't we also exclude Root Complex Event Collectors
> (PCI_EXP_TYPE_RC_EC)?

Yes, probably so.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-21 22:40   ` Bjorn Helgaas
  2026-01-22  9:42     ` Haakon Bugge
@ 2026-01-22 10:36     ` Bjorn Helgaas
  2026-01-22 10:49       ` Haakon Bugge
  2026-01-22 11:35       ` Niklas Schnelle
  1 sibling, 2 replies; 14+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 10:36 UTC (permalink / raw)
  To: Håkon Bugge
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi, Alex Williamson, Niklas Schnelle

[+cc Alex, Niklas]

On Wed, Jan 21, 2026 at 04:40:10PM -0600, Bjorn Helgaas wrote:
> On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> > Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> > Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> > instructed program_hpx_type2() to set the RCB in an endpoint,
> > although it's RC did not have the RCB bit set.
> > 
> > e42010d8207f fixed that by qualifying the setting of the RCB in the
> > endpoint with the RC supporting an 128 byte RCB.
> > 
> > In retrospect, the program_hpx_type2() should only modify the AER
> > bits, and stay away from fiddling with the Link Control Register.
> > 
> > Hence, we explicitly program the RCB from pci_configure_device(). RCB
> > is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
> > cases we skip programming it. Then, if the Root Port has RCB set and
> > it is not set in the EP, we set it.
> > 
> > Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
> > Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> > 
> > ---
> > 
> > Note, that the current duplication of pcie_root_rcb_set() will be
> > removed in the next commit.
> > ---
> >  drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
> >  1 file changed, 36 insertions(+)
> > 
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 41183aed8f5d9..347af29868124 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
> >  	}
> >  }
> >  
> > +static bool pcie_root_rcb_set(struct pci_dev *dev)
> > +{
> > +	struct pci_dev *rp = pcie_find_root_port(dev);
> > +	u16 lnkctl;
> > +
> > +	if (!rp)
> > +		return false;
> > +
> > +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> > +
> > +	return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> > +}
> > +
> > +static void pci_configure_rcb(struct pci_dev *dev)
> > +{
> > +	/*
> > +	 * Obviously, we need a Link Control register. The RCB is RO
> > +	 * in Root Ports, so no need to attempt to set it for
> > +	 * them. For VFs, the RCB is RsvdP, so, no need to set it.
> > +	 * Then, if the Root Port has RCB set, then we set for the EP
> > +	 * unless already set.
> > +	 */
> > +	if (pcie_cap_has_lnkctl(dev) &&
> > +	    (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> > +	    !dev->is_virtfn && pcie_root_rcb_set(dev)) {
> > +		u16 lnkctl;
> > +
> > +		pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> > +		if (lnkctl & PCI_EXP_LNKCTL_RCB)
> > +			return;
> > +
> > +		pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);
> > +	}
> 
> RCB isn't meaningful for switches, so we'll read their LNKCTL
> unnecessarily.  I propose something like this, which also clears RCB
> if it's set when it shouldn't be (I think this would indicate a
> firmware defect):
> 
>         /*
>          * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
>          * (where it is read-only), Endpoints, and Bridges.  It may only be
>          * set for Endpoints and Bridges if it is set in the Root Port.
>          */
>         if (!pci_is_pcie(dev) ||
>             pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>             pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>             pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>             dev->is_virtfn)
>                 return;
> 
>         rcb = pcie_root_rcb_set(dev);
> 
>         pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>         if (rcb) {
>                 if (lnkctl & PCI_EXP_LNKCTL_RCB)
>                         return;
> 
>                 lnkctl |= PCI_EXP_LNKCTL_RCB;
>         } else {
>                 if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>                         return;
> 
>                 pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>                 lnkctl &= ~PCI_EXP_LNKCTL_RCB;

On second thought, I think this is too aggressive.  I think VM guests
will often see endpoints but not the Root Port.  In that case,
pcie_root_rcb_set() would return false because it couldn't find the
RP, but the RP might actually have RCB set.  Then we would clear the
endpoint RCB unnecessarily, which should be safe but would reduce
performance and will result in annoying misleading warnings.

Could either ignore this case (as in your original patch) or bring
pcie_root_rcb_set() inline here and return early if we can't find the
RP.

>         }
> 
>         pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> 
> > +}
> > +
> >  static void pci_configure_device(struct pci_dev *dev)
> >  {
> >  	pci_configure_mps(dev);
> > @@ -2419,6 +2454,7 @@ static void pci_configure_device(struct pci_dev *dev)
> >  	pci_configure_aspm_l1ss(dev);
> >  	pci_configure_eetlp_prefix(dev);
> >  	pci_configure_serr(dev);
> > +	pci_configure_rcb(dev);
> >  
> >  	pci_acpi_program_hp_params(dev);
> >  }
> > -- 
> > 2.43.5
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 10:36     ` Bjorn Helgaas
@ 2026-01-22 10:49       ` Haakon Bugge
  2026-01-22 11:04         ` Bjorn Helgaas
  2026-01-22 11:35       ` Niklas Schnelle
  1 sibling, 1 reply; 14+ messages in thread
From: Haakon Bugge @ 2026-01-22 10:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	Alex Williamson, Niklas Schnelle



> On 22 Jan 2026, at 11:36, Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> [+cc Alex, Niklas]
> 
> On Wed, Jan 21, 2026 at 04:40:10PM -0600, Bjorn Helgaas wrote:
>> On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
>>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>>> instructed program_hpx_type2() to set the RCB in an endpoint,
>>> although it's RC did not have the RCB bit set.
>>> 
>>> e42010d8207f fixed that by qualifying the setting of the RCB in the
>>> endpoint with the RC supporting an 128 byte RCB.
>>> 
>>> In retrospect, the program_hpx_type2() should only modify the AER
>>> bits, and stay away from fiddling with the Link Control Register.
>>> 
>>> Hence, we explicitly program the RCB from pci_configure_device(). RCB
>>> is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
>>> cases we skip programming it. Then, if the Root Port has RCB set and
>>> it is not set in the EP, we set it.
>>> 
>>> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>> 
>>> ---
>>> 
>>> Note, that the current duplication of pcie_root_rcb_set() will be
>>> removed in the next commit.
>>> ---
>>> drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
>>> 1 file changed, 36 insertions(+)
>>> 
>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>> index 41183aed8f5d9..347af29868124 100644
>>> --- a/drivers/pci/probe.c
>>> +++ b/drivers/pci/probe.c
>>> @@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
>>> }
>>> }
>>> 
>>> +static bool pcie_root_rcb_set(struct pci_dev *dev)
>>> +{
>>> + struct pci_dev *rp = pcie_find_root_port(dev);
>>> + u16 lnkctl;
>>> +
>>> + if (!rp)
>>> + return false;
>>> +
>>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>>> +
>>> + return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>>> +}
>>> +
>>> +static void pci_configure_rcb(struct pci_dev *dev)
>>> +{
>>> + /*
>>> +  * Obviously, we need a Link Control register. The RCB is RO
>>> +  * in Root Ports, so no need to attempt to set it for
>>> +  * them. For VFs, the RCB is RsvdP, so, no need to set it.
>>> +  * Then, if the Root Port has RCB set, then we set for the EP
>>> +  * unless already set.
>>> +  */
>>> + if (pcie_cap_has_lnkctl(dev) &&
>>> +     (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
>>> +     !dev->is_virtfn && pcie_root_rcb_set(dev)) {
>>> + u16 lnkctl;
>>> +
>>> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>>> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
>>> + return;
>>> +
>>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);
>>> + }
>> 
>> RCB isn't meaningful for switches, so we'll read their LNKCTL
>> unnecessarily.  I propose something like this, which also clears RCB
>> if it's set when it shouldn't be (I think this would indicate a
>> firmware defect):
>> 
>>        /*
>>         * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
>>         * (where it is read-only), Endpoints, and Bridges.  It may only be
>>         * set for Endpoints and Bridges if it is set in the Root Port.
>>         */
>>        if (!pci_is_pcie(dev) ||
>>            pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>            pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>>            pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>>            dev->is_virtfn)
>>                return;
>> 
>>        rcb = pcie_root_rcb_set(dev);
>> 
>>        pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>>        if (rcb) {
>>                if (lnkctl & PCI_EXP_LNKCTL_RCB)
>>                        return;
>> 
>>                lnkctl |= PCI_EXP_LNKCTL_RCB;
>>        } else {
>>                if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>>                        return;
>> 
>>                pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>>                lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> 
> On second thought, I think this is too aggressive.  I think VM guests
> will often see endpoints but not the Root Port.  In that case,
> pcie_root_rcb_set() would return false because it couldn't find the
> RP, but the RP might actually have RCB set.  Then we would clear the
> endpoint RCB unnecessarily, which should be safe but would reduce
> performance and will result in annoying misleading warnings.

If the VM has a PF in pass-through and the RP is not there, you're right. If it is a VF, we do not program it anyway.

> Could either ignore this case (as in your original patch) or bring
> pcie_root_rcb_set() inline here and return early if we can't find the
> RP.

I think pcie_root_rcb_set() should return true when it was able to retrieve the RP's RCB value, and if not, we bail out.


Thxs, Håkon

> 
>>        }
>> 
>>        pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>> 
>>> +}
>>> +
>>> static void pci_configure_device(struct pci_dev *dev)
>>> {
>>> pci_configure_mps(dev);
>>> @@ -2419,6 +2454,7 @@ static void pci_configure_device(struct pci_dev *dev)
>>> pci_configure_aspm_l1ss(dev);
>>> pci_configure_eetlp_prefix(dev);
>>> pci_configure_serr(dev);
>>> + pci_configure_rcb(dev);
>>> 
>>> pci_acpi_program_hp_params(dev);
>>> }
>>> -- 
>>> 2.43.5



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 10:49       ` Haakon Bugge
@ 2026-01-22 11:04         ` Bjorn Helgaas
  2026-01-22 11:12           ` Haakon Bugge
  0 siblings, 1 reply; 14+ messages in thread
From: Bjorn Helgaas @ 2026-01-22 11:04 UTC (permalink / raw)
  To: Haakon Bugge
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	Alex Williamson, Niklas Schnelle

On Thu, Jan 22, 2026 at 10:49:45AM +0000, Haakon Bugge wrote:
> > On 22 Jan 2026, at 11:36, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Wed, Jan 21, 2026 at 04:40:10PM -0600, Bjorn Helgaas wrote:
> >> On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> >>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> >>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> >>> instructed program_hpx_type2() to set the RCB in an endpoint,
> >>> although it's RC did not have the RCB bit set.
> >>> 
> >>> e42010d8207f fixed that by qualifying the setting of the RCB in the
> >>> endpoint with the RC supporting an 128 byte RCB.
> >>> 
> >>> In retrospect, the program_hpx_type2() should only modify the AER
> >>> bits, and stay away from fiddling with the Link Control Register.
> >>> 
> >>> Hence, we explicitly program the RCB from pci_configure_device(). RCB
> >>> is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
> >>> cases we skip programming it. Then, if the Root Port has RCB set and
> >>> it is not set in the EP, we set it.
> >>> 
> >>> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
> >>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> >>> 
> >>> ---
> >>> 
> >>> Note, that the current duplication of pcie_root_rcb_set() will be
> >>> removed in the next commit.
> >>> ---
> >>> drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
> >>> 1 file changed, 36 insertions(+)
> >>> 
> >>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> >>> index 41183aed8f5d9..347af29868124 100644
> >>> --- a/drivers/pci/probe.c
> >>> +++ b/drivers/pci/probe.c
> >>> @@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
> >>> }
> >>> }
> >>> 
> >>> +static bool pcie_root_rcb_set(struct pci_dev *dev)
> >>> +{
> >>> + struct pci_dev *rp = pcie_find_root_port(dev);
> >>> + u16 lnkctl;
> >>> +
> >>> + if (!rp)
> >>> + return false;
> >>> +
> >>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> >>> +
> >>> + return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> >>> +}
> >>> +
> >>> +static void pci_configure_rcb(struct pci_dev *dev)
> >>> +{
> >>> + /*
> >>> +  * Obviously, we need a Link Control register. The RCB is RO
> >>> +  * in Root Ports, so no need to attempt to set it for
> >>> +  * them. For VFs, the RCB is RsvdP, so, no need to set it.
> >>> +  * Then, if the Root Port has RCB set, then we set for the EP
> >>> +  * unless already set.
> >>> +  */
> >>> + if (pcie_cap_has_lnkctl(dev) &&
> >>> +     (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
> >>> +     !dev->is_virtfn && pcie_root_rcb_set(dev)) {
> >>> + u16 lnkctl;
> >>> +
> >>> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> >>> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
> >>> + return;
> >>> +
> >>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);
> >>> + }
> >> 
> >> RCB isn't meaningful for switches, so we'll read their LNKCTL
> >> unnecessarily.  I propose something like this, which also clears RCB
> >> if it's set when it shouldn't be (I think this would indicate a
> >> firmware defect):
> >> 
> >>        /*
> >>         * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
> >>         * (where it is read-only), Endpoints, and Bridges.  It may only be
> >>         * set for Endpoints and Bridges if it is set in the Root Port.
> >>         */
> >>        if (!pci_is_pcie(dev) ||
> >>            pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
> >>            pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
> >>            pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
> >>            dev->is_virtfn)
> >>                return;
> >> 
> >>        rcb = pcie_root_rcb_set(dev);
> >> 
> >>        pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> >>        if (rcb) {
> >>                if (lnkctl & PCI_EXP_LNKCTL_RCB)
> >>                        return;
> >> 
> >>                lnkctl |= PCI_EXP_LNKCTL_RCB;
> >>        } else {
> >>                if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> >>                        return;
> >> 
> >>                pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> >>                lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> > 
> > On second thought, I think this is too aggressive.  I think VM
> > guests will often see endpoints but not the Root Port.  In that
> > case, pcie_root_rcb_set() would return false because it couldn't
> > find the RP, but the RP might actually have RCB set.  Then we
> > would clear the endpoint RCB unnecessarily, which should be safe
> > but would reduce performance and will result in annoying
> > misleading warnings.
> 
> If the VM has a PF in pass-through and the RP is not there, you're
> right. If it is a VF, we do not program it anyway.
> 
> > Could either ignore this case (as in your original patch) or bring
> > pcie_root_rcb_set() inline here and return early if we can't find
> > the RP.
> 
> I think pcie_root_rcb_set() should return true when it was able to
> retrieve the RP's RCB value, and if not, we bail out.

Currently it returns:

  - true if we found the RP and it had RCB set

  - false if (a) we couldn't find the RP or (b) we found the RP and it
    had RCB cleared

I don't think it's worth complicating the signature to make (a) and
(b) distinguishable.

Inlining pcie_root_rcb_set() would also remove any ambiguity about
whether pcie_root_rcb_set() actually *sets* RCB or just tests it.  I
think any readability advantage of using a separate function is
minimal.

> >>        }
> >> 
> >>        pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
> >> 
> >>> +}
> >>> +
> >>> static void pci_configure_device(struct pci_dev *dev)
> >>> {
> >>> pci_configure_mps(dev);
> >>> @@ -2419,6 +2454,7 @@ static void pci_configure_device(struct pci_dev *dev)
> >>> pci_configure_aspm_l1ss(dev);
> >>> pci_configure_eetlp_prefix(dev);
> >>> pci_configure_serr(dev);
> >>> + pci_configure_rcb(dev);
> >>> 
> >>> pci_acpi_program_hp_params(dev);
> >>> }
> >>> -- 
> >>> 2.43.5
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 11:04         ` Bjorn Helgaas
@ 2026-01-22 11:12           ` Haakon Bugge
  0 siblings, 0 replies; 14+ messages in thread
From: Haakon Bugge @ 2026-01-22 11:12 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	Alex Williamson, Niklas Schnelle



> On 22 Jan 2026, at 12:04, Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> On Thu, Jan 22, 2026 at 10:49:45AM +0000, Haakon Bugge wrote:
>>> On 22 Jan 2026, at 11:36, Bjorn Helgaas <helgaas@kernel.org> wrote:
>>> On Wed, Jan 21, 2026 at 04:40:10PM -0600, Bjorn Helgaas wrote:
>>>> On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
>>>>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>>>>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>>>>> instructed program_hpx_type2() to set the RCB in an endpoint,
>>>>> although it's RC did not have the RCB bit set.
>>>>> 
>>>>> e42010d8207f fixed that by qualifying the setting of the RCB in the
>>>>> endpoint with the RC supporting an 128 byte RCB.
>>>>> 
>>>>> In retrospect, the program_hpx_type2() should only modify the AER
>>>>> bits, and stay away from fiddling with the Link Control Register.
>>>>> 
>>>>> Hence, we explicitly program the RCB from pci_configure_device(). RCB
>>>>> is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
>>>>> cases we skip programming it. Then, if the Root Port has RCB set and
>>>>> it is not set in the EP, we set it.
>>>>> 
>>>>> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
>>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>>> 
>>>>> ---
>>>>> 
>>>>> Note, that the current duplication of pcie_root_rcb_set() will be
>>>>> removed in the next commit.
>>>>> ---
>>>>> drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
>>>>> 1 file changed, 36 insertions(+)
>>>>> 
>>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>>> index 41183aed8f5d9..347af29868124 100644
>>>>> --- a/drivers/pci/probe.c
>>>>> +++ b/drivers/pci/probe.c
>>>>> @@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
>>>>> }
>>>>> }
>>>>> 
>>>>> +static bool pcie_root_rcb_set(struct pci_dev *dev)
>>>>> +{
>>>>> + struct pci_dev *rp = pcie_find_root_port(dev);
>>>>> + u16 lnkctl;
>>>>> +
>>>>> + if (!rp)
>>>>> + return false;
>>>>> +
>>>>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>>>>> +
>>>>> + return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>>>>> +}
>>>>> +
>>>>> +static void pci_configure_rcb(struct pci_dev *dev)
>>>>> +{
>>>>> + /*
>>>>> +  * Obviously, we need a Link Control register. The RCB is RO
>>>>> +  * in Root Ports, so no need to attempt to set it for
>>>>> +  * them. For VFs, the RCB is RsvdP, so, no need to set it.
>>>>> +  * Then, if the Root Port has RCB set, then we set for the EP
>>>>> +  * unless already set.
>>>>> +  */
>>>>> + if (pcie_cap_has_lnkctl(dev) &&
>>>>> +     (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT) &&
>>>>> +     !dev->is_virtfn && pcie_root_rcb_set(dev)) {
>>>>> + u16 lnkctl;
>>>>> +
>>>>> + pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>>>>> + if (lnkctl & PCI_EXP_LNKCTL_RCB)
>>>>> + return;
>>>>> +
>>>>> + pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl | PCI_EXP_LNKCTL_RCB);
>>>>> + }
>>>> 
>>>> RCB isn't meaningful for switches, so we'll read their LNKCTL
>>>> unnecessarily.  I propose something like this, which also clears RCB
>>>> if it's set when it shouldn't be (I think this would indicate a
>>>> firmware defect):
>>>> 
>>>>       /*
>>>>        * Per PCIe r7.0, sec 7.5.3.7, RCB is only meaningful in Root Ports
>>>>        * (where it is read-only), Endpoints, and Bridges.  It may only be
>>>>        * set for Endpoints and Bridges if it is set in the Root Port.
>>>>        */
>>>>       if (!pci_is_pcie(dev) ||
>>>>           pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT ||
>>>>           pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM ||
>>>>           pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
>>>>           dev->is_virtfn)
>>>>               return;
>>>> 
>>>>       rcb = pcie_root_rcb_set(dev);
>>>> 
>>>>       pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>>>>       if (rcb) {
>>>>               if (lnkctl & PCI_EXP_LNKCTL_RCB)
>>>>                       return;
>>>> 
>>>>               lnkctl |= PCI_EXP_LNKCTL_RCB;
>>>>       } else {
>>>>               if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>>>>                       return;
>>>> 
>>>>               pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>>>>               lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>>> 
>>> On second thought, I think this is too aggressive.  I think VM
>>> guests will often see endpoints but not the Root Port.  In that
>>> case, pcie_root_rcb_set() would return false because it couldn't
>>> find the RP, but the RP might actually have RCB set.  Then we
>>> would clear the endpoint RCB unnecessarily, which should be safe
>>> but would reduce performance and will result in annoying
>>> misleading warnings.
>> 
>> If the VM has a PF in pass-through and the RP is not there, you're
>> right. If it is a VF, we do not program it anyway.
>> 
>>> Could either ignore this case (as in your original patch) or bring
>>> pcie_root_rcb_set() inline here and return early if we can't find
>>> the RP.
>> 
>> I think pcie_root_rcb_set() should return true when it was able to
>> retrieve the RP's RCB value, and if not, we bail out.
> 
> Currently it returns:
> 
>  - true if we found the RP and it had RCB set
> 
>  - false if (a) we couldn't find the RP or (b) we found the RP and it
>    had RCB cleared
> 
> I don't think it's worth complicating the signature to make (a) and
> (b) distinguishable.

Case (b) will not trustfully detect the case where the EP's RCB is set and the RP is (b.1) not found or (b.2) RP is found but it's RCB is not set. In case (b.2), we _shall_ reset the EP's RCB, IIUC.


Thxs, Håkon



> 
> Inlining pcie_root_rcb_set() would also remove any ambiguity about
> whether pcie_root_rcb_set() actually *sets* RCB or just tests it.  I
> think any readability advantage of using a separate function is
> minimal.
> 
>>>>       }
>>>> 
>>>>       pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
>>>> 
>>>>> +}
>>>>> +
>>>>> static void pci_configure_device(struct pci_dev *dev)
>>>>> {
>>>>> pci_configure_mps(dev);
>>>>> @@ -2419,6 +2454,7 @@ static void pci_configure_device(struct pci_dev *dev)
>>>>> pci_configure_aspm_l1ss(dev);
>>>>> pci_configure_eetlp_prefix(dev);
>>>>> pci_configure_serr(dev);
>>>>> + pci_configure_rcb(dev);
>>>>> 
>>>>> pci_acpi_program_hp_params(dev);
>>>>> }
>>>>> -- 
>>>>> 2.43.5
>> 
>> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 10:36     ` Bjorn Helgaas
  2026-01-22 10:49       ` Haakon Bugge
@ 2026-01-22 11:35       ` Niklas Schnelle
  2026-01-22 13:25         ` Haakon Bugge
  1 sibling, 1 reply; 14+ messages in thread
From: Niklas Schnelle @ 2026-01-22 11:35 UTC (permalink / raw)
  To: Bjorn Helgaas, Håkon Bugge
  Cc: Bjorn Helgaas, Johannes Thumshirn, linux-pci, linux-kernel,
	linux-acpi, Alex Williamson

On Thu, 2026-01-22 at 04:36 -0600, Bjorn Helgaas wrote:
> [+cc Alex, Niklas]
> 
> On Wed, Jan 21, 2026 at 04:40:10PM -0600, Bjorn Helgaas wrote:
> > On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
> > > Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
> > > Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
> > > instructed program_hpx_type2() to set the RCB in an endpoint,
> > > although it's RC did not have the RCB bit set.
> > > 
> > > e42010d8207f fixed that by qualifying the setting of the RCB in the
> > > endpoint with the RC supporting an 128 byte RCB.
> > > 
> > > In retrospect, the program_hpx_type2() should only modify the AER
> > > bits, and stay away from fiddling with the Link Control Register.
> > > 
> > > Hence, we explicitly program the RCB from pci_configure_device(). RCB
> > > is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
> > > cases we skip programming it. Then, if the Root Port has RCB set and
> > > it is not set in the EP, we set it.
> > > 
> > > Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
> > > Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> > > 
> > > ---
> > > 
> > > Note, that the current duplication of pcie_root_rcb_set() will be
> > > removed in the next commit.
> > > ---
> > >  drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 36 insertions(+)
> > > 
> > > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > > index 41183aed8f5d9..347af29868124 100644
> > > --- a/drivers/pci/probe.c
> > > +++ b/drivers/pci/probe.c
> > > @@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
> > >  	}
> > >  }
> > >  
> > > +static bool pcie_root_rcb_set(struct pci_dev *dev)
> > > +{
> > > +	struct pci_dev *rp = pcie_find_root_port(dev);
> > > +	u16 lnkctl;
> > > +
> > > +	if (!rp)
> > > +		return false;
> > > +
> > > +	pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
> > > +
> > > +	return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
> > > +}
> > > +
> > > +static void pci_configure_rcb(struct pci_dev *dev)
> > > +{
> > > 
--- snip ---
> > 
> >         pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
> >         if (rcb) {
> >                 if (lnkctl & PCI_EXP_LNKCTL_RCB)
> >                         return;
> > 
> >                 lnkctl |= PCI_EXP_LNKCTL_RCB;
> >         } else {
> >                 if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
> >                         return;
> > 
> >                 pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
> >                 lnkctl &= ~PCI_EXP_LNKCTL_RCB;
> 
> On second thought, I think this is too aggressive.  I think VM guests
> will often see endpoints but not the Root Port.  In that case,
> pcie_root_rcb_set() would return false because it couldn't find the
> RP, but the RP might actually have RCB set.  Then we would clear the
> endpoint RCB unnecessarily, which should be safe but would reduce
> performance and will result in annoying misleading warnings.
> 
> Could either ignore this case (as in your original patch) or bring
> pcie_root_rcb_set() inline here and return early if we can't find the
> RP.
> > 

Thanks Bjorn for looping me in. If I'm reading later comments correctly
we're already in agreement that if the root port isn't found the
function should bail and leave the setting as is which sounds good to
me. Still, feel free to directly add me in To on the next version and
I'll be happy to test it and take a look at the code.

Nevertheless I'd like to confirm that yes on s390 we definitely have
the case where PFs are passed-through to guests without the guest
having access to / seeing the root port as a PCIe device. This is true
on both our machine hypervisor guests (LPARs) and in KVM guests. And
yes I think this would potentially incorrectly clear the RCB which
could have been set by firmware or platform PCI code based on its
knowledge of the actual root port. That said from a quick look we
currently seem to keep the RCB at 64 bytes in endpoints.

Thanks,
Niklas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device
  2026-01-22 11:35       ` Niklas Schnelle
@ 2026-01-22 13:25         ` Haakon Bugge
  0 siblings, 0 replies; 14+ messages in thread
From: Haakon Bugge @ 2026-01-22 13:25 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Bjorn Helgaas, Bjorn Helgaas, Johannes Thumshirn,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org, Alex Williamson



> On 22 Jan 2026, at 12:35, Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> 
> On Thu, 2026-01-22 at 04:36 -0600, Bjorn Helgaas wrote:
>> [+cc Alex, Niklas]
>> 
>> On Wed, Jan 21, 2026 at 04:40:10PM -0600, Bjorn Helgaas wrote:
>>> On Wed, Jan 21, 2026 at 12:35:40PM +0100, Håkon Bugge wrote:
>>>> Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff
>>>> Root Port supports it (_HPX)") fixed a bogus _HPX type 2 record, which
>>>> instructed program_hpx_type2() to set the RCB in an endpoint,
>>>> although it's RC did not have the RCB bit set.
>>>> 
>>>> e42010d8207f fixed that by qualifying the setting of the RCB in the
>>>> endpoint with the RC supporting an 128 byte RCB.
>>>> 
>>>> In retrospect, the program_hpx_type2() should only modify the AER
>>>> bits, and stay away from fiddling with the Link Control Register.
>>>> 
>>>> Hence, we explicitly program the RCB from pci_configure_device(). RCB
>>>> is RO in Root Ports, and in VFs, the bit is RsvdP, so for these two
>>>> cases we skip programming it. Then, if the Root Port has RCB set and
>>>> it is not set in the EP, we set it.
>>>> 
>>>> Fixes: Commit e42010d8207f ("PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)")
>>>> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
>>>> 
>>>> ---
>>>> 
>>>> Note, that the current duplication of pcie_root_rcb_set() will be
>>>> removed in the next commit.
>>>> ---
>>>> drivers/pci/probe.c | 36 ++++++++++++++++++++++++++++++++++++
>>>> 1 file changed, 36 insertions(+)
>>>> 
>>>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
>>>> index 41183aed8f5d9..347af29868124 100644
>>>> --- a/drivers/pci/probe.c
>>>> +++ b/drivers/pci/probe.c
>>>> @@ -2410,6 +2410,41 @@ static void pci_configure_serr(struct pci_dev *dev)
>>>> }
>>>> }
>>>> 
>>>> +static bool pcie_root_rcb_set(struct pci_dev *dev)
>>>> +{
>>>> + struct pci_dev *rp = pcie_find_root_port(dev);
>>>> + u16 lnkctl;
>>>> +
>>>> + if (!rp)
>>>> + return false;
>>>> +
>>>> + pcie_capability_read_word(rp, PCI_EXP_LNKCTL, &lnkctl);
>>>> +
>>>> + return !!(lnkctl & PCI_EXP_LNKCTL_RCB);
>>>> +}
>>>> +
>>>> +static void pci_configure_rcb(struct pci_dev *dev)
>>>> +{
>>>> 
> --- snip ---
>>> 
>>>        pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
>>>        if (rcb) {
>>>                if (lnkctl & PCI_EXP_LNKCTL_RCB)
>>>                        return;
>>> 
>>>                lnkctl |= PCI_EXP_LNKCTL_RCB;
>>>        } else {
>>>                if (!(lnkctl & PCI_EXP_LNKCTL_RCB))
>>>                        return;
>>> 
>>>                pci_info(FW_INFO "clearing RCB (RCB not set in Root Port)\n");
>>>                lnkctl &= ~PCI_EXP_LNKCTL_RCB;
>> 
>> On second thought, I think this is too aggressive.  I think VM guests
>> will often see endpoints but not the Root Port.  In that case,
>> pcie_root_rcb_set() would return false because it couldn't find the
>> RP, but the RP might actually have RCB set.  Then we would clear the
>> endpoint RCB unnecessarily, which should be safe but would reduce
>> performance and will result in annoying misleading warnings.
>> 
>> Could either ignore this case (as in your original patch) or bring
>> pcie_root_rcb_set() inline here and return early if we can't find the
>> RP.
>>> 
> 
> Thanks Bjorn for looping me in. If I'm reading later comments correctly
> we're already in agreement that if the root port isn't found the
> function should bail and leave the setting as is which sounds good to
> me. Still, feel free to directly add me in To on the next version and
> I'll be happy to test it and take a look at the code.
> 
> Nevertheless I'd like to confirm that yes on s390 we definitely have
> the case where PFs are passed-through to guests without the guest
> having access to / seeing the root port as a PCIe device. This is true
> on both our machine hypervisor guests (LPARs) and in KVM guests. And
> yes I think this would potentially incorrectly clear the RCB which
> could have been set by firmware or platform PCI code based on its
> knowledge of the actual root port. That said from a quick look we
> currently seem to keep the RCB at 64 bytes in endpoints.

Hi Niklas,

Thanks for chiming in. Just out a v3 [1].


Thxs, Håkon

[1] https://lore.kernel.org/linux-pci/20260122130957.68757-1-haakon.bugge@oracle.com/

> 
> Thanks,
> Niklas



^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-01-22 13:25 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-21 11:35 [PATCH v2 0/2] PCI: Init RCB from pci_configure_device and fix program_hpx_type2 Håkon Bugge
2026-01-21 11:35 ` [PATCH v2 1/2] PCI: Initialize RCB from pci_configure_device Håkon Bugge
2026-01-21 17:41   ` Lukas Wunner
2026-01-21 19:13     ` Haakon Bugge
2026-01-21 22:40   ` Bjorn Helgaas
2026-01-22  9:42     ` Haakon Bugge
2026-01-22 10:22       ` Bjorn Helgaas
2026-01-22 10:36     ` Bjorn Helgaas
2026-01-22 10:49       ` Haakon Bugge
2026-01-22 11:04         ` Bjorn Helgaas
2026-01-22 11:12           ` Haakon Bugge
2026-01-22 11:35       ` Niklas Schnelle
2026-01-22 13:25         ` Haakon Bugge
2026-01-21 11:35 ` [PATCH v2 2/2] PCI/ACPI: Confine program_hpx_type2 to the AER bits Håkon Bugge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox