linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] PCI: Support to workaround bus level HW issues
@ 2018-04-30 17:25 James Puthukattukaran
  2018-04-30 17:27 ` [PATCH v2 1/2] PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI switch specific issues prior to accessing newly added endpoint James Puthukattukaran
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: James Puthukattukaran @ 2018-04-30 17:25 UTC (permalink / raw)
  To: Alex Williamson, Sinan Kaya; +Cc: linux-pci@vger.kernel.org

There are bugs in certain PCIe switches that cause access violations when an
endpoint device is hotplugged. In particular, there's an issue with
certain IDT switches that trigger a ACS violation when bringing up a newly
plugged PCIe endpoint device. This is a major issue for platforms
designed to issue a fatal reset in the case of this event.

The first patch provides a framework for intercepting and working around
issues with parent devices to the endpoint being brought up.

The second patch provides the actual patch for the IDT switch issue using
that framework. The ACS feature is disabled in the IDT switch prior to endpoint
device detection and then re-enabled subsequent to that.

James

-v2: move workaround to pci_bus_read_dev_vendor_id() from pci_bus_check_dev()
     and move enable_acs_sv to drivers/pci/pci.c -- by Yinghai
-v3: add bus->self check for root bus and virtual bus for sriov vfs.
-v4: only do workaround for IDT switches
-v5: tweak pci_std_enable_acs_sv to deal with unimplemented SV
-v6: Added errata verbiage verbatim and resolved patch format issues
-v7: changed int to bool for found and idt_workaround declarations. Also
     added bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=196979
-v8: Rewrote the patch by adding a new acs quirk to keep the workaround
     out of the main code path
-v9: changed function name from pci_dev_specific_fixup_acs_quirk to
     pci_bus_acs_quirk. Also, tested for FLR and hot reset scenarios and did
     not see issues where workaround was required. The issue seems to be
     related only to cold reset/power on situation.
-v10: Moved the contents of pci_bus_read_vendor_id into an internal function
      __pci_bus_read_vendor_id
-v11: Split the patch into two patches. The first a general quirk framework.
-v12: Add pci_bus_generic_read_dev_vendor_id() to carry out default behavior
      when detecting endpoint and pci_bus_specific_read_dev_vendor_id for
      bus quirk behavior
-v13: Fixed build errors found for non-x86 platforms via cross compiles when
      CONFIG_QUIRKS is not defined


James Puthukattukaran (2):
  PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI
    switch     specific issues prior to accessing newly added endpoint
  PCI: Implement workaround for the ACS bug in the IDT switch

 drivers/pci/pci.h    |  13 +++++
 drivers/pci/probe.c  |  20 +++++++-
 drivers/pci/quirks.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 174 insertions(+), 1 deletion(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI switch specific issues prior to accessing newly added endpoint
  2018-04-30 17:25 [PATCH v2 0/2] PCI: Support to workaround bus level HW issues James Puthukattukaran
@ 2018-04-30 17:27 ` James Puthukattukaran
  2018-06-29 22:00   ` Bjorn Helgaas
  2018-04-30 17:28 ` [PATCH v2 2/2] PCI: Implement workaround for the ACS bug in the IDT, switch James Puthukattukaran
  2018-06-29 21:47 ` [PATCH v2 0/2] PCI: Support to workaround bus level HW issues Bjorn Helgaas
  2 siblings, 1 reply; 6+ messages in thread
From: James Puthukattukaran @ 2018-04-30 17:27 UTC (permalink / raw)
  To: Alex Williamson, Sinan Kaya; +Cc: linux-pci@vger.kernel.org

This patch provides a framework in which it would be possible to implement
bus specific quirks prior to accessing an endpoint device beneath that bus.
The routine, pci_bus_specific_read_dev_vendor_id, can be called prior to
accessing the end point device itself in order to workaround potential issues
with the parent device (switch). If there is nothing specific to be done for
a particular switch device, it falls through to check for the endpoint device
i.e pci_bus_generic_read_dev_vendor_id().

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/pci/pci.h    | 11 +++++++++++
 drivers/pci/probe.c  | 20 +++++++++++++++++++-
 drivers/pci/quirks.c | 41 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 023f7cf..2132a60 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -225,6 +225,17 @@ enum pci_bar_type {
 int pci_configure_extended_tags(struct pci_dev *dev, void *ign);
 bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 				int crs_timeout);
+#ifdef CONFIG_PCI_QUIRKS
+int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
+				u32 *pl, int crs_timeout);
+#else
+static int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
+				u32 *pl, int crs_timeout)
+{
+	return -ENOTTY;
+}
+#endif
+
 int pci_setup_device(struct pci_dev *dev);
 int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
 		    struct resource *res, unsigned int reg);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ac91b6f..31eba02 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2097,7 +2097,7 @@ static bool pci_bus_wait_crs(struct pci_bus *bus, int devfn, u32 *l,
 	return true;
 }
 
-bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
+bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
 				int timeout)
 {
 	if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
@@ -2113,6 +2113,24 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
 
 	return true;
 }
+
+
+bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
+					int timeout)
+{
+	int ret;
+
+	/*
+ 	 * An opportunity to implement something specific for this device.
+	 * For ex, implement a quirk prior to even accessing the device
+	 */
+	ret = pci_bus_specific_read_dev_vendor_id(bus, devfn, l, timeout);
+	if (ret >= 0)
+		return (ret >= 0);
+
+	return pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
+}
+
 EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
 
 /*
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 2990ad1..2b28584 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4741,3 +4741,44 @@ static void quirk_gpu_hda(struct pci_dev *hda)
 			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
 			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
+
+static const struct pci_bus_specific_quirk{
+	u16 vendor;
+	u16 device;
+	int (*bus_quirk)(struct pci_bus *bus, int devfn, u32 *l, int timeout);
+} pci_bus_specific_quirks[] = {
+	{0}
+};
+
+/*
+ * This routine provides the ability to implement a bus specific quirk
+ * prior to doing config accesses to the endpoint device itself. For ex, there
+ * could be HW problems with the switch above the endpoint that causes issues
+ * when accessing the endpoint device. Such workarounds "specific" to the
+ * parent could be implemented prior or subsequent to accesses to the
+ * endpoint itself
+ *
+ */
+int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
+					int timeout)
+{
+	const struct pci_bus_specific_quirk *i;
+	struct pci_dev *dev;
+
+	if (!bus || !bus->self)
+		return -ENOTTY;
+
+	dev = bus->self;
+
+	/*
+ 	 * Implement any quirks in the "bus" (switch, for ex) that causes
+	 * issues in accessing the endpoint
+	 */
+	for (i = pci_bus_specific_quirks; i->bus_quirk; i++) {
+		if ((i->vendor == dev->vendor ||
+		     i->vendor == (u16)PCI_ANY_ID) &&
+		    (i->device == dev->device || i->device == (u16)PCI_ANY_ID))
+			return(i->bus_quirk(bus, devfn, l, timeout));
+	}
+	return -ENOTTY;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] PCI: Implement workaround for the ACS bug in the IDT, switch
  2018-04-30 17:25 [PATCH v2 0/2] PCI: Support to workaround bus level HW issues James Puthukattukaran
  2018-04-30 17:27 ` [PATCH v2 1/2] PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI switch specific issues prior to accessing newly added endpoint James Puthukattukaran
@ 2018-04-30 17:28 ` James Puthukattukaran
  2018-06-29 23:12   ` Bjorn Helgaas
  2018-06-29 21:47 ` [PATCH v2 0/2] PCI: Support to workaround bus level HW issues Bjorn Helgaas
  2 siblings, 1 reply; 6+ messages in thread
From: James Puthukattukaran @ 2018-04-30 17:28 UTC (permalink / raw)
  To: Alex Williamson, Sinan Kaya; +Cc: linux-pci@vger.kernel.org

The IDT switch incorrectly flags an ACS source violation on a read config
request to an end point device on the completion (IDT 89H32H8G3-YC,
errata #36) even though the PCI Express spec states that completions are
never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1). Here's
the specific copy of the errata text

"Item #36 - Downstream port applies ACS Source Validation to Completions
Section 6.12.1.1 of the PCI Express Base Specification 3.1 states
that completions are never affected
by ACS Source Validation. However, completions received by a
downstream port of the PCIe switch from a device that has not yet
captured a PCIe bus number are incorrectly dropped by ACS source
validation by the switch downstream port.

Workaround: Issue a CfgWr1 to the downstream device before issuing
the first CfgRd1 to the device.
This allows the downstream device to capture its bus number; ACS
source validation no longer stops
completions from being forwarded by the downstream port. It has been
observed that Microsoft Windows implements this workaround already;
however, some versions of Linux and other operating systems may not. "

The suggested workaround by IDT is to issue a configuration write to the
downstream device before issuing the first config read. This allows the
downstream device to capture its bus number, thus avoiding the ACS
violation on the completion. In order to make sure that the device is ready
for config accesses, we do what is currently done in making config reads
till it succeeds and then do the config write as specified by the errata.
However, to avoid hitting the errata issue when doing config reads, we
disable ACS SV around this process.

The patch does the following -

1. Disable ACS source violation if enabled.
2. Wait for config space access to become available by reading vendor id
3. Do a config write to the end point (errata workaround)
4. Enable ACS source validation (if it was enabled to begin with)

Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
---
 drivers/pci/pci.h    |   2 +
 drivers/pci/quirks.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 2132a60..586874d 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -228,6 +228,8 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 #ifdef CONFIG_PCI_QUIRKS
 int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
 				u32 *pl, int crs_timeout);
+bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn,
+				u32 *pl, int crs_timeout);
 #else
 static int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
 				u32 *pl, int crs_timeout)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 2b28584..2f5c024 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4742,11 +4742,112 @@ static void quirk_gpu_hda(struct pci_dev *hda)
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
 			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
 
+/*
+ * The IDT switch incorrectly flags an ACS source violation on a read config
+ * request to an end point device on the completion (IDT 89H32H8G3-YC,
+ * errata #36) even though the PCI Express spec states that completions are
+ * never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).
+ * Here's * the specific copy of the errata text --
+ *
+ * "Item #36 - Downstream port applies ACS Source Validation to Completions
+ * Section 6.12.1.1 of the PCI Express Base Specification 3.1 states
+ * that completions are never affected
+ * by ACS Source Validation. However, completions received by a
+ * downstream port of the PCIe switch from a device that has not yet
+ * captured a PCIe bus number are incorrectly dropped by ACS source
+ * validation by the switch downstream port."
+ *
+ * The suggested workaround by IDT is to issue a configuration write to the
+ * downstream device before issuing the first config read. This allows the
+ * downstream device to capture its bus number, thus avoiding the ACS
+ * violation on the completion. In order to make sure that the device is ready
+ * for config accesses, we do what is currently done in making config reads
+ * till it succeeds and then do the config write as specified by the errata.
+ * However, to avoid hitting the errata issue when doing config reads, we
+ * disable ACS SV around this process.
+ */
+static int pci_idt_acs_quirk(struct pci_bus *bus, int devfn, int enable,
+				bool found)
+{
+	int pos;
+	u16 cap;
+	u16 ctrl;
+	int retval;
+	struct pci_dev *dev = bus->self;
+
+
+	/*
+	 * Write 0 to the devfn device under the PCIE switch (bus->self)
+	 * as part of forcing the devfn number to latch with the device
+	 * below
+	 */
+	if (found)
+		pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);
+
+
+	/* Enable/disable ACS SV feature (based on enable flag) */
+	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	if (!pos)
+		return -ENODEV;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
+
+	if (!(cap & PCI_ACS_SV))
+		return -ENODEV;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
+
+	retval = !!(ctrl & cap & PCI_ACS_SV);
+	if (enable)
+		ctrl |= (cap & PCI_ACS_SV);
+	else
+		ctrl &= ~(cap & PCI_ACS_SV);
+
+	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
+
+	/*
+	 * return the previous state of the ACS SV state i.e was SV enabled
+	 * or disabled?
+	 */
+	return retval;
+}
+
+static int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *l,
+				int timeout)
+{
+	int enable;
+	bool found;
+
+	/*
+	 * Disable acs for the IDT switch before attempting the initial
+	 * config accesses to the endpoint device.
+	 */
+	enable = pci_idt_acs_quirk(bus, devfn, 0, false);
+
+	/*
+	 * found indicates whether the endpoint device was identified
+	 * as present or not
+	 */
+
+	found = pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
+
+	/*
+	 * re-enable acs feature for the switch again if it was enabled to
+	 * start with
+	 */
+	if (enable > 0)
+		pci_idt_acs_quirk(bus, devfn, enable, found);
+
+	return found ? 1 : 0;
+}
+
+
 static const struct pci_bus_specific_quirk{
 	u16 vendor;
 	u16 device;
 	int (*bus_quirk)(struct pci_bus *bus, int devfn, u32 *l, int timeout);
 } pci_bus_specific_quirks[] = {
+	{ PCI_VENDOR_ID_IDT, 0x80b5, pci_idt_bus_quirk},
 	{0}
 };
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 0/2] PCI: Support to workaround bus level HW issues
  2018-04-30 17:25 [PATCH v2 0/2] PCI: Support to workaround bus level HW issues James Puthukattukaran
  2018-04-30 17:27 ` [PATCH v2 1/2] PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI switch specific issues prior to accessing newly added endpoint James Puthukattukaran
  2018-04-30 17:28 ` [PATCH v2 2/2] PCI: Implement workaround for the ACS bug in the IDT, switch James Puthukattukaran
@ 2018-06-29 21:47 ` Bjorn Helgaas
  2 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2018-06-29 21:47 UTC (permalink / raw)
  To: James Puthukattukaran
  Cc: Alex Williamson, Sinan Kaya, linux-pci@vger.kernel.org

On Mon, Apr 30, 2018 at 01:25:22PM -0400, James Puthukattukaran wrote:
> There are bugs in certain PCIe switches that cause access violations when an
> endpoint device is hotplugged. In particular, there's an issue with
> certain IDT switches that trigger a ACS violation when bringing up a newly
> plugged PCIe endpoint device. This is a major issue for platforms
> designed to issue a fatal reset in the case of this event.
> 
> The first patch provides a framework for intercepting and working around
> issues with parent devices to the endpoint being brought up.
> 
> The second patch provides the actual patch for the IDT switch issue using
> that framework. The ACS feature is disabled in the IDT switch prior to endpoint
> device detection and then re-enabled subsequent to that.
> 
> James
> 
> -v2: move workaround to pci_bus_read_dev_vendor_id() from pci_bus_check_dev()
>      and move enable_acs_sv to drivers/pci/pci.c -- by Yinghai
> -v3: add bus->self check for root bus and virtual bus for sriov vfs.
> -v4: only do workaround for IDT switches
> -v5: tweak pci_std_enable_acs_sv to deal with unimplemented SV
> -v6: Added errata verbiage verbatim and resolved patch format issues
> -v7: changed int to bool for found and idt_workaround declarations. Also
>      added bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=196979
> -v8: Rewrote the patch by adding a new acs quirk to keep the workaround
>      out of the main code path
> -v9: changed function name from pci_dev_specific_fixup_acs_quirk to
>      pci_bus_acs_quirk. Also, tested for FLR and hot reset scenarios and did
>      not see issues where workaround was required. The issue seems to be
>      related only to cold reset/power on situation.
> -v10: Moved the contents of pci_bus_read_vendor_id into an internal function
>       __pci_bus_read_vendor_id
> -v11: Split the patch into two patches. The first a general quirk framework.
> -v12: Add pci_bus_generic_read_dev_vendor_id() to carry out default behavior
>       when detecting endpoint and pci_bus_specific_read_dev_vendor_id for
>       bus quirk behavior
> -v13: Fixed build errors found for non-x86 platforms via cross compiles when
>       CONFIG_QUIRKS is not defined

This email is labeled v2, but you go up to v13 here.  Obviously this must
be at least v14.

If this post is actually v13, please label the next version v14.  If this
post is different from v13, that means *this* one is really v14 and the
next one should be v15.

> James Puthukattukaran (2):
>   PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI
>     switch     specific issues prior to accessing newly added endpoint
>   PCI: Implement workaround for the ACS bug in the IDT switch
> 
>  drivers/pci/pci.h    |  13 +++++
>  drivers/pci/probe.c  |  20 +++++++-
>  drivers/pci/quirks.c | 142 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 174 insertions(+), 1 deletion(-)
> 
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI switch specific issues prior to accessing newly added endpoint
  2018-04-30 17:27 ` [PATCH v2 1/2] PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI switch specific issues prior to accessing newly added endpoint James Puthukattukaran
@ 2018-06-29 22:00   ` Bjorn Helgaas
  0 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2018-06-29 22:00 UTC (permalink / raw)
  To: James Puthukattukaran
  Cc: Alex Williamson, Sinan Kaya, linux-pci@vger.kernel.org

On Mon, Apr 30, 2018 at 01:27:31PM -0400, James Puthukattukaran wrote:
> This patch provides a framework in which it would be possible to implement
> bus specific quirks prior to accessing an endpoint device beneath that bus.
> The routine, pci_bus_specific_read_dev_vendor_id, can be called prior to
> accessing the end point device itself in order to workaround potential issues
> with the parent device (switch). If there is nothing specific to be done for
> a particular switch device, it falls through to check for the endpoint device
> i.e pci_bus_generic_read_dev_vendor_id().
> 
> Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/pci.h    | 11 +++++++++++
>  drivers/pci/probe.c  | 20 +++++++++++++++++++-
>  drivers/pci/quirks.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 023f7cf..2132a60 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -225,6 +225,17 @@ enum pci_bar_type {
>  int pci_configure_extended_tags(struct pci_dev *dev, void *ign);
>  bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
>  				int crs_timeout);
> +#ifdef CONFIG_PCI_QUIRKS
> +int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
> +				u32 *pl, int crs_timeout);
> +#else
> +static int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
> +				u32 *pl, int crs_timeout)
> +{
> +	return -ENOTTY;
> +}
> +#endif
> +
>  int pci_setup_device(struct pci_dev *dev);
>  int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
>  		    struct resource *res, unsigned int reg);
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index ac91b6f..31eba02 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2097,7 +2097,7 @@ static bool pci_bus_wait_crs(struct pci_bus *bus, int devfn, u32 *l,
>  	return true;
>  }
>  
> -bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
> +bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
>  				int timeout)
>  {
>  	if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
> @@ -2113,6 +2113,24 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
>  
>  	return true;
>  }
> +
> +
> +bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
> +					int timeout)
> +{
> +	int ret;
> +
> +	/*
> + 	 * An opportunity to implement something specific for this device.
> +	 * For ex, implement a quirk prior to even accessing the device
> +	 */
> +	ret = pci_bus_specific_read_dev_vendor_id(bus, devfn, l, timeout);
> +	if (ret >= 0)
> +		return (ret >= 0);
> +
> +	return pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
> +}
> +
>  EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
>  
>  /*
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 2990ad1..2b28584 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4741,3 +4741,44 @@ static void quirk_gpu_hda(struct pci_dev *hda)
>  			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
>  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>  			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
> +
> +static const struct pci_bus_specific_quirk{
> +	u16 vendor;
> +	u16 device;
> +	int (*bus_quirk)(struct pci_bus *bus, int devfn, u32 *l, int timeout);
> +} pci_bus_specific_quirks[] = {
> +	{0}
> +};
> +
> +/*
> + * This routine provides the ability to implement a bus specific quirk
> + * prior to doing config accesses to the endpoint device itself. For ex, there
> + * could be HW problems with the switch above the endpoint that causes issues
> + * when accessing the endpoint device. Such workarounds "specific" to the
> + * parent could be implemented prior or subsequent to accesses to the
> + * endpoint itself
> + *
> + */
> +int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
> +					int timeout)
> +{
> +	const struct pci_bus_specific_quirk *i;
> +	struct pci_dev *dev;
> +
> +	if (!bus || !bus->self)
> +		return -ENOTTY;
> +
> +	dev = bus->self;
> +
> +	/*
> + 	 * Implement any quirks in the "bus" (switch, for ex) that causes
> +	 * issues in accessing the endpoint
> +	 */
> +	for (i = pci_bus_specific_quirks; i->bus_quirk; i++) {
> +		if ((i->vendor == dev->vendor ||
> +		     i->vendor == (u16)PCI_ANY_ID) &&
> +		    (i->device == dev->device || i->device == (u16)PCI_ANY_ID))
> +			return(i->bus_quirk(bus, devfn, l, timeout));
> +	}
> +	return -ENOTTY;
> +}

I think all this quirk infrastructure (the pci_bus_specific_quirks[]
table, the loop to iterate through it, etc) is excessive.  In 15 years
of PCIe, we only have a single known device that's broken this way.

Just check for that device, e.g.,

  bool pci_bus_read_dev_vendor_id(...)
  {
    struct pci_dev *bridge = bus->self;

    if (bridge &&
        bridge->vendor == PCI_VENDOR_ID_IDT &&
        bridge->device == 0x80b5)
          return pci_broken_idt_read_dev_vendor_id(bridge, ...);
    }

    ...
      

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 2/2] PCI: Implement workaround for the ACS bug in the IDT, switch
  2018-04-30 17:28 ` [PATCH v2 2/2] PCI: Implement workaround for the ACS bug in the IDT, switch James Puthukattukaran
@ 2018-06-29 23:12   ` Bjorn Helgaas
  0 siblings, 0 replies; 6+ messages in thread
From: Bjorn Helgaas @ 2018-06-29 23:12 UTC (permalink / raw)
  To: James Puthukattukaran
  Cc: Alex Williamson, Sinan Kaya, linux-pci@vger.kernel.org

On Mon, Apr 30, 2018 at 01:28:46PM -0400, James Puthukattukaran wrote:
> The IDT switch incorrectly flags an ACS source violation on a read config
> request to an end point device on the completion (IDT 89H32H8G3-YC,
> errata #36) even though the PCI Express spec states that completions are
> never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1). Here's

s/PCI Spec 3.1, Section 6.12.1.1/PCIe r4.0, sec 6.12.1.1/

> the specific copy of the errata text
> 
> "Item #36 - Downstream port applies ACS Source Validation to Completions
> Section 6.12.1.1 of the PCI Express Base Specification 3.1 states
> that completions are never affected
> by ACS Source Validation. However, completions received by a
> downstream port of the PCIe switch from a device that has not yet
> captured a PCIe bus number are incorrectly dropped by ACS source
> validation by the switch downstream port.
> 
> Workaround: Issue a CfgWr1 to the downstream device before issuing
> the first CfgRd1 to the device.
> This allows the downstream device to capture its bus number; ACS
> source validation no longer stops
> completions from being forwarded by the downstream port. It has been
> observed that Microsoft Windows implements this workaround already;
> however, some versions of Linux and other operating systems may not. "
>
> The suggested workaround by IDT is to issue a configuration write to the
> downstream device before issuing the first config read. This allows the
> downstream device to capture its bus number, thus avoiding the ACS
> violation on the completion. In order to make sure that the device is ready
> for config accesses, we do what is currently done in making config reads
> till it succeeds and then do the config write as specified by the errata.
> However, to avoid hitting the errata issue when doing config reads, we
> disable ACS SV around this process.
> 
> The patch does the following -
> 
> 1. Disable ACS source violation if enabled.
> 2. Wait for config space access to become available by reading vendor id
> 3. Do a config write to the end point (errata workaround)
> 4. Enable ACS source validation (if it was enabled to begin with)
> 
> Signed-off-by: James Puthukattukaran <james.puthukattukaran@oracle.com>
> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  drivers/pci/pci.h    |   2 +
>  drivers/pci/quirks.c | 101 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 103 insertions(+)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 2132a60..586874d 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -228,6 +228,8 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
>  #ifdef CONFIG_PCI_QUIRKS
>  int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
>  				u32 *pl, int crs_timeout);
> +bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn,
> +				u32 *pl, int crs_timeout);
>  #else
>  static int pci_bus_specific_read_dev_vendor_id(struct pci_bus *bus, int devfn,
>  				u32 *pl, int crs_timeout)
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 2b28584..2f5c024 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4742,11 +4742,112 @@ static void quirk_gpu_hda(struct pci_dev *hda)
>  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>  			      PCI_CLASS_MULTIMEDIA_HD_AUDIO, 8, quirk_gpu_hda);
>  
> +/*
> + * The IDT switch incorrectly flags an ACS source violation on a read config
> + * request to an end point device on the completion (IDT 89H32H8G3-YC,
> + * errata #36) even though the PCI Express spec states that completions are
> + * never affected by ACS source violation (PCI Spec 3.1, Section 6.12.1.1).
> + * Here's * the specific copy of the errata text --
> + *
> + * "Item #36 - Downstream port applies ACS Source Validation to Completions
> + * Section 6.12.1.1 of the PCI Express Base Specification 3.1 states
> + * that completions are never affected
> + * by ACS Source Validation. However, completions received by a
> + * downstream port of the PCIe switch from a device that has not yet
> + * captured a PCIe bus number are incorrectly dropped by ACS source
> + * validation by the switch downstream port."
> + *
> + * The suggested workaround by IDT is to issue a configuration write to the
> + * downstream device before issuing the first config read. This allows the
> + * downstream device to capture its bus number, thus avoiding the ACS
> + * violation on the completion. In order to make sure that the device is ready
> + * for config accesses, we do what is currently done in making config reads
> + * till it succeeds and then do the config write as specified by the errata.
> + * However, to avoid hitting the errata issue when doing config reads, we
> + * disable ACS SV around this process.
> + */
> +static int pci_idt_acs_quirk(struct pci_bus *bus, int devfn, int enable,
> +				bool found)
> +{
> +	int pos;
> +	u16 cap;
> +	u16 ctrl;
> +	int retval;
> +	struct pci_dev *dev = bus->self;
> +
> +
> +	/*
> +	 * Write 0 to the devfn device under the PCIE switch (bus->self)
> +	 * as part of forcing the devfn number to latch with the device
> +	 * below
> +	 */
> +	if (found)
> +		pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);
> +
> +
> +	/* Enable/disable ACS SV feature (based on enable flag) */
> +	pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> +	if (!pos)
> +		return -ENODEV;
> +
> +	pci_read_config_word(dev, pos + PCI_ACS_CAP, &cap);
> +
> +	if (!(cap & PCI_ACS_SV))
> +		return -ENODEV;
> +
> +	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &ctrl);
> +
> +	retval = !!(ctrl & cap & PCI_ACS_SV);
> +	if (enable)
> +		ctrl |= (cap & PCI_ACS_SV);
> +	else
> +		ctrl &= ~(cap & PCI_ACS_SV);
> +
> +	pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
> +
> +	/*
> +	 * return the previous state of the ACS SV state i.e was SV enabled
> +	 * or disabled?
> +	 */
> +	return retval;
> +}
> +
> +static int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *l,
> +				int timeout)
> +{
> +	int enable;
> +	bool found;
> +
> +	/*
> +	 * Disable acs for the IDT switch before attempting the initial
> +	 * config accesses to the endpoint device.
> +	 */
> +	enable = pci_idt_acs_quirk(bus, devfn, 0, false);
> +
> +	/*
> +	 * found indicates whether the endpoint device was identified
> +	 * as present or not
> +	 */
> +
> +	found = pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
> +
> +	/*
> +	 * re-enable acs feature for the switch again if it was enabled to
> +	 * start with
> +	 */
> +	if (enable > 0)
> +		pci_idt_acs_quirk(bus, devfn, enable, found);
> +
> +	return found ? 1 : 0;
> +}

The different meanings of the return values are confusing, it's ugly
to have to look up the ACS cap twice, and I don't think we need to
check ACS_SV in both the capability and control registers.  Can you do
something along these lines in a single function?

  u16 ctrl = 0;

  pos = pci_find_ext_capability(bridge, PCI_EXT_CAP_ID_ACS);
  if (pos) {
    pci_read_config_word(bridge, pos + PCI_ACS_CTRL, &ctrl);
    if (ctrl & PCI_ACS_SV)
      pci_write_config_word(bridge, pos + PCI_ACS_CTRL, ctrl & ~PCI_ACS_SV);
  } 

  found = pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
  if (found)
    pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);

  if (ctrl & PCI_ACS_SV)
    pci_write_config_word(bridge, pos + PCI_ACS_CTRL, ctrl);
    
  return found;

> +
> +
>  static const struct pci_bus_specific_quirk{
>  	u16 vendor;
>  	u16 device;
>  	int (*bus_quirk)(struct pci_bus *bus, int devfn, u32 *l, int timeout);
>  } pci_bus_specific_quirks[] = {
> +	{ PCI_VENDOR_ID_IDT, 0x80b5, pci_idt_bus_quirk},
>  	{0}
>  };
>  
> -- 
> 1.8.3.1
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-29 23:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-30 17:25 [PATCH v2 0/2] PCI: Support to workaround bus level HW issues James Puthukattukaran
2018-04-30 17:27 ` [PATCH v2 1/2] PCI: Add pci_bus_specific_read_dev_vendor_id() to workaround PCI switch specific issues prior to accessing newly added endpoint James Puthukattukaran
2018-06-29 22:00   ` Bjorn Helgaas
2018-04-30 17:28 ` [PATCH v2 2/2] PCI: Implement workaround for the ACS bug in the IDT, switch James Puthukattukaran
2018-06-29 23:12   ` Bjorn Helgaas
2018-06-29 21:47 ` [PATCH v2 0/2] PCI: Support to workaround bus level HW issues Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).