public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
@ 2026-01-02 15:34 Manivannan Sadhasivam via B4 Relay
  2026-01-02 15:34 ` [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for " Manivannan Sadhasivam via B4 Relay
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-01-02 15:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam, Manivannan Sadhasivam

Hi,

This series fixes the long standing issue with ACS in OF platforms. There are
two fixes in this series, both fixing independent issues on their own, but both
are needed to properly enable ACS on OF platforms.

Issue(s) background
===================

Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC
device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to
ACS not being enabled for the QEMU Root Port device and he proposed a patch to
fix it [2].

Once the patch got applied, people reported PCIe issues with linux-next on the
ARM Juno Development boards, where they saw failure in enumerating the endpoint
devices [3][4]. So soon, the patch got dropped, but the actual issue with the
ARM Juno boards was left behind.

Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase,
hoping that someone in the community would fix the issue with ARM Juno boards.
But the patch was rightly rejected, as a patch that was known to cause issues
should not be merged to the kernel. But again, no one investigated the Juno
issue and it was left behind again.

Now it ended up in my plate and I managed to track down the issue with the help
of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the
PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on
Completions received for the Configuration Read Request from a device connected
to the downstream port that has not yet captured the PCIe bus number. As per the
PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers
supplied with all Type 0 Configuration Write Requests completed by the Function
and supply these numbers in the Bus and Device Number fields of the Requester ID
for all Requests". So during the first Configuration Read Request issued by the
switch downstream port during enumeration (for reading Vendor ID), Bus and
Device numbers will be unknown to the device. So it responds to the Read Request
with Completion having Bus and Device number as 0. The switch interprets the
Completion as an ACS Source Validation error and drops the completion, leading
to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec
6.12.1.1, states that "Completions are never affected by ACS Source Validation".
This behavior is in violation of the spec.

Solution
========

In September, I submitted a series [6] to fix both issues. For the IDT issue,
I reused the existing quirk in the PCI core which does a dummy config write
before issuing the first config read to the device. And for the ACS enablement
issue, I just resubmitted the original patch from Xingang which called
pci_request_acs() from devm_of_pci_bridge_init().

But during the review of the series, several comments were received and they
required the series to be reworked completely. Hence, in this version, I've
incorported the comments as below:

1. For the ACS enablement issue, I've moved the pci_enable_acs() call from
pci_acs_init() to pci_dma_configure().

2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev',
and disabled the broken capability for the IDT switches in the cache. This also
allowed to get rid of the earlier workaround for the switch.

[1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
[2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
[3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
[4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
[5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
[6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com

Changes in v3:
- Dropped the 'acs_broken_cap' field and directly called the quirk from
  pci_acs_init()
- Collected tags. Since the delta between v2 and v3 is minimal, I've kept them.
- Rebased on top of v6.19-rc1 
- Link to v2: https://lore.kernel.org/r/20251202-pci_acs-v2-0-5d2759a71489@oss.qualcomm.com

Changes in v2:

* Reworked the patches completely as mentioned above.
* Rebased on top of v6.18-rc7

Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
Manivannan Sadhasivam (4):
      PCI: Enable ACS only after configuring IOMMU for OF platforms
      PCI: Cache ACS capabilities
      PCI: Disable ACS SV capability for the broken IDT switches
      PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch

 drivers/pci/pci-driver.c |  8 +++++++
 drivers/pci/pci.c        | 33 ++++++++++++--------------
 drivers/pci/pci.h        |  4 +++-
 drivers/pci/probe.c      | 12 ----------
 drivers/pci/quirks.c     | 62 ++++++++++++------------------------------------
 include/linux/pci.h      |  1 +
 6 files changed, 42 insertions(+), 78 deletions(-)
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20251201-pci_acs-b15aa3947289

Best regards,
-- 
Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms
  2026-01-02 15:34 [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam via B4 Relay
@ 2026-01-02 15:34 ` Manivannan Sadhasivam via B4 Relay
  2026-02-04 23:33   ` Bjorn Helgaas
                     ` (2 more replies)
  2026-01-02 15:34 ` [PATCH v3 2/4] PCI: Cache ACS capabilities Manivannan Sadhasivam via B4 Relay
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 22+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-01-02 15:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam, Manivannan Sadhasivam

From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

For enabling ACS without the cmdline params, the platform drivers are
expected to call pci_request_acs() API which sets a static flag,
'pci_acs_enable' in drivers/pci/pci.c. And this flag is used to enable ACS
in pci_enable_acs() helper, which gets called during pci_acs_init(), as per
this call stack:

-> pci_device_add()
	-> pci_init_capabilities()
		-> pci_acs_init()
			/* check for pci_acs_enable */
			-> pci_enable_acs()

For the OF platforms, pci_request_acs() is called during
of_iommu_configure() during device_add(), as per this call stack:

-> device_add()
	-> iommu_bus_notifier()
		-> iommu_probe_device()
			-> pci_dma_configure()
				-> of_dma_configure()
					-> of_iommu_configure()
						/* set pci_acs_enable */
						-> pci_request_acs()

As seen from both call stacks, pci_enable_acs() is called way before the
invocation of pci_request_acs() for the OF platforms. This means,
pci_enable_acs() will not enable ACS for the first device that gets
enumerated, which is usally the Root Port device. But since the static
flag, 'pci_acs_enable' is set *afterwards*, ACS will be enabled for the
ACS capable devices enumerated later.

To fix this issue, do not call pci_enable_acs() from pci_acs_init(), but
only from pci_dma_configure() after calling of_dma_configure(). This makes
sure that pci_enable_acs() only gets called after the IOMMU framework has
called pci_request_acs(). The ACS enablement flow now looks like:

-> pci_device_add()
	-> pci_init_capabilities()
		/* Just store the ACS cap */
		-> pci_acs_init()
	-> device_add()
		...
		-> pci_dma_configure()
			-> of_dma_configure()
				-> pci_request_acs()
			-> pci_enable_acs()

For the ACPI platforms, pci_request_acs() is called during ACPI
initialization time itself, independent of the IOMMU framework.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
 drivers/pci/pci-driver.c | 8 ++++++++
 drivers/pci/pci.c        | 8 --------
 drivers/pci/pci.h        | 1 +
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 7c2d9d596258..301a9418e38e 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1650,6 +1650,14 @@ static int pci_dma_configure(struct device *dev)
 		ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
 	}
 
+	/*
+	 * Attempt to enable ACS regardless of capability because some Root
+	 * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
+	 * the standard ACS capability but still support ACS via those
+	 * quirks.
+	 */
+	pci_enable_acs(to_pci_dev(dev));
+
 	pci_put_host_bridge_device(bridge);
 
 	/* @drv may not be valid when we're called from the IOMMU layer */
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 13dbb405dc31..2c3d0a2d6973 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3648,14 +3648,6 @@ bool pci_acs_path_enabled(struct pci_dev *start,
 void pci_acs_init(struct pci_dev *dev)
 {
 	dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
-
-	/*
-	 * Attempt to enable ACS regardless of capability because some Root
-	 * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
-	 * the standard ACS capability but still support ACS via those
-	 * quirks.
-	 */
-	pci_enable_acs(dev);
 }
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 0e67014aa001..4592ede0ebcc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -939,6 +939,7 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
 }
 
 void pci_acs_init(struct pci_dev *dev);
+void pci_enable_acs(struct pci_dev *dev);
 #ifdef CONFIG_PCI_QUIRKS
 int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
 int pci_dev_specific_enable_acs(struct pci_dev *dev);

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/4] PCI: Cache ACS capabilities
  2026-01-02 15:34 [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam via B4 Relay
  2026-01-02 15:34 ` [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for " Manivannan Sadhasivam via B4 Relay
@ 2026-01-02 15:34 ` Manivannan Sadhasivam via B4 Relay
  2026-01-02 15:34 ` [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches Manivannan Sadhasivam via B4 Relay
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-01-02 15:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam, Manivannan Sadhasivam

From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

ACS capabilities are the RO values set by the hardware. Cache them to avoid
reading it all the time when required and also to override any capability
in quirks.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
 drivers/pci/pci.c   | 26 +++++++++++++++-----------
 include/linux/pci.h |  1 +
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2c3d0a2d6973..d89b04451aea 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -892,7 +892,6 @@ static const char *disable_acs_redir_param;
 static const char *config_acs_param;
 
 struct pci_acs {
-	u16 cap;
 	u16 ctrl;
 	u16 fw_ctrl;
 };
@@ -995,27 +994,27 @@ static void __pci_config_acs(struct pci_dev *dev, struct pci_acs *caps,
 static void pci_std_enable_acs(struct pci_dev *dev, struct pci_acs *caps)
 {
 	/* Source Validation */
-	caps->ctrl |= (caps->cap & PCI_ACS_SV);
+	caps->ctrl |= (dev->acs_capabilities & PCI_ACS_SV);
 
 	/* P2P Request Redirect */
-	caps->ctrl |= (caps->cap & PCI_ACS_RR);
+	caps->ctrl |= (dev->acs_capabilities & PCI_ACS_RR);
 
 	/* P2P Completion Redirect */
-	caps->ctrl |= (caps->cap & PCI_ACS_CR);
+	caps->ctrl |= (dev->acs_capabilities & PCI_ACS_CR);
 
 	/* Upstream Forwarding */
-	caps->ctrl |= (caps->cap & PCI_ACS_UF);
+	caps->ctrl |= (dev->acs_capabilities & PCI_ACS_UF);
 
 	/* Enable Translation Blocking for external devices and noats */
 	if (pci_ats_disabled() || dev->external_facing || dev->untrusted)
-		caps->ctrl |= (caps->cap & PCI_ACS_TB);
+		caps->ctrl |= (dev->acs_capabilities & PCI_ACS_TB);
 }
 
 /**
  * pci_enable_acs - enable ACS if hardware support it
  * @dev: the PCI device
  */
-static void pci_enable_acs(struct pci_dev *dev)
+void pci_enable_acs(struct pci_dev *dev)
 {
 	struct pci_acs caps;
 	bool enable_acs = false;
@@ -1031,7 +1030,6 @@ static void pci_enable_acs(struct pci_dev *dev)
 	if (!pos)
 		return;
 
-	pci_read_config_word(dev, pos + PCI_ACS_CAP, &caps.cap);
 	pci_read_config_word(dev, pos + PCI_ACS_CTRL, &caps.ctrl);
 	caps.fw_ctrl = caps.ctrl;
 
@@ -3514,7 +3512,7 @@ void pci_configure_ari(struct pci_dev *dev)
 static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
 {
 	int pos;
-	u16 cap, ctrl;
+	u16 ctrl;
 
 	pos = pdev->acs_cap;
 	if (!pos)
@@ -3525,8 +3523,7 @@ static bool pci_acs_flags_enabled(struct pci_dev *pdev, u16 acs_flags)
 	 * or only required if controllable.  Features missing from the
 	 * capability field can therefore be assumed as hard-wired enabled.
 	 */
-	pci_read_config_word(pdev, pos + PCI_ACS_CAP, &cap);
-	acs_flags &= (cap | PCI_ACS_EC);
+	acs_flags &= (pdev->acs_capabilities | PCI_ACS_EC);
 
 	pci_read_config_word(pdev, pos + PCI_ACS_CTRL, &ctrl);
 	return (ctrl & acs_flags) == acs_flags;
@@ -3647,7 +3644,14 @@ bool pci_acs_path_enabled(struct pci_dev *start,
  */
 void pci_acs_init(struct pci_dev *dev)
 {
+	int pos;
+
 	dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+	pos = dev->acs_cap;
+	if (!pos)
+		return;
+
+	pci_read_config_word(dev, pos + PCI_ACS_CAP, &dev->acs_capabilities);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 864775651c6f..6195e040b29c 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -558,6 +558,7 @@ struct pci_dev {
 	struct pci_tsm *tsm;		/* TSM operation state */
 #endif
 	u16		acs_cap;	/* ACS Capability offset */
+	u16		acs_capabilities; /* ACS Capabilities */
 	u8		supported_speeds; /* Supported Link Speeds Vector */
 	phys_addr_t	rom;		/* Physical address if not from BAR */
 	size_t		romlen;		/* Length if not from BAR */

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-01-02 15:34 [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam via B4 Relay
  2026-01-02 15:34 ` [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for " Manivannan Sadhasivam via B4 Relay
  2026-01-02 15:34 ` [PATCH v3 2/4] PCI: Cache ACS capabilities Manivannan Sadhasivam via B4 Relay
@ 2026-01-02 15:34 ` Manivannan Sadhasivam via B4 Relay
  2026-02-05 23:39   ` Bjorn Helgaas
  2026-01-02 15:34 ` [PATCH v3 4/4] PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch Manivannan Sadhasivam via B4 Relay
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-01-02 15:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam, Manivannan Sadhasivam

From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

Some IDT switches behave erratically when ACS Source Validation is enabled.
For example, they incorrectly flag an ACS Source Validation error on
completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
says that completions are never affected by ACS Source Validation.

Even though IDT suggests working around this issue by issuing a config
write before the first config read, so that the device caches the bus and
device number. But it would still be fragile since the device could loose
the IDs after the reset and any further access may trigger ACS SV
violation.

Hence, to properly fix the issue, the respective capability needs to be
disabled. Since the ACS Capabilities are RO values, and are cached in the
'pci_dev::acs_capabilities' field, remove the cached broken capabilities by
calling pci_disable_broken_acs_cap() from pci_acs_init(). This will allow
pci_enable_acs() helper to disable the relevant ACS ctrls.

With this, the previous workaround can now be safely removed.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
 drivers/pci/pci.c    |  1 +
 drivers/pci/pci.h    |  3 ++-
 drivers/pci/probe.c  | 12 -----------
 drivers/pci/quirks.c | 61 ++++++++++++----------------------------------------
 4 files changed, 17 insertions(+), 60 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d89b04451aea..e16229e7ff6f 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3652,6 +3652,7 @@ void pci_acs_init(struct pci_dev *dev)
 		return;
 
 	pci_read_config_word(dev, pos + PCI_ACS_CAP, &dev->acs_capabilities);
+	pci_disable_broken_acs_cap(dev);
 }
 
 /**
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 4592ede0ebcc..5fe5d6e84c95 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -432,7 +432,6 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 				int rrs_timeout);
 bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
 					int rrs_timeout);
-int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int rrs_timeout);
 
 int pci_setup_device(struct pci_dev *dev);
 void __pci_size_stdbars(struct pci_dev *dev, int count,
@@ -944,6 +943,7 @@ void pci_enable_acs(struct pci_dev *dev);
 int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
 int pci_dev_specific_enable_acs(struct pci_dev *dev);
 int pci_dev_specific_disable_acs_redir(struct pci_dev *dev);
+void pci_disable_broken_acs_cap(struct pci_dev *pdev);
 int pcie_failed_link_retrain(struct pci_dev *dev);
 #else
 static inline int pci_dev_specific_acs_enabled(struct pci_dev *dev,
@@ -959,6 +959,7 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
 {
 	return -ENOTTY;
 }
+static inline void pci_disable_broken_acs_cap(struct pci_dev *dev) { }
 static inline int pcie_failed_link_retrain(struct pci_dev *dev)
 {
 	return -ENOTTY;
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 41183aed8f5d..c7304ac5afc2 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2547,18 +2547,6 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
 bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
 				int timeout)
 {
-#ifdef CONFIG_PCI_QUIRKS
-	struct pci_dev *bridge = bus->self;
-
-	/*
-	 * Certain IDT switches have an issue where they improperly trigger
-	 * ACS Source Validation errors on completions for config reads.
-	 */
-	if (bridge && bridge->vendor == PCI_VENDOR_ID_IDT &&
-	    bridge->device == 0x80b5)
-		return pci_idt_bus_quirk(bus, devfn, l, timeout);
-#endif
-
 	return pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
 }
 EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b9c252aa6fe0..1571a2ef331e 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5778,58 +5778,25 @@ DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
 			       PCI_BASE_CLASS_DISPLAY, 16, quirk_nvidia_hda);
 
 /*
- * Some IDT switches incorrectly flag an ACS Source Validation error on
- * completions for config read requests even though PCIe r4.0, sec
- * 6.12.1.1, says that completions are never affected by ACS Source
- * Validation.  Here's the text of IDT 89H32H8G3-YC, erratum #36:
+ * Some IDT switches behave erratically when ACS Source Validation is enabled.
+ * For example, they incorrectly flag an ACS Source Validation error on
+ * completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
+ * says that completions are never affected by ACS Source Validation.
  *
- *   Item #36 - Downstream port applies ACS Source Validation to Completions
- *   Section 6.12.1.1 of the PCI Express Base Specification 3.1 states that
- *   completions are never affected by ACS Source Validation.  However,
- *   completions received by a downstream port of the PCIe switch from a
- *   device that has not yet captured a PCIe bus number are incorrectly
- *   dropped by ACS Source Validation by the switch downstream port.
+ * Even though IDT suggests working around this issue by issuing a config write
+ * before the first config read, so that the switch caches the bus and device
+ * number, it would still be fragile since the device could loose the IDs after
+ * the reset.
  *
- * The workaround suggested by IDT is to issue a config write to the
- * downstream device before issuing the first config read.  This allows the
- * downstream device to capture its bus and device numbers (see PCIe r4.0,
- * sec 2.2.9), thus avoiding the ACS error on the completion.
- *
- * However, we don't know when the device is ready to accept the config
- * write, so we do config reads until we receive a non-Config Request Retry
- * Status, then do the config write.
- *
- * To avoid hitting the erratum when doing the config reads, we disable ACS
- * SV around this process.
+ * Hence, a reliable fix would be to assume that these switches don't support
+ * ACS SV.
  */
-int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *l, int timeout)
+void pci_disable_broken_acs_cap(struct pci_dev *pdev)
 {
-	int pos;
-	u16 ctrl = 0;
-	bool found;
-	struct pci_dev *bridge = bus->self;
-
-	pos = bridge->acs_cap;
-
-	/* Disable ACS SV before initial config reads */
-	if (pos) {
-		pci_read_config_word(bridge, pos + PCI_ACS_CTRL, &ctrl);
-		if (ctrl & PCI_ACS_SV)
-			pci_write_config_word(bridge, pos + PCI_ACS_CTRL,
-					      ctrl & ~PCI_ACS_SV);
+	if (pdev->vendor == PCI_VENDOR_ID_IDT && pdev->device == 0x80b5) {
+		pci_info(pdev, "Disabling broken ACS SV\n");
+		pdev->acs_capabilities &= ~PCI_ACS_SV;
 	}
-
-	found = pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
-
-	/* Write Vendor ID (read-only) so the endpoint latches its bus/dev */
-	if (found)
-		pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);
-
-	/* Re-enable ACS_SV if it was previously enabled */
-	if (ctrl & PCI_ACS_SV)
-		pci_write_config_word(bridge, pos + PCI_ACS_CTRL, ctrl);
-
-	return found;
 }
 
 /*

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/4] PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch
  2026-01-02 15:34 [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam via B4 Relay
                   ` (2 preceding siblings ...)
  2026-01-02 15:34 ` [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches Manivannan Sadhasivam via B4 Relay
@ 2026-01-02 15:34 ` Manivannan Sadhasivam via B4 Relay
  2026-01-29 17:42 ` [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam
  2026-02-06 18:32 ` Bjorn Helgaas
  5 siblings, 0 replies; 22+ messages in thread
From: Manivannan Sadhasivam via B4 Relay @ 2026-01-02 15:34 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam, Manivannan Sadhasivam

From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>

The IDT switch with Device ID 0x8090 used in the ARM Juno R2 development
board incorrectly raises an ACS Source Validation error on Completions for
Config Read Requests, even though PCIe r6.0, sec 6.12.1.1, says that
Completions are never affected by ACS Source Validation.

This is already handled by the pci_disable_broken_acs_cap() quirk for one
of the IDT switch 0x80b5. Hence, extend the quirk for this device too.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
---
 drivers/pci/quirks.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 1571a2ef331e..11ecb9ba1594 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5793,7 +5793,8 @@ DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
  */
 void pci_disable_broken_acs_cap(struct pci_dev *pdev)
 {
-	if (pdev->vendor == PCI_VENDOR_ID_IDT && pdev->device == 0x80b5) {
+	if (pdev->vendor == PCI_VENDOR_ID_IDT && (pdev->device == 0x80b5 ||
+	    pdev->device == 0x8090)) {
 		pci_info(pdev, "Disabling broken ACS SV\n");
 		pdev->acs_capabilities &= ~PCI_ACS_SV;
 	}

-- 
2.48.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
  2026-01-02 15:34 [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam via B4 Relay
                   ` (3 preceding siblings ...)
  2026-01-02 15:34 ` [PATCH v3 4/4] PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch Manivannan Sadhasivam via B4 Relay
@ 2026-01-29 17:42 ` Manivannan Sadhasivam
  2026-01-29 18:06   ` Jason Gunthorpe
  2026-02-06 18:32 ` Bjorn Helgaas
  5 siblings, 1 reply; 22+ messages in thread
From: Manivannan Sadhasivam @ 2026-01-29 17:42 UTC (permalink / raw)
  To: manivannan.sadhasivam, Jason Gunthorpe, Robin Murphy
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski

On Fri, Jan 02, 2026 at 09:04:46PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> Hi,
> 
> This series fixes the long standing issue with ACS in OF platforms. There are
> two fixes in this series, both fixing independent issues on their own, but both
> are needed to properly enable ACS on OF platforms.
> 
> Issue(s) background
> ===================
> 
> Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC
> device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to
> ACS not being enabled for the QEMU Root Port device and he proposed a patch to
> fix it [2].
> 
> Once the patch got applied, people reported PCIe issues with linux-next on the
> ARM Juno Development boards, where they saw failure in enumerating the endpoint
> devices [3][4]. So soon, the patch got dropped, but the actual issue with the
> ARM Juno boards was left behind.
> 
> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase,
> hoping that someone in the community would fix the issue with ARM Juno boards.
> But the patch was rightly rejected, as a patch that was known to cause issues
> should not be merged to the kernel. But again, no one investigated the Juno
> issue and it was left behind again.
> 
> Now it ended up in my plate and I managed to track down the issue with the help
> of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the
> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on
> Completions received for the Configuration Read Request from a device connected
> to the downstream port that has not yet captured the PCIe bus number. As per the
> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers
> supplied with all Type 0 Configuration Write Requests completed by the Function
> and supply these numbers in the Bus and Device Number fields of the Requester ID
> for all Requests". So during the first Configuration Read Request issued by the
> switch downstream port during enumeration (for reading Vendor ID), Bus and
> Device numbers will be unknown to the device. So it responds to the Read Request
> with Completion having Bus and Device number as 0. The switch interprets the
> Completion as an ACS Source Validation error and drops the completion, leading
> to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec
> 6.12.1.1, states that "Completions are never affected by ACS Source Validation".
> This behavior is in violation of the spec.
> 
> Solution
> ========
> 
> In September, I submitted a series [6] to fix both issues. For the IDT issue,
> I reused the existing quirk in the PCI core which does a dummy config write
> before issuing the first config read to the device. And for the ACS enablement
> issue, I just resubmitted the original patch from Xingang which called
> pci_request_acs() from devm_of_pci_bridge_init().
> 
> But during the review of the series, several comments were received and they
> required the series to be reworked completely. Hence, in this version, I've
> incorported the comments as below:
> 
> 1. For the ACS enablement issue, I've moved the pci_enable_acs() call from
> pci_acs_init() to pci_dma_configure().
> 
> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev',
> and disabled the broken capability for the IDT switches in the cache. This also
> allowed to get rid of the earlier workaround for the switch.
> 

Jason, Robin, any thoughts on this series?

- Mani

> [1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> [2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> [3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> [4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> [5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> [6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> 
> Changes in v3:
> - Dropped the 'acs_broken_cap' field and directly called the quirk from
>   pci_acs_init()
> - Collected tags. Since the delta between v2 and v3 is minimal, I've kept them.
> - Rebased on top of v6.19-rc1 
> - Link to v2: https://lore.kernel.org/r/20251202-pci_acs-v2-0-5d2759a71489@oss.qualcomm.com
> 
> Changes in v2:
> 
> * Reworked the patches completely as mentioned above.
> * Rebased on top of v6.18-rc7
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Manivannan Sadhasivam (4):
>       PCI: Enable ACS only after configuring IOMMU for OF platforms
>       PCI: Cache ACS capabilities
>       PCI: Disable ACS SV capability for the broken IDT switches
>       PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch
> 
>  drivers/pci/pci-driver.c |  8 +++++++
>  drivers/pci/pci.c        | 33 ++++++++++++--------------
>  drivers/pci/pci.h        |  4 +++-
>  drivers/pci/probe.c      | 12 ----------
>  drivers/pci/quirks.c     | 62 ++++++++++++------------------------------------
>  include/linux/pci.h      |  1 +
>  6 files changed, 42 insertions(+), 78 deletions(-)
> ---
> base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
> change-id: 20251201-pci_acs-b15aa3947289
> 
> Best regards,
> -- 
> Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> 
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
  2026-01-29 17:42 ` [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam
@ 2026-01-29 18:06   ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2026-01-29 18:06 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: manivannan.sadhasivam, Robin Murphy, Bjorn Helgaas, linux-pci,
	linux-kernel, iommu, Naresh Kamboju, Pavankumar Kondeti,
	Xingang Wang, Marek Szyprowski

On Thu, Jan 29, 2026 at 11:12:07PM +0530, Manivannan Sadhasivam wrote:
> > Manivannan Sadhasivam (4):
> >       PCI: Cache ACS capabilities
> >       PCI: Disable ACS SV capability for the broken IDT switches
> >       PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch

I'm OK with these patches

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms
  2026-01-02 15:34 ` [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for " Manivannan Sadhasivam via B4 Relay
@ 2026-02-04 23:33   ` Bjorn Helgaas
  2026-02-06  9:18     ` Manivannan Sadhasivam
  2026-04-03 16:45   ` Regression " Jon Kohler
  2026-04-03 16:47   ` Jon Kohler
  2 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2026-02-04 23:33 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam

On Fri, Jan 02, 2026 at 09:04:47PM +0530, Manivannan Sadhasivam wrote:
> For enabling ACS without the cmdline params, the platform drivers are
> expected to call pci_request_acs() API which sets a static flag,
> 'pci_acs_enable' in drivers/pci/pci.c. And this flag is used to enable ACS
> in pci_enable_acs() helper, which gets called during pci_acs_init(), as per
> this call stack:
> 
> -> pci_device_add()
> 	-> pci_init_capabilities()
> 		-> pci_acs_init()
> 			/* check for pci_acs_enable */
> 			-> pci_enable_acs()
> 
> For the OF platforms, pci_request_acs() is called during
> of_iommu_configure() during device_add(), as per this call stack:
> 
> -> device_add()
> 	-> iommu_bus_notifier()
> 		-> iommu_probe_device()
> 			-> pci_dma_configure()
> 				-> of_dma_configure()
> 					-> of_iommu_configure()
> 						/* set pci_acs_enable */
> 						-> pci_request_acs()
> 
> As seen from both call stacks, pci_enable_acs() is called way before the
> invocation of pci_request_acs() for the OF platforms. This means,
> pci_enable_acs() will not enable ACS for the first device that gets
> enumerated, which is usally the Root Port device. But since the static
> flag, 'pci_acs_enable' is set *afterwards*, ACS will be enabled for the
> ACS capable devices enumerated later.
> 
> To fix this issue, do not call pci_enable_acs() from pci_acs_init(), but
> only from pci_dma_configure() after calling of_dma_configure(). This makes
> sure that pci_enable_acs() only gets called after the IOMMU framework has
> called pci_request_acs(). The ACS enablement flow now looks like:
> 
> -> pci_device_add()
> 	-> pci_init_capabilities()
> 		/* Just store the ACS cap */
> 		-> pci_acs_init()
> 	-> device_add()
> 		...
> 		-> pci_dma_configure()
> 			-> of_dma_configure()
> 				-> pci_request_acs()
> 			-> pci_enable_acs()
> 
> For the ACPI platforms, pci_request_acs() is called during ACPI
> initialization time itself, independent of the IOMMU framework.

I don't think cmdline params are relevant, since I don't see any that
set pci_acs_enable or call pci_request_acs().

I propose a longer but combined call chain in the commit log to show
the old sequence vs the new one.  I do notice the double call of
pci_enable_acs() (along with dev->bus->dma_configure()), which all
looks odd, but maybe it will be rationalized someday:

    PCI: Enable ACS after configuring IOMMU for OF platforms
    
    Platform, ACPI, or IOMMU drivers call pci_request_acs(), which sets
    'pci_acs_enable' to request that ACS be enabled for any devices enumerated
    in the future.
    
    OF platforms called pci_enable_acs() for the first device before
    of_iommu_configure() called pci_request_acs(), so ACS was never enabled for
    that device (typically a Root Port).
    
    Call pci_enable_acs() later, from pci_dma_configure(), after
    of_dma_configure() has had a chance to call pci_request_acs().
    
    Here's the call path, showing the move of pci_enable_acs() from
    pci_acs_init() to pci_dma_configure(), where it always happens after
    pci_request_acs():
    
        pci_device_add
          pci_init_capabilities
            pci_acs_init
     -        pci_enable_acs
     -          if (pci_acs_enable)                <-- previous test
     -            ...
          device_add
            bus_notify(BUS_NOTIFY_ADD_DEVICE)
              iommu_bus_notifier
                iommu_probe_device
                  iommu_init_device
                    dev->bus->dma_configure
                      pci_dma_configure            # pci_bus_type.dma_configure
                        of_dma_configure
                          of_iommu_configure
                            pci_request_acs
                              pci_acs_enable = 1   <-- set
     +                  pci_enable_acs
     +                    if (pci_acs_enable)      <-- new test
     +                      ...
            bus_probe_device
              device_initial_probe
                ...
                  really_probe
                    dev->bus->dma_configure
                      pci_dma_configure            # pci_bus_type.dma_configure
                        ...
                          pci_enable_acs
    
    Note that we will now call pci_enable_acs() twice for every device, first
    from the iommu_probe_device() path and again from the really_probe() path.
    Presumably that's not an issue since we also call dev->bus->dma_configure()
    twice.
    
    For the ACPI platforms, pci_request_acs() is called during ACPI
    initialization time itself, independent of the IOMMU framework.

> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
>  drivers/pci/pci-driver.c | 8 ++++++++
>  drivers/pci/pci.c        | 8 --------
>  drivers/pci/pci.h        | 1 +
>  3 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 7c2d9d596258..301a9418e38e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1650,6 +1650,14 @@ static int pci_dma_configure(struct device *dev)
>  		ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
>  	}
>  
> +	/*
> +	 * Attempt to enable ACS regardless of capability because some Root
> +	 * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> +	 * the standard ACS capability but still support ACS via those
> +	 * quirks.
> +	 */
> +	pci_enable_acs(to_pci_dev(dev));
> +
>  	pci_put_host_bridge_device(bridge);
>  
>  	/* @drv may not be valid when we're called from the IOMMU layer */
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..2c3d0a2d6973 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3648,14 +3648,6 @@ bool pci_acs_path_enabled(struct pci_dev *start,
>  void pci_acs_init(struct pci_dev *dev)
>  {
>  	dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> -
> -	/*
> -	 * Attempt to enable ACS regardless of capability because some Root
> -	 * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> -	 * the standard ACS capability but still support ACS via those
> -	 * quirks.
> -	 */
> -	pci_enable_acs(dev);
>  }
>  
>  /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 0e67014aa001..4592ede0ebcc 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -939,6 +939,7 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
>  }
>  
>  void pci_acs_init(struct pci_dev *dev);
> +void pci_enable_acs(struct pci_dev *dev);
>  #ifdef CONFIG_PCI_QUIRKS
>  int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
>  int pci_dev_specific_enable_acs(struct pci_dev *dev);
> 
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-01-02 15:34 ` [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches Manivannan Sadhasivam via B4 Relay
@ 2026-02-05 23:39   ` Bjorn Helgaas
  2026-02-06  9:11     ` Manivannan Sadhasivam
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2026-02-05 23:39 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam, Alex Williamson,
	James Puthukattukaran

[+cc Alex, James (author of aa667c6408d2)]

On Fri, Jan 02, 2026 at 09:04:49PM +0530, Manivannan Sadhasivam wrote:
> Some IDT switches behave erratically when ACS Source Validation is enabled.
> For example, they incorrectly flag an ACS Source Validation error on
> completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
> says that completions are never affected by ACS Source Validation.
> 
> Even though IDT suggests working around this issue by issuing a config
> write before the first config read, so that the device caches the bus and
> device number. But it would still be fragile since the device could loose
> the IDs after the reset and any further access may trigger ACS SV
> violation.
> 
> Hence, to properly fix the issue, the respective capability needs to be
> disabled. Since the ACS Capabilities are RO values, and are cached in the
> 'pci_dev::acs_capabilities' field, remove the cached broken capabilities by
> calling pci_disable_broken_acs_cap() from pci_acs_init(). This will allow
> pci_enable_acs() helper to disable the relevant ACS ctrls.
> 
> With this, the previous workaround can now be safely removed.

James added the existing IDT workaround for the IDT 0x80b5 switch,
which was merged as aa667c6408d2 ("PCI: Workaround IDT switch ACS
Source Validation erratum").

I guess these last three patches:

  PCI: Cache ACS capabilities
  PCI: Disable ACS SV capability for the broken IDT switches
  PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch

are basically to fix the same issue on the IDT 0x8090 switch?

The obvious approach would be to extend the test in
pci_bus_read_dev_vendor_id() to check for 0x8090 in addition to
0x80b5, which might be worth doing first, before changing the whole
approach.

I guess your point here is that the existing workaround isn't enough
even for 0x80b5 because it doesn't handle the reset case?  The patch
changelogs after v9 of the original patch (see below) do mention that
FLR and hot reset were tested and didn't seem to require the
workaround.  But you've probably seen problems after reset?

I think it's worth a note about why the reset case can't be fixed
similarly to the enumeration case.  

This patch gives up on ACS SV completely for this switch instead of
the current approach of:

  - disabling ACS SV
  - doing a config write
  - reading dev/vendor ID
  - re-enabling ACS SV

It'd be worth expanding on this and what the effect of avoiding ACS SV
is.  Does this change which devices can be safely passed through to
virtual guests?  Does it give up isolation that users expect?

I also couldn't remember why we can't just do a config write before
reading the vendor/dev ID of a device below the switch, which is
addressed in the discussion of v3 (below).  Basically if the device
isn't yet ready to accept config accesses, it may not latch the bus
number, and we can't really tell when it *is* ready.

I collected up all the history I could find about the original change:

v3  2017-06-09 https://lore.kernel.org/all/20170609231617.20243-1-yinghai@kernel.org/
    doing config write is hard because we don't know when the device
    is Configuration-Ready and can capture the bus number:
    https://lore.kernel.org/all/20170613221451.GC7012@bhelgaas-glaptop.roam.corp.google.com/
v4  2017-06-27 https://lore.kernel.org/all/5952D144.8060609@oracle.com/
    only do workaround for IDT switches
v5  2017-07-06 https://lore.kernel.org/all/595E610A.5000900@oracle.com/
    skip devices that don't advertise ACS SV support
v7  2017-09-18 https://lore.kernel.org/all/59C02719.5050103@oracle.com/
    add https://bugzilla.kernel.org/show_bug.cgi?id=196979 with
    details about how Windows deals with this
v8  2018-03-06 https://lore.kernel.org/all/ce8e9a3e-474d-961c-eb0a-c021077f2dca@oracle.com/
v9  ? (couldn't find this)
    tested FLR and hot reset scenarios; didn't see issues where
    workaround was required
v11 2018-04-11 https://lore.kernel.org/all/e6a55432-8c56-9c99-ce99-4ae80a0ed3ba@oracle.com/ (no version label)
    2018-04-19 https://lore.kernel.org/all/b6439e86-2284-5b3c-3656-a0c3c2568317@oracle.com/ (no version label)
v12 2018-04-24 https://lore.kernel.org/all/7e9f5905-34c5-f24c-8b13-df36b8bf3621@oracle.com/
v13 2018-04-30 https://lore.kernel.org/all/4bd28931-8e8d-6bfe-a98c-49f8e2778c6e@oracle.com/ (labeled v2)
v14 2018-07-09 https://lore.kernel.org/all/4117c119-c754-bf4e-544c-19cb6e239f38@oracle.com/

> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
>  drivers/pci/pci.c    |  1 +
>  drivers/pci/pci.h    |  3 ++-
>  drivers/pci/probe.c  | 12 -----------
>  drivers/pci/quirks.c | 61 ++++++++++++----------------------------------------
>  4 files changed, 17 insertions(+), 60 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index d89b04451aea..e16229e7ff6f 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3652,6 +3652,7 @@ void pci_acs_init(struct pci_dev *dev)
>  		return;
>  
>  	pci_read_config_word(dev, pos + PCI_ACS_CAP, &dev->acs_capabilities);
> +	pci_disable_broken_acs_cap(dev);
>  }
>  
>  /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 4592ede0ebcc..5fe5d6e84c95 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -432,7 +432,6 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
>  				int rrs_timeout);
>  bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
>  					int rrs_timeout);
> -int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int rrs_timeout);
>  
>  int pci_setup_device(struct pci_dev *dev);
>  void __pci_size_stdbars(struct pci_dev *dev, int count,
> @@ -944,6 +943,7 @@ void pci_enable_acs(struct pci_dev *dev);
>  int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
>  int pci_dev_specific_enable_acs(struct pci_dev *dev);
>  int pci_dev_specific_disable_acs_redir(struct pci_dev *dev);
> +void pci_disable_broken_acs_cap(struct pci_dev *pdev);
>  int pcie_failed_link_retrain(struct pci_dev *dev);
>  #else
>  static inline int pci_dev_specific_acs_enabled(struct pci_dev *dev,
> @@ -959,6 +959,7 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
>  {
>  	return -ENOTTY;
>  }
> +static inline void pci_disable_broken_acs_cap(struct pci_dev *dev) { }
>  static inline int pcie_failed_link_retrain(struct pci_dev *dev)
>  {
>  	return -ENOTTY;
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 41183aed8f5d..c7304ac5afc2 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2547,18 +2547,6 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
>  bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
>  				int timeout)
>  {
> -#ifdef CONFIG_PCI_QUIRKS
> -	struct pci_dev *bridge = bus->self;
> -
> -	/*
> -	 * Certain IDT switches have an issue where they improperly trigger
> -	 * ACS Source Validation errors on completions for config reads.
> -	 */
> -	if (bridge && bridge->vendor == PCI_VENDOR_ID_IDT &&
> -	    bridge->device == 0x80b5)
> -		return pci_idt_bus_quirk(bus, devfn, l, timeout);
> -#endif
> -
>  	return pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
>  }
>  EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index b9c252aa6fe0..1571a2ef331e 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5778,58 +5778,25 @@ DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
>  			       PCI_BASE_CLASS_DISPLAY, 16, quirk_nvidia_hda);
>  
>  /*
> - * Some IDT switches incorrectly flag an ACS Source Validation error on
> - * completions for config read requests even though PCIe r4.0, sec
> - * 6.12.1.1, says that completions are never affected by ACS Source
> - * Validation.  Here's the text of IDT 89H32H8G3-YC, erratum #36:
> + * Some IDT switches behave erratically when ACS Source Validation is enabled.
> + * For example, they incorrectly flag an ACS Source Validation error on
> + * completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
> + * says that completions are never affected by ACS Source Validation.
>   *
> - *   Item #36 - Downstream port applies ACS Source Validation to Completions
> - *   Section 6.12.1.1 of the PCI Express Base Specification 3.1 states that
> - *   completions are never affected by ACS Source Validation.  However,
> - *   completions received by a downstream port of the PCIe switch from a
> - *   device that has not yet captured a PCIe bus number are incorrectly
> - *   dropped by ACS Source Validation by the switch downstream port.
> + * Even though IDT suggests working around this issue by issuing a config write
> + * before the first config read, so that the switch caches the bus and device
> + * number, it would still be fragile since the device could loose the IDs after
> + * the reset.
>   *
> - * The workaround suggested by IDT is to issue a config write to the
> - * downstream device before issuing the first config read.  This allows the
> - * downstream device to capture its bus and device numbers (see PCIe r4.0,
> - * sec 2.2.9), thus avoiding the ACS error on the completion.
> - *
> - * However, we don't know when the device is ready to accept the config
> - * write, so we do config reads until we receive a non-Config Request Retry
> - * Status, then do the config write.
> - *
> - * To avoid hitting the erratum when doing the config reads, we disable ACS
> - * SV around this process.
> + * Hence, a reliable fix would be to assume that these switches don't support
> + * ACS SV.
>   */
> -int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *l, int timeout)
> +void pci_disable_broken_acs_cap(struct pci_dev *pdev)
>  {
> -	int pos;
> -	u16 ctrl = 0;
> -	bool found;
> -	struct pci_dev *bridge = bus->self;
> -
> -	pos = bridge->acs_cap;
> -
> -	/* Disable ACS SV before initial config reads */
> -	if (pos) {
> -		pci_read_config_word(bridge, pos + PCI_ACS_CTRL, &ctrl);
> -		if (ctrl & PCI_ACS_SV)
> -			pci_write_config_word(bridge, pos + PCI_ACS_CTRL,
> -					      ctrl & ~PCI_ACS_SV);
> +	if (pdev->vendor == PCI_VENDOR_ID_IDT && pdev->device == 0x80b5) {
> +		pci_info(pdev, "Disabling broken ACS SV\n");
> +		pdev->acs_capabilities &= ~PCI_ACS_SV;
>  	}
> -
> -	found = pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
> -
> -	/* Write Vendor ID (read-only) so the endpoint latches its bus/dev */
> -	if (found)
> -		pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);
> -
> -	/* Re-enable ACS_SV if it was previously enabled */
> -	if (ctrl & PCI_ACS_SV)
> -		pci_write_config_word(bridge, pos + PCI_ACS_CTRL, ctrl);
> -
> -	return found;
>  }
>  
>  /*
> 
> -- 
> 2.48.1
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-02-05 23:39   ` Bjorn Helgaas
@ 2026-02-06  9:11     ` Manivannan Sadhasivam
  2026-02-06 14:30       ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-06  9:11 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Bjorn Helgaas, linux-pci, linux-kernel,
	iommu, Naresh Kamboju, Pavankumar Kondeti, Xingang Wang,
	Marek Szyprowski, Robin Murphy, Jason Gunthorpe, Alex Williamson,
	James Puthukattukaran

On Thu, Feb 05, 2026 at 05:39:20PM -0600, Bjorn Helgaas wrote:
> [+cc Alex, James (author of aa667c6408d2)]
> 
> On Fri, Jan 02, 2026 at 09:04:49PM +0530, Manivannan Sadhasivam wrote:
> > Some IDT switches behave erratically when ACS Source Validation is enabled.
> > For example, they incorrectly flag an ACS Source Validation error on
> > completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
> > says that completions are never affected by ACS Source Validation.
> > 
> > Even though IDT suggests working around this issue by issuing a config
> > write before the first config read, so that the device caches the bus and
> > device number. But it would still be fragile since the device could loose
> > the IDs after the reset and any further access may trigger ACS SV
> > violation.
> > 
> > Hence, to properly fix the issue, the respective capability needs to be
> > disabled. Since the ACS Capabilities are RO values, and are cached in the
> > 'pci_dev::acs_capabilities' field, remove the cached broken capabilities by
> > calling pci_disable_broken_acs_cap() from pci_acs_init(). This will allow
> > pci_enable_acs() helper to disable the relevant ACS ctrls.
> > 
> > With this, the previous workaround can now be safely removed.
> 
> James added the existing IDT workaround for the IDT 0x80b5 switch,
> which was merged as aa667c6408d2 ("PCI: Workaround IDT switch ACS
> Source Validation erratum").
> 
> I guess these last three patches:
> 
>   PCI: Cache ACS capabilities
>   PCI: Disable ACS SV capability for the broken IDT switches
>   PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch
> 
> are basically to fix the same issue on the IDT 0x8090 switch?
> 

Yes!

> The obvious approach would be to extend the test in
> pci_bus_read_dev_vendor_id() to check for 0x8090 in addition to
> 0x80b5, which might be worth doing first, before changing the whole
> approach.
> 

I've mentioned the rationale for not using this approach in the cover letter.
But I'll mention it here also. I did try extending the existing quirk in the
first version of this series [1], but Jason proposed to disable ACS completely
for these switches as doing a dummy config write before reading the vendor ID
would not help if the switch gets reset during runtime. The cached bus number in
the device would get lost, triggering ACS SV again.

> I guess your point here is that the existing workaround isn't enough
> even for 0x80b5 because it doesn't handle the reset case?  The patch
> changelogs after v9 of the original patch (see below) do mention that
> FLR and hot reset were tested and didn't seem to require the
> workaround.  But you've probably seen problems after reset?
> 

I haven't seen the problem myself, but Jason's point was that there is no
spec mandate for the switch to retain its bus number after FLR/hot reset [2]

> I think it's worth a note about why the reset case can't be fixed
> similarly to the enumeration case.  
> 

If the device looses its cached bus number, then further config reads from the
host would trigger ACS SV. And we do not want to do a dummy write from all
possible locations in PCI core.

> This patch gives up on ACS SV completely for this switch instead of
> the current approach of:
> 
>   - disabling ACS SV
>   - doing a config write
>   - reading dev/vendor ID
>   - re-enabling ACS SV
> 
> It'd be worth expanding on this and what the effect of avoiding ACS SV
> is.  Does this change which devices can be safely passed through to
> virtual guests?  Does it give up isolation that users expect?
> 

IMO, ACS SV is somewhat broken on this switch. But we can still passthrough the
downstream devices to the guests. There won't be ACS SV apparently, but that's
what users will get with broken hw.

> I also couldn't remember why we can't just do a config write before
> reading the vendor/dev ID of a device below the switch, which is
> addressed in the discussion of v3 (below).  Basically if the device
> isn't yet ready to accept config accesses, it may not latch the bus
> number, and we can't really tell when it *is* ready.
> 

I think that is a separate discussion (detecting device readiness). But doing a
dummy config write to woraround the hardware bug looks fragile to me too.

- Mani

[1] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com/
[2] https://lore.kernel.org/linux-pci/20250924125555.GJ2547959@ziepe.ca/

> I collected up all the history I could find about the original change:
> 
> v3  2017-06-09 https://lore.kernel.org/all/20170609231617.20243-1-yinghai@kernel.org/
>     doing config write is hard because we don't know when the device
>     is Configuration-Ready and can capture the bus number:
>     https://lore.kernel.org/all/20170613221451.GC7012@bhelgaas-glaptop.roam.corp.google.com/
> v4  2017-06-27 https://lore.kernel.org/all/5952D144.8060609@oracle.com/
>     only do workaround for IDT switches
> v5  2017-07-06 https://lore.kernel.org/all/595E610A.5000900@oracle.com/
>     skip devices that don't advertise ACS SV support
> v7  2017-09-18 https://lore.kernel.org/all/59C02719.5050103@oracle.com/
>     add https://bugzilla.kernel.org/show_bug.cgi?id=196979 with
>     details about how Windows deals with this
> v8  2018-03-06 https://lore.kernel.org/all/ce8e9a3e-474d-961c-eb0a-c021077f2dca@oracle.com/
> v9  ? (couldn't find this)
>     tested FLR and hot reset scenarios; didn't see issues where
>     workaround was required
> v11 2018-04-11 https://lore.kernel.org/all/e6a55432-8c56-9c99-ce99-4ae80a0ed3ba@oracle.com/ (no version label)
>     2018-04-19 https://lore.kernel.org/all/b6439e86-2284-5b3c-3656-a0c3c2568317@oracle.com/ (no version label)
> v12 2018-04-24 https://lore.kernel.org/all/7e9f5905-34c5-f24c-8b13-df36b8bf3621@oracle.com/
> v13 2018-04-30 https://lore.kernel.org/all/4bd28931-8e8d-6bfe-a98c-49f8e2778c6e@oracle.com/ (labeled v2)
> v14 2018-07-09 https://lore.kernel.org/all/4117c119-c754-bf4e-544c-19cb6e239f38@oracle.com/
> 
> > Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> > Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> > ---
> >  drivers/pci/pci.c    |  1 +
> >  drivers/pci/pci.h    |  3 ++-
> >  drivers/pci/probe.c  | 12 -----------
> >  drivers/pci/quirks.c | 61 ++++++++++++----------------------------------------
> >  4 files changed, 17 insertions(+), 60 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index d89b04451aea..e16229e7ff6f 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3652,6 +3652,7 @@ void pci_acs_init(struct pci_dev *dev)
> >  		return;
> >  
> >  	pci_read_config_word(dev, pos + PCI_ACS_CAP, &dev->acs_capabilities);
> > +	pci_disable_broken_acs_cap(dev);
> >  }
> >  
> >  /**
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 4592ede0ebcc..5fe5d6e84c95 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -432,7 +432,6 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
> >  				int rrs_timeout);
> >  bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *pl,
> >  					int rrs_timeout);
> > -int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *pl, int rrs_timeout);
> >  
> >  int pci_setup_device(struct pci_dev *dev);
> >  void __pci_size_stdbars(struct pci_dev *dev, int count,
> > @@ -944,6 +943,7 @@ void pci_enable_acs(struct pci_dev *dev);
> >  int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
> >  int pci_dev_specific_enable_acs(struct pci_dev *dev);
> >  int pci_dev_specific_disable_acs_redir(struct pci_dev *dev);
> > +void pci_disable_broken_acs_cap(struct pci_dev *pdev);
> >  int pcie_failed_link_retrain(struct pci_dev *dev);
> >  #else
> >  static inline int pci_dev_specific_acs_enabled(struct pci_dev *dev,
> > @@ -959,6 +959,7 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
> >  {
> >  	return -ENOTTY;
> >  }
> > +static inline void pci_disable_broken_acs_cap(struct pci_dev *dev) { }
> >  static inline int pcie_failed_link_retrain(struct pci_dev *dev)
> >  {
> >  	return -ENOTTY;
> > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> > index 41183aed8f5d..c7304ac5afc2 100644
> > --- a/drivers/pci/probe.c
> > +++ b/drivers/pci/probe.c
> > @@ -2547,18 +2547,6 @@ bool pci_bus_generic_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
> >  bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
> >  				int timeout)
> >  {
> > -#ifdef CONFIG_PCI_QUIRKS
> > -	struct pci_dev *bridge = bus->self;
> > -
> > -	/*
> > -	 * Certain IDT switches have an issue where they improperly trigger
> > -	 * ACS Source Validation errors on completions for config reads.
> > -	 */
> > -	if (bridge && bridge->vendor == PCI_VENDOR_ID_IDT &&
> > -	    bridge->device == 0x80b5)
> > -		return pci_idt_bus_quirk(bus, devfn, l, timeout);
> > -#endif
> > -
> >  	return pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
> >  }
> >  EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index b9c252aa6fe0..1571a2ef331e 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5778,58 +5778,25 @@ DECLARE_PCI_FIXUP_CLASS_RESUME_EARLY(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
> >  			       PCI_BASE_CLASS_DISPLAY, 16, quirk_nvidia_hda);
> >  
> >  /*
> > - * Some IDT switches incorrectly flag an ACS Source Validation error on
> > - * completions for config read requests even though PCIe r4.0, sec
> > - * 6.12.1.1, says that completions are never affected by ACS Source
> > - * Validation.  Here's the text of IDT 89H32H8G3-YC, erratum #36:
> > + * Some IDT switches behave erratically when ACS Source Validation is enabled.
> > + * For example, they incorrectly flag an ACS Source Validation error on
> > + * completions for config read requests even though PCIe r4.0, sec 6.12.1.1,
> > + * says that completions are never affected by ACS Source Validation.
> >   *
> > - *   Item #36 - Downstream port applies ACS Source Validation to Completions
> > - *   Section 6.12.1.1 of the PCI Express Base Specification 3.1 states that
> > - *   completions are never affected by ACS Source Validation.  However,
> > - *   completions received by a downstream port of the PCIe switch from a
> > - *   device that has not yet captured a PCIe bus number are incorrectly
> > - *   dropped by ACS Source Validation by the switch downstream port.
> > + * Even though IDT suggests working around this issue by issuing a config write
> > + * before the first config read, so that the switch caches the bus and device
> > + * number, it would still be fragile since the device could loose the IDs after
> > + * the reset.
> >   *
> > - * The workaround suggested by IDT is to issue a config write to the
> > - * downstream device before issuing the first config read.  This allows the
> > - * downstream device to capture its bus and device numbers (see PCIe r4.0,
> > - * sec 2.2.9), thus avoiding the ACS error on the completion.
> > - *
> > - * However, we don't know when the device is ready to accept the config
> > - * write, so we do config reads until we receive a non-Config Request Retry
> > - * Status, then do the config write.
> > - *
> > - * To avoid hitting the erratum when doing the config reads, we disable ACS
> > - * SV around this process.
> > + * Hence, a reliable fix would be to assume that these switches don't support
> > + * ACS SV.
> >   */
> > -int pci_idt_bus_quirk(struct pci_bus *bus, int devfn, u32 *l, int timeout)
> > +void pci_disable_broken_acs_cap(struct pci_dev *pdev)
> >  {
> > -	int pos;
> > -	u16 ctrl = 0;
> > -	bool found;
> > -	struct pci_dev *bridge = bus->self;
> > -
> > -	pos = bridge->acs_cap;
> > -
> > -	/* Disable ACS SV before initial config reads */
> > -	if (pos) {
> > -		pci_read_config_word(bridge, pos + PCI_ACS_CTRL, &ctrl);
> > -		if (ctrl & PCI_ACS_SV)
> > -			pci_write_config_word(bridge, pos + PCI_ACS_CTRL,
> > -					      ctrl & ~PCI_ACS_SV);
> > +	if (pdev->vendor == PCI_VENDOR_ID_IDT && pdev->device == 0x80b5) {
> > +		pci_info(pdev, "Disabling broken ACS SV\n");
> > +		pdev->acs_capabilities &= ~PCI_ACS_SV;
> >  	}
> > -
> > -	found = pci_bus_generic_read_dev_vendor_id(bus, devfn, l, timeout);
> > -
> > -	/* Write Vendor ID (read-only) so the endpoint latches its bus/dev */
> > -	if (found)
> > -		pci_bus_write_config_word(bus, devfn, PCI_VENDOR_ID, 0);
> > -
> > -	/* Re-enable ACS_SV if it was previously enabled */
> > -	if (ctrl & PCI_ACS_SV)
> > -		pci_write_config_word(bridge, pos + PCI_ACS_CTRL, ctrl);
> > -
> > -	return found;
> >  }
> >  
> >  /*
> > 
> > -- 
> > 2.48.1
> > 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms
  2026-02-04 23:33   ` Bjorn Helgaas
@ 2026-02-06  9:18     ` Manivannan Sadhasivam
  2026-02-06 15:37       ` Robin Murphy
  0 siblings, 1 reply; 22+ messages in thread
From: Manivannan Sadhasivam @ 2026-02-06  9:18 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Bjorn Helgaas, linux-pci, linux-kernel,
	iommu, Naresh Kamboju, Pavankumar Kondeti, Xingang Wang,
	Marek Szyprowski, Robin Murphy, Jason Gunthorpe

On Wed, Feb 04, 2026 at 05:33:30PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 02, 2026 at 09:04:47PM +0530, Manivannan Sadhasivam wrote:
> > For enabling ACS without the cmdline params, the platform drivers are
> > expected to call pci_request_acs() API which sets a static flag,
> > 'pci_acs_enable' in drivers/pci/pci.c. And this flag is used to enable ACS
> > in pci_enable_acs() helper, which gets called during pci_acs_init(), as per
> > this call stack:
> > 
> > -> pci_device_add()
> > 	-> pci_init_capabilities()
> > 		-> pci_acs_init()
> > 			/* check for pci_acs_enable */
> > 			-> pci_enable_acs()
> > 
> > For the OF platforms, pci_request_acs() is called during
> > of_iommu_configure() during device_add(), as per this call stack:
> > 
> > -> device_add()
> > 	-> iommu_bus_notifier()
> > 		-> iommu_probe_device()
> > 			-> pci_dma_configure()
> > 				-> of_dma_configure()
> > 					-> of_iommu_configure()
> > 						/* set pci_acs_enable */
> > 						-> pci_request_acs()
> > 
> > As seen from both call stacks, pci_enable_acs() is called way before the
> > invocation of pci_request_acs() for the OF platforms. This means,
> > pci_enable_acs() will not enable ACS for the first device that gets
> > enumerated, which is usally the Root Port device. But since the static
> > flag, 'pci_acs_enable' is set *afterwards*, ACS will be enabled for the
> > ACS capable devices enumerated later.
> > 
> > To fix this issue, do not call pci_enable_acs() from pci_acs_init(), but
> > only from pci_dma_configure() after calling of_dma_configure(). This makes
> > sure that pci_enable_acs() only gets called after the IOMMU framework has
> > called pci_request_acs(). The ACS enablement flow now looks like:
> > 
> > -> pci_device_add()
> > 	-> pci_init_capabilities()
> > 		/* Just store the ACS cap */
> > 		-> pci_acs_init()
> > 	-> device_add()
> > 		...
> > 		-> pci_dma_configure()
> > 			-> of_dma_configure()
> > 				-> pci_request_acs()
> > 			-> pci_enable_acs()
> > 
> > For the ACPI platforms, pci_request_acs() is called during ACPI
> > initialization time itself, independent of the IOMMU framework.
> 
> I don't think cmdline params are relevant, since I don't see any that
> set pci_acs_enable or call pci_request_acs().
> 
> I propose a longer but combined call chain in the commit log to show
> the old sequence vs the new one.  I do notice the double call of
> pci_enable_acs() (along with dev->bus->dma_configure()), which all
> looks odd, but maybe it will be rationalized someday:
> 

Yeah, there is no issue with calling pci_enable_acs() twice as of now, but I
don't have a better solution either.

>     PCI: Enable ACS after configuring IOMMU for OF platforms
>     
>     Platform, ACPI, or IOMMU drivers call pci_request_acs(), which sets
>     'pci_acs_enable' to request that ACS be enabled for any devices enumerated
>     in the future.
>     
>     OF platforms called pci_enable_acs() for the first device before
>     of_iommu_configure() called pci_request_acs(), so ACS was never enabled for
>     that device (typically a Root Port).
>     
>     Call pci_enable_acs() later, from pci_dma_configure(), after
>     of_dma_configure() has had a chance to call pci_request_acs().
>     
>     Here's the call path, showing the move of pci_enable_acs() from
>     pci_acs_init() to pci_dma_configure(), where it always happens after
>     pci_request_acs():
>     
>         pci_device_add
>           pci_init_capabilities
>             pci_acs_init
>      -        pci_enable_acs
>      -          if (pci_acs_enable)                <-- previous test
>      -            ...
>           device_add
>             bus_notify(BUS_NOTIFY_ADD_DEVICE)
>               iommu_bus_notifier
>                 iommu_probe_device
>                   iommu_init_device
>                     dev->bus->dma_configure
>                       pci_dma_configure            # pci_bus_type.dma_configure
>                         of_dma_configure
>                           of_iommu_configure
>                             pci_request_acs
>                               pci_acs_enable = 1   <-- set
>      +                  pci_enable_acs
>      +                    if (pci_acs_enable)      <-- new test
>      +                      ...
>             bus_probe_device
>               device_initial_probe
>                 ...
>                   really_probe
>                     dev->bus->dma_configure
>                       pci_dma_configure            # pci_bus_type.dma_configure
>                         ...
>                           pci_enable_acs
>     
>     Note that we will now call pci_enable_acs() twice for every device, first
>     from the iommu_probe_device() path and again from the really_probe() path.
>     Presumably that's not an issue since we also call dev->bus->dma_configure()
>     twice.
>     
>     For the ACPI platforms, pci_request_acs() is called during ACPI
>     initialization time itself, independent of the IOMMU framework.
> 

Thanks! Let me know if I need to respin or you want to ammend it while applying.
(It is not clear to me what is your plan here).

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-02-06  9:11     ` Manivannan Sadhasivam
@ 2026-02-06 14:30       ` Jason Gunthorpe
  2026-02-06 14:43         ` Robin Murphy
  2026-02-06 14:46         ` Bjorn Helgaas
  0 siblings, 2 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2026-02-06 14:30 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, Manivannan Sadhasivam, Bjorn Helgaas, linux-pci,
	linux-kernel, iommu, Naresh Kamboju, Pavankumar Kondeti,
	Xingang Wang, Marek Szyprowski, Robin Murphy, Alex Williamson,
	James Puthukattukaran

On Fri, Feb 06, 2026 at 02:41:36PM +0530, Manivannan Sadhasivam wrote:
> > It'd be worth expanding on this and what the effect of avoiding ACS SV
> > is.  Does this change which devices can be safely passed through to
> > virtual guests?  Does it give up isolation that users expect?
> > 
> 
> IMO, ACS SV is somewhat broken on this switch. But we can still passthrough the
> downstream devices to the guests. There won't be ACS SV apparently, but that's
> what users will get with broken hw.

I agree with this, the HW is very broken, let's have it at least work
properly in Linux on bare metal out of the box.

If someone really insists they need virtualization with narrow groups
on this HW then they need to come with a more complete fix. Using VFIO
is going to open up the reset flows that are problematic with the
current solution, so it isn't like that is already working fully.

Somehow I suspect nobody would use this switch for virtualization :)

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-02-06 14:30       ` Jason Gunthorpe
@ 2026-02-06 14:43         ` Robin Murphy
  2026-02-06 14:46         ` Bjorn Helgaas
  1 sibling, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2026-02-06 14:43 UTC (permalink / raw)
  To: Jason Gunthorpe, Manivannan Sadhasivam
  Cc: Bjorn Helgaas, Manivannan Sadhasivam, Bjorn Helgaas, linux-pci,
	linux-kernel, iommu, Naresh Kamboju, Pavankumar Kondeti,
	Xingang Wang, Marek Szyprowski, Alex Williamson,
	James Puthukattukaran

On 2026-02-06 2:30 pm, Jason Gunthorpe wrote:
> On Fri, Feb 06, 2026 at 02:41:36PM +0530, Manivannan Sadhasivam wrote:
>>> It'd be worth expanding on this and what the effect of avoiding ACS SV
>>> is.  Does this change which devices can be safely passed through to
>>> virtual guests?  Does it give up isolation that users expect?
>>>
>>
>> IMO, ACS SV is somewhat broken on this switch. But we can still passthrough the
>> downstream devices to the guests. There won't be ACS SV apparently, but that's
>> what users will get with broken hw.
> 
> I agree with this, the HW is very broken, let's have it at least work
> properly in Linux on bare metal out of the box.
> 
> If someone really insists they need virtualization with narrow groups
> on this HW then they need to come with a more complete fix. Using VFIO
> is going to open up the reset flows that are problematic with the
> current solution, so it isn't like that is already working fully.
> 
> Somehow I suspect nobody would use this switch for virtualization :)

And it's fine on Juno, since due to the (lack of) SMMU StreamIDs you can 
only assign the entire root complex as a single iommu_group anyway :D

Cheers,
Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-02-06 14:30       ` Jason Gunthorpe
  2026-02-06 14:43         ` Robin Murphy
@ 2026-02-06 14:46         ` Bjorn Helgaas
  2026-02-06 14:52           ` Jason Gunthorpe
  1 sibling, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2026-02-06 14:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Manivannan Sadhasivam, Manivannan Sadhasivam, Bjorn Helgaas,
	linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Alex Williamson, James Puthukattukaran

On Fri, Feb 06, 2026 at 10:30:14AM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 06, 2026 at 02:41:36PM +0530, Manivannan Sadhasivam wrote:
> > > It'd be worth expanding on this and what the effect of avoiding
> > > ACS SV is.  Does this change which devices can be safely passed
> > > through to virtual guests?  Does it give up isolation that users
> > > expect?
> > 
> > IMO, ACS SV is somewhat broken on this switch. But we can still
> > passthrough the downstream devices to the guests. There won't be
> > ACS SV apparently, but that's what users will get with broken hw.
> 
> I agree with this, the HW is very broken, let's have it at least
> work properly in Linux on bare metal out of the box.

I'm assuming the bare metal part could be done by something like this:

  @@ -2555,7 +2555,7 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
	   * ACS Source Validation errors on completions for config reads.
	   */
	  if (bridge && bridge->vendor == PCI_VENDOR_ID_IDT &&
  -	    bridge->device == 0x80b5)
  +	    (bridge->device == 0x80b5 || bridge->device == 0x8090)
		  return pci_idt_bus_quirk(bus, devfn, l, timeout);

> If someone really insists they need virtualization with narrow
> groups on this HW then they need to come with a more complete fix.
> Using VFIO is going to open up the reset flows that are problematic
> with the current solution, so it isn't like that is already working
> fully.

IIUC the current situation is that for these IDT switches, ACS SV is
enabled when downstream devices are passed through to guests, but
after these patches, it will no longer be enabled.

So my question is whether users are giving up some isolation.  If so,
should we even allow devices to be passed through to guests?  If we do
allow that, do users have any indication that they're not getting what
they expect?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-02-06 14:46         ` Bjorn Helgaas
@ 2026-02-06 14:52           ` Jason Gunthorpe
  2026-02-06 15:05             ` Bjorn Helgaas
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2026-02-06 14:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Manivannan Sadhasivam, Bjorn Helgaas,
	linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Alex Williamson, James Puthukattukaran

On Fri, Feb 06, 2026 at 08:46:51AM -0600, Bjorn Helgaas wrote:

> IIUC the current situation is that for these IDT switches, ACS SV is
> enabled when downstream devices are passed through to guests, but
> after these patches, it will no longer be enabled.

ACS SV is enabled at boot time if an IOMMU driver is present
regardless if guests or virtualization is in use.

Linux doesn't change ACS flags dynamically.

> So my question is whether users are giving up some isolation.  If so,
> should we even allow devices to be passed through to guests?  If we do
> allow that, do users have any indication that they're not getting what
> they expect?

iommu_groups will correctly describe the system limitations with the
ACS quirk path and so all of the above concerns are taken care
of. Robin is saying the Juno SMMU forces a large iommu_group covering
the switch anyhow today, so at least that platform is not affected.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-02-06 14:52           ` Jason Gunthorpe
@ 2026-02-06 15:05             ` Bjorn Helgaas
  2026-02-06 15:07               ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Bjorn Helgaas @ 2026-02-06 15:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Manivannan Sadhasivam, Manivannan Sadhasivam, Bjorn Helgaas,
	linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Alex Williamson, James Puthukattukaran

On Fri, Feb 06, 2026 at 10:52:54AM -0400, Jason Gunthorpe wrote:
> On Fri, Feb 06, 2026 at 08:46:51AM -0600, Bjorn Helgaas wrote:
> 
> > IIUC the current situation is that for these IDT switches, ACS SV is
> > enabled when downstream devices are passed through to guests, but
> > after these patches, it will no longer be enabled.
> 
> ACS SV is enabled at boot time if an IOMMU driver is present
> regardless if guests or virtualization is in use.
> 
> Linux doesn't change ACS flags dynamically.

Right, it's just that this series effectively un-advertises ACS SV for
the IDE switches so it will never be enabled for them, whereas today,
I think we *do* enable ACS SV for them (but temporarily disable it
during enumeration).

> > So my question is whether users are giving up some isolation.  If so,
> > should we even allow devices to be passed through to guests?  If we do
> > allow that, do users have any indication that they're not getting what
> > they expect?
> 
> iommu_groups will correctly describe the system limitations with the
> ACS quirk path and so all of the above concerns are taken care
> of. Robin is saying the Juno SMMU forces a large iommu_group covering
> the switch anyhow today, so at least that platform is not affected.

I guess REQ_ACS_FLAGS is what iommu_groups looks for?  I looked for
such a thing earlier but must have missed it.  Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches
  2026-02-06 15:05             ` Bjorn Helgaas
@ 2026-02-06 15:07               ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2026-02-06 15:07 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Manivannan Sadhasivam, Bjorn Helgaas,
	linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Alex Williamson, James Puthukattukaran

On Fri, Feb 06, 2026 at 09:05:05AM -0600, Bjorn Helgaas wrote:
> On Fri, Feb 06, 2026 at 10:52:54AM -0400, Jason Gunthorpe wrote:
> > On Fri, Feb 06, 2026 at 08:46:51AM -0600, Bjorn Helgaas wrote:
> > 
> > > IIUC the current situation is that for these IDT switches, ACS SV is
> > > enabled when downstream devices are passed through to guests, but
> > > after these patches, it will no longer be enabled.
> > 
> > ACS SV is enabled at boot time if an IOMMU driver is present
> > regardless if guests or virtualization is in use.
> > 
> > Linux doesn't change ACS flags dynamically.
> 
> Right, it's just that this series effectively un-advertises ACS SV for
> the IDE switches so it will never be enabled for them

Yes

> > iommu_groups will correctly describe the system limitations with the
> > ACS quirk path and so all of the above concerns are taken care
> > of. Robin is saying the Juno SMMU forces a large iommu_group covering
> > the switch anyhow today, so at least that platform is not affected.
> 
> I guess REQ_ACS_FLAGS is what iommu_groups looks for?  I looked for
> such a thing earlier but must have missed it.  Thanks!

Yes

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms
  2026-02-06  9:18     ` Manivannan Sadhasivam
@ 2026-02-06 15:37       ` Robin Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2026-02-06 15:37 UTC (permalink / raw)
  To: Manivannan Sadhasivam, Bjorn Helgaas
  Cc: Manivannan Sadhasivam, Bjorn Helgaas, linux-pci, linux-kernel,
	iommu, Naresh Kamboju, Pavankumar Kondeti, Xingang Wang,
	Marek Szyprowski, Jason Gunthorpe

On 2026-02-06 9:18 am, Manivannan Sadhasivam wrote:
> On Wed, Feb 04, 2026 at 05:33:30PM -0600, Bjorn Helgaas wrote:
>> On Fri, Jan 02, 2026 at 09:04:47PM +0530, Manivannan Sadhasivam wrote:
>>> For enabling ACS without the cmdline params, the platform drivers are
>>> expected to call pci_request_acs() API which sets a static flag,
>>> 'pci_acs_enable' in drivers/pci/pci.c. And this flag is used to enable ACS
>>> in pci_enable_acs() helper, which gets called during pci_acs_init(), as per
>>> this call stack:
>>>
>>> -> pci_device_add()
>>> 	-> pci_init_capabilities()
>>> 		-> pci_acs_init()
>>> 			/* check for pci_acs_enable */
>>> 			-> pci_enable_acs()
>>>
>>> For the OF platforms, pci_request_acs() is called during
>>> of_iommu_configure() during device_add(), as per this call stack:
>>>
>>> -> device_add()
>>> 	-> iommu_bus_notifier()
>>> 		-> iommu_probe_device()
>>> 			-> pci_dma_configure()
>>> 				-> of_dma_configure()
>>> 					-> of_iommu_configure()
>>> 						/* set pci_acs_enable */
>>> 						-> pci_request_acs()
>>>
>>> As seen from both call stacks, pci_enable_acs() is called way before the
>>> invocation of pci_request_acs() for the OF platforms. This means,
>>> pci_enable_acs() will not enable ACS for the first device that gets
>>> enumerated, which is usally the Root Port device. But since the static
>>> flag, 'pci_acs_enable' is set *afterwards*, ACS will be enabled for the
>>> ACS capable devices enumerated later.
>>>
>>> To fix this issue, do not call pci_enable_acs() from pci_acs_init(), but
>>> only from pci_dma_configure() after calling of_dma_configure(). This makes
>>> sure that pci_enable_acs() only gets called after the IOMMU framework has
>>> called pci_request_acs(). The ACS enablement flow now looks like:
>>>
>>> -> pci_device_add()
>>> 	-> pci_init_capabilities()
>>> 		/* Just store the ACS cap */
>>> 		-> pci_acs_init()
>>> 	-> device_add()
>>> 		...
>>> 		-> pci_dma_configure()
>>> 			-> of_dma_configure()
>>> 				-> pci_request_acs()
>>> 			-> pci_enable_acs()
>>>
>>> For the ACPI platforms, pci_request_acs() is called during ACPI
>>> initialization time itself, independent of the IOMMU framework.
>>
>> I don't think cmdline params are relevant, since I don't see any that
>> set pci_acs_enable or call pci_request_acs().
>>
>> I propose a longer but combined call chain in the commit log to show
>> the old sequence vs the new one.  I do notice the double call of
>> pci_enable_acs() (along with dev->bus->dma_configure()), which all
>> looks odd, but maybe it will be rationalized someday:
>>
> 
> Yeah, there is no issue with calling pci_enable_acs() twice as of now, but I
> don't have a better solution either.

Indeed the second one is due to go away - the current state of running 
all of bus->dma_configure twice is the precursor to splitting it up so 
the iommu_configure() bits run at device_add() time (or the equivalent 
if the IOMMU registers later), while the rest of the dma_configure() 
bits (including deferral if we're still waiting for an IOMMU driver) 
stay at driver bind. The snag is that there are some dodgy platform 
drivers abusing the previous behaviour that I'm not sure what to do with.

However I suppose with some careful refactoring it shouldn't strictly be 
necessary to change everything all at once, so if and when I get the 
chance to pick that project up again I could potentially finish the new 
design for PCI and other buses while kicking platform down the road...

Thanks,
Robin.

>>      PCI: Enable ACS after configuring IOMMU for OF platforms
>>      
>>      Platform, ACPI, or IOMMU drivers call pci_request_acs(), which sets
>>      'pci_acs_enable' to request that ACS be enabled for any devices enumerated
>>      in the future.
>>      
>>      OF platforms called pci_enable_acs() for the first device before
>>      of_iommu_configure() called pci_request_acs(), so ACS was never enabled for
>>      that device (typically a Root Port).
>>      
>>      Call pci_enable_acs() later, from pci_dma_configure(), after
>>      of_dma_configure() has had a chance to call pci_request_acs().
>>      
>>      Here's the call path, showing the move of pci_enable_acs() from
>>      pci_acs_init() to pci_dma_configure(), where it always happens after
>>      pci_request_acs():
>>      
>>          pci_device_add
>>            pci_init_capabilities
>>              pci_acs_init
>>       -        pci_enable_acs
>>       -          if (pci_acs_enable)                <-- previous test
>>       -            ...
>>            device_add
>>              bus_notify(BUS_NOTIFY_ADD_DEVICE)
>>                iommu_bus_notifier
>>                  iommu_probe_device
>>                    iommu_init_device
>>                      dev->bus->dma_configure
>>                        pci_dma_configure            # pci_bus_type.dma_configure
>>                          of_dma_configure
>>                            of_iommu_configure
>>                              pci_request_acs
>>                                pci_acs_enable = 1   <-- set
>>       +                  pci_enable_acs
>>       +                    if (pci_acs_enable)      <-- new test
>>       +                      ...
>>              bus_probe_device
>>                device_initial_probe
>>                  ...
>>                    really_probe
>>                      dev->bus->dma_configure
>>                        pci_dma_configure            # pci_bus_type.dma_configure
>>                          ...
>>                            pci_enable_acs
>>      
>>      Note that we will now call pci_enable_acs() twice for every device, first
>>      from the iommu_probe_device() path and again from the really_probe() path.
>>      Presumably that's not an issue since we also call dev->bus->dma_configure()
>>      twice.
>>      
>>      For the ACPI platforms, pci_request_acs() is called during ACPI
>>      initialization time itself, independent of the IOMMU framework.
>>
> 
> Thanks! Let me know if I need to respin or you want to ammend it while applying.
> (It is not clear to me what is your plan here).
> 
> - Mani
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms
  2026-01-02 15:34 [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam via B4 Relay
                   ` (4 preceding siblings ...)
  2026-01-29 17:42 ` [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam
@ 2026-02-06 18:32 ` Bjorn Helgaas
  5 siblings, 0 replies; 22+ messages in thread
From: Bjorn Helgaas @ 2026-02-06 18:32 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam

On Fri, Jan 02, 2026 at 09:04:46PM +0530, Manivannan Sadhasivam wrote:
> Hi,
> 
> This series fixes the long standing issue with ACS in OF platforms. There are
> two fixes in this series, both fixing independent issues on their own, but both
> are needed to properly enable ACS on OF platforms.
> 
> Issue(s) background
> ===================
> 
> Back in 2021, Xingang Wang first noted a failure in attaching the HiSilicon SEC
> device to QEMU ARM64 pci-root-port device [1]. He then tracked down the issue to
> ACS not being enabled for the QEMU Root Port device and he proposed a patch to
> fix it [2].
> 
> Once the patch got applied, people reported PCIe issues with linux-next on the
> ARM Juno Development boards, where they saw failure in enumerating the endpoint
> devices [3][4]. So soon, the patch got dropped, but the actual issue with the
> ARM Juno boards was left behind.
> 
> Fast forward to 2024, Pavan resubmitted the same fix [5] for his own usecase,
> hoping that someone in the community would fix the issue with ARM Juno boards.
> But the patch was rightly rejected, as a patch that was known to cause issues
> should not be merged to the kernel. But again, no one investigated the Juno
> issue and it was left behind again.
> 
> Now it ended up in my plate and I managed to track down the issue with the help
> of Naresh who got access to the Juno boards in LKFT. The Juno issue was with the
> PCIe switch from Microsemi/IDT, which triggers ACS Source Validation error on
> Completions received for the Configuration Read Request from a device connected
> to the downstream port that has not yet captured the PCIe bus number. As per the
> PCIe spec r6.0 sec 2.2.6.2, "Functions must capture the Bus and Device Numbers
> supplied with all Type 0 Configuration Write Requests completed by the Function
> and supply these numbers in the Bus and Device Number fields of the Requester ID
> for all Requests". So during the first Configuration Read Request issued by the
> switch downstream port during enumeration (for reading Vendor ID), Bus and
> Device numbers will be unknown to the device. So it responds to the Read Request
> with Completion having Bus and Device number as 0. The switch interprets the
> Completion as an ACS Source Validation error and drops the completion, leading
> to the failure in detecting the endpoint device. Though the PCIe spec r6.0, sec
> 6.12.1.1, states that "Completions are never affected by ACS Source Validation".
> This behavior is in violation of the spec.
> 
> Solution
> ========
> 
> In September, I submitted a series [6] to fix both issues. For the IDT issue,
> I reused the existing quirk in the PCI core which does a dummy config write
> before issuing the first config read to the device. And for the ACS enablement
> issue, I just resubmitted the original patch from Xingang which called
> pci_request_acs() from devm_of_pci_bridge_init().
> 
> But during the review of the series, several comments were received and they
> required the series to be reworked completely. Hence, in this version, I've
> incorported the comments as below:
> 
> 1. For the ACS enablement issue, I've moved the pci_enable_acs() call from
> pci_acs_init() to pci_dma_configure().
> 
> 2. For the IDT issue, I've cached the ACS capabilities (RO) in 'pci_dev',
> and disabled the broken capability for the IDT switches in the cache. This also
> allowed to get rid of the earlier workaround for the switch.
> 
> [1] https://lore.kernel.org/all/038397a6-57e2-b6fc-6e1c-7c03b7be9d96@huawei.com
> [2] https://lore.kernel.org/all/1621566204-37456-1-git-send-email-wangxingang5@huawei.com
> [3] https://lore.kernel.org/all/01314d70-41e6-70f9-e496-84091948701a@samsung.com
> [4] https://lore.kernel.org/all/CADYN=9JWU3CMLzMEcD5MSQGnaLyDRSKc5SofBFHUax6YuTRaJA@mail.gmail.com
> [5] https://lore.kernel.org/linux-pci/20241107-pci_acs_fix-v1-1-185a2462a571@quicinc.com
> [6] https://lore.kernel.org/linux-pci/20250910-pci-acs-v1-0-fe9adb65ad7d@oss.qualcomm.com
> 
> Changes in v3:
> - Dropped the 'acs_broken_cap' field and directly called the quirk from
>   pci_acs_init()
> - Collected tags. Since the delta between v2 and v3 is minimal, I've kept them.
> - Rebased on top of v6.19-rc1 
> - Link to v2: https://lore.kernel.org/r/20251202-pci_acs-v2-0-5d2759a71489@oss.qualcomm.com
> 
> Changes in v2:
> 
> * Reworked the patches completely as mentioned above.
> * Rebased on top of v6.18-rc7
> 
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> Manivannan Sadhasivam (4):
>       PCI: Enable ACS only after configuring IOMMU for OF platforms
>       PCI: Cache ACS capabilities
>       PCI: Disable ACS SV capability for the broken IDT switches
>       PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch
> 
>  drivers/pci/pci-driver.c |  8 +++++++
>  drivers/pci/pci.c        | 33 ++++++++++++--------------
>  drivers/pci/pci.h        |  4 +++-
>  drivers/pci/probe.c      | 12 ----------
>  drivers/pci/quirks.c     | 62 ++++++++++++------------------------------------
>  include/linux/pci.h      |  1 +
>  6 files changed, 42 insertions(+), 78 deletions(-)
> ---
> base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
> change-id: 20251201-pci_acs-b15aa3947289

Applied to pci/virtualization for v6.20, thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Regression Re: [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms
  2026-01-02 15:34 ` [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for " Manivannan Sadhasivam via B4 Relay
  2026-02-04 23:33   ` Bjorn Helgaas
@ 2026-04-03 16:45   ` Jon Kohler
  2026-04-03 16:52     ` Manivannan Sadhasivam
  2026-04-03 16:47   ` Jon Kohler
  2 siblings, 1 reply; 22+ messages in thread
From: Jon Kohler @ 2026-04-03 16:45 UTC (permalink / raw)
  To: manivannan.sadhasivam
  Cc: Bjorn Helgaas, linux-pci, linux-kernel, iommu, Naresh Kamboju,
	Pavankumar Kondeti, Xingang Wang, Marek Szyprowski, Robin Murphy,
	Jason Gunthorpe, Manivannan Sadhasivam, sashal



> On Jan 2, 2026, at 10:34 AM, Manivannan Sadhasivam via B4 Relay <devnull+manivannan.sadhasivam.oss.qualcomm.com@kernel.org> wrote:
> 
> From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> 
> For enabling ACS without the cmdline params, the platform drivers are
> expected to call pci_request_acs() API which sets a static flag,
> 'pci_acs_enable' in drivers/pci/pci.c. And this flag is used to enable ACS
> in pci_enable_acs() helper, which gets called during pci_acs_init(), as per
> this call stack:
> 
> -> pci_device_add()
> -> pci_init_capabilities()
> -> pci_acs_init()
> /* check for pci_acs_enable */
> -> pci_enable_acs()
> 
> For the OF platforms, pci_request_acs() is called during
> of_iommu_configure() during device_add(), as per this call stack:
> 
> -> device_add()
> -> iommu_bus_notifier()
> -> iommu_probe_device()
> -> pci_dma_configure()
> -> of_dma_configure()
> -> of_iommu_configure()
> /* set pci_acs_enable */
> -> pci_request_acs()
> 
> As seen from both call stacks, pci_enable_acs() is called way before the
> invocation of pci_request_acs() for the OF platforms. This means,
> pci_enable_acs() will not enable ACS for the first device that gets
> enumerated, which is usally the Root Port device. But since the static
> flag, 'pci_acs_enable' is set *afterwards*, ACS will be enabled for the
> ACS capable devices enumerated later.
> 
> To fix this issue, do not call pci_enable_acs() from pci_acs_init(), but
> only from pci_dma_configure() after calling of_dma_configure(). This makes
> sure that pci_enable_acs() only gets called after the IOMMU framework has
> called pci_request_acs(). The ACS enablement flow now looks like:
> 
> -> pci_device_add()
> -> pci_init_capabilities()
> /* Just store the ACS cap */
> -> pci_acs_init()
> -> device_add()
> ...
> -> pci_dma_configure()
> -> of_dma_configure()
> -> pci_request_acs()
> -> pci_enable_acs()
> 
> For the ACPI platforms, pci_request_acs() is called during ACPI
> initialization time itself, independent of the IOMMU framework.
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> drivers/pci/pci-driver.c | 8 ++++++++
> drivers/pci/pci.c        | 8 --------
> drivers/pci/pci.h        | 1 +
> 3 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 7c2d9d596258..301a9418e38e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1650,6 +1650,14 @@ static int pci_dma_configure(struct device *dev)
> ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
> }
> 
> + /*
> + * Attempt to enable ACS regardless of capability because some Root
> + * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> + * the standard ACS capability but still support ACS via those
> + * quirks.
> + */
> + pci_enable_acs(to_pci_dev(dev));
> +
> pci_put_host_bridge_device(bridge);
> 
> /* @drv may not be valid when we're called from the IOMMU layer */
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..2c3d0a2d6973 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3648,14 +3648,6 @@ bool pci_acs_path_enabled(struct pci_dev *start,
> void pci_acs_init(struct pci_dev *dev)
> {
> dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> -
> - /*
> - * Attempt to enable ACS regardless of capability because some Root
> - * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> - * the standard ACS capability but still support ACS via those
> - * quirks.
> - */
> - pci_enable_acs(dev);
> }
> 
> /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 0e67014aa001..4592ede0ebcc 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -939,6 +939,7 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
> }
> 
> void pci_acs_init(struct pci_dev *dev);
> +void pci_enable_acs(struct pci_dev *dev);
> #ifdef CONFIG_PCI_QUIRKS
> int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
> int pci_dev_specific_enable_acs(struct pci_dev *dev);
> 
> -- 
> 2.48.1
> 

Howdy folks,
Writing to report a regression from this patch. While attempting to
rebase our 6.12.y-based internal tree past 6.12.75, we encountered a
behavior change in our internal systems, wherein we have a platform
that uses vfio-pci passthrough for a series of PCIe devices (SAS HBAs,
NVMe SSDs, NIC passthrough, etc) to a given service VM.

On some, but not all, of these systems, we noticed that the iommu
group topology completely changes with this commit. Before this commit,
a given system may have ~90-ish iommu groups. After this commit, we see
~60-ish iommu groups.

The net result is that some of the devices marked for passthrough get
clubbed together with devices that are not marked for passthrough. As
such, libvirt refuses to start the VM, like so: 

[root@demo-system-here ~]# virsh start demo-service-vm
error: Failed to start domain 'demo-service-vm'
2026-04-02T03:52:17.098998Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:41:00.0","id":"ua-cb173399-0c46-5b7a-b3e6-fe2fb5f9509c","bus":"pci.0","addr":"0x7","rombar":0}: vfio 0000:41:00.0: group 8 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

[root@demo-system-here ~]# lspci |grep 41:00
...
41:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
Subsystem: Super Micro Computer Inc AOC-S3816L-L16iT (NI22) Storage Adapter
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas 

[root@demo-system-here ~]# ls -l /sys/kernel/iommu_groups/8/devices
total 0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.0 -> ../../../../devices/pci0000:40/0000:40:01.0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.1 -> ../../../../devices/pci0000:40/0000:40:01.1
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.2 -> ../../../../devices/pci0000:40/0000:40:01.2
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:41:00.0 -> ../../../../devices/pci0000:40/0000:40:01.1/0000:41:00.0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:42:00.0 -> ../../../../devices/pci0000:40/0000:40:01.2/0000:42:00.0

The tricky bit here is 41 is the SAS controller we're trying to pass
through, and 42 is the local NVMe m.2 boot disk that the server itself
is booting off of.

Before this patch, they are put in separate groups, and there is not
a problem.

I’ve only tried this on our 6.12.y tree, but not yet our 6.18 and 6.6
trees, so I’m not sure if this problem exists there yet, but this
commit is in those trees in general in 6.18.16 and 6.6.128 respectively

Happy to provide any other details you might like, as this is 100%
reproducible on a variety of systems here.

Thanks,
Jon


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Regression Re: [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms
  2026-01-02 15:34 ` [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for " Manivannan Sadhasivam via B4 Relay
  2026-02-04 23:33   ` Bjorn Helgaas
  2026-04-03 16:45   ` Regression " Jon Kohler
@ 2026-04-03 16:47   ` Jon Kohler
  2 siblings, 0 replies; 22+ messages in thread
From: Jon Kohler @ 2026-04-03 16:47 UTC (permalink / raw)
  To: manivannan.sadhasivam@oss.qualcomm.com
  Cc: Bjorn Helgaas, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	Naresh Kamboju, Pavankumar Kondeti, Xingang Wang,
	Marek Szyprowski, Robin Murphy, Jason Gunthorpe,
	Manivannan Sadhasivam, sashal@kernel.org



> On Jan 2, 2026, at 10:34 AM, Manivannan Sadhasivam via B4 Relay <devnull+manivannan.sadhasivam.oss.qualcomm.com@kernel.org> wrote:
> 
> From: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> 
> For enabling ACS without the cmdline params, the platform drivers are
> expected to call pci_request_acs() API which sets a static flag,
> 'pci_acs_enable' in drivers/pci/pci.c. And this flag is used to enable ACS
> in pci_enable_acs() helper, which gets called during pci_acs_init(), as per
> this call stack:
> 
> -> pci_device_add()
> -> pci_init_capabilities()
> -> pci_acs_init()
> /* check for pci_acs_enable */
> -> pci_enable_acs()
> 
> For the OF platforms, pci_request_acs() is called during
> of_iommu_configure() during device_add(), as per this call stack:
> 
> -> device_add()
> -> iommu_bus_notifier()
> -> iommu_probe_device()
> -> pci_dma_configure()
> -> of_dma_configure()
> -> of_iommu_configure()
> /* set pci_acs_enable */
> -> pci_request_acs()
> 
> As seen from both call stacks, pci_enable_acs() is called way before the
> invocation of pci_request_acs() for the OF platforms. This means,
> pci_enable_acs() will not enable ACS for the first device that gets
> enumerated, which is usally the Root Port device. But since the static
> flag, 'pci_acs_enable' is set *afterwards*, ACS will be enabled for the
> ACS capable devices enumerated later.
> 
> To fix this issue, do not call pci_enable_acs() from pci_acs_init(), but
> only from pci_dma_configure() after calling of_dma_configure(). This makes
> sure that pci_enable_acs() only gets called after the IOMMU framework has
> called pci_request_acs(). The ACS enablement flow now looks like:
> 
> -> pci_device_add()
> -> pci_init_capabilities()
> /* Just store the ACS cap */
> -> pci_acs_init()
> -> device_add()
> ...
> -> pci_dma_configure()
> -> of_dma_configure()
> -> pci_request_acs()
> -> pci_enable_acs()
> 
> For the ACPI platforms, pci_request_acs() is called during ACPI
> initialization time itself, independent of the IOMMU framework.
> 
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
> ---
> drivers/pci/pci-driver.c | 8 ++++++++
> drivers/pci/pci.c        | 8 --------
> drivers/pci/pci.h        | 1 +
> 3 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 7c2d9d596258..301a9418e38e 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1650,6 +1650,14 @@ static int pci_dma_configure(struct device *dev)
> ret = acpi_dma_configure(dev, acpi_get_dma_attr(adev));
> }
> 
> + /*
> + * Attempt to enable ACS regardless of capability because some Root
> + * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> + * the standard ACS capability but still support ACS via those
> + * quirks.
> + */
> + pci_enable_acs(to_pci_dev(dev));
> +
> pci_put_host_bridge_device(bridge);
> 
> /* @drv may not be valid when we're called from the IOMMU layer */
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 13dbb405dc31..2c3d0a2d6973 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3648,14 +3648,6 @@ bool pci_acs_path_enabled(struct pci_dev *start,
> void pci_acs_init(struct pci_dev *dev)
> {
> dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
> -
> - /*
> - * Attempt to enable ACS regardless of capability because some Root
> - * Ports (e.g. those quirked with *_intel_pch_acs_*) do not have
> - * the standard ACS capability but still support ACS via those
> - * quirks.
> - */
> - pci_enable_acs(dev);
> }
> 
> /**
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 0e67014aa001..4592ede0ebcc 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -939,6 +939,7 @@ static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
> }
> 
> void pci_acs_init(struct pci_dev *dev);
> +void pci_enable_acs(struct pci_dev *dev);
> #ifdef CONFIG_PCI_QUIRKS
> int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
> int pci_dev_specific_enable_acs(struct pci_dev *dev);
> 
> -- 
> 2.48.1
> 

Resending, as I sent the last copy from my iCloud account, not work account

Howdy folks,
Writing to report a regression from this patch. While attempting to
rebase our 6.12.y-based internal tree past 6.12.75, we encountered a
behavior change in our internal systems, wherein we have a platform
that uses vfio-pci passthrough for a series of PCIe devices (SAS HBAs,
NVMe SSDs, NIC passthrough, etc) to a given service VM.

On some, but not all, of these systems, we noticed that the iommu
group topology completely changes with this commit. Before this commit,
a given system may have ~90-ish iommu groups. After this commit, we see
~60-ish iommu groups.

The net result is that some of the devices marked for passthrough get
clubbed together with devices that are not marked for passthrough. As
such, libvirt refuses to start the VM, like so: 

[root@demo-system-here ~]# virsh start demo-service-vm
error: Failed to start domain 'demo-service-vm'
2026-04-02T03:52:17.098998Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:41:00.0","id":"ua-cb173399-0c46-5b7a-b3e6-fe2fb5f9509c","bus":"pci.0","addr":"0x7","rombar":0}: vfio 0000:41:00.0: group 8 is not viable
Please ensure all devices within the iommu_group are bound to their vfio bus driver.

[root@demo-system-here ~]# lspci |grep 41:00
...
41:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
Subsystem: Super Micro Computer Inc AOC-S3816L-L16iT (NI22) Storage Adapter
Kernel driver in use: vfio-pci
Kernel modules: mpt3sas 

[root@demo-system-here ~]# ls -l /sys/kernel/iommu_groups/8/devices
total 0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.0 -> ../../../../devices/pci0000:40/0000:40:01.0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.1 -> ../../../../devices/pci0000:40/0000:40:01.1
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.2 -> ../../../../devices/pci0000:40/0000:40:01.2
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:41:00.0 -> ../../../../devices/pci0000:40/0000:40:01.1/0000:41:00.0
lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:42:00.0 -> ../../../../devices/pci0000:40/0000:40:01.2/0000:42:00.0

The tricky bit here is 41 is the SAS controller we're trying to pass
through, and 42 is the local NVMe m.2 boot disk that the server itself
is booting off of.

Before this patch, they are put in separate groups, and there is not
a problem.

I’ve only tried this on our 6.12.y tree, but not yet our 6.18 and 6.6
trees, so I’m not sure if this problem exists there yet, but this
commit is in those trees in general in 6.18.16 and 6.6.128 respectively

Happy to provide any other details you might like, as this is 100%
reproducible on a variety of systems here.

Thanks,
Jon



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Regression Re: [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for OF platforms
  2026-04-03 16:45   ` Regression " Jon Kohler
@ 2026-04-03 16:52     ` Manivannan Sadhasivam
  0 siblings, 0 replies; 22+ messages in thread
From: Manivannan Sadhasivam @ 2026-04-03 16:52 UTC (permalink / raw)
  To: Jon Kohler
  Cc: manivannan.sadhasivam, Bjorn Helgaas, linux-pci, linux-kernel,
	iommu, Naresh Kamboju, Pavankumar Kondeti, Xingang Wang,
	Marek Szyprowski, Robin Murphy, Jason Gunthorpe, sashal

On Fri, Apr 03, 2026 at 12:45:29PM -0400, Jon Kohler wrote:

[...]

> Howdy folks,
> Writing to report a regression from this patch. While attempting to
> rebase our 6.12.y-based internal tree past 6.12.75, we encountered a
> behavior change in our internal systems, wherein we have a platform
> that uses vfio-pci passthrough for a series of PCIe devices (SAS HBAs,
> NVMe SSDs, NIC passthrough, etc) to a given service VM.
> 
> On some, but not all, of these systems, we noticed that the iommu
> group topology completely changes with this commit. Before this commit,
> a given system may have ~90-ish iommu groups. After this commit, we see
> ~60-ish iommu groups.
> 
> The net result is that some of the devices marked for passthrough get
> clubbed together with devices that are not marked for passthrough. As
> such, libvirt refuses to start the VM, like so: 
> 
> [root@demo-system-here ~]# virsh start demo-service-vm
> error: Failed to start domain 'demo-service-vm'
> 2026-04-02T03:52:17.098998Z qemu-kvm: -device {"driver":"vfio-pci","host":"0000:41:00.0","id":"ua-cb173399-0c46-5b7a-b3e6-fe2fb5f9509c","bus":"pci.0","addr":"0x7","rombar":0}: vfio 0000:41:00.0: group 8 is not viable
> Please ensure all devices within the iommu_group are bound to their vfio bus driver.
> 
> [root@demo-system-here ~]# lspci |grep 41:00
> ...
> 41:00.0 Serial Attached SCSI controller: Broadcom / LSI Fusion-MPT 12GSAS/PCIe Secure SAS38xx
> Subsystem: Super Micro Computer Inc AOC-S3816L-L16iT (NI22) Storage Adapter
> Kernel driver in use: vfio-pci
> Kernel modules: mpt3sas 
> 
> [root@demo-system-here ~]# ls -l /sys/kernel/iommu_groups/8/devices
> total 0
> lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.0 -> ../../../../devices/pci0000:40/0000:40:01.0
> lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.1 -> ../../../../devices/pci0000:40/0000:40:01.1
> lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:40:01.2 -> ../../../../devices/pci0000:40/0000:40:01.2
> lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:41:00.0 -> ../../../../devices/pci0000:40/0000:40:01.1/0000:41:00.0
> lrwxrwxrwx. 1 root root 0 Apr 2 07:33 0000:42:00.0 -> ../../../../devices/pci0000:40/0000:40:01.2/0000:42:00.0
> 
> The tricky bit here is 41 is the SAS controller we're trying to pass
> through, and 42 is the local NVMe m.2 boot disk that the server itself
> is booting off of.
> 
> Before this patch, they are put in separate groups, and there is not
> a problem.
> 
> I’ve only tried this on our 6.12.y tree, but not yet our 6.18 and 6.6
> trees, so I’m not sure if this problem exists there yet, but this
> commit is in those trees in general in 6.18.16 and 6.6.128 respectively
> 
> Happy to provide any other details you might like, as this is 100%
> reproducible on a variety of systems here.
> 

Sorry for the breakage. This issue was reported and a revert has been submitted:
https://lore.kernel.org/linux-pci/20260320172335.29778-1-john@kernel.doghat.io/

The stable maintainers are yet to pick it. And I've submitted similar reverts
for all pre v6.15 kernels where this patch is not applicable.

https://lore.kernel.org/linux-pci/20260331091009.19536-1-manivannan.sadhasivam@oss.qualcomm.com/
https://lore.kernel.org/linux-pci/20260331091306.25210-1-manivannan.sadhasivam@oss.qualcomm.com/
https://lore.kernel.org/linux-pci/20260331091455.30394-1-manivannan.sadhasivam@oss.qualcomm.com/
https://lore.kernel.org/linux-pci/20260331091727.33552-1-manivannan.sadhasivam@oss.qualcomm.com/

- Mani

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-04-03 16:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-02 15:34 [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam via B4 Relay
2026-01-02 15:34 ` [PATCH v3 1/4] PCI: Enable ACS only after configuring IOMMU for " Manivannan Sadhasivam via B4 Relay
2026-02-04 23:33   ` Bjorn Helgaas
2026-02-06  9:18     ` Manivannan Sadhasivam
2026-02-06 15:37       ` Robin Murphy
2026-04-03 16:45   ` Regression " Jon Kohler
2026-04-03 16:52     ` Manivannan Sadhasivam
2026-04-03 16:47   ` Jon Kohler
2026-01-02 15:34 ` [PATCH v3 2/4] PCI: Cache ACS capabilities Manivannan Sadhasivam via B4 Relay
2026-01-02 15:34 ` [PATCH v3 3/4] PCI: Disable ACS SV capability for the broken IDT switches Manivannan Sadhasivam via B4 Relay
2026-02-05 23:39   ` Bjorn Helgaas
2026-02-06  9:11     ` Manivannan Sadhasivam
2026-02-06 14:30       ` Jason Gunthorpe
2026-02-06 14:43         ` Robin Murphy
2026-02-06 14:46         ` Bjorn Helgaas
2026-02-06 14:52           ` Jason Gunthorpe
2026-02-06 15:05             ` Bjorn Helgaas
2026-02-06 15:07               ` Jason Gunthorpe
2026-01-02 15:34 ` [PATCH v3 4/4] PCI: Extend the pci_disable_broken_acs_cap() quirk for one more IDT switch Manivannan Sadhasivam via B4 Relay
2026-01-29 17:42 ` [PATCH v3 0/4] PCI: Fix ACS enablement for Root Ports in OF platforms Manivannan Sadhasivam
2026-01-29 18:06   ` Jason Gunthorpe
2026-02-06 18:32 ` Bjorn Helgaas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox