* [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking
@ 2026-05-29 11:12 Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 1/6] PCI/ATS: Ensure pci_ats_supported() is PF-aware for VFs Pranjal Shrivastava
` (5 more replies)
0 siblings, 6 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-29 11:12 UTC (permalink / raw)
To: iommu, linux-pci, linux-arm-kernel, linux-kernel
Cc: Joerg Roedel, Will Deacon, Bjorn Helgaas, David Woodhouse,
Lu Baolu, Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh, Pranjal Shrivastava
The primary motivation for this series is an ATS state mismatch observed
under heavy load (via iova_stress). A failure in pci_enable_ats() leaves
IOMMU drivers like arm-smmu-v3 with inconsistent state leading to PCI core
warnings during device detach.
While David's recent work [1] addressed a discovery race for specific
quirked devices by moving them to the HEADER phase, gaps remained
regarding how Virtual Functions (VFs) inherit state from their Physical
Functions (PFs). Specifically, pci_ats_supported() did not account for
PF-level quirked status, and pci_prepare_ats() lacked STU validation for
VFs.
Based on discussion with Jason and Baolu in v3/v5, it was decided that the
IOMMU drivers should explicitly check pci_ats_supported() before calling
pci_prepare_ats(). To enforce this, pci_prepare_ats() now noisily checks
for support via WARN_ON(). Furthermore, the device probe should fail if
pci_prepare_ats() fails. Since these early gates preclude software
configuration errors, any remaining failure during pci_enable_ats() is
treated as a kernel bug.
This series standardizes this pattern across ARM SMMUv3, Intel VT-d, &
AMD IOMMU drivers.
[1] https://lore.kernel.org/linux-pci/20260403222750.1215002-1-dmatlack@google.com/
[v6]
- Reverted the decoupling of pci_ats_supported() from pci_prepare_ats().
- Added a WARN_ON() to the internal support check in pci_prepare_ats().
- Dropped the standalone Intel bugfixes (RB-tree and UAF) to be sent as a
separate standalone series per maintainer request.
- Kept the folded UAF fix in the AMD IOMMU patch to ensure the new error
path is immediately safe.
- Collected Reviewed-by tags from Lu Baolu for PCI core patches.
[v5]
- https://lore.kernel.org/all/20260528202353.3422206-1-praan@google.com/
- Decoupled pci_ats_supported() from pci_prepare_ats() in the PCI core.
- Rebased SMMUv3 support on top of Nicolin Chen's "Always-On ATS" series.
- Fixed pre-existing RB-tree corruption in VT-d probe (Baolu/Sashiko).
- Addressed the pre-existing UAF in AMD IOMMU probe suggested by Sashiko.
[v4]
- https://lore.kernel.org/all/20260525184347.4059549-1-praan@google.com/
- Standardized the pattern across Intel VT-d and AMD IOMMU drivers.
- Replaced the SMMUv3 ats_prepared gate with a fatal probe-fail logic.
- Utilized WARN() macros for runtime enablement failures in all drivers.
- Collected R-b tags from Jason and Sami.
Pranjal Shrivastava (6):
PCI/ATS: Ensure pci_ats_supported() is PF-aware for VFs
PCI/ATS: Validate STU for VFs in pci_prepare_ats()
PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats()
iommu/arm-smmu-v3: Standardize ATS enablement failure reporting
iommu/vt-d: Fail probe on ATS configuration failure
iommu/amd: Fail probe on ATS configuration failure
drivers/iommu/amd/iommu.c | 30 ++++++++++++++++-----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 +++++--
drivers/iommu/intel/iommu.c | 15 ++++++++---
drivers/pci/ats.c | 19 +++++++++----
4 files changed, 58 insertions(+), 16 deletions(-)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v6 1/6] PCI/ATS: Ensure pci_ats_supported() is PF-aware for VFs
2026-05-29 11:12 [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking Pranjal Shrivastava
@ 2026-05-29 11:12 ` Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 2/6] PCI/ATS: Validate STU for VFs in pci_prepare_ats() Pranjal Shrivastava
` (4 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-29 11:12 UTC (permalink / raw)
To: iommu, linux-pci, linux-arm-kernel, linux-kernel
Cc: Joerg Roedel, Will Deacon, Bjorn Helgaas, David Woodhouse,
Lu Baolu, Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh, Pranjal Shrivastava,
Jason Gunthorpe
Update pci_ats_supported() to additionally check the associated PF's
status when called on a VF. This ensures that PF-level quirks and
untrusted status are correctly propagated to VFs, providing a robust
support check that aligns with the kernel's PF-centric ATS configuration
model and is immune to the timing of VF-specific fixups.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/pci/ats.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 96efa00d9743..679a3c3c1d54 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -40,10 +40,13 @@ void pci_ats_init(struct pci_dev *dev)
*/
bool pci_ats_supported(struct pci_dev *dev)
{
- if (!dev->ats_cap)
+ if (!dev->ats_cap || dev->untrusted)
return false;
- return (dev->untrusted == 0);
+ if (dev->is_virtfn)
+ return pci_ats_supported(pci_physfn(dev));
+
+ return true;
}
EXPORT_SYMBOL_GPL(pci_ats_supported);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 2/6] PCI/ATS: Validate STU for VFs in pci_prepare_ats()
2026-05-29 11:12 [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 1/6] PCI/ATS: Ensure pci_ats_supported() is PF-aware for VFs Pranjal Shrivastava
@ 2026-05-29 11:12 ` Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 3/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats() Pranjal Shrivastava
` (3 subsequent siblings)
5 siblings, 0 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-29 11:12 UTC (permalink / raw)
To: iommu, linux-pci, linux-arm-kernel, linux-kernel
Cc: Joerg Roedel, Will Deacon, Bjorn Helgaas, David Woodhouse,
Lu Baolu, Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh, Pranjal Shrivastava,
Jason Gunthorpe
While every PCI Function that implements ATS has an independent ATS
Extended Capability structure with a Read/Write Smallest Translation
Unit (STU) field, the kernel manages SR-IOV ATS by requiring the IOMMU
driver to configure the STU on the Physical Function (PF) before any
any Virtual Functions (VFs) are created.
Currently, pci_prepare_ats() bails out early for VFs, assuming that the
PF has already been correctly prepared. However, this creates a potential
mismatch if a VF is subsequently prepared with a different page shift.
Update pci_prepare_ats() to validate that the requested page shift (ps)
matches the STU already configured in the associated PF. This ensures
early detection of incompatible configurations and maintains the kernel's
policy of consistent STU sizing across all functions associated with a
given SMMU.
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/pci/ats.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 679a3c3c1d54..9cb23780093d 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -73,8 +73,12 @@ int pci_prepare_ats(struct pci_dev *dev, int ps)
if (ps < PCI_ATS_MIN_STU)
return -EINVAL;
- if (dev->is_virtfn)
+ if (dev->is_virtfn) {
+ if (pci_physfn(dev)->ats_stu != ps)
+ return -EINVAL;
+
return 0;
+ }
dev->ats_stu = ps;
ctrl = PCI_ATS_CTRL_STU(dev->ats_stu - PCI_ATS_MIN_STU);
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 3/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats()
2026-05-29 11:12 [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 1/6] PCI/ATS: Ensure pci_ats_supported() is PF-aware for VFs Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 2/6] PCI/ATS: Validate STU for VFs in pci_prepare_ats() Pranjal Shrivastava
@ 2026-05-29 11:12 ` Pranjal Shrivastava
2026-05-29 21:56 ` Nicolin Chen
2026-05-29 11:12 ` [PATCH v6 4/6] iommu/arm-smmu-v3: Standardize ATS enablement failure reporting Pranjal Shrivastava
` (2 subsequent siblings)
5 siblings, 1 reply; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-29 11:12 UTC (permalink / raw)
To: iommu, linux-pci, linux-arm-kernel, linux-kernel
Cc: Joerg Roedel, Will Deacon, Bjorn Helgaas, David Woodhouse,
Lu Baolu, Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh, Pranjal Shrivastava
Currently, pci_prepare_ats() internally calls pci_ats_supported() and
returns -EINVAL if the device does not support ATS. While this provides
a silent safety check, it conflates support detection with configuration.
Update pci_prepare_ats() to wrap the internal pci_ats_supported check in
a WARN_ON(). This mandates all callers to call pci_prepare_ats() only if
the function supports ATS.
Update the function documentation to mention that callers must verify
ATS support (via pci_ats_supported()) before calling pci_prepare_ats().
Suggested-by: Baolu Lu <baolu.lu@linux.intel.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/pci/ats.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index 9cb23780093d..f1434f86ac40 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -56,7 +56,9 @@ EXPORT_SYMBOL_GPL(pci_ats_supported);
* @ps: the IOMMU page shift
*
* This must be done by the IOMMU driver on the PF before any VFs are created to
- * ensure that the VF can have ATS enabled.
+ * ensure that the VF can have ATS enabled. Callers must verify that ATS is
+ * supported by the device (e.g. via pci_ats_supported()) before calling this
+ * function.
*
* Returns 0 on success, or negative on failure.
*/
@@ -64,7 +66,7 @@ int pci_prepare_ats(struct pci_dev *dev, int ps)
{
u16 ctrl;
- if (!pci_ats_supported(dev))
+ if (WARN_ON(!pci_ats_supported(dev)))
return -EINVAL;
if (WARN_ON(dev->ats_enabled))
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 4/6] iommu/arm-smmu-v3: Standardize ATS enablement failure reporting
2026-05-29 11:12 [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking Pranjal Shrivastava
` (2 preceding siblings ...)
2026-05-29 11:12 ` [PATCH v6 3/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats() Pranjal Shrivastava
@ 2026-05-29 11:12 ` Pranjal Shrivastava
2026-05-29 21:51 ` Nicolin Chen
2026-05-29 11:12 ` [PATCH v6 5/6] iommu/vt-d: Fail probe on ATS configuration failure Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 6/6] iommu/amd: " Pranjal Shrivastava
5 siblings, 1 reply; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-29 11:12 UTC (permalink / raw)
To: iommu, linux-pci, linux-arm-kernel, linux-kernel
Cc: Joerg Roedel, Will Deacon, Bjorn Helgaas, David Woodhouse,
Lu Baolu, Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh, Pranjal Shrivastava
The SMMUv3 driver currently has a two-phase commit in its ATS enablement
flow. During arm_smmu_attach_prepare(), it predicts whether ATS will be
enabled using arm_smmu_ats_supported() and accordingly increments
nr_ats_masters and merges ATS invalidations into the domain's invs array.
However, the actual hardware enablement via pci_enable_ats() happens
later in arm_smmu_attach_commit(). If this call to pci_enable_ats fails,
the SMMU driver's ATS state tracking remains polluted, i.e., the driver
tracks ATS as enabled on a master that is not actually using it. This
leads to an incorrect nr_ats_masters and triggers a warning in the PCI
core during detach:
1 [ 127.925080] ------------[ cut here ]------------
2 [ 127.925084] WARNING: drivers/pci/ats.c:132 at pci_disable_ats+0x94/0xa8
3 ...
4 [ 128.068169] Call trace:
5 [ 128.070603] pci_disable_ats+0x94/0xa8 (P)
6 [ 128.074688] arm_smmu_attach_prepare+0x104/0x310
7 [ 128.079292] arm_smmu_attach_dev_ste+0x128/0x1e0
The issue was exposed under heavy load when running a VFIO-based DMA
map stress test (iova_stress).
Following the addition of the arm_smmu_master_prepare_ats() [1] helper during
device probe, failable ATS configuration (STU setup) is now handled early
during probe. This ensures that any master reaching the attach phase is
guaranteed to have a valid ATS configuration.
Update arm_smmu_enable_ats() to use the WARN() macro for any
subsequent enablement failures during the commit phase. Since probe
checks now preclude software configuration errors, any failure here is
considered a kernel bug.
[1] https://lore.kernel.org/all/cover.1779392420.git.nicolinc@nvidia.com/
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a10affb483a4..aaebd72bc48d 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2956,8 +2956,14 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
* ATC invalidation of PASID 0 causes the entire ATC to be flushed.
*/
arm_smmu_atc_inv_master(master, IOMMU_NO_PASID);
- if (pci_enable_ats(pdev, stu))
- dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
+
+ /*
+ * Any failure at this point is a kernel bug. pci_ats_supported()
+ * and pci_prepare_ats() have already verified the hardware capability
+ * and programmed the STU. Thus, pci_enable_ats() should not fail here.
+ */
+ WARN(pci_enable_ats(pdev, stu),
+ "Failed to enable ATS (STU %zu)\n", stu);
}
static int arm_smmu_enable_pasid(struct arm_smmu_master *master)
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 5/6] iommu/vt-d: Fail probe on ATS configuration failure
2026-05-29 11:12 [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking Pranjal Shrivastava
` (3 preceding siblings ...)
2026-05-29 11:12 ` [PATCH v6 4/6] iommu/arm-smmu-v3: Standardize ATS enablement failure reporting Pranjal Shrivastava
@ 2026-05-29 11:12 ` Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 6/6] iommu/amd: " Pranjal Shrivastava
5 siblings, 0 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-29 11:12 UTC (permalink / raw)
To: iommu, linux-pci, linux-arm-kernel, linux-kernel
Cc: Joerg Roedel, Will Deacon, Bjorn Helgaas, David Woodhouse,
Lu Baolu, Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh, Pranjal Shrivastava
Update the Intel VT-d driver to handle ATS configuration and enablement
more strictly. Specifically, update the device probe to fail if
pci_prepare_ats() returns an error. This ensures that any ATS-capable
master reaching the attach phase is guaranteed to have a valid config.
Additionally, update iommu_enable_pci_ats() to WARN() if pci_enable_ats
fails. Since earlier checks in the probe phase preclude config-related
failures, any failure during hardware enablement is considered a kernel
bug.
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/intel/iommu.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4d0e65bc131d..22308e4911e1 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -873,8 +873,14 @@ static void iommu_enable_pci_ats(struct device_domain_info *info)
if (!pci_ats_page_aligned(pdev))
return;
- if (!pci_enable_ats(pdev, VTD_PAGE_SHIFT))
- info->ats_enabled = 1;
+ /*
+ * pci_enable_ats() should not fail here because earlier checks
+ * have already verified support and configuration.
+ */
+ if (WARN_ON(pci_enable_ats(pdev, VTD_PAGE_SHIFT)))
+ return;
+
+ info->ats_enabled = 1;
}
static void iommu_disable_pci_ats(struct device_domain_info *info)
@@ -3288,7 +3294,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
dev_iommu_priv_set(dev, info);
if (pdev && pci_ats_supported(pdev)) {
- pci_prepare_ats(pdev, VTD_PAGE_SHIFT);
+ ret = pci_prepare_ats(pdev, VTD_PAGE_SHIFT);
+ if (ret)
+ goto free;
+
ret = device_rbtree_insert(iommu, info);
if (ret)
goto free;
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v6 6/6] iommu/amd: Fail probe on ATS configuration failure
2026-05-29 11:12 [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking Pranjal Shrivastava
` (4 preceding siblings ...)
2026-05-29 11:12 ` [PATCH v6 5/6] iommu/vt-d: Fail probe on ATS configuration failure Pranjal Shrivastava
@ 2026-05-29 11:12 ` Pranjal Shrivastava
2026-06-01 6:00 ` Ankit Soni
5 siblings, 1 reply; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-29 11:12 UTC (permalink / raw)
To: iommu, linux-pci, linux-arm-kernel, linux-kernel
Cc: Joerg Roedel, Will Deacon, Bjorn Helgaas, David Woodhouse,
Lu Baolu, Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh, Pranjal Shrivastava
Update the AMD IOMMU driver to handle ATS configuration and enablement
more strictly. Specifically, update the device probe to fail if
pci_prepare_ats() returns an error. This ensures that any ATS-capable
master reaching the attach phase is guaranteed to have a valid config.
Additionally, update pdev_enable_cap_ats() to WARN_ON() if pci_enable_ats
fails. Since earlier checks in the probe phase preclude config-related
failures, any failure during hardware enablement is considered a kernel
bug.
Fix a pre-existing Use-After-Free risk by ensuring iommu_ignore_device()
is called on all probe failures after iommu_init_device().
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
drivers/iommu/amd/iommu.c | 30 ++++++++++++++++++++++++------
1 file changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 84cad43dc188..b74c4504bee3 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -570,10 +570,16 @@ static inline int pdev_enable_cap_ats(struct pci_dev *pdev)
if (amd_iommu_iotlb_sup &&
(dev_data->flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP)) {
ret = pci_enable_ats(pdev, PAGE_SHIFT);
- if (!ret) {
- dev_data->ats_enabled = 1;
- dev_data->ats_qdep = pci_ats_queue_depth(pdev);
- }
+ /*
+ * pci_enable_ats() should not fail here because earlier checks
+ * have already verified support and configuration.
+ */
+ if (WARN_ON(ret))
+ return ret;
+
+ dev_data->ats_enabled = 1;
+ dev_data->ats_qdep = pci_ats_queue_depth(pdev);
+ ret = 0;
}
return ret;
@@ -2502,10 +2508,22 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
else
dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
- if (dev_is_pci(dev))
- pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
+ if (dev_is_pci(dev)) {
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ if (pci_ats_supported(pdev)) {
+ ret = pci_prepare_ats(pdev, PAGE_SHIFT);
+ if (ret) {
+ iommu_dev = ERR_PTR(ret);
+ goto out_err;
+ }
+ }
+ }
out_err:
+ if (IS_ERR(iommu_dev))
+ iommu_ignore_device(iommu, dev);
+
return iommu_dev;
}
--
2.54.0.823.g6e5bcc1fc9-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v6 4/6] iommu/arm-smmu-v3: Standardize ATS enablement failure reporting
2026-05-29 11:12 ` [PATCH v6 4/6] iommu/arm-smmu-v3: Standardize ATS enablement failure reporting Pranjal Shrivastava
@ 2026-05-29 21:51 ` Nicolin Chen
2026-05-31 17:13 ` Pranjal Shrivastava
0 siblings, 1 reply; 15+ messages in thread
From: Nicolin Chen @ 2026-05-29 21:51 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
David Matlack, Samiullah Khawaja, Daniel Mentz, Pasha Tatashin,
Mostafa Saleh
On Fri, May 29, 2026 at 11:12:06AM +0000, Pranjal Shrivastava wrote:
> The SMMUv3 driver currently has a two-phase commit in its ATS enablement
> flow. During arm_smmu_attach_prepare(), it predicts whether ATS will be
> enabled using arm_smmu_ats_supported() and accordingly increments
> nr_ats_masters and merges ATS invalidations into the domain's invs array.
>
> However, the actual hardware enablement via pci_enable_ats() happens
> later in arm_smmu_attach_commit(). If this call to pci_enable_ats fails,
> the SMMU driver's ATS state tracking remains polluted, i.e., the driver
> tracks ATS as enabled on a master that is not actually using it. This
> leads to an incorrect nr_ats_masters and triggers a warning in the PCI
> core during detach:
>
> 1 [ 127.925080] ------------[ cut here ]------------
> 2 [ 127.925084] WARNING: drivers/pci/ats.c:132 at pci_disable_ats+0x94/0xa8
> 3 ...
> 4 [ 128.068169] Call trace:
> 5 [ 128.070603] pci_disable_ats+0x94/0xa8 (P)
> 6 [ 128.074688] arm_smmu_attach_prepare+0x104/0x310
> 7 [ 128.079292] arm_smmu_attach_dev_ste+0x128/0x1e0
>
> The issue was exposed under heavy load when running a VFIO-based DMA
> map stress test (iova_stress).
>
> Following the addition of the arm_smmu_master_prepare_ats() [1] helper during
> device probe, failable ATS configuration (STU setup) is now handled early
> during probe. This ensures that any master reaching the attach phase is
> guaranteed to have a valid ATS configuration.
>
> Update arm_smmu_enable_ats() to use the WARN() macro for any
> subsequent enablement failures during the commit phase. Since probe
> checks now preclude software configuration errors, any failure here is
> considered a kernel bug.
The commit message feels like mixing a stale background and the
real requirement (based on the latest code line). Could that DMA
map stress test still trigger the WARN_ON in pci_disable_ats(),
after having arm_smmu_master_prepare_ats()?
It'd be nicer if the writing can be simplified a bit.
> arm_smmu_atc_inv_master(master, IOMMU_NO_PASID);
> - if (pci_enable_ats(pdev, stu))
> - dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
> +
> + /*
> + * Any failure at this point is a kernel bug. pci_ats_supported()
> + * and pci_prepare_ats() have already verified the hardware capability
> + * and programmed the STU. Thus, pci_enable_ats() should not fail here.
> + */
The patch that removes pci_ats_supported() from pci_prepare_ats()
is dropped in this v6. So, my previous comments may stay true and
the two lines can be enough?
/*
* As pci_prepare_ats() have already verified the hardware capability
* and programmed the STE, pci_enable_ats() should not fail here.
*/
> + WARN(pci_enable_ats(pdev, stu),
> + "Failed to enable ATS (STU %zu)\n", stu);
https://sashiko.dev/#/patchset/20260529111208.387412-1-praan%40google.com
Please check Sashiko review (for other patches in this series too).
I think it'd be cleaner to just have:
- if (pci_enable_ats(pdev, stu))
+ if (WARN_ON(pci_enable_ats(pdev, stu)))
Nicolin
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats()
2026-05-29 11:12 ` [PATCH v6 3/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats() Pranjal Shrivastava
@ 2026-05-29 21:56 ` Nicolin Chen
2026-05-31 17:06 ` Pranjal Shrivastava
0 siblings, 1 reply; 15+ messages in thread
From: Nicolin Chen @ 2026-05-29 21:56 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
David Matlack, Samiullah Khawaja, Daniel Mentz, Pasha Tatashin,
Mostafa Saleh
On Fri, May 29, 2026 at 11:12:05AM +0000, Pranjal Shrivastava wrote:
> Currently, pci_prepare_ats() internally calls pci_ats_supported() and
> returns -EINVAL if the device does not support ATS. While this provides
> a silent safety check, it conflates support detection with configuration.
>
> Update pci_prepare_ats() to wrap the internal pci_ats_supported check in
> a WARN_ON(). This mandates all callers to call pci_prepare_ats() only if
> the function supports ATS.
>
> Update the function documentation to mention that callers must verify
> ATS support (via pci_ats_supported()) before calling pci_prepare_ats().
>
> Suggested-by: Baolu Lu <baolu.lu@linux.intel.com>
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
Sashiko pointed out a bisect issue. So, you might want to reorder
a bit. The patch itself looks good to me.
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 3/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats()
2026-05-29 21:56 ` Nicolin Chen
@ 2026-05-31 17:06 ` Pranjal Shrivastava
0 siblings, 0 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-31 17:06 UTC (permalink / raw)
To: Nicolin Chen
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
David Matlack, Samiullah Khawaja, Daniel Mentz, Pasha Tatashin,
Mostafa Saleh
On Fri, May 29, 2026 at 02:56:01PM -0700, Nicolin Chen wrote:
> On Fri, May 29, 2026 at 11:12:05AM +0000, Pranjal Shrivastava wrote:
> > Currently, pci_prepare_ats() internally calls pci_ats_supported() and
> > returns -EINVAL if the device does not support ATS. While this provides
> > a silent safety check, it conflates support detection with configuration.
> >
> > Update pci_prepare_ats() to wrap the internal pci_ats_supported check in
> > a WARN_ON(). This mandates all callers to call pci_prepare_ats() only if
> > the function supports ATS.
> >
> > Update the function documentation to mention that callers must verify
> > ATS support (via pci_ats_supported()) before calling pci_prepare_ats().
> >
> > Suggested-by: Baolu Lu <baolu.lu@linux.intel.com>
> > Signed-off-by: Pranjal Shrivastava <praan@google.com>
>
> Sashiko pointed out a bisect issue. So, you might want to reorder
> a bit. The patch itself looks good to me.
>
Ack. I'll re-order for bisectibility.
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Thanks!
Praan
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 4/6] iommu/arm-smmu-v3: Standardize ATS enablement failure reporting
2026-05-29 21:51 ` Nicolin Chen
@ 2026-05-31 17:13 ` Pranjal Shrivastava
0 siblings, 0 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-05-31 17:13 UTC (permalink / raw)
To: Nicolin Chen
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
David Matlack, Samiullah Khawaja, Daniel Mentz, Pasha Tatashin,
Mostafa Saleh
On Fri, May 29, 2026 at 02:51:52PM -0700, Nicolin Chen wrote:
> On Fri, May 29, 2026 at 11:12:06AM +0000, Pranjal Shrivastava wrote:
> > The SMMUv3 driver currently has a two-phase commit in its ATS enablement
> > flow. During arm_smmu_attach_prepare(), it predicts whether ATS will be
> > enabled using arm_smmu_ats_supported() and accordingly increments
> > nr_ats_masters and merges ATS invalidations into the domain's invs array.
> >
> > However, the actual hardware enablement via pci_enable_ats() happens
> > later in arm_smmu_attach_commit(). If this call to pci_enable_ats fails,
> > the SMMU driver's ATS state tracking remains polluted, i.e., the driver
> > tracks ATS as enabled on a master that is not actually using it. This
> > leads to an incorrect nr_ats_masters and triggers a warning in the PCI
> > core during detach:
> >
> > 1 [ 127.925080] ------------[ cut here ]------------
> > 2 [ 127.925084] WARNING: drivers/pci/ats.c:132 at pci_disable_ats+0x94/0xa8
> > 3 ...
> > 4 [ 128.068169] Call trace:
> > 5 [ 128.070603] pci_disable_ats+0x94/0xa8 (P)
> > 6 [ 128.074688] arm_smmu_attach_prepare+0x104/0x310
> > 7 [ 128.079292] arm_smmu_attach_dev_ste+0x128/0x1e0
> >
> > The issue was exposed under heavy load when running a VFIO-based DMA
> > map stress test (iova_stress).
> >
> > Following the addition of the arm_smmu_master_prepare_ats() [1] helper during
> > device probe, failable ATS configuration (STU setup) is now handled early
> > during probe. This ensures that any master reaching the attach phase is
> > guaranteed to have a valid ATS configuration.
> >
> > Update arm_smmu_enable_ats() to use the WARN() macro for any
> > subsequent enablement failures during the commit phase. Since probe
> > checks now preclude software configuration errors, any failure here is
> > considered a kernel bug.
>
> The commit message feels like mixing a stale background and the
> real requirement (based on the latest code line). Could that DMA
> map stress test still trigger the WARN_ON in pci_disable_ats(),
> after having arm_smmu_master_prepare_ats()?
>
> It'd be nicer if the writing can be simplified a bit.
Ack. I'll re-word and remove stale context.
>
> > arm_smmu_atc_inv_master(master, IOMMU_NO_PASID);
> > - if (pci_enable_ats(pdev, stu))
> > - dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
> > +
> > + /*
> > + * Any failure at this point is a kernel bug. pci_ats_supported()
> > + * and pci_prepare_ats() have already verified the hardware capability
> > + * and programmed the STU. Thus, pci_enable_ats() should not fail here.
> > + */
>
> The patch that removes pci_ats_supported() from pci_prepare_ats()
> is dropped in this v6. So, my previous comments may stay true and
> the two lines can be enough?
>
> /*
> * As pci_prepare_ats() have already verified the hardware capability
> * and programmed the STE, pci_enable_ats() should not fail here.
> */
>
> > + WARN(pci_enable_ats(pdev, stu),
> > + "Failed to enable ATS (STU %zu)\n", stu);
Ack. I'll update this.
>
> https://sashiko.dev/#/patchset/20260529111208.387412-1-praan%40google.com
> Please check Sashiko review (for other patches in this series too).
Yup, already sent out a series [1] to address Sashiko findings
separately.
>
> I think it'd be cleaner to just have:
>
> - if (pci_enable_ats(pdev, stu))
> + if (WARN_ON(pci_enable_ats(pdev, stu)))
Sure.. I'll also maybe keep the dev_err log that we have, knowing STU
mismatch is slightly helpful.
Thanks,
Praan
[1] https://lore.kernel.org/all/20260531170254.60493-1-praan@google.com/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 6/6] iommu/amd: Fail probe on ATS configuration failure
2026-05-29 11:12 ` [PATCH v6 6/6] iommu/amd: " Pranjal Shrivastava
@ 2026-06-01 6:00 ` Ankit Soni
2026-06-01 6:20 ` Pranjal Shrivastava
0 siblings, 1 reply; 15+ messages in thread
From: Ankit Soni @ 2026-06-01 6:00 UTC (permalink / raw)
To: Pranjal Shrivastava
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh
On Fri, May 29, 2026 at 11:12:08AM +0000, Pranjal Shrivastava wrote:
> Update the AMD IOMMU driver to handle ATS configuration and enablement
> more strictly. Specifically, update the device probe to fail if
> pci_prepare_ats() returns an error. This ensures that any ATS-capable
> master reaching the attach phase is guaranteed to have a valid config.
>
> Additionally, update pdev_enable_cap_ats() to WARN_ON() if pci_enable_ats
> fails. Since earlier checks in the probe phase preclude config-related
> failures, any failure during hardware enablement is considered a kernel
> bug.
>
> Fix a pre-existing Use-After-Free risk by ensuring iommu_ignore_device()
> is called on all probe failures after iommu_init_device().
>
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
> ---
> drivers/iommu/amd/iommu.c | 30 ++++++++++++++++++++++++------
> 1 file changed, 24 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 84cad43dc188..b74c4504bee3 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -570,10 +570,16 @@ static inline int pdev_enable_cap_ats(struct pci_dev *pdev)
> if (amd_iommu_iotlb_sup &&
> (dev_data->flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP)) {
> ret = pci_enable_ats(pdev, PAGE_SHIFT);
> - if (!ret) {
> - dev_data->ats_enabled = 1;
> - dev_data->ats_qdep = pci_ats_queue_depth(pdev);
> - }
> + /*
> + * pci_enable_ats() should not fail here because earlier checks
> + * have already verified support and configuration.
> + */
> + if (WARN_ON(ret))
> + return ret;
> +
> + dev_data->ats_enabled = 1;
> + dev_data->ats_qdep = pci_ats_queue_depth(pdev);
> + ret = 0;
> }
>
> return ret;
> @@ -2502,10 +2508,22 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
> else
> dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
>
> - if (dev_is_pci(dev))
> - pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
> + if (dev_is_pci(dev)) {
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + if (pci_ats_supported(pdev)) {
> + ret = pci_prepare_ats(pdev, PAGE_SHIFT);
> + if (ret) {
> + iommu_dev = ERR_PTR(ret);
> + goto out_err;
> + }
> + }
> + }
>
> out_err:
> + if (IS_ERR(iommu_dev))
> + iommu_ignore_device(iommu, dev);
> +
> return iommu_dev;
> }
>
Hi,
This regresses IRQ remapping in the PD_MODE_NONE branch. By design
rlookup_table[devid] must stay valid for IR - init.c:2257 documents
this: "Do not return an error to enable IRQ remapping ...". Pre-patch
the PD_MODE_NONE branch returned ERR_PTR(-ENODEV) without nulling
rlookup, precisely so irq_remapping_alloc() / __rlookup_amd_iommu()
keep working; this unconditional cleanup violates that.
The new pci_prepare_ats() failure path has the same shape:
amd_iommu_set_pci_msi_domain() ran earlier and parented dev->msi_domain
on iommu->ir_domain, but on this new out_err that's not unwound. So
nulling rlookup_table[devid] makes irq_remapping_alloc() return -EINVAL
on the first MSI alloc for the device. Sashiko also flagged this in [1];
Also if iommu_init_device() branch fails, iommu_ignore_device() will be
called twice.
[1] https://lore.kernel.org/r/20260529153216.2AD1E1F00899@smtp.kernel.org
-Ankit
> --
> 2.54.0.823.g6e5bcc1fc9-goog
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 6/6] iommu/amd: Fail probe on ATS configuration failure
2026-06-01 6:00 ` Ankit Soni
@ 2026-06-01 6:20 ` Pranjal Shrivastava
2026-06-01 8:17 ` Ankit Soni
2026-06-01 9:28 ` Vasant Hegde
0 siblings, 2 replies; 15+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 6:20 UTC (permalink / raw)
To: Ankit Soni
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh
On Mon, Jun 01, 2026 at 06:00:15AM +0000, Ankit Soni wrote:
> > @@ -2502,10 +2508,22 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
> > else
> > dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
> >
> > - if (dev_is_pci(dev))
> > - pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
> > + if (dev_is_pci(dev)) {
> > + struct pci_dev *pdev = to_pci_dev(dev);
> > +
> > + if (pci_ats_supported(pdev)) {
> > + ret = pci_prepare_ats(pdev, PAGE_SHIFT);
> > + if (ret) {
> > + iommu_dev = ERR_PTR(ret);
> > + goto out_err;
> > + }
> > + }
> > + }
> >
> > out_err:
> > + if (IS_ERR(iommu_dev))
> > + iommu_ignore_device(iommu, dev);
> > +
> > return iommu_dev;
> > }
> >
>
> Hi,
> This regresses IRQ remapping in the PD_MODE_NONE branch. By design
> rlookup_table[devid] must stay valid for IR - init.c:2257 documents
> this: "Do not return an error to enable IRQ remapping ...". Pre-patch
> the PD_MODE_NONE branch returned ERR_PTR(-ENODEV) without nulling
> rlookup, precisely so irq_remapping_alloc() / __rlookup_amd_iommu()
> keep working; this unconditional cleanup violates that.
> The new pci_prepare_ats() failure path has the same shape:
> amd_iommu_set_pci_msi_domain() ran earlier and parented dev->msi_domain
> on iommu->ir_domain, but on this new out_err that's not unwound. So
> nulling rlookup_table[devid] makes irq_remapping_alloc() return -EINVAL
> on the first MSI alloc for the device. Sashiko also flagged this in [1];
>
> Also if iommu_init_device() branch fails, iommu_ignore_device() will be
> called twice.
>
Hi Ankit,
Ack. Sashiko made me realize that this regresses IRQ mapping for AMD,
and I agree that the call to iommu_ignore_device() is a bit too
aggressive as it wipes the rlookup_table entry required for IRQ
remapping, particularly in PD_MODE_NONE.
I was thinkig to address this in the next version as follows:
1. Split the probe error paths:
- Proper init failures (like iommu_init_device) will continue to call
iommu_ignore_device(). I will fix the double invocation here.
- Config failures (like ATS mismatch or PD_MODE_NONE) will return an
an error but skip caling iommu_ignore_device(), preserving the
rlookup_table entry for IRQ remapping.
2. Resolve the Use-After-Free (UAF):
To prevent the UAF on the "DMA-only" failure path, I will ensure that
the hardware Device Table Entry (DTE) is set to a safe state (like
blocked or bypass) and the dev_data->dev pointer is cleared, as the
IOMMU core does not invoke release_device() after a probe failure.
3. Fix iommu_ignore_device() infrastructure:
I will address the pre-existing bugs identified by Sashiko:
- Fix clearing order (calling setup_aliases before clearing the
rlookup_table).
- Replace the non-atomic memset() on the hardware dev_table with an
atomic DTE update.
That said, I'm investigating the safest way to revert the MSI domain
assignment on probe failure to avoid the dangling domain issue pointed
out by Sashiko. Maybe we can add an amd_iommu_restore_msi_domain() helper
to revert the assignment made in amd_iommu_set_pci_msi_domain() on probe
failure?
Please, let me know if that sounds okay?
Also, I'm wondering if I should send this as a separate series specific
to AMD which is unrelated to this one? Or maybe handle AMD IOMMU in a
separate series altogether. Let me know if you (and Vasanth / Suravee)
would prefer that?
Thanks,
Pranjal
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 6/6] iommu/amd: Fail probe on ATS configuration failure
2026-06-01 6:20 ` Pranjal Shrivastava
@ 2026-06-01 8:17 ` Ankit Soni
2026-06-01 9:28 ` Vasant Hegde
1 sibling, 0 replies; 15+ messages in thread
From: Ankit Soni @ 2026-06-01 8:17 UTC (permalink / raw)
To: Pranjal Shrivastava, Vasant Hegde
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh
On Mon, Jun 01, 2026 at 06:20:58AM +0000, Pranjal Shrivastava wrote:
> On Mon, Jun 01, 2026 at 06:00:15AM +0000, Ankit Soni wrote:
> > > @@ -2502,10 +2508,22 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
> > > else
> > > dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
> > >
> > > - if (dev_is_pci(dev))
> > > - pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
> > > + if (dev_is_pci(dev)) {
> > > + struct pci_dev *pdev = to_pci_dev(dev);
> > > +
> > > + if (pci_ats_supported(pdev)) {
> > > + ret = pci_prepare_ats(pdev, PAGE_SHIFT);
> > > + if (ret) {
> > > + iommu_dev = ERR_PTR(ret);
> > > + goto out_err;
> > > + }
> > > + }
> > > + }
> > >
> > > out_err:
> > > + if (IS_ERR(iommu_dev))
> > > + iommu_ignore_device(iommu, dev);
> > > +
> > > return iommu_dev;
> > > }
> > >
> >
> > Hi,
> > This regresses IRQ remapping in the PD_MODE_NONE branch. By design
> > rlookup_table[devid] must stay valid for IR - init.c:2257 documents
> > this: "Do not return an error to enable IRQ remapping ...". Pre-patch
> > the PD_MODE_NONE branch returned ERR_PTR(-ENODEV) without nulling
> > rlookup, precisely so irq_remapping_alloc() / __rlookup_amd_iommu()
> > keep working; this unconditional cleanup violates that.
> > The new pci_prepare_ats() failure path has the same shape:
> > amd_iommu_set_pci_msi_domain() ran earlier and parented dev->msi_domain
> > on iommu->ir_domain, but on this new out_err that's not unwound. So
> > nulling rlookup_table[devid] makes irq_remapping_alloc() return -EINVAL
> > on the first MSI alloc for the device. Sashiko also flagged this in [1];
> >
> > Also if iommu_init_device() branch fails, iommu_ignore_device() will be
> > called twice.
> >
>
> Hi Ankit,
>
> Ack. Sashiko made me realize that this regresses IRQ mapping for AMD,
> and I agree that the call to iommu_ignore_device() is a bit too
> aggressive as it wipes the rlookup_table entry required for IRQ
> remapping, particularly in PD_MODE_NONE.
>
> I was thinkig to address this in the next version as follows:
>
> 1. Split the probe error paths:
> - Proper init failures (like iommu_init_device) will continue to call
> iommu_ignore_device(). I will fix the double invocation here.
>
> - Config failures (like ATS mismatch or PD_MODE_NONE) will return an
> an error but skip caling iommu_ignore_device(), preserving the
> rlookup_table entry for IRQ remapping.
>
> 2. Resolve the Use-After-Free (UAF):
> To prevent the UAF on the "DMA-only" failure path, I will ensure that
> the hardware Device Table Entry (DTE) is set to a safe state (like
> blocked or bypass) and the dev_data->dev pointer is cleared, as the
> IOMMU core does not invoke release_device() after a probe failure.
>
> 3. Fix iommu_ignore_device() infrastructure:
> I will address the pre-existing bugs identified by Sashiko:
> - Fix clearing order (calling setup_aliases before clearing the
> rlookup_table).
> - Replace the non-atomic memset() on the hardware dev_table with an
> atomic DTE update.
>
> That said, I'm investigating the safest way to revert the MSI domain
> assignment on probe failure to avoid the dangling domain issue pointed
> out by Sashiko. Maybe we can add an amd_iommu_restore_msi_domain() helper
> to revert the assignment made in amd_iommu_set_pci_msi_domain() on probe
> failure?
>
> Please, let me know if that sounds okay?
>
> Also, I'm wondering if I should send this as a separate series specific
> to AMD which is unrelated to this one? Or maybe handle AMD IOMMU in a
> separate series altogether. Let me know if you (and Vasanth / Suravee)
> would prefer that?
>
> Thanks,
> Pranjal
Hi Pranjal,
Plan looks good. One pushback: I don't think you need the
amd_iommu_restore_msi_domain() helper.
If point 1 preserves rlookup_table on the PD_MODE_NONE and
pci_prepare_ats() failure paths, dev->msi_domain pointing at
iommu->ir_domain stays functional - irq_remapping_alloc() /
__rlookup_amd_iommu() find the iommu and the chain works.
So fixing rlookup makes the MSI assignment correct,
not dangling - no restore needed.
On splitting: While patches 1-5 are essentially settled. I'd lean
toward pulling AMD into a separate follow-up so the rest doesn't wait,
but defer to Vasant/Suravee on that.
+Vasant
Thanks,
Ankit
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v6 6/6] iommu/amd: Fail probe on ATS configuration failure
2026-06-01 6:20 ` Pranjal Shrivastava
2026-06-01 8:17 ` Ankit Soni
@ 2026-06-01 9:28 ` Vasant Hegde
1 sibling, 0 replies; 15+ messages in thread
From: Vasant Hegde @ 2026-06-01 9:28 UTC (permalink / raw)
To: Pranjal Shrivastava, Ankit Soni
Cc: iommu, linux-pci, linux-arm-kernel, linux-kernel, Joerg Roedel,
Will Deacon, Bjorn Helgaas, David Woodhouse, Lu Baolu,
Robin Murphy, Suravee Suthikulpanit, Jason Gunthorpe,
Nicolin Chen, David Matlack, Samiullah Khawaja, Daniel Mentz,
Pasha Tatashin, Mostafa Saleh
Hi Pranjal,
On 6/1/2026 11:50 AM, Pranjal Shrivastava wrote:
> On Mon, Jun 01, 2026 at 06:00:15AM +0000, Ankit Soni wrote:
>>> @@ -2502,10 +2508,22 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
>>> else
>>> dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
>>>
>>> - if (dev_is_pci(dev))
>>> - pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
>>> + if (dev_is_pci(dev)) {
>>> + struct pci_dev *pdev = to_pci_dev(dev);
>>> +
>>> + if (pci_ats_supported(pdev)) {
>>> + ret = pci_prepare_ats(pdev, PAGE_SHIFT);
>>> + if (ret) {
>>> + iommu_dev = ERR_PTR(ret);
>>> + goto out_err;
>>> + }
>>> + }
>>> + }
>>>
>>> out_err:
>>> + if (IS_ERR(iommu_dev))
>>> + iommu_ignore_device(iommu, dev);
>>> +
>>> return iommu_dev;
>>> }
>>>
>>
>> Hi,
>> This regresses IRQ remapping in the PD_MODE_NONE branch. By design
>> rlookup_table[devid] must stay valid for IR - init.c:2257 documents
>> this: "Do not return an error to enable IRQ remapping ...". Pre-patch
>> the PD_MODE_NONE branch returned ERR_PTR(-ENODEV) without nulling
>> rlookup, precisely so irq_remapping_alloc() / __rlookup_amd_iommu()
>> keep working; this unconditional cleanup violates that.
>> The new pci_prepare_ats() failure path has the same shape:
>> amd_iommu_set_pci_msi_domain() ran earlier and parented dev->msi_domain
>> on iommu->ir_domain, but on this new out_err that's not unwound. So
>> nulling rlookup_table[devid] makes irq_remapping_alloc() return -EINVAL
>> on the first MSI alloc for the device. Sashiko also flagged this in [1];
>>
>> Also if iommu_init_device() branch fails, iommu_ignore_device() will be
>> called twice.
>>
>
> Hi Ankit,
>
> Ack. Sashiko made me realize that this regresses IRQ mapping for AMD,
> and I agree that the call to iommu_ignore_device() is a bit too
> aggressive as it wipes the rlookup_table entry required for IRQ
> remapping, particularly in PD_MODE_NONE.
>
> I was thinkig to address this in the next version as follows:
>
> 1. Split the probe error paths:
> - Proper init failures (like iommu_init_device) will continue to call
> iommu_ignore_device(). I will fix the double invocation here.
>
> - Config failures (like ATS mismatch or PD_MODE_NONE) will return an
> an error but skip caling iommu_ignore_device(), preserving the
> rlookup_table entry for IRQ remapping.
I was reviewing v5 last Friday and decided to fix probe() code as its been long
time I wanted to cleanup this code. I have a patch series which pretty much
does this.
I haven't fixed iommu_ignore_device() code, but should be simple to fix it.
we can use amd_iommu_make_clear_dte / amd_iommu_update_dte. It will set the
dte.tv=0 and essentially blocks all DMAs.
If you already have the patches then please go ahead, else I can post the series
this week.
>
> 2. Resolve the Use-After-Free (UAF):
> To prevent the UAF on the "DMA-only" failure path, I will ensure that
> the hardware Device Table Entry (DTE) is set to a safe state (like
> blocked or bypass) and the dev_data->dev pointer is cleared, as the
> IOMMU core does not invoke release_device() after a probe failure.
>
> 3. Fix iommu_ignore_device() infrastructure:
> I will address the pre-existing bugs identified by Sashiko:
> - Fix clearing order (calling setup_aliases before clearing the
> rlookup_table).
> - Replace the non-atomic memset() on the hardware dev_table with an
> atomic DTE update.
>
> That said, I'm investigating the safest way to revert the MSI domain
> assignment on probe failure to avoid the dangling domain issue pointed
> out by Sashiko. Maybe we can add an amd_iommu_restore_msi_domain() helper
> to revert the assignment made in amd_iommu_set_pci_msi_domain() on probe
> failure?
>
> Please, let me know if that sounds okay?
>
> Also, I'm wondering if I should send this as a separate series specific
> to AMD which is unrelated to this one? Or maybe handle AMD IOMMU in a
> separate series altogether. Let me know if you (and Vasanth / Suravee)
> would prefer that?
Lets separate out AMD fixes part as its not related to this series.
If you want to keep this patch then just the "iommu_ignore_device" part that
should be OK -OR- if you want to drop entirely and pick it up with AMD specific
series that's also works for me.
-Vasant
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-06-01 9:28 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-29 11:12 [PATCH v6 0/6] iommu: Standardize ATS robustness and state tracking Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 1/6] PCI/ATS: Ensure pci_ats_supported() is PF-aware for VFs Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 2/6] PCI/ATS: Validate STU for VFs in pci_prepare_ats() Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 3/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats() Pranjal Shrivastava
2026-05-29 21:56 ` Nicolin Chen
2026-05-31 17:06 ` Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 4/6] iommu/arm-smmu-v3: Standardize ATS enablement failure reporting Pranjal Shrivastava
2026-05-29 21:51 ` Nicolin Chen
2026-05-31 17:13 ` Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 5/6] iommu/vt-d: Fail probe on ATS configuration failure Pranjal Shrivastava
2026-05-29 11:12 ` [PATCH v6 6/6] iommu/amd: " Pranjal Shrivastava
2026-06-01 6:00 ` Ankit Soni
2026-06-01 6:20 ` Pranjal Shrivastava
2026-06-01 8:17 ` Ankit Soni
2026-06-01 9:28 ` Vasant Hegde
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox