Linux IOMMU Development
 help / color / mirror / Atom feed
* [PATCH 0/6] iommu/amd: Refactors for ATS updates
@ 2026-06-01 13:41 Pranjal Shrivastava
  2026-06-01 13:41 ` [PATCH 1/6] iommu/amd: Clear aliases before setting the rlookup_table to NULL Pranjal Shrivastava
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 13:41 UTC (permalink / raw)
  To: iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Vasant Hegde, Ankit Soni,
	Jason Gunthorpe, Bjorn Helgaas, Samiullah Khawaja,
	Pranjal Shrivastava

This series addresses some issues identified by Sashiko in the AMD IOMMU
driver during the review of the subsystem-wide ATS update work [1].

Patches 1-2 address the  pre-existing bugs in iommu_ignore_device()
regarding the order of alias clearing and cleaning up the DTEs. 

Patches 3-4 refine the probe error paths. As noted[2] by Ankit & Sashiko,
unconditionally calling iommu_ignore_device() on probe failure breaks
IRQ remapping for devices in PD_MODE_NONE. We now split the error paths
in probe_device to preserve interrupt mapping for non-fatal failures
while ensuring that dangling device pointers are cleared to prevent 
potential UAFs.

Finally, patch 5 implements the Fail Hard pattern being standardized
for ATS, ensuring configuration errors are caught during probe_device and
ATS enablement failures are reported with a WARN_ON().

Patch 6 is carried forward as is from the original ATS work [3] to
maintain bisectibility.

[1] https://lore.kernel.org/all/20260529111208.387412-1-praan@google.com/
[2] https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
[3] https://lore.kernel.org/all/20260529111208.387412-4-praan@google.com/

Thanks,
Praan

Pranjal Shrivastava (6):
  iommu/amd: Clear aliases before setting the rlookup_table to NULL
  iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device()
  iommu/amd: Split probe error paths to preserve IRQ remapping
  iommu/amd: Fix Use-After-Free in non-fatal probe error path
  iommu/amd: Fail probe on ATS configuration failure
  PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats()

 drivers/iommu/amd/iommu.c | 61 ++++++++++++++++++++++++++++++++-------
 drivers/pci/ats.c         |  6 ++--
 2 files changed, 55 insertions(+), 12 deletions(-)

base-commit: 283d245468a2b61c41aa8b582f25ed5615d1c304
-- 
2.54.0.823.g6e5bcc1fc9-goog


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/6] iommu/amd: Clear aliases before setting the rlookup_table to NULL
  2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
@ 2026-06-01 13:41 ` Pranjal Shrivastava
  2026-06-01 13:42 ` [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device() Pranjal Shrivastava
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 13:41 UTC (permalink / raw)
  To: iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Vasant Hegde, Ankit Soni,
	Jason Gunthorpe, Bjorn Helgaas, Samiullah Khawaja,
	Pranjal Shrivastava, sashiko-bot

In iommu_ignore_device(), clearing the primary devid in the lookup table
before calling setup_aliases() causes rlookup_amd_iommu() to fail for
aliases. This prevents clearing the Device Table Entries (DTEs) for
DMA aliases.

Fix this by moving the setup_aliases() call before clearing the
rlookup_table entry.

Fixes: 99fc4ac3d297 ("iommu/amd: Introduce per PCI segment alias_table")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 drivers/iommu/amd/iommu.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f1333071da10..a94de66a885e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -745,10 +745,12 @@ static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 		return;
 
 	devid = PCI_SBDF_TO_DEVID(sbdf);
-	pci_seg->rlookup_table[devid] = NULL;
-	memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
 
+	/* Clear aliases while rlookup is still valid */
 	setup_aliases(iommu, dev);
+
+	pci_seg->rlookup_table[devid] = NULL;
+	memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
 }
 
 
-- 
2.54.0.823.g6e5bcc1fc9-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device()
  2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
  2026-06-01 13:41 ` [PATCH 1/6] iommu/amd: Clear aliases before setting the rlookup_table to NULL Pranjal Shrivastava
@ 2026-06-01 13:42 ` Pranjal Shrivastava
  2026-06-01 14:19   ` Jason Gunthorpe
  2026-06-02  8:55   ` Vasant Hegde
  2026-06-01 13:42 ` [PATCH 3/6] iommu/amd: Split probe error paths to preserve IRQ remapping Pranjal Shrivastava
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 13:42 UTC (permalink / raw)
  To: iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Vasant Hegde, Ankit Soni,
	Jason Gunthorpe, Bjorn Helgaas, Samiullah Khawaja,
	Pranjal Shrivastava, sashiko-bot

The iommu_ignore_device() function currently uses memset() to clear
the Device Table Entry (DTE), which risks torn writes because the
hardware reads DTEs as atomic 256-bit qwords. Fix this by using
update_dte256() to perform a hardware-safe atomic clear when a live
dev_data entry is available.

Fixes: 99fc4ac3d297 ("iommu/amd: Introduce per PCI segment alias_table")
Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 drivers/iommu/amd/iommu.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a94de66a885e..9b5861e241d7 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -750,7 +750,16 @@ static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 	setup_aliases(iommu, dev);
 
 	pci_seg->rlookup_table[devid] = NULL;
-	memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
+
+	/* Clear DTE if we have a live entry */
+	if (dev_data) {
+		struct dev_table_entry new = {};
+
+		amd_iommu_make_clear_dte(dev_data, &new);
+		update_dte256(iommu, dev_data, &new);
+	} else {
+		memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
+	}
 }
 
 
-- 
2.54.0.823.g6e5bcc1fc9-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/6] iommu/amd: Split probe error paths to preserve IRQ remapping
  2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
  2026-06-01 13:41 ` [PATCH 1/6] iommu/amd: Clear aliases before setting the rlookup_table to NULL Pranjal Shrivastava
  2026-06-01 13:42 ` [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device() Pranjal Shrivastava
@ 2026-06-01 13:42 ` Pranjal Shrivastava
  2026-06-01 13:42 ` [PATCH 4/6] iommu/amd: Fix Use-After-Free in non-fatal probe error path Pranjal Shrivastava
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 13:42 UTC (permalink / raw)
  To: iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Vasant Hegde, Ankit Soni,
	Jason Gunthorpe, Bjorn Helgaas, Samiullah Khawaja,
	Pranjal Shrivastava, sashiko-bot

Split the amd_iommu_probe_device() error paths into err_deinit and
out_err. Proper init failures continue to call iommu_ignore_device()
while configuration failures (like PD_MODE_NONE or ATS mismatches) skip
it to preserve the rlookup_table entry required for IRQ remapping.

Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
Suggested-by: Ankit Soni <ankit.soni@amd.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 drivers/iommu/amd/iommu.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 9b5861e241d7..c3b3750d4a22 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -737,6 +737,7 @@ static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
 static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 {
 	struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
+	struct iommu_dev_data *dev_data = dev_iommu_priv_get(dev);
 	struct dev_table_entry *dev_table = get_dev_table(iommu);
 	int devid, sbdf;
 
@@ -2477,8 +2478,7 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 	if (ret) {
 		dev_err(dev, "Failed to initialize - trying to proceed anyway\n");
 		iommu_dev = ERR_PTR(ret);
-		iommu_ignore_device(iommu, dev);
-		goto out_err;
+		goto err_deinit;
 	}
 
 	amd_iommu_set_pci_msi_domain(dev, iommu);
@@ -2512,6 +2512,10 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 	if (dev_is_pci(dev))
 		pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
 
+	return iommu_dev;
+
+err_deinit:
+	iommu_ignore_device(iommu, dev);
 out_err:
 	return iommu_dev;
 }
-- 
2.54.0.823.g6e5bcc1fc9-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 4/6] iommu/amd: Fix Use-After-Free in non-fatal probe error path
  2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
                   ` (2 preceding siblings ...)
  2026-06-01 13:42 ` [PATCH 3/6] iommu/amd: Split probe error paths to preserve IRQ remapping Pranjal Shrivastava
@ 2026-06-01 13:42 ` Pranjal Shrivastava
  2026-06-01 13:42 ` [PATCH 5/6] iommu/amd: Fail probe on ATS configuration failure Pranjal Shrivastava
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 13:42 UTC (permalink / raw)
  To: iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Vasant Hegde, Ankit Soni,
	Jason Gunthorpe, Bjorn Helgaas, Samiullah Khawaja,
	Pranjal Shrivastava, sashiko-bot

When amd_iommu_probe_device() fails for optional features (e.g.
PD_MODE_NONE), the dev_data remains globally linked with a pointer
to the struct device. Since the IOMMU core does not invoke
release_device() after a probe failure, this could lead to a UAF once
the physical device is removed.

Fix this by introducing amd_iommu_clear_device_dte() to clear the HW
state and setting dev_data->dev = NULL on the non-fatal error path.

Reported-by: sashiko-bot@kernel.org
Closes: https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 drivers/iommu/amd/iommu.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index c3b3750d4a22..4ef6024c5a4e 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -734,6 +734,15 @@ static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
 	return 0;
 }
 
+static void amd_iommu_clear_device_dte(struct amd_iommu *iommu,
+				       struct iommu_dev_data *dev_data)
+{
+	struct dev_table_entry new = {};
+
+	amd_iommu_make_clear_dte(dev_data, &new);
+	update_dte256(iommu, dev_data, &new);
+}
+
 static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 {
 	struct amd_iommu_pci_seg *pci_seg = iommu->pci_seg;
@@ -753,14 +762,10 @@ static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
 	pci_seg->rlookup_table[devid] = NULL;
 
 	/* Clear DTE if we have a live entry */
-	if (dev_data) {
-		struct dev_table_entry new = {};
-
-		amd_iommu_make_clear_dte(dev_data, &new);
-		update_dte256(iommu, dev_data, &new);
-	} else {
+	if (dev_data)
+		amd_iommu_clear_device_dte(iommu, dev_data);
+	else
 		memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
-	}
 }
 
 
@@ -2517,6 +2522,11 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 err_deinit:
 	iommu_ignore_device(iommu, dev);
 out_err:
+	if (dev_data) {
+		amd_iommu_clear_device_dte(iommu, dev_data);
+		dev_data->dev = NULL;
+	}
+
 	return iommu_dev;
 }
 
-- 
2.54.0.823.g6e5bcc1fc9-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 5/6] iommu/amd: Fail probe on ATS configuration failure
  2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
                   ` (3 preceding siblings ...)
  2026-06-01 13:42 ` [PATCH 4/6] iommu/amd: Fix Use-After-Free in non-fatal probe error path Pranjal Shrivastava
@ 2026-06-01 13:42 ` Pranjal Shrivastava
  2026-06-01 13:42 ` [PATCH 6/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats() Pranjal Shrivastava
  2026-06-02  8:49 ` [PATCH 0/6] iommu/amd: Refactors for ATS updates Vasant Hegde
  6 siblings, 0 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 13:42 UTC (permalink / raw)
  To: iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Vasant Hegde, Ankit Soni,
	Jason Gunthorpe, Bjorn Helgaas, Samiullah Khawaja,
	Pranjal Shrivastava

Update the driver to call pci_prepare_ats() after checking if
pci_ats_supported() and fail the probe_device if pci_prepare_ats()
returns an error. Additionally, update pdev_enable_cap_ats() to WARN_ON()
a failure in pci_enable_ats().

Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 drivers/iommu/amd/iommu.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 4ef6024c5a4e..783f53cb8599 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -566,10 +566,17 @@ static inline int pdev_enable_cap_ats(struct pci_dev *pdev)
 	if (amd_iommu_iotlb_sup &&
 	    (dev_data->flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP)) {
 		ret = pci_enable_ats(pdev, PAGE_SHIFT);
-		if (!ret) {
-			dev_data->ats_enabled = 1;
-			dev_data->ats_qdep    = pci_ats_queue_depth(pdev);
-		}
+
+		/*
+		 * pci_enable_ats() should not fail here because earlier
+		 * checks have already verified support & config.
+		 */
+		if (WARN_ON(ret))
+			return ret;
+
+		dev_data->ats_enabled = 1;
+		dev_data->ats_qdep    = pci_ats_queue_depth(pdev);
+		ret = 0;
 	}
 
 	return ret;
@@ -2514,8 +2521,17 @@ static struct iommu_device *amd_iommu_probe_device(struct device *dev)
 	else
 		dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
 
-	if (dev_is_pci(dev))
-		pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
+	if (dev_is_pci(dev)) {
+		struct pci_dev *pdev = to_pci_dev(dev);
+
+		if (pci_ats_supported(pdev)) {
+			ret = pci_prepare_ats(pdev, PAGE_SHIFT);
+			if (ret) {
+				iommu_dev = ERR_PTR(ret);
+				goto out_err;
+			}
+		}
+	}
 
 	return iommu_dev;
 
-- 
2.54.0.823.g6e5bcc1fc9-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 6/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats()
  2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
                   ` (4 preceding siblings ...)
  2026-06-01 13:42 ` [PATCH 5/6] iommu/amd: Fail probe on ATS configuration failure Pranjal Shrivastava
@ 2026-06-01 13:42 ` Pranjal Shrivastava
  2026-06-02  8:49 ` [PATCH 0/6] iommu/amd: Refactors for ATS updates Vasant Hegde
  6 siblings, 0 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-01 13:42 UTC (permalink / raw)
  To: iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Vasant Hegde, Ankit Soni,
	Jason Gunthorpe, Bjorn Helgaas, Samiullah Khawaja,
	Pranjal Shrivastava, Baolu Lu, Nicolin Chen

Currently, pci_prepare_ats() internally calls pci_ats_supported() and
returns -EINVAL if the device does not support ATS. While this provides
a silent safety check, it conflates support detection with configuration.

Update pci_prepare_ats() to wrap the internal pci_ats_supported check in
a WARN_ON(). This mandates all callers to call pci_prepare_ats() only if
the function supports ATS.

Update the function documentation to mention that callers must verify
ATS support (via pci_ats_supported()) before calling pci_prepare_ats().

Suggested-by: Baolu Lu <baolu.lu@linux.intel.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Pranjal Shrivastava <praan@google.com>
---
 drivers/pci/ats.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index ec6c8dbdc5e9..e10a0ceb2645 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -53,7 +53,9 @@ EXPORT_SYMBOL_GPL(pci_ats_supported);
  * @ps: the IOMMU page shift
  *
  * This must be done by the IOMMU driver on the PF before any VFs are created to
- * ensure that the VF can have ATS enabled.
+ * ensure that the VF can have ATS enabled. Callers must verify that ATS is
+ * supported by the device (e.g. via pci_ats_supported()) before calling this
+ * function.
  *
  * Returns 0 on success, or negative on failure.
  */
@@ -61,7 +63,7 @@ int pci_prepare_ats(struct pci_dev *dev, int ps)
 {
 	u16 ctrl;
 
-	if (!pci_ats_supported(dev))
+	if (WARN_ON(!pci_ats_supported(dev)))
 		return -EINVAL;
 
 	if (WARN_ON(dev->ats_enabled))
-- 
2.54.0.823.g6e5bcc1fc9-goog


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device()
  2026-06-01 13:42 ` [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device() Pranjal Shrivastava
@ 2026-06-01 14:19   ` Jason Gunthorpe
  2026-06-02  4:01     ` Pranjal Shrivastava
  2026-06-02  8:55   ` Vasant Hegde
  1 sibling, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2026-06-01 14:19 UTC (permalink / raw)
  To: Pranjal Shrivastava
  Cc: iommu, linux-pci, linux-kernel, Joerg Roedel,
	Suravee Suthikulpanit, Vasant Hegde, Ankit Soni, Bjorn Helgaas,
	Samiullah Khawaja, sashiko-bot

On Mon, Jun 01, 2026 at 01:42:00PM +0000, Pranjal Shrivastava wrote:
> The iommu_ignore_device() function currently uses memset() to clear
> the Device Table Entry (DTE), which risks torn writes because the
> hardware reads DTEs as atomic 256-bit qwords. Fix this by using
> update_dte256() to perform a hardware-safe atomic clear when a live
> dev_data entry is available.
> 
> Fixes: 99fc4ac3d297 ("iommu/amd: Introduce per PCI segment alias_table")
> Reported-by: sashiko-bot@kernel.org
> Closes: https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
> ---
>  drivers/iommu/amd/iommu.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a94de66a885e..9b5861e241d7 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -750,7 +750,16 @@ static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
>  	setup_aliases(iommu, dev);
>  
>  	pci_seg->rlookup_table[devid] = NULL;
> -	memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
> +
> +	/* Clear DTE if we have a live entry */
> +	if (dev_data) {
> +		struct dev_table_entry new = {};
> +
> +		amd_iommu_make_clear_dte(dev_data, &new);
> +		update_dte256(iommu, dev_data, &new);
> +	} else {
> +		memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
> +	}

This seems a little weird, an ignored device shouldn't have a dev_data
really, or it will soon be freed.

I think you are better to replace the memset with a dedicated function

/* Cannot not be used on a probe'd device with a live dev_data */
disable_dte(..)
{
	struct dev_table_entry new = {};

	write_dte_lower128(ptr, new);
	write_dte_upper128(ptr, new);
}

And then this new ordering breaks the clone_aliases flow, it was
supposed to copy the 0 from the current DTE to the aliases..

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device()
  2026-06-01 14:19   ` Jason Gunthorpe
@ 2026-06-02  4:01     ` Pranjal Shrivastava
  0 siblings, 0 replies; 13+ messages in thread
From: Pranjal Shrivastava @ 2026-06-02  4:01 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, linux-pci, linux-kernel, Joerg Roedel,
	Suravee Suthikulpanit, Vasant Hegde, Ankit Soni, Bjorn Helgaas,
	Samiullah Khawaja, sashiko-bot

On Mon, Jun 01, 2026 at 11:19:56AM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 01, 2026 at 01:42:00PM +0000, Pranjal Shrivastava wrote:
> > The iommu_ignore_device() function currently uses memset() to clear
> > the Device Table Entry (DTE), which risks torn writes because the
> > hardware reads DTEs as atomic 256-bit qwords. Fix this by using
> > update_dte256() to perform a hardware-safe atomic clear when a live
> > dev_data entry is available.
> > 
> > Fixes: 99fc4ac3d297 ("iommu/amd: Introduce per PCI segment alias_table")
> > Reported-by: sashiko-bot@kernel.org
> > Closes: https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
> > Signed-off-by: Pranjal Shrivastava <praan@google.com>
> > ---
> >  drivers/iommu/amd/iommu.c | 11 ++++++++++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> > index a94de66a885e..9b5861e241d7 100644
> > --- a/drivers/iommu/amd/iommu.c
> > +++ b/drivers/iommu/amd/iommu.c
> > @@ -750,7 +750,16 @@ static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
> >  	setup_aliases(iommu, dev);
> >  
> >  	pci_seg->rlookup_table[devid] = NULL;
> > -	memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
> > +
> > +	/* Clear DTE if we have a live entry */
> > +	if (dev_data) {
> > +		struct dev_table_entry new = {};
> > +
> > +		amd_iommu_make_clear_dte(dev_data, &new);
> > +		update_dte256(iommu, dev_data, &new);
> > +	} else {
> > +		memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
> > +	}
> 
> This seems a little weird, an ignored device shouldn't have a dev_data
> really, or it will soon be freed.
> 
> I think you are better to replace the memset with a dedicated function
> 
> /* Cannot not be used on a probe'd device with a live dev_data */
> disable_dte(..)
> {
> 	struct dev_table_entry new = {};
> 
> 	write_dte_lower128(ptr, new);
> 	write_dte_upper128(ptr, new);
> }
> 
> And then this new ordering breaks the clone_aliases flow, it was
> supposed to copy the 0 from the current DTE to the aliases..

Ack.. I'll fix it.

Thanks,
Praan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] iommu/amd: Refactors for ATS updates
  2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
                   ` (5 preceding siblings ...)
  2026-06-01 13:42 ` [PATCH 6/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats() Pranjal Shrivastava
@ 2026-06-02  8:49 ` Vasant Hegde
  2026-06-02 12:08   ` Jason Gunthorpe
  6 siblings, 1 reply; 13+ messages in thread
From: Vasant Hegde @ 2026-06-02  8:49 UTC (permalink / raw)
  To: Pranjal Shrivastava, iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Ankit Soni, Jason Gunthorpe,
	Bjorn Helgaas, Samiullah Khawaja

Pranjal,


On 6/1/2026 7:11 PM, Pranjal Shrivastava wrote:
> This series addresses some issues identified by Sashiko in the AMD IOMMU
> driver during the review of the subsystem-wide ATS update work [1].
> 
> Patches 1-2 address the  pre-existing bugs in iommu_ignore_device()
> regarding the order of alias clearing and cleaning up the DTEs. 
> 

Thanks for the series. While this series fixes few important bugs, at the end
certain checks are still scatters. I really want to move around the code bit so
that it makes easy to read.

ex: even after this series, in probe path its calling dev_is_pci() check 3
times! Device capability is still scattered between probe() and
iommu_init_device() function.

Does something like below makes sense (untested code)? This doesn't address
iommu_ignore_device() issues. May be we should rename iommu_ignore_device() to
something liek disable_device_dma that clears TV bit and other things.


-Vasant

---<---

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 57dc8fabc7d9..c628a2e7e3a8 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -672,11 +672,12 @@ static void pdev_disable_caps(struct pci_dev *pdev)
  * This function checks if the driver got a valid device from the caller to
  * avoid dereferencing invalid pointers.
  */
-static bool check_device(struct device *dev)
+static bool iommu_lookup_device(struct device *dev,
+				struct amd_iommu **iommu_out, u16 *devid_out)
 {
 	struct amd_iommu_pci_seg *pci_seg;
-	struct amd_iommu *iommu;
-	int devid, sbdf;
+	int sbdf;
+	u16 devid;

 	if (!dev)
 		return false;
@@ -687,7 +688,8 @@ static bool check_device(struct device *dev)
 	devid = PCI_SBDF_TO_DEVID(sbdf);

 	iommu = rlookup_amd_iommu(dev);
-	if (!iommu)
+	/* Not registered yet? */
+	if (!iommu || !iommu->iommu.ops)
 		return false;

 	/* Out of our scope? */
@@ -695,25 +697,19 @@ static bool check_device(struct device *dev)
 	if (devid > pci_seg->last_bdf)
 		return false;

+	*iommu_out = iommu;
+	*devid_out = devid;
 	return true;
 }

-static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
+static struct iommu_dev_data *iommu_init_device(struct amd_iommu *iommu,
+						struct device *dev, u16 devid)
 {
 	struct iommu_dev_data *dev_data;
-	int devid, sbdf;
-
-	if (dev_iommu_priv_get(dev))
-		return 0;

-	sbdf = get_device_sbdf_id(dev);
-	if (sbdf < 0)
-		return sbdf;
-
-	devid = PCI_SBDF_TO_DEVID(sbdf);
 	dev_data = find_dev_data(iommu, devid);
 	if (!dev_data)
-		return -ENOMEM;
+		return ERR_PTR(-ENOMEM);

 	dev_data->dev = dev;

@@ -724,6 +720,25 @@ static int iommu_init_device(struct amd_iommu *iommu,
struct device *dev)
 	dev_iommu_priv_set(dev, dev_data);
 	setup_aliases(iommu, dev);

+	iommu_completion_wait(iommu);
+
+	return dev_data;
+}
+
+static void iommu_init_device_caps(struct iommu_dev_data *dev_data,
+				   struct device *dev,
+				   struct amd_iommu *iommu)
+{
+	if (FEATURE_NUM_INT_REMAP_SUP_2K(amd_iommu_efr2))
+		dev_data->max_irqs = MAX_IRQS_PER_TABLE_2K;
+	else
+		dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
+
+	amd_iommu_set_pci_msi_domain(dev, iommu);
+
+	if (!dev_is_pci(dev))
+		return;
+
 	/*
 	 * By default we use passthrough mode for IOMMUv2 capable device.
 	 * But if amd_iommu=force_isolation is set (e.g. to debug DMA to
@@ -731,11 +746,21 @@ static int iommu_init_device(struct amd_iommu *iommu,
struct device *dev)
 	 * it'll be forced to go into translation mode.
 	 */
 	if ((iommu_default_passthrough() || !amd_iommu_force_isolation) &&
-	    dev_is_pci(dev) && amd_iommu_gt_ppr_supported()) {
+	    amd_iommu_gt_ppr_supported()) {
 		dev_data->flags = pdev_get_caps(to_pci_dev(dev));
 	}

-	return 0;
+	/*
+	 * If IOMMU and device supports PASID then it will contain max
+	 * supported PASIDs, else it will be zero.
+	 */
+	if (amd_iommu_pasid_supported() &&
+	    pdev_pasid_supported(dev_data)) {
+		dev_data->max_pasids = min_t(u32, iommu->iommu.max_pasids,
+					     pci_max_pasids(to_pci_dev(dev)));
+	}
+
+	pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
 }

 static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
@@ -2451,43 +2476,29 @@ static struct iommu_device
*amd_iommu_probe_device(struct device *dev)
 	struct amd_iommu *iommu;
 	struct iommu_dev_data *dev_data;
 	int ret;
+	u16 devid;

-	if (!check_device(dev))
-		return ERR_PTR(-ENODEV);
-
-	iommu = rlookup_amd_iommu(dev);
-	if (!iommu)
-		return ERR_PTR(-ENODEV);
-
-	/* Not registered yet? */
-	if (!iommu->iommu.ops)
+	if (!iommu_lookup_device(dev, &iommu, &devid))
 		return ERR_PTR(-ENODEV);

 	if (dev_iommu_priv_get(dev))
 		return &iommu->iommu;

-	ret = iommu_init_device(iommu, dev);
-	if (ret) {
-		dev_err(dev, "Failed to initialize - trying to proceed anyway\n");
-		iommu_dev = ERR_PTR(ret);
+	dev_data = iommu_init_device(iommu, dev, devid);
+	if (IS_ERR(dev_data)) {
+		dev_err(dev, "Failed to initialize \n", devid);
 		iommu_ignore_device(iommu, dev);
-		goto out_err;
+		return ERR_CAST(dev_data);
 	}

-	amd_iommu_set_pci_msi_domain(dev, iommu);
+	iommu_init_device_caps(dev_data, dev, iommu);
 	iommu_dev = &iommu->iommu;

 	/*
-	 * If IOMMU and device supports PASID then it will contain max
-	 * supported PASIDs, else it will be zero.
+	 * When DMA translation is unavailable return error so the iommu core
+	 * won't attempt domain attach for this device. But interrupt-remap
+	 * is still supported. Hence do not ignore the device.
 	 */
-	dev_data = dev_iommu_priv_get(dev);
-	if (amd_iommu_pasid_supported() && dev_is_pci(dev) &&
-	    pdev_pasid_supported(dev_data)) {
-		dev_data->max_pasids = min_t(u32, iommu->iommu.max_pasids,
-					     pci_max_pasids(to_pci_dev(dev)));
-	}
-
 	if (amd_iommu_pgtable == PD_MODE_NONE) {
 		pr_warn_once("%s: DMA translation not supported by iommu.\n",
 			     __func__);
@@ -2495,16 +2506,6 @@ static struct iommu_device *amd_iommu_probe_device(struct
device *dev)
 		goto out_err;
 	}

-	iommu_completion_wait(iommu);
-
-	if (FEATURE_NUM_INT_REMAP_SUP_2K(amd_iommu_efr2))
-		dev_data->max_irqs = MAX_IRQS_PER_TABLE_2K;
-	else
-		dev_data->max_irqs = MAX_IRQS_PER_TABLE_512;
-
-	if (dev_is_pci(dev))
-		pci_prepare_ats(to_pci_dev(dev), PAGE_SHIFT);
-
 out_err:
 	return iommu_dev;
 }









> Patches 3-4 refine the probe error paths. As noted[2] by Ankit & Sashiko,
> unconditionally calling iommu_ignore_device() on probe failure breaks
> IRQ remapping for devices in PD_MODE_NONE. We now split the error paths
> in probe_device to preserve interrupt mapping for non-fatal failures
> while ensuring that dangling device pointers are cleared to prevent 
> potential UAFs.
> 
> Finally, patch 5 implements the Fail Hard pattern being standardized
> for ATS, ensuring configuration errors are caught during probe_device and
> ATS enablement failures are reported with a WARN_ON().
> 
> Patch 6 is carried forward as is from the original ATS work [3] to
> maintain bisectibility.
> 
> [1] https://lore.kernel.org/all/20260529111208.387412-1-praan@google.com/
> [2] https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
> [3] https://lore.kernel.org/all/20260529111208.387412-4-praan@google.com/
> 
> Thanks,
> Praan
> 
> Pranjal Shrivastava (6):
>   iommu/amd: Clear aliases before setting the rlookup_table to NULL
>   iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device()
>   iommu/amd: Split probe error paths to preserve IRQ remapping
>   iommu/amd: Fix Use-After-Free in non-fatal probe error path
>   iommu/amd: Fail probe on ATS configuration failure
>   PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats()
> 
>  drivers/iommu/amd/iommu.c | 61 ++++++++++++++++++++++++++++++++-------
>  drivers/pci/ats.c         |  6 ++--
>  2 files changed, 55 insertions(+), 12 deletions(-)
> 
> base-commit: 283d245468a2b61c41aa8b582f25ed5615d1c304


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device()
  2026-06-01 13:42 ` [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device() Pranjal Shrivastava
  2026-06-01 14:19   ` Jason Gunthorpe
@ 2026-06-02  8:55   ` Vasant Hegde
  1 sibling, 0 replies; 13+ messages in thread
From: Vasant Hegde @ 2026-06-02  8:55 UTC (permalink / raw)
  To: Pranjal Shrivastava, iommu, linux-pci, linux-kernel
  Cc: Joerg Roedel, Suravee Suthikulpanit, Ankit Soni, Jason Gunthorpe,
	Bjorn Helgaas, Samiullah Khawaja, sashiko-bot



On 6/1/2026 7:12 PM, Pranjal Shrivastava wrote:
> The iommu_ignore_device() function currently uses memset() to clear
> the Device Table Entry (DTE), which risks torn writes because the
> hardware reads DTEs as atomic 256-bit qwords. Fix this by using
> update_dte256() to perform a hardware-safe atomic clear when a live
> dev_data entry is available.
> 
> Fixes: 99fc4ac3d297 ("iommu/amd: Introduce per PCI segment alias_table")
> Reported-by: sashiko-bot@kernel.org
> Closes: https://lore.kernel.org/all/20260529153216.2AD1E1F00899@smtp.kernel.org/
> Signed-off-by: Pranjal Shrivastava <praan@google.com>
> ---
>  drivers/iommu/amd/iommu.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index a94de66a885e..9b5861e241d7 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -750,7 +750,16 @@ static void iommu_ignore_device(struct amd_iommu *iommu, struct device *dev)
>  	setup_aliases(iommu, dev);
>  
>  	pci_seg->rlookup_table[devid] = NULL;
> -	memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
> +
> +	/* Clear DTE if we have a live entry */
> +	if (dev_data) {
> +		struct dev_table_entry new = {};
> +
> +		amd_iommu_make_clear_dte(dev_data, &new);
> +		update_dte256(iommu, dev_data, &new);
> +	} else {
> +		memset(&dev_table[devid], 0, sizeof(struct dev_table_entry));
> +	}
>  }
> 

Why not clear_dte_entry() that will disable the DMA for device and alias
devices? May be we should rename fucntion name as well iommu_ignore_device ->
iommu_disable_device_dma or something.


-Vasant




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] iommu/amd: Refactors for ATS updates
  2026-06-02  8:49 ` [PATCH 0/6] iommu/amd: Refactors for ATS updates Vasant Hegde
@ 2026-06-02 12:08   ` Jason Gunthorpe
  2026-06-04  8:23     ` Vasant Hegde
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2026-06-02 12:08 UTC (permalink / raw)
  To: Vasant Hegde
  Cc: Pranjal Shrivastava, iommu, linux-pci, linux-kernel, Joerg Roedel,
	Suravee Suthikulpanit, Ankit Soni, Bjorn Helgaas,
	Samiullah Khawaja

On Tue, Jun 02, 2026 at 02:19:54PM +0530, Vasant Hegde wrote:
> -static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
> +static struct iommu_dev_data *iommu_init_device(struct amd_iommu *iommu,
> +						struct device *dev, u16 devid)
>  {
>  	struct iommu_dev_data *dev_data;
> -	int devid, sbdf;
> -
> -	if (dev_iommu_priv_get(dev))
> -		return 0;
> 
> -	sbdf = get_device_sbdf_id(dev);
> -	if (sbdf < 0)
> -		return sbdf;
> -
> -	devid = PCI_SBDF_TO_DEVID(sbdf);
>  	dev_data = find_dev_data(iommu, devid);
>  	if (!dev_data)
> -		return -ENOMEM;
> +		return ERR_PTR(-ENOMEM);

Why does the AMD driver do this? The dev data is supposed to be
allocated during probe and freed during remove, it doesn't make sense
to search for it in probe - it should not exist if the kernel is
working right.

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/6] iommu/amd: Refactors for ATS updates
  2026-06-02 12:08   ` Jason Gunthorpe
@ 2026-06-04  8:23     ` Vasant Hegde
  0 siblings, 0 replies; 13+ messages in thread
From: Vasant Hegde @ 2026-06-04  8:23 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Pranjal Shrivastava, iommu, linux-pci, linux-kernel, Joerg Roedel,
	Suravee Suthikulpanit, Ankit Soni, Bjorn Helgaas,
	Samiullah Khawaja

Jason,


On 6/2/2026 5:38 PM, Jason Gunthorpe wrote:
> On Tue, Jun 02, 2026 at 02:19:54PM +0530, Vasant Hegde wrote:
>> -static int iommu_init_device(struct amd_iommu *iommu, struct device *dev)
>> +static struct iommu_dev_data *iommu_init_device(struct amd_iommu *iommu,
>> +						struct device *dev, u16 devid)
>>  {
>>  	struct iommu_dev_data *dev_data;
>> -	int devid, sbdf;
>> -
>> -	if (dev_iommu_priv_get(dev))
>> -		return 0;
>>
>> -	sbdf = get_device_sbdf_id(dev);
>> -	if (sbdf < 0)
>> -		return sbdf;
>> -
>> -	devid = PCI_SBDF_TO_DEVID(sbdf);
>>  	dev_data = find_dev_data(iommu, devid);
>>  	if (!dev_data)
>> -		return -ENOMEM;
>> +		return ERR_PTR(-ENOMEM);
> 
> Why does the AMD driver do this? The dev data is supposed to be
> allocated during probe and freed during remove, it doesn't make sense
> to search for it in probe - it should not exist if the kernel is
> working right.

Digging into git history and how its all evolved, I agree. We should fix up
find_dev_data() stuff. we should separate out search from allocation. I will add
separate fix once this series settles.

-Vasant


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-06-04  8:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 13:41 [PATCH 0/6] iommu/amd: Refactors for ATS updates Pranjal Shrivastava
2026-06-01 13:41 ` [PATCH 1/6] iommu/amd: Clear aliases before setting the rlookup_table to NULL Pranjal Shrivastava
2026-06-01 13:42 ` [PATCH 2/6] iommu/amd: Clear DTE with update_dte256 in iommu_ignore_device() Pranjal Shrivastava
2026-06-01 14:19   ` Jason Gunthorpe
2026-06-02  4:01     ` Pranjal Shrivastava
2026-06-02  8:55   ` Vasant Hegde
2026-06-01 13:42 ` [PATCH 3/6] iommu/amd: Split probe error paths to preserve IRQ remapping Pranjal Shrivastava
2026-06-01 13:42 ` [PATCH 4/6] iommu/amd: Fix Use-After-Free in non-fatal probe error path Pranjal Shrivastava
2026-06-01 13:42 ` [PATCH 5/6] iommu/amd: Fail probe on ATS configuration failure Pranjal Shrivastava
2026-06-01 13:42 ` [PATCH 6/6] PCI/ATS: Mandate checking pci_ats_supported() before pci_prepare_ats() Pranjal Shrivastava
2026-06-02  8:49 ` [PATCH 0/6] iommu/amd: Refactors for ATS updates Vasant Hegde
2026-06-02 12:08   ` Jason Gunthorpe
2026-06-04  8:23     ` Vasant Hegde

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox