* [PATCH rc v8 1/8] iommu: Fix NULL group->domain dereference in pci_dev_reset_iommu_done()
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 2/8] iommu: Fix kdocs of pci_dev_reset_iommu_done() Nicolin Chen
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
Local sashiko review pointed it out that group->domain could be NULL when
a default domain fails to allocate during the first probe, which can crash
at domain->ops->attach_dev dereference in __iommu_attach_device() invoked
by pci_dev_reset_iommu_done().
pci_dev_reset_iommu_prepare() is fine as an old_domain pointer can be NULL.
Skip the re-attach in pci_dev_reset_iommu_done() to fix the bug.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 35db517809540..00b6a33515398 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -4025,8 +4025,13 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
if (WARN_ON(!group->blocking_domain))
return;
- /* Re-attach RID domain back to group->domain */
- if (group->domain != group->blocking_domain) {
+ /*
+ * Re-attach RID domain back to group->domain
+ *
+ * Leave the device parked in the blocking_domain if group->domain isn't
+ * initialized yet
+ */
+ if (group->domain && group->domain != group->blocking_domain) {
WARN_ON(__iommu_attach_device(group->domain, &pdev->dev,
group->blocking_domain));
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH rc v8 2/8] iommu: Fix kdocs of pci_dev_reset_iommu_done()
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 1/8] iommu: Fix NULL group->domain dereference in pci_dev_reset_iommu_done() Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 3/8] iommu: Replace per-group resetting_domain with per-gdev blocked flag Nicolin Chen
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
Remove the duplicated word. No functional change.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 00b6a33515398..82dd806e5e6a6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3997,9 +3997,9 @@ EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
* @pdev: PCI device that has finished a reset routine
*
* After a PCIe device finishes a reset routine, it wants to restore its IOMMU
- * IOMMU activity, including new translation as well as cache invalidation, by
- * re-attaching all RID/PASID of the device's back to the domains retained in
- * the core-level structure.
+ * activity, including new translation and cache invalidation, by re-attaching
+ * all RID/PASID of the device back to the domains retained in the core-level
+ * structure.
*
* Caller must pair it with a successful pci_dev_reset_iommu_prepare().
*
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH rc v8 3/8] iommu: Replace per-group resetting_domain with per-gdev blocked flag
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 1/8] iommu: Fix NULL group->domain dereference in pci_dev_reset_iommu_done() Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 2/8] iommu: Fix kdocs of pci_dev_reset_iommu_done() Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 4/8] iommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done() Nicolin Chen
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
The core tracks device resetting states with a per-group resetting_domain,
while a reset is actually per group-device. Such a mismatch might lead to
confusion and even difficulty to untangle per-gdev handling requirement.
Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
internally while both are calling pci_dev_reset_iommu_prepare/done(). And
the solution requires the core to track at the group_device level as well.
Introduce a 'blocked' flag to struct group_device, to allow a multi-device
group to isolate concurrent device resets independently.
As the reset routine is per gdev, it cannot clear group->resetting_domain
without iterating over the device list to ensure no other device is being
reset. Simplify it by replacing the resetting_domain with a 'recovery_cnt'
in the struct iommu_group.
No functional change. But this is essential to apply following bug fixes.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 102 ++++++++++++++++++++++++++++++++----------
1 file changed, 78 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 82dd806e5e6a6..5b784e43ca592 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -61,14 +61,14 @@ struct iommu_group {
int id;
struct iommu_domain *default_domain;
struct iommu_domain *blocking_domain;
- /*
- * During a group device reset, @resetting_domain points to the physical
- * domain, while @domain points to the attached domain before the reset.
- */
- struct iommu_domain *resetting_domain;
struct iommu_domain *domain;
struct list_head entry;
unsigned int owner_cnt;
+ /*
+ * Number of devices in the group undergoing or awaiting recovery.
+ * If non-zero, concurrent domain attachments are rejected.
+ */
+ unsigned int recovery_cnt;
void *owner;
};
@@ -76,12 +76,32 @@ struct group_device {
struct list_head list;
struct device *dev;
char *name;
+ /*
+ * Device is blocked for a pending recovery while its group->domain is
+ * retained. This can happen when:
+ * - Device is undergoing a reset
+ */
+ bool blocked;
};
/* Iterate over each struct group_device in a struct iommu_group */
#define for_each_group_device(group, pos) \
list_for_each_entry(pos, &(group)->devices, list)
+static struct group_device *__dev_to_gdev(struct device *dev)
+{
+ struct iommu_group *group = dev->iommu_group;
+ struct group_device *gdev;
+
+ lockdep_assert_held(&group->mutex);
+
+ for_each_group_device(group, gdev) {
+ if (gdev->dev == dev)
+ return gdev;
+ }
+ return NULL;
+}
+
struct iommu_group_attribute {
struct attribute attr;
ssize_t (*show)(struct iommu_group *group, char *buf);
@@ -2191,6 +2211,8 @@ EXPORT_SYMBOL_GPL(iommu_attach_device);
int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
{
+ struct group_device *gdev;
+
/*
* This is called on the dma mapping fast path so avoid locking. This is
* racy, but we have an expectation that the driver will setup its DMAs
@@ -2201,14 +2223,18 @@ int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
guard(mutex)(&dev->iommu_group->mutex);
+ gdev = __dev_to_gdev(dev);
+ if (WARN_ON(!gdev))
+ return -ENODEV;
+
/*
- * This is a concurrent attach during a device reset. Reject it until
+ * This is a concurrent attach during device recovery. Reject it until
* pci_dev_reset_iommu_done() attaches the device to group->domain.
*
* Note that this might fail the iommu_dma_map(). But there's nothing
* more we can do here.
*/
- if (dev->iommu_group->resetting_domain)
+ if (gdev->blocked)
return -EBUSY;
return __iommu_attach_device(domain, dev, NULL);
}
@@ -2265,19 +2291,24 @@ EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
struct iommu_domain *iommu_driver_get_domain_for_dev(struct device *dev)
{
struct iommu_group *group = dev->iommu_group;
+ struct group_device *gdev;
lockdep_assert_held(&group->mutex);
+ gdev = __dev_to_gdev(dev);
+ if (WARN_ON(!gdev))
+ return NULL;
+
/*
* Driver handles the low-level __iommu_attach_device(), including the
* one invoked by pci_dev_reset_iommu_done() re-attaching the device to
* the cached group->domain. In this case, the driver must get the old
- * domain from group->resetting_domain rather than group->domain. This
+ * domain from group->blocking_domain rather than group->domain. This
* prevents it from re-attaching the device from group->domain (old) to
* group->domain (new).
*/
- if (group->resetting_domain)
- return group->resetting_domain;
+ if (gdev->blocked)
+ return group->blocking_domain;
return group->domain;
}
@@ -2436,10 +2467,10 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
return -EINVAL;
/*
- * This is a concurrent attach during a device reset. Reject it until
+ * This is a concurrent attach during device recovery. Reject it until
* pci_dev_reset_iommu_done() attaches the device to group->domain.
*/
- if (group->resetting_domain)
+ if (group->recovery_cnt)
return -EBUSY;
/*
@@ -3567,10 +3598,10 @@ int iommu_attach_device_pasid(struct iommu_domain *domain,
mutex_lock(&group->mutex);
/*
- * This is a concurrent attach during a device reset. Reject it until
+ * This is a concurrent attach during device recovery. Reject it until
* pci_dev_reset_iommu_done() attaches the device to group->domain.
*/
- if (group->resetting_domain) {
+ if (group->recovery_cnt) {
ret = -EBUSY;
goto out_unlock;
}
@@ -3660,10 +3691,10 @@ int iommu_replace_device_pasid(struct iommu_domain *domain,
mutex_lock(&group->mutex);
/*
- * This is a concurrent attach during a device reset. Reject it until
+ * This is a concurrent attach during device recovery. Reject it until
* pci_dev_reset_iommu_done() attaches the device to group->domain.
*/
- if (group->resetting_domain) {
+ if (group->recovery_cnt) {
ret = -EBUSY;
goto out_unlock;
}
@@ -3934,12 +3965,12 @@ EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, "IOMMUFD_INTERNAL");
* routine wants to block any IOMMU activity: translation and ATS invalidation.
*
* This function attaches the device's RID/PASID(s) the group->blocking_domain,
- * setting the group->resetting_domain. This allows the IOMMU driver pausing any
+ * incrementing the group->recovery_cnt, to allow the IOMMU driver pausing any
* IOMMU activity while leaving the group->domain pointer intact. Later when the
* reset is finished, pci_dev_reset_iommu_done() can restore everything.
*
* Caller must use pci_dev_reset_iommu_prepare() with pci_dev_reset_iommu_done()
- * before/after the core-level reset routine, to unset the resetting_domain.
+ * before/after the core-level reset routine, to decrement the recovery_cnt.
*
* Return: 0 on success or negative error code if the preparation failed.
*
@@ -3952,6 +3983,7 @@ EXPORT_SYMBOL_NS_GPL(iommu_replace_group_handle, "IOMMUFD_INTERNAL");
int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
{
struct iommu_group *group = pdev->dev.iommu_group;
+ struct group_device *gdev;
unsigned long pasid;
void *entry;
int ret;
@@ -3961,8 +3993,12 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
guard(mutex)(&group->mutex);
+ gdev = __dev_to_gdev(&pdev->dev);
+ if (WARN_ON(!gdev))
+ return -ENODEV;
+
/* Re-entry is not allowed */
- if (WARN_ON(group->resetting_domain))
+ if (WARN_ON(gdev->blocked))
return -EBUSY;
ret = __iommu_group_alloc_blocking_domain(group);
@@ -3977,6 +4013,13 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
return ret;
}
+ /*
+ * Update gdev->blocked upon the domain change, as it is used to return
+ * the correct domain in iommu_driver_get_domain_for_dev() that might be
+ * called in a set_dev_pasid callback function.
+ */
+ gdev->blocked = true;
+
/*
* Stage PASID domains at blocking_domain while retaining pasid_array.
*
@@ -3987,7 +4030,7 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
iommu_remove_dev_pasid(&pdev->dev, pasid,
pasid_array_entry_to_domain(entry));
- group->resetting_domain = group->blocking_domain;
+ group->recovery_cnt++;
return ret;
}
EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
@@ -4009,6 +4052,7 @@ EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
void pci_dev_reset_iommu_done(struct pci_dev *pdev)
{
struct iommu_group *group = pdev->dev.iommu_group;
+ struct group_device *gdev;
unsigned long pasid;
void *entry;
@@ -4017,11 +4061,13 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
guard(mutex)(&group->mutex);
- /* pci_dev_reset_iommu_prepare() was bypassed for the device */
- if (!group->resetting_domain)
+ gdev = __dev_to_gdev(&pdev->dev);
+ if (WARN_ON(!gdev))
+ return;
+
+ if (!gdev->blocked)
return;
- /* pci_dev_reset_iommu_prepare() was not successfully called */
if (WARN_ON(!group->blocking_domain))
return;
@@ -4036,6 +4082,13 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
group->blocking_domain));
}
+ /*
+ * Update gdev->blocked upon the domain change, as it is used to return
+ * the correct domain in iommu_driver_get_domain_for_dev() that might be
+ * called in a set_dev_pasid callback function.
+ */
+ gdev->blocked = false;
+
/*
* Re-attach PASID domains back to the domains retained in pasid_array.
*
@@ -4047,7 +4100,8 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
pasid_array_entry_to_domain(entry), group, pasid,
group->blocking_domain));
- group->resetting_domain = NULL;
+ if (!WARN_ON(group->recovery_cnt == 0))
+ group->recovery_cnt--;
}
EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_done);
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH rc v8 4/8] iommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done()
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
` (2 preceding siblings ...)
2026-04-25 1:15 ` [PATCH rc v8 3/8] iommu: Replace per-group resetting_domain with per-gdev blocked flag Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 5/8] iommu: Fix nested pci_dev_reset_iommu_prepare/done() Nicolin Chen
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
Now the helpers handle per-gdev resets. Replace __iommu_set_group_pasid()
with set_dev_pasid() accordingly, in the pci_dev_reset_iommu_done().
Also add max_pasids check as other callers.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/ad858513-09fc-455e-bbc5-fe38a225cc78@linux.alibaba.com/
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 25 ++++++++++++++++++-------
1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5b784e43ca592..2907d76c39c68 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -4026,9 +4026,14 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
* The pasid_array is mostly fenced by group->mutex, except one reader
* in iommu_attach_handle_get(), so it's safe to read without xa_lock.
*/
- xa_for_each_start(&group->pasid_array, pasid, entry, 1)
- iommu_remove_dev_pasid(&pdev->dev, pasid,
- pasid_array_entry_to_domain(entry));
+ if (pdev->dev.iommu->max_pasids > 0) {
+ xa_for_each_start(&group->pasid_array, pasid, entry, 1) {
+ struct iommu_domain *pasid_dom =
+ pasid_array_entry_to_domain(entry);
+
+ iommu_remove_dev_pasid(&pdev->dev, pasid, pasid_dom);
+ }
+ }
group->recovery_cnt++;
return ret;
@@ -4095,10 +4100,16 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
* The pasid_array is mostly fenced by group->mutex, except one reader
* in iommu_attach_handle_get(), so it's safe to read without xa_lock.
*/
- xa_for_each_start(&group->pasid_array, pasid, entry, 1)
- WARN_ON(__iommu_set_group_pasid(
- pasid_array_entry_to_domain(entry), group, pasid,
- group->blocking_domain));
+ if (pdev->dev.iommu->max_pasids > 0) {
+ xa_for_each_start(&group->pasid_array, pasid, entry, 1) {
+ struct iommu_domain *pasid_dom =
+ pasid_array_entry_to_domain(entry);
+
+ WARN_ON(pasid_dom->ops->set_dev_pasid(
+ pasid_dom, &pdev->dev, pasid,
+ group->blocking_domain));
+ }
+ }
if (!WARN_ON(group->recovery_cnt == 0))
group->recovery_cnt--;
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH rc v8 5/8] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
` (3 preceding siblings ...)
2026-04-25 1:15 ` [PATCH rc v8 4/8] iommu: Fix pasid attach in pci_dev_reset_iommu_prepare/done() Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 6/8] iommu: Fix ATS invalidation timeouts during __iommu_remove_group_pasid() Nicolin Chen
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
internally while both are calling pci_dev_reset_iommu_prepare/done().
As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call
will trigger a WARN_ON and return -EBUSY, resulting in failing the entire
device reset.
On the other hand, removing the outer calls in the PCI callers is unsafe.
As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev()
execute custom firmware waits after their inner pcie_flr() completes. If
the IOMMU protection relies solely on the inner reset, the IOMMU will be
unblocked prematurely while the device is still resetting.
Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant.
Introduce gdev->reset_depth to handle the re-entries on the same device.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2907d76c39c68..7a5a5d3aabb65 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -82,6 +82,7 @@ struct group_device {
* - Device is undergoing a reset
*/
bool blocked;
+ unsigned int reset_depth;
};
/* Iterate over each struct group_device in a struct iommu_group */
@@ -3997,20 +3998,23 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
if (WARN_ON(!gdev))
return -ENODEV;
- /* Re-entry is not allowed */
- if (WARN_ON(gdev->blocked))
- return -EBUSY;
+ if (gdev->reset_depth++)
+ return 0;
ret = __iommu_group_alloc_blocking_domain(group);
- if (ret)
+ if (ret) {
+ gdev->reset_depth--;
return ret;
+ }
/* Stage RID domain at blocking_domain while retaining group->domain */
if (group->domain != group->blocking_domain) {
ret = __iommu_attach_device(group->blocking_domain, &pdev->dev,
group->domain);
- if (ret)
+ if (ret) {
+ gdev->reset_depth--;
return ret;
+ }
}
/*
@@ -4070,7 +4074,10 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
if (WARN_ON(!gdev))
return;
- if (!gdev->blocked)
+ /* Unbalanced done() calls would underflow the counter */
+ if (WARN_ON(gdev->reset_depth == 0))
+ return;
+ if (--gdev->reset_depth)
return;
if (WARN_ON(!group->blocking_domain))
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH rc v8 6/8] iommu: Fix ATS invalidation timeouts during __iommu_remove_group_pasid()
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
` (4 preceding siblings ...)
2026-04-25 1:15 ` [PATCH rc v8 5/8] iommu: Fix nested pci_dev_reset_iommu_prepare/done() Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 7/8] iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 8/8] iommu: Warn on premature unblock during DMA aliased sibling reset Nicolin Chen
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
If a device is blocked, its PASID domains are already detached. Repeating
iommu_remove_dev_pasid() is unnecessary and might trigger ATS invalidation
timeouts.
Skip the iommu_remove_dev_pasid() call upon gdev->blocked.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Closes: https://sashiko.dev/#/patchset/20260407194644.171304-1-nicolinc%40nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7a5a5d3aabb65..d0f32bd954a72 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3554,7 +3554,12 @@ static void __iommu_remove_group_pasid(struct iommu_group *group,
struct group_device *device;
for_each_group_device(group, device) {
- if (device->dev->iommu->max_pasids > 0)
+ /*
+ * A group-level detach cannot fail, even if there is a blocked
+ * device. In fact, blocked devices must be already detached for
+ * a pending device recovery.
+ */
+ if (!device->blocked && device->dev->iommu->max_pasids > 0)
iommu_remove_dev_pasid(device->dev, pasid, domain);
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH rc v8 7/8] iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
` (5 preceding siblings ...)
2026-04-25 1:15 ` [PATCH rc v8 6/8] iommu: Fix ATS invalidation timeouts during __iommu_remove_group_pasid() Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
2026-04-25 1:15 ` [PATCH rc v8 8/8] iommu: Warn on premature unblock during DMA aliased sibling reset Nicolin Chen
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
In __iommu_group_set_domain_internal(), concurrent domain attachments are
rejected when any device in the group is recovering. This is necessary to
fence concurrent attachments to a multi-device group where devices might
share the same RID due to PCI DMA alias quirks, but triggers the WARN_ON in
__iommu_group_set_domain_nofail().
Other IOMMU_SET_DOMAIN_MUST_SUCCEED callers in detach/teardown paths, such
as __iommu_group_set_core_domain and __iommu_release_dma_ownership, should
not be rejected, as the domain would be freed anyway in these nofail paths
while group->domain is still pointing to it. So pci_dev_reset_iommu_done()
could trigger a UAF when re-attaching group->domain.
Honor the IOMMU_SET_DOMAIN_MUST_SUCCEED flag, allowing the callers through
the group->recovery_cnt fence, so as to update the group->domain pointer.
Instead add a gdev->blocked check in the device iteration loop, to prevent
any concurrent per-device detachment.
Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Closes: https://sashiko.dev/#/patchset/20260407194644.171304-1-nicolinc%40nvidia.com
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d0f32bd954a72..f21d352a67f70 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2469,9 +2469,10 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
/*
* This is a concurrent attach during device recovery. Reject it until
- * pci_dev_reset_iommu_done() attaches the device to group->domain.
+ * pci_dev_reset_iommu_done() attaches the device to group->domain, if
+ * IOMMU_SET_DOMAIN_MUST_SUCCEED is not set.
*/
- if (group->recovery_cnt)
+ if (group->recovery_cnt && !(flags & IOMMU_SET_DOMAIN_MUST_SUCCEED))
return -EBUSY;
/*
@@ -2482,6 +2483,13 @@ static int __iommu_group_set_domain_internal(struct iommu_group *group,
*/
result = 0;
for_each_group_device(group, gdev) {
+ /*
+ * Device under recovery is attached to group->blocking_domain.
+ * Don't change that. pci_dev_reset_iommu_done() will re-attach
+ * its domain to the updated group->domain, after the recovery.
+ */
+ if (gdev->blocked)
+ continue;
ret = __iommu_device_set_domain(group, gdev->dev, new_domain,
group->domain, flags);
if (ret) {
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH rc v8 8/8] iommu: Warn on premature unblock during DMA aliased sibling reset
2026-04-25 1:15 [PATCH rc v8 0/8] iommu: Fix pci_dev_reset_iommu_prepare/done() Nicolin Chen
` (6 preceding siblings ...)
2026-04-25 1:15 ` [PATCH rc v8 7/8] iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset Nicolin Chen
@ 2026-04-25 1:15 ` Nicolin Chen
7 siblings, 0 replies; 9+ messages in thread
From: Nicolin Chen @ 2026-04-25 1:15 UTC (permalink / raw)
To: joro, kevin.tian, jgg
Cc: will, robin.murphy, baolu.lu, iommu, linux-kernel, xueshuai
When two aliased siblings are in the same iommu_group, they might share the
same RID. The reset functions don't support this case, though it is unclear
whether there is a real case of having an ATS capable device on a PCI/PCI-X
bus.
Theoretically, however, if two aliased devices are resetting concurrently,
one might be unblocked prematurely in the middle of the reset by the other
sibling who completes the reset first.
This isn't a regression from this series but it's better to spit a warning,
so we can know if such use case is common enough for us to make subsequent
patches for its coverage.
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
drivers/iommu/iommu.c | 49 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f21d352a67f70..dd53cce12087c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -4057,6 +4057,41 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
}
EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
+static int __group_device_cmp_dma_alias(struct pci_dev *dev, u16 alias,
+ void *data)
+{
+ return alias == *(u16 *)data;
+}
+
+static int group_device_cmp_dma_alias(struct pci_dev *dev, u16 alias,
+ void *data)
+{
+ return pci_for_each_dma_alias(data, __group_device_cmp_dma_alias,
+ &alias);
+}
+
+static bool group_device_dma_alias_is_blocked(struct iommu_group *group,
+ struct group_device *gdev)
+{
+ struct group_device *sibling;
+
+ lockdep_assert_held(&group->mutex);
+
+ if (!dev_is_pci(gdev->dev))
+ return false;
+
+ for_each_group_device(group, sibling) {
+ if (sibling == gdev || !sibling->blocked ||
+ !dev_is_pci(sibling->dev))
+ continue;
+ if (pci_for_each_dma_alias(to_pci_dev(gdev->dev),
+ group_device_cmp_dma_alias,
+ to_pci_dev(sibling->dev)))
+ return true;
+ }
+ return false;
+}
+
/**
* pci_dev_reset_iommu_done() - Restore IOMMU after a PCI device reset is done
* @pdev: PCI device that has finished a reset routine
@@ -4096,6 +4131,20 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
if (WARN_ON(!group->blocking_domain))
return;
+ if (group_device_dma_alias_is_blocked(group, gdev)) {
+ /*
+ * FIXME: DMA aliased devices share the same RID, which would be
+ * convoluted to handle, as "gdev->blocked" is not sufficient:
+ * - "blocked" state is effectively shared across these devices
+ * - if the core skipped the blocking on the second device, the
+ * IOMMU driver's attachment state would diverge from the HW
+ * state
+ * For now, just warn and see whether real ATS use cases hit it.
+ */
+ pci_warn(pdev,
+ "DMA-aliased sibling may be prematurely unblocked\n");
+ }
+
/*
* Re-attach RID domain back to group->domain
*
--
2.43.0
^ permalink raw reply related [flat|nested] 9+ messages in thread