[PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
@ 2026-03-19  4:31 Nicolin Chen
  2026-03-19 11:14 ` Shuai Xue
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolin Chen @ 2026-03-19  4:31 UTC (permalink / raw)
  To: joro, kevin.tian
  Cc: will, robin.murphy, baolu.lu, jgg, iommu, linux-kernel, xueshuai

Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
internally while both are calling pci_dev_reset_iommu_prepare/done().

As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call
will trigger a WARN_ON and return -EBUSY, resulting in failing the entire
device reset.

On the other hand, removing the outer calls in the PCI callers is unsafe.
As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev()
execute custom firmware waits after their inner pcie_flr() completes. If
the IOMMU protection relies solely on the inner reset, the IOMMU will be
unblocked prematurely while the device is still resetting.

Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant.

Introduce a 'reset_cnt' in struct iommu_group. Safely increment/decrement
the reference counter in the nested calls, ensuring the IOMMU domains are
only restored when the outermost reset finally completes.

Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
Cc: stable@vger.kernel.org
Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
Suggested-by: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
Changelog
 v2:
 * Fix in the helpers by allowing re-entry
 v1:
 https://lore.kernel.org/all/20260318220028.1146905-1-nicolinc@nvidia.com/

 drivers/iommu/iommu.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 35db51780954..16155097b27c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -68,6 +68,7 @@ struct iommu_group {
 	struct iommu_domain *resetting_domain;
 	struct iommu_domain *domain;
 	struct list_head entry;
+	unsigned int reset_cnt;
 	unsigned int owner_cnt;
 	void *owner;
 };
@@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
 
 	guard(mutex)(&group->mutex);
 
-	/* Re-entry is not allowed */
-	if (WARN_ON(group->resetting_domain))
-		return -EBUSY;
+	if (group->resetting_domain) {
+		group->reset_cnt++;
+		return 0;
+	}
 
 	ret = __iommu_group_alloc_blocking_domain(group);
 	if (ret)
@@ -3988,6 +3990,7 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
 				       pasid_array_entry_to_domain(entry));
 
 	group->resetting_domain = group->blocking_domain;
+	group->reset_cnt = 1;
 	return ret;
 }
 EXPORT_SYMBOL_GPL(pci_dev_reset_iommu_prepare);
@@ -4021,6 +4024,12 @@ void pci_dev_reset_iommu_done(struct pci_dev *pdev)
 	if (!group->resetting_domain)
 		return;
 
+	/* Unbalanced done() calls that would underflow the counter */
+	if (WARN_ON(group->reset_cnt == 0))
+		return;
+	if (--group->reset_cnt > 0)
+		return;
+
 	/* pci_dev_reset_iommu_prepare() was not successfully called */
 	if (WARN_ON(!group->blocking_domain))
 		return;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  2026-03-19  4:31 [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done() Nicolin Chen
@ 2026-03-19 11:14 ` Shuai Xue
  2026-03-19 21:34   ` Nicolin Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Shuai Xue @ 2026-03-19 11:14 UTC (permalink / raw)
  To: Nicolin Chen, joro, kevin.tian
  Cc: will, robin.murphy, baolu.lu, jgg, iommu, linux-kernel



On 3/19/26 12:31 PM, Nicolin Chen wrote:
> Shuai found that cxl_reset_bus_function() calls pci_reset_bus_function()
> internally while both are calling pci_dev_reset_iommu_prepare/done().
> 
> As pci_dev_reset_iommu_prepare() doesn't support re-entry, the inner call
> will trigger a WARN_ON and return -EBUSY, resulting in failing the entire
> device reset.
> 
> On the other hand, removing the outer calls in the PCI callers is unsafe.
> As pointed out by Kevin, device-specific quirks like reset_hinic_vf_dev()
> execute custom firmware waits after their inner pcie_flr() completes. If
> the IOMMU protection relies solely on the inner reset, the IOMMU will be
> unblocked prematurely while the device is still resetting.
> 
> Instead, fix this by making pci_dev_reset_iommu_prepare/done() reentrant.
> 
> Introduce a 'reset_cnt' in struct iommu_group. Safely increment/decrement
> the reference counter in the nested calls, ensuring the IOMMU domains are
> only restored when the outermost reset finally completes.
> 
> Fixes: c279e83953d9 ("iommu: Introduce pci_dev_reset_iommu_prepare/done()")
> Cc: stable@vger.kernel.org
> Reported-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Closes: https://lore.kernel.org/all/absKsk7qQOwzhpzv@Asurada-Nvidia/
> Suggested-by: Kevin Tian <kevin.tian@intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> ---
> Changelog
>   v2:
>   * Fix in the helpers by allowing re-entry
>   v1:
>   https://lore.kernel.org/all/20260318220028.1146905-1-nicolinc@nvidia.com/
> 
>   drivers/iommu/iommu.c | 15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 35db51780954..16155097b27c 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -68,6 +68,7 @@ struct iommu_group {
>   	struct iommu_domain *resetting_domain;
>   	struct iommu_domain *domain;
>   	struct list_head entry;
> +	unsigned int reset_cnt;


Nit: consider renaming reset_cnt to reset_depth or
reset_nesting to better convey that this tracks nesting level,
not a count of total resets.

>   	unsigned int owner_cnt;
>   	void *owner;
>   };
> @@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
>   
>   	guard(mutex)(&group->mutex);
>   
> -	/* Re-entry is not allowed */
> -	if (WARN_ON(group->resetting_domain))
> -		return -EBUSY;
> +	if (group->resetting_domain) {
> +		group->reset_cnt++;
> +		return 0;
> +	}
>   
>   	ret = __iommu_group_alloc_blocking_domain(group);

pci_dev_reset_iommu_prepare/done() have NO singleton group check.
They operate on the specific pdev passed in, but use group-wide
state (resetting_domain, reset_cnt) to track the reset lifecycle.

Interestingly, the broken_worker in patch 3 of the ATC timeout
series DOES have an explicit singleton check:

   if (list_is_singular(&group->devices)) {
       /* Note: only support group with a single device */

This reveals an implicit assumption: the entire prepare/done
mechanism works correctly only for singleton groups. For
multi-device groups:

   - prepare() only detaches the specific pdev, leaving other
     devices in the group still attached to the original domain
   - The group-wide resetting_domain/reset_cnt state can be
     corrupted by concurrent resets on different devices (as
     discussed above)

If prepare/done is truly meant only for singleton groups, it
should enforce this explicitly:

   if (!list_is_singular(&group->devices))
       return -EOPNOTSUPP;

If it's meant to support multi-device groups, then the per-device
vs group-wide state mismatch needs to be resolved — either by
making the state per-device, or by detaching/restoring all
devices in the group together.


Thanks.
Shuai



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  2026-03-19 11:14 ` Shuai Xue
@ 2026-03-19 21:34   ` Nicolin Chen
  2026-03-27  8:27     ` Tian, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolin Chen @ 2026-03-19 21:34 UTC (permalink / raw)
  To: Shuai Xue
  Cc: joro, kevin.tian, will, robin.murphy, baolu.lu, jgg, iommu,
	linux-kernel

On Thu, Mar 19, 2026 at 07:14:21PM +0800, Shuai Xue wrote:
> On 3/19/26 12:31 PM, Nicolin Chen wrote:
> > @@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct pci_dev *pdev)
> >   	guard(mutex)(&group->mutex);
> > -	/* Re-entry is not allowed */
> > -	if (WARN_ON(group->resetting_domain))
> > -		return -EBUSY;
> > +	if (group->resetting_domain) {
> > +		group->reset_cnt++;
> > +		return 0;
> > +	}
> >   	ret = __iommu_group_alloc_blocking_domain(group);
> 
> pci_dev_reset_iommu_prepare/done() have NO singleton group check.
> They operate on the specific pdev passed in, but use group-wide
> state (resetting_domain, reset_cnt) to track the reset lifecycle.
>
> Interestingly, the broken_worker in patch 3 of the ATC timeout
> series DOES have an explicit singleton check:
> 
>   if (list_is_singular(&group->devices)) {
>       /* Note: only support group with a single device */
> 
> This reveals an implicit assumption: the entire prepare/done
> mechanism works correctly only for singleton groups. For
> multi-device groups:
> 
>   - prepare() only detaches the specific pdev, leaving other
>     devices in the group still attached to the original domain
>   - The group-wide resetting_domain/reset_cnt state can be
>     corrupted by concurrent resets on different devices (as
>     discussed above)

That's a phenomenal catch!

I think we shall have a reset_ndevs in the group and a reset_depth
in the gdev.

> If prepare/done is truly meant only for singleton groups, it
> should enforce this explicitly:
> 
>   if (!list_is_singular(&group->devices))
>       return -EOPNOTSUPP;

-EOPNOTSUPP will fail ail the PCI reset caller functions, though we
could change them to ignore -EOPNOTSUPP or we could probably do:
	if (!list_is_singular(&group->devices))
		return 0;

But I feel this would be too heavy for a bug fix.

> If it's meant to support multi-device groups, then the per-device
> vs group-wide state mismatch needs to be resolved — either by
> making the state per-device, or by detaching/restoring all
> devices in the group together.

Per-gdev should work, IMHO, at least for this patch.

For iommu_report_device_broken(), I am thinking we could still try
a per-gdev WQ, by changing the mutex-protected gdev list to an RCU
one.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  2026-03-19 21:34   ` Nicolin Chen
@ 2026-03-27  8:27     ` Tian, Kevin
  2026-03-27 21:08       ` Nicolin Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Tian, Kevin @ 2026-03-27  8:27 UTC (permalink / raw)
  To: Nicolin Chen, Shuai Xue
  Cc: joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	baolu.lu@linux.intel.com, jgg@nvidia.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Friday, March 20, 2026 5:35 AM
> 
> On Thu, Mar 19, 2026 at 07:14:21PM +0800, Shuai Xue wrote:
> > On 3/19/26 12:31 PM, Nicolin Chen wrote:
> > > @@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct
> pci_dev *pdev)
> > >   	guard(mutex)(&group->mutex);
> > > -	/* Re-entry is not allowed */
> > > -	if (WARN_ON(group->resetting_domain))
> > > -		return -EBUSY;
> > > +	if (group->resetting_domain) {
> > > +		group->reset_cnt++;
> > > +		return 0;
> > > +	}
> > >   	ret = __iommu_group_alloc_blocking_domain(group);
> >
> > pci_dev_reset_iommu_prepare/done() have NO singleton group check.
> > They operate on the specific pdev passed in, but use group-wide
> > state (resetting_domain, reset_cnt) to track the reset lifecycle.
> >
> > Interestingly, the broken_worker in patch 3 of the ATC timeout
> > series DOES have an explicit singleton check:
> >
> >   if (list_is_singular(&group->devices)) {
> >       /* Note: only support group with a single device */
> >
> > This reveals an implicit assumption: the entire prepare/done
> > mechanism works correctly only for singleton groups. For
> > multi-device groups:
> >
> >   - prepare() only detaches the specific pdev, leaving other
> >     devices in the group still attached to the original domain

no, attach is per-group. all devices are changed together.

> >   - The group-wide resetting_domain/reset_cnt state can be
> >     corrupted by concurrent resets on different devices (as
> >     discussed above)
> 
> That's a phenomenal catch!
> 
> I think we shall have a reset_ndevs in the group and a reset_depth
> in the gdev.
> 

I didn't get how the state might be corrupted.

If there are two devices in a group both being reset, you only
want to switch back to the original domain after both devices
have finished reset.

so a reset_cnt is sufficient to cover both scenarios:

- nested prepare/done in a single device resetting path
- concurrent prepare/done in multiple devices resetting paths

v4 is way too complex. Probably it's required in the following
ATS timeout series (which I haven't thought about carefully).
but for the fix here seems a simple logic like v2 does is ok?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  2026-03-27  8:27     ` Tian, Kevin
@ 2026-03-27 21:08       ` Nicolin Chen
  2026-03-31  7:12         ` Tian, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolin Chen @ 2026-03-27 21:08 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Shuai Xue, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	baolu.lu@linux.intel.com, jgg@nvidia.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org

On Fri, Mar 27, 2026 at 08:27:12AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Friday, March 20, 2026 5:35 AM
> > 
> > On Thu, Mar 19, 2026 at 07:14:21PM +0800, Shuai Xue wrote:
> > > On 3/19/26 12:31 PM, Nicolin Chen wrote:
> > > > @@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct
> > pci_dev *pdev)
> > > >   	guard(mutex)(&group->mutex);
> > > > -	/* Re-entry is not allowed */
> > > > -	if (WARN_ON(group->resetting_domain))
> > > > -		return -EBUSY;
> > > > +	if (group->resetting_domain) {
> > > > +		group->reset_cnt++;
> > > > +		return 0;
> > > > +	}
> > > >   	ret = __iommu_group_alloc_blocking_domain(group);
> > >
> > > pci_dev_reset_iommu_prepare/done() have NO singleton group check.
> > > They operate on the specific pdev passed in, but use group-wide
> > > state (resetting_domain, reset_cnt) to track the reset lifecycle.
> > >
> > > Interestingly, the broken_worker in patch 3 of the ATC timeout
> > > series DOES have an explicit singleton check:
> > >
> > >   if (list_is_singular(&group->devices)) {
> > >       /* Note: only support group with a single device */
> > >
> > > This reveals an implicit assumption: the entire prepare/done
> > > mechanism works correctly only for singleton groups. For
> > > multi-device groups:
> > >
> > >   - prepare() only detaches the specific pdev, leaving other
> > >     devices in the group still attached to the original domain
> 
> no, attach is per-group. all devices are changed together.

I think prepare() should use the per-device ops:
    ops->attach_dev (via __iommu_attach_device)
    ops->set_dev_pasid
right?

And it is intended to limit to the resetting device only. Other
devices in the group can still stay in the attached domain, but
they cannot do new attachment (-EBUSY) because that's per-group.

> > >   - The group-wide resetting_domain/reset_cnt state can be
> > >     corrupted by concurrent resets on different devices (as
> > >     discussed above)
> > 
> > That's a phenomenal catch!
> > 
> > I think we shall have a reset_ndevs in the group and a reset_depth
> > in the gdev.
> > 
> 
> I didn't get how the state might be corrupted.
> 
> If there are two devices in a group both being reset, you only
> want to switch back to the original domain after both devices
> have finished reset.
> 
> so a reset_cnt is sufficient to cover both scenarios:
> 
> - nested prepare/done in a single device resetting path
> - concurrent prepare/done in multiple devices resetting paths

Since prepare() is per-device, the per-group reset_cnt wouldn't
be sufficient if one device is resetting while the other isn't.

iommu_driver_get_domain_for_dev() for example would return the
blocking_domain for both devices, but physically only the first
device is attached to blocking_domain.

> v4 is way too complex. Probably it's required in the following
> ATS timeout series (which I haven't thought about carefully).
> but for the fix here seems a simple logic like v2 does is ok?

I think v4 is the way to go, unless we fundamentally limit this
API (and the followup ATS timeout series) to singleton groups.

FWIW, I have the ATS timeout series updated (waiting for some
finalization of this base patch):
https://github.com/nicolinc/iommufd/commits/smmuv3_atc_timeout-v3/
The new iommu_report_device_broken() is changed to per-gdev.

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  2026-03-27 21:08       ` Nicolin Chen
@ 2026-03-31  7:12         ` Tian, Kevin
  2026-03-31 12:23           ` Nicolin Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Tian, Kevin @ 2026-03-31  7:12 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shuai Xue, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	baolu.lu@linux.intel.com, jgg@nvidia.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, March 28, 2026 5:09 AM
> 
> On Fri, Mar 27, 2026 at 08:27:12AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Friday, March 20, 2026 5:35 AM
> > >
> > > On Thu, Mar 19, 2026 at 07:14:21PM +0800, Shuai Xue wrote:
> > > > On 3/19/26 12:31 PM, Nicolin Chen wrote:
> > > > > @@ -3961,9 +3962,10 @@ int pci_dev_reset_iommu_prepare(struct
> > > pci_dev *pdev)
> > > > >   	guard(mutex)(&group->mutex);
> > > > > -	/* Re-entry is not allowed */
> > > > > -	if (WARN_ON(group->resetting_domain))
> > > > > -		return -EBUSY;
> > > > > +	if (group->resetting_domain) {
> > > > > +		group->reset_cnt++;
> > > > > +		return 0;
> > > > > +	}
> > > > >   	ret = __iommu_group_alloc_blocking_domain(group);
> > > >
> > > > pci_dev_reset_iommu_prepare/done() have NO singleton group check.
> > > > They operate on the specific pdev passed in, but use group-wide
> > > > state (resetting_domain, reset_cnt) to track the reset lifecycle.
> > > >
> > > > Interestingly, the broken_worker in patch 3 of the ATC timeout
> > > > series DOES have an explicit singleton check:
> > > >
> > > >   if (list_is_singular(&group->devices)) {
> > > >       /* Note: only support group with a single device */
> > > >
> > > > This reveals an implicit assumption: the entire prepare/done
> > > > mechanism works correctly only for singleton groups. For
> > > > multi-device groups:
> > > >
> > > >   - prepare() only detaches the specific pdev, leaving other
> > > >     devices in the group still attached to the original domain
> >
> > no, attach is per-group. all devices are changed together.
> 
> I think prepare() should use the per-device ops:
>     ops->attach_dev (via __iommu_attach_device)
>     ops->set_dev_pasid
> right?

it's per-device op but the condition of calling it is per-group:

        if (group->domain != group->blocking_domain) {
                ret = __iommu_attach_device(group->blocking_domain, &pdev->dev,
                                            group->domain);
                if (ret)
                        return ret;
        }

> 
> And it is intended to limit to the resetting device only. Other
> devices in the group can still stay in the attached domain, but
> they cannot do new attachment (-EBUSY) because that's per-group.

how could other devices stay in the attached domain?

> 
> > > >   - The group-wide resetting_domain/reset_cnt state can be
> > > >     corrupted by concurrent resets on different devices (as
> > > >     discussed above)
> > >
> > > That's a phenomenal catch!
> > >
> > > I think we shall have a reset_ndevs in the group and a reset_depth
> > > in the gdev.
> > >
> >
> > I didn't get how the state might be corrupted.
> >
> > If there are two devices in a group both being reset, you only
> > want to switch back to the original domain after both devices
> > have finished reset.
> >
> > so a reset_cnt is sufficient to cover both scenarios:
> >
> > - nested prepare/done in a single device resetting path
> > - concurrent prepare/done in multiple devices resetting paths
> 
> Since prepare() is per-device, the per-group reset_cnt wouldn't
> be sufficient if one device is resetting while the other isn't.
> 
> iommu_driver_get_domain_for_dev() for example would return the
> blocking_domain for both devices, but physically only the first
> device is attached to blocking_domain.

what do you mean by 'physically'?

> 
> > v4 is way too complex. Probably it's required in the following
> > ATS timeout series (which I haven't thought about carefully).
> > but for the fix here seems a simple logic like v2 does is ok?
> 
> I think v4 is the way to go, unless we fundamentally limit this
> API (and the followup ATS timeout series) to singleton groups.

currently we don't have any per-device resetting logic. As a rc
fix I didn't see the point of making it complex.

> 
> FWIW, I have the ATS timeout series updated (waiting for some
> finalization of this base patch):
> https://github.com/nicolinc/iommufd/commits/smmuv3_atc_timeout-v3/
> The new iommu_report_device_broken() is changed to per-gdev.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  2026-03-31  7:12         ` Tian, Kevin
@ 2026-03-31 12:23           ` Nicolin Chen
  2026-04-01  8:14             ` Tian, Kevin
  0 siblings, 1 reply; 8+ messages in thread
From: Nicolin Chen @ 2026-03-31 12:23 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Shuai Xue, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	baolu.lu@linux.intel.com, jgg@nvidia.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org

On Tue, Mar 31, 2026 at 07:12:19AM +0000, Tian, Kevin wrote:
> > > > >   - prepare() only detaches the specific pdev, leaving other
> > > > >     devices in the group still attached to the original domain
> > >
> > > no, attach is per-group. all devices are changed together.
> > 
> > I think prepare() should use the per-device ops:
> >     ops->attach_dev (via __iommu_attach_device)
> >     ops->set_dev_pasid
> > right?
> 
> it's per-device op but the condition of calling it is per-group:
> 
>         if (group->domain != group->blocking_domain) {
>                 ret = __iommu_attach_device(group->blocking_domain, &pdev->dev,
>                                             group->domain);
>                 if (ret)
>                         return ret;
>         }

If the group->domain is not blocking_domain, this device will
move away from group->domain to blocking_domain, while other
device (in the same group) are staying in group->domain. No?

> > And it is intended to limit to the resetting device only. Other
> > devices in the group can still stay in the attached domain, but
> > they cannot do new attachment (-EBUSY) because that's per-group.
> 
> how could other devices stay in the attached domain?

We call __iommu_attach_device only on one gdev, not looping
the entire group, right?

> > Since prepare() is per-device, the per-group reset_cnt wouldn't
> > be sufficient if one device is resetting while the other isn't.
> > 
> > iommu_driver_get_domain_for_dev() for example would return the
> > blocking_domain for both devices, but physically only the first
> > device is attached to blocking_domain.
> 
> what do you mean by 'physically'?

I mean SMMU driver will program STE to ask SMMU HW to reflect
the change "physically". But it won't program the other STEs.

Nicolin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done()
  2026-03-31 12:23           ` Nicolin Chen
@ 2026-04-01  8:14             ` Tian, Kevin
  0 siblings, 0 replies; 8+ messages in thread
From: Tian, Kevin @ 2026-04-01  8:14 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: Shuai Xue, joro@8bytes.org, will@kernel.org, robin.murphy@arm.com,
	baolu.lu@linux.intel.com, jgg@nvidia.com, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org

> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, March 31, 2026 8:23 PM
> 
> On Tue, Mar 31, 2026 at 07:12:19AM +0000, Tian, Kevin wrote:
> > > > > >   - prepare() only detaches the specific pdev, leaving other
> > > > > >     devices in the group still attached to the original domain
> > > >
> > > > no, attach is per-group. all devices are changed together.
> > >
> > > I think prepare() should use the per-device ops:
> > >     ops->attach_dev (via __iommu_attach_device)
> > >     ops->set_dev_pasid
> > > right?
> >
> > it's per-device op but the condition of calling it is per-group:
> >
> >         if (group->domain != group->blocking_domain) {
> >                 ret = __iommu_attach_device(group->blocking_domain, &pdev-
> >dev,
> >                                             group->domain);
> >                 if (ret)
> >                         return ret;
> >         }
> 
> If the group->domain is not blocking_domain, this device will
> move away from group->domain to blocking_domain, while other
> device (in the same group) are staying in group->domain. No?

yeah, obviously I was wrong here. will review v4.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-04-01  8:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-19  4:31 [PATCH rc v2] iommu: Fix nested pci_dev_reset_iommu_prepare/done() Nicolin Chen
2026-03-19 11:14 ` Shuai Xue
2026-03-19 21:34   ` Nicolin Chen
2026-03-27  8:27     ` Tian, Kevin
2026-03-27 21:08       ` Nicolin Chen
2026-03-31  7:12         ` Tian, Kevin
2026-03-31 12:23           ` Nicolin Chen
2026-04-01  8:14             ` Tian, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox