Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: <joro@8bytes.org>, <will@kernel.org>, <robin.murphy@arm.com>,
	<rafael@kernel.org>, <lenb@kernel.org>, <bhelgaas@google.com>,
	<iommu@lists.linux.dev>, <linux-kernel@vger.kernel.org>,
	<linux-acpi@vger.kernel.org>, <linux-pci@vger.kernel.org>,
	<patches@lists.linux.dev>, <pjaroszynski@nvidia.com>,
	<vsethi@nvidia.com>, <helgaas@kernel.org>,
	<baolu.lu@linux.intel.com>
Subject: Re: [PATCH RFC v2 3/4] iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset_done()
Date: Tue, 22 Jul 2025 14:58:21 -0700	[thread overview]
Message-ID: <aIAJfYMKYKyZZRqx@Asurada-Nvidia> (raw)
In-Reply-To: <20250704154342.GN1410929@nvidia.com>

Sorry for a huge delay. I've addressed all, following your remarks.

Some feedbacks inline.

On Fri, Jul 04, 2025 at 12:43:42PM -0300, Jason Gunthorpe wrote:
> On Sat, Jun 28, 2025 at 12:42:41AM -0700, Nicolin Chen wrote:
> 
> >  - This only works for IOMMU drivers that implemented ops->blocked_domain
> >    correctly with pci_disable_ats().
> 
> As was in the thread, it works for everyone. Even if we install an
> empty paging domain for blocking that still will stop the ATS
> invalidations from being issued. ATS remains on but this is not a
> problem.

OK. And I am dropping this validation in the PCI patch:

	/* Something wrong with the iommu driver that failed to disable ATS */
	if (dev->ats_enabled)
		pci_err(dev, "failed to stop ATS. ATS invalidation may time out\n");

> > @@ -2155,8 +2172,17 @@ int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
> >  	int ret = 0;
> >  
> >  	mutex_lock(&group->mutex);
> > +
> > +	/*
> > +	 * There is a racy attach while the device is resetting. Defer it until
> > +	 * the iommu_dev_reset_done() that attaches the device to group->domain.
> > +	 */
> > +	if (device_to_group_device(dev)->pending_reset)
> > +		goto unlock;
> > +
> >  	if (dev->iommu && dev->iommu->attach_deferred)
> >  		ret = __iommu_attach_device(domain, dev);
> > +unlock:
> >  	mutex_unlock(&group->mutex);
> 
> Actually looking at this some more maybe write it like:
> 
> /*
>  * This is called on the dma mapping fast path so avoid locking. This
>  * is racy, but we have an expectation that the driver will setup its
>  * DMAs inside probe while still single threaded to avoid racing.
>  */
> if (dev->iommu && !READ_ONCE(dev->iommu->attach_deferred))

This triggers a build error as attach_deferred is a bit-field. So I
am changing it from "u32 attach_deferred:1" to "bool" for this.

And, to keep the original logic, I think it should be:
	if (!dev->iommu || !READ_ONCE(dev->iommu->attach_deferred))

>    return 0;
> 
> guard(mutex)(&group->mutex);

I recall Baolu mentioned that Joerg might not like the guard style
so I am keeping mutex_lock/unlock().

> if (device_to_group_device(dev)->pending_reset)
>     return 0;
> 
> if (!dev->iommu->attach_deferred)
>    return 0;

I think this is redundant since the fast path checked.

> return __iommu_attach_device(domain, dev);
> 
> And of course it is already quite crazy to be doing FLR during a
> device probe so this is not a realistic scenario.

Hmm, I am not sure about that, as I see iommu_deferred_attach() get
mostly invoked by a dma_alloc() or even a dma_map(). So, this might
not be confined to a device probe?

> > +	if (dev->iommu->require_direct) {
> > +		dev_warn(
> > +			dev,
> > +			"Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n");
> > +		return -EINVAL;
> > +	}
> 
> I don't think we can do this. eg on ARM all devices have RMRs inside
> VMs so this will completely break FLR inside a vm???
> 
> Either ignore this condition with the rational that we are about to
> reset it so it doesn't matter, or we need to establish a new paging
> domain for isolation purposes that has the RMR setup.

Ah, you are right. ARM MSI in a VM uses RMR and sets this.

But does it also raise a question that a VM having RMR can't use
the blocked_domain, as __iommu_device_set_domain() has the exact
same check rejecting blocked_domain? Not sure if there would be
some unintended consequnce though...

> > +	if (ret)
> > +		goto unlock;
> > +
> > +	/* Dock PASID domains to blocked_domain while retaining pasid_array */
> > +	xa_lock(&group->pasid_array);
> 
> Not sure we need this lock? The group mutex already prevents mutation
> of the xa list and I dont' think it is allowed to call
> iommu_remove_dev_pasid() in an atomic context.

I see only iommu_attach_handle_get() doesn't use group->mutex. And
it's a reader. So I think it's safe to drop the xa_lock.

I added this:

	/*                                                                                                                                                                                                                                                                                                                                              |||     iommu_map_sg
	 * Dock PASID domains to blocking_domain while retaining pasid_array.
	 *
	 * The pasid_array is mostly fenced by group->mutex, except one reader
	 * in iommu_attach_handle_get(), so it's safe to read without xa_lock.
	 */

Thanks!
Nicolin

  reply	other threads:[~2025-07-22 21:58 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-28  7:42 [PATCH RFC v2 0/4] Disable ATS via iommu during PCI resets Nicolin Chen
2025-06-28  7:42 ` [PATCH RFC v2 1/4] iommu: Lock group->mutex in iommu_deferred_attach Nicolin Chen
2025-07-04 15:22   ` Jason Gunthorpe
2025-06-28  7:42 ` [PATCH RFC v2 2/4] iommu: Pass in gdev to __iommu_device_set_domain Nicolin Chen
2025-07-04 15:23   ` Jason Gunthorpe
2025-06-28  7:42 ` [PATCH RFC v2 3/4] iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset_done() Nicolin Chen
2025-06-28 13:28   ` Baolu Lu
2025-06-30 12:38     ` Jason Gunthorpe
2025-06-30 17:29       ` Nicolin Chen
2025-06-30 22:49         ` Jason Gunthorpe
2025-07-04 15:43   ` Jason Gunthorpe
2025-07-22 21:58     ` Nicolin Chen [this message]
2025-07-23  2:21       ` Baolu Lu
2025-07-23  2:53         ` Nicolin Chen
2025-07-27 16:25       ` Jason Gunthorpe
2025-07-28 19:07         ` Nicolin Chen
2025-07-29 13:02           ` Jason Gunthorpe
2025-06-28  7:42 ` [PATCH RFC v2 4/4] pci: Suspend iommu function prior to resetting a device Nicolin Chen
2025-07-24  6:50 ` [PATCH RFC v2 0/4] Disable ATS via iommu during PCI resets Ethan Zhao
2025-07-25 16:41   ` Nicolin Chen
2025-07-27 12:48     ` Ethan Zhao
2025-07-27 16:20       ` Jason Gunthorpe
2025-07-29  6:16         ` Ethan Zhao
2025-07-29 12:59           ` Jason Gunthorpe
2025-07-31  1:10             ` Ethan Zhao
2025-07-31 13:47               ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aIAJfYMKYKyZZRqx@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=helgaas@kernel.org \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=pjaroszynski@nvidia.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox