From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: <joro@8bytes.org>, <will@kernel.org>, <robin.murphy@arm.com>,
<rafael@kernel.org>, <lenb@kernel.org>, <bhelgaas@google.com>,
<iommu@lists.linux.dev>, <linux-kernel@vger.kernel.org>,
<linux-acpi@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<patches@lists.linux.dev>, <pjaroszynski@nvidia.com>,
<vsethi@nvidia.com>, <helgaas@kernel.org>,
<baolu.lu@linux.intel.com>
Subject: Re: [PATCH RFC v2 3/4] iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset_done()
Date: Tue, 22 Jul 2025 14:58:21 -0700 [thread overview]
Message-ID: <aIAJfYMKYKyZZRqx@Asurada-Nvidia> (raw)
In-Reply-To: <20250704154342.GN1410929@nvidia.com>
Sorry for a huge delay. I've addressed all, following your remarks.
Some feedbacks inline.
On Fri, Jul 04, 2025 at 12:43:42PM -0300, Jason Gunthorpe wrote:
> On Sat, Jun 28, 2025 at 12:42:41AM -0700, Nicolin Chen wrote:
>
> > - This only works for IOMMU drivers that implemented ops->blocked_domain
> > correctly with pci_disable_ats().
>
> As was in the thread, it works for everyone. Even if we install an
> empty paging domain for blocking that still will stop the ATS
> invalidations from being issued. ATS remains on but this is not a
> problem.
OK. And I am dropping this validation in the PCI patch:
/* Something wrong with the iommu driver that failed to disable ATS */
if (dev->ats_enabled)
pci_err(dev, "failed to stop ATS. ATS invalidation may time out\n");
> > @@ -2155,8 +2172,17 @@ int iommu_deferred_attach(struct device *dev, struct iommu_domain *domain)
> > int ret = 0;
> >
> > mutex_lock(&group->mutex);
> > +
> > + /*
> > + * There is a racy attach while the device is resetting. Defer it until
> > + * the iommu_dev_reset_done() that attaches the device to group->domain.
> > + */
> > + if (device_to_group_device(dev)->pending_reset)
> > + goto unlock;
> > +
> > if (dev->iommu && dev->iommu->attach_deferred)
> > ret = __iommu_attach_device(domain, dev);
> > +unlock:
> > mutex_unlock(&group->mutex);
>
> Actually looking at this some more maybe write it like:
>
> /*
> * This is called on the dma mapping fast path so avoid locking. This
> * is racy, but we have an expectation that the driver will setup its
> * DMAs inside probe while still single threaded to avoid racing.
> */
> if (dev->iommu && !READ_ONCE(dev->iommu->attach_deferred))
This triggers a build error as attach_deferred is a bit-field. So I
am changing it from "u32 attach_deferred:1" to "bool" for this.
And, to keep the original logic, I think it should be:
if (!dev->iommu || !READ_ONCE(dev->iommu->attach_deferred))
> return 0;
>
> guard(mutex)(&group->mutex);
I recall Baolu mentioned that Joerg might not like the guard style
so I am keeping mutex_lock/unlock().
> if (device_to_group_device(dev)->pending_reset)
> return 0;
>
> if (!dev->iommu->attach_deferred)
> return 0;
I think this is redundant since the fast path checked.
> return __iommu_attach_device(domain, dev);
>
> And of course it is already quite crazy to be doing FLR during a
> device probe so this is not a realistic scenario.
Hmm, I am not sure about that, as I see iommu_deferred_attach() get
mostly invoked by a dma_alloc() or even a dma_map(). So, this might
not be confined to a device probe?
> > + if (dev->iommu->require_direct) {
> > + dev_warn(
> > + dev,
> > + "Firmware has requested this device have a 1:1 IOMMU mapping, rejecting configuring the device without a 1:1 mapping. Contact your platform vendor.\n");
> > + return -EINVAL;
> > + }
>
> I don't think we can do this. eg on ARM all devices have RMRs inside
> VMs so this will completely break FLR inside a vm???
>
> Either ignore this condition with the rational that we are about to
> reset it so it doesn't matter, or we need to establish a new paging
> domain for isolation purposes that has the RMR setup.
Ah, you are right. ARM MSI in a VM uses RMR and sets this.
But does it also raise a question that a VM having RMR can't use
the blocked_domain, as __iommu_device_set_domain() has the exact
same check rejecting blocked_domain? Not sure if there would be
some unintended consequnce though...
> > + if (ret)
> > + goto unlock;
> > +
> > + /* Dock PASID domains to blocked_domain while retaining pasid_array */
> > + xa_lock(&group->pasid_array);
>
> Not sure we need this lock? The group mutex already prevents mutation
> of the xa list and I dont' think it is allowed to call
> iommu_remove_dev_pasid() in an atomic context.
I see only iommu_attach_handle_get() doesn't use group->mutex. And
it's a reader. So I think it's safe to drop the xa_lock.
I added this:
/* ||| iommu_map_sg
* Dock PASID domains to blocking_domain while retaining pasid_array.
*
* The pasid_array is mostly fenced by group->mutex, except one reader
* in iommu_attach_handle_get(), so it's safe to read without xa_lock.
*/
Thanks!
Nicolin
next prev parent reply other threads:[~2025-07-22 21:58 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-28 7:42 [PATCH RFC v2 0/4] Disable ATS via iommu during PCI resets Nicolin Chen
2025-06-28 7:42 ` [PATCH RFC v2 1/4] iommu: Lock group->mutex in iommu_deferred_attach Nicolin Chen
2025-07-04 15:22 ` Jason Gunthorpe
2025-06-28 7:42 ` [PATCH RFC v2 2/4] iommu: Pass in gdev to __iommu_device_set_domain Nicolin Chen
2025-07-04 15:23 ` Jason Gunthorpe
2025-06-28 7:42 ` [PATCH RFC v2 3/4] iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset_done() Nicolin Chen
2025-06-28 13:28 ` Baolu Lu
2025-06-30 12:38 ` Jason Gunthorpe
2025-06-30 17:29 ` Nicolin Chen
2025-06-30 22:49 ` Jason Gunthorpe
2025-07-04 15:43 ` Jason Gunthorpe
2025-07-22 21:58 ` Nicolin Chen [this message]
2025-07-23 2:21 ` Baolu Lu
2025-07-23 2:53 ` Nicolin Chen
2025-07-27 16:25 ` Jason Gunthorpe
2025-07-28 19:07 ` Nicolin Chen
2025-07-29 13:02 ` Jason Gunthorpe
2025-06-28 7:42 ` [PATCH RFC v2 4/4] pci: Suspend iommu function prior to resetting a device Nicolin Chen
2025-07-24 6:50 ` [PATCH RFC v2 0/4] Disable ATS via iommu during PCI resets Ethan Zhao
2025-07-25 16:41 ` Nicolin Chen
2025-07-27 12:48 ` Ethan Zhao
2025-07-27 16:20 ` Jason Gunthorpe
2025-07-29 6:16 ` Ethan Zhao
2025-07-29 12:59 ` Jason Gunthorpe
2025-07-31 1:10 ` Ethan Zhao
2025-07-31 13:47 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aIAJfYMKYKyZZRqx@Asurada-Nvidia \
--to=nicolinc@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=helgaas@kernel.org \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=patches@lists.linux.dev \
--cc=pjaroszynski@nvidia.com \
--cc=rafael@kernel.org \
--cc=robin.murphy@arm.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox