From: Jason Gunthorpe <jgg@nvidia.com>
To: Will Deacon <will@kernel.org>,
Lu Baolu <baolu.lu@linux.intel.com>,
Kevin Tian <kevin.tian@intel.com>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
robin.murphy@arm.com, joro@8bytes.org, praan@google.com,
mmarrid@nvidia.com, kees@kernel.org,
Alexander.Grest@microsoft.com, baolu.lu@linux.intel.com,
smostafa@google.com, linux-arm-kernel@lists.infradead.org,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
bbiber@nvidia.com, skaestle@nvidia.com
Subject: Re: [PATCH rc] iommu/arm-smmu-v3: Drain in-flight fault handlers
Date: Thu, 12 Mar 2026 11:25:09 -0300 [thread overview]
Message-ID: <20260312142509.GA1586734@nvidia.com> (raw)
In-Reply-To: <abLE3rdM9YAu81tq@willie-the-truck>
On Thu, Mar 12, 2026 at 01:51:26PM +0000, Will Deacon wrote:
> On Fri, Mar 06, 2026 at 04:17:23PM -0800, Nicolin Chen wrote:
> > From: Malak Marrid <mmarrid@nvidia.com>
> >
> > When a device is switching away from a domain, either through a detach or a
> > replace operation, it must drain its IOPF queue that only contains the page
> > requests for the old domain.
> >
> > Currently, the IOPF infrastructure is used by master->stall_enabled. So the
> > stalled transaction for the old domain should be resumed/terminated. Fix it
> > properly.
> >
> > Fixes: cfea71aea921 ("iommu/arm-smmu-v3: Put iopf enablement in the domain attach path")
> > Cc: stable@vger.kernel.org
> > Co-developed-by: Barak Biber <bbiber@nvidia.com>
> > Signed-off-by: Barak Biber <bbiber@nvidia.com>
> > Co-developed-by: Stefan Kaestle <skaestle@nvidia.com>
> > Signed-off-by: Stefan Kaestle <skaestle@nvidia.com>
> > Signed-off-by: Malak Marrid <mmarrid@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 11 ++++++++++-
> > 1 file changed, 10 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index 4d00d796f0783..2176ee8bec767 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -2843,6 +2843,12 @@ static int arm_smmu_enable_iopf(struct arm_smmu_master *master,
> > if (master->iopf_refcount) {
> > master->iopf_refcount++;
> > master_domain->using_iopf = true;
> > + /*
> > + * If the device is already on the IOPF queue (domain replace),
> > + * drain in-flight fault handlers so nothing will hold the old
> > + * domain when the core switches the attach handle.
> > + */
> > + iopf_queue_flush_dev(master->dev);
>
> So this drains the iopf workqueue, but don't you still have a race with
> the hardware generating a fault on the old domain and then that only
> showing up once you've switched to the new one? What is the actual
> problem you're trying to solve with this patch?
HW doesn't generate faults on domains, it calls
iommu_report_device_fault() which calls find_fault_handler() that
uses iommu_attach_handle_get() to find the domain. It then shoves the
domain pointer onto a WQ.
The ordering is supposed to be
1) IOMMU HW starts using the new domain
2) iommu_attach_handle_get() returns the new domain
3) IOMMU driver flushes its own IRQs/queues that may be concurrently
calling iommu_attach_handle_get()
4) iopf_queue_flush_dev() to clear the iopf work queue
5) domain is freed, no pointers in WQs or other threads
So the naked iopf_queue_flush_dev() doesn't seem right, I'd expect a
synchronize_irq() (is that right for threaded IRQs?) too as the
threaded IRQ is concurrently calling iommu_attach_handle_get().
Next, something has gone wrong with the ordering of the xarray stores
in iommu_replace_device_pasid(), it doesn't follow the above since the
store for replace is after attach and flushing. Not sure how that
happened, I remember pointing this ordering out at various times..
Maybe we need to add a dedicated driver callback that does
#3 and have the core do #4 internally. The driver shouldn't disable
its iopf during attach, it should happen in the flush handler.
> > @@ -2866,8 +2872,11 @@ static void arm_smmu_disable_iopf(struct arm_smmu_master *master,
> > return;
> >
> > master->iopf_refcount--;
> > - if (master->iopf_refcount == 0)
> > + if (master->iopf_refcount == 0) {
> > + /* Drain in-flight fault handlers before removing device */
> > + iopf_queue_flush_dev(master->dev);
> > iopf_queue_remove_device(master->smmu->evtq.iopf, master->dev);
>
> Why doesn't iopf_queue_remove_device() handle the draining? Is there a
> case where you _don't_ want to drain the faults on the disable path?
Because it isn't needed, this hunk is redundant.
We never disable iopf on a master that currently has an attached iopf
capable domain. When the domain was changed all the required flushing
should have been done - there should be exactly one
iopf_queue_flush_dev() inside a driver and it must be inside the
attach flow.
Jason
next prev parent reply other threads:[~2026-03-12 14:25 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-07 0:17 [PATCH rc] iommu/arm-smmu-v3: Drain in-flight fault handlers Nicolin Chen
2026-03-12 13:51 ` Will Deacon
2026-03-12 14:25 ` Jason Gunthorpe [this message]
2026-03-24 14:04 ` Will Deacon
2026-03-24 14:17 ` Jason Gunthorpe
2026-03-24 14:35 ` Will Deacon
2026-03-24 18:53 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260312142509.GA1586734@nvidia.com \
--to=jgg@nvidia.com \
--cc=Alexander.Grest@microsoft.com \
--cc=baolu.lu@linux.intel.com \
--cc=bbiber@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kees@kernel.org \
--cc=kevin.tian@intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mmarrid@nvidia.com \
--cc=nicolinc@nvidia.com \
--cc=praan@google.com \
--cc=robin.murphy@arm.com \
--cc=skaestle@nvidia.com \
--cc=smostafa@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.