From: Jason Gunthorpe <jgg@nvidia.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Joerg Roedel <joro@8bytes.org>,
Bjorn Helgaas <bhelgaas@google.com>,
"Rafael J . Wysocki" <rafael@kernel.org>,
Len Brown <lenb@kernel.org>,
Pranjal Shrivastava <praan@google.com>,
Mostafa Saleh <smostafa@google.com>,
Lu Baolu <baolu.lu@linux.intel.com>,
Kevin Tian <kevin.tian@intel.com>,
linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
linux-pci@vger.kernel.org, vsethi@nvidia.com,
Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
Date: Fri, 5 Jun 2026 16:42:59 -0300 [thread overview]
Message-ID: <20260605194259.GE1962447@nvidia.com> (raw)
In-Reply-To: <ah5txKCvspL8zMeG@Asurada-Nvidia>
On Mon, Jun 01, 2026 at 10:44:36PM -0700, Nicolin Chen wrote:
> On Mon, Jun 01, 2026 at 09:15:47PM -0300, Jason Gunthorpe wrote:
> > On Mon, Jun 01, 2026 at 01:41:26PM -0700, Nicolin Chen wrote:
> > > On Mon, Jun 01, 2026 at 09:32:31AM -0300, Jason Gunthorpe wrote:
> > > > On Fri, May 29, 2026 at 06:27:40PM -0700, Nicolin Chen wrote:
> > > > > On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> > > > > > On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > > > > So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
> > > > > each batch is built from domain->invs so it can carry the "invs";
> > > > > if the batch times out, we can immediately mutate its ATS entries.
> > > > >
> > > > > But I realized a limitation. E.g., if a device attaches to two SVA
> > > > > domains on two SSIDs. An invalidation timing out on one of the SVA
> > > > > domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
> > > > > the other SVA domain's invs?
> > > >
> > > > You'd have to mark all the S1's sharing the STE.
> > >
> > > That would be a bit convoluted as we would have to go through all
> > > other domains' invs arrays.
> >
> > Ok, that is certainly an annoying problem.
> >
> > I don't have a better idea than storing the master unfortunately
> >
> > But I think the locking for that is going to be tricky, I'm not sure it does
> > actually fully work..
>
> Yes, there can be a race that sets STE.EATS back while per-master
> flag is set, which would skip the ATC_INV in commit(), so no more
> ATC_INV timeout that resets STE.EATS=0. To close it, we can force
> STE.EATS=0 at the end of commit() when state->ats_enabled and the
> per-master flag are both set, which is only possible in a race.
I don't see any of these options as appealing. We have to maintain a
few key invariants, and I think it cannot be done without a way to
find all the domains that are using the STE.
One way or another you have to be using the invs list rw locks to
synchronize the EATS state changes.
It is okayish to be sloppy when turning EATS off, but when turning it
back on we do need to cycle through every invs list and toggle its
lock to ensure that the invalidations are synchronized before
EATS=enable happens.
Given you must have a way to go from STE -> master -> all invs lists
I'm not sure either option really makes such a large difference.
If so then adjusting the invs to disable the ATS is pretty simple, run
over the xarray and set them all off. Yes you could find the master
through a SID lookup with some locking adjustment.
>
> (1) Per-invs marker: INV_TYPE_ATS_BROKEN + master_domains
> disable_ats() in the timeout path walks master->master_domains
> and flips matching ATS invs entries to the BROKEN type.
>
> + invs walker is free (one case label in the existing type switch).
> + No lock or pointer deref in the invs walker.
> + No master pointer stored in invs; no lifetime concern.
>
> - disable_ats() walks every (master, domain) and marks each invs
> set; the list needs locking usable from atomic.
This doesn't seem so bad
> (2) Per-master flag + streams_lock
> invs walker resolves SID -> master via streams_lock and reads
> master->ats_broken.
>
> + Single source of truth on the master.
> + disable_ats() is one WRITE_ONCE.
> + atc_inv_master early-skips via one READ_ONCE.
> + attach gates ats_enabled on the flag; a concurrent quarantine
> race can be closed by a short post-attach re-check in commit()
> + No master pointer in invs; no lifetime concern.
>
> - invs walker pays streams_lock + rb_find(SID) per ATS entry on
> every invalidation. Measurable on ATS-heavy workloads.
Doesn't consider how to enable
> (3) Per-master flag + inv->master pointer (v4)
> invs entry carries a master pointer; the invs walker reads
> cur->master->ats_broken directly.
>
> + invs walker is one READ_ONCE through a cached pointer.
> + disable_ats is one WRITE_ONCE.
> + atc_inv_master early-skip via one READ_ONCE.
> + attach gate + post-attach re-check, same as (2).
>
> - invs holds a master ptr, so release_device must synchronize_rcu()
> before freeing the master to drain walkers under rcu_read_lock().
> We dropped this from v4 for that reason.
synchronize_rcu is not right because you have to have gone through the
rwlock so there can be no readers.
Jason
next prev parent reply other threads:[~2026-06-05 19:43 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-19 3:38 [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 01/24] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 02/24] PCI: Propagate FLR return values to callers Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 03/24] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 04/24] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 05/24] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 06/24] iommu: Defer iommu_group free via kfree_rcu() Nicolin Chen
2026-05-19 11:39 ` Jason Gunthorpe
2026-05-19 18:54 ` Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 07/24] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-05-19 11:47 ` Jason Gunthorpe
2026-05-19 3:38 ` [PATCH v4 08/24] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 09/24] iommu: Add group pointer to struct group_device Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 10/24] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-05-19 12:07 ` Jason Gunthorpe
2026-05-19 18:29 ` Nicolin Chen
2026-05-19 19:16 ` Jason Gunthorpe
2026-05-19 22:30 ` Nicolin Chen
2026-05-19 23:02 ` Jason Gunthorpe
2026-05-20 0:21 ` Nicolin Chen
2026-05-20 0:30 ` Jason Gunthorpe
2026-05-20 7:20 ` Nicolin Chen
2026-05-20 17:51 ` Jason Gunthorpe
2026-05-20 18:13 ` Nicolin Chen
2026-05-21 13:12 ` Jason Gunthorpe
2026-05-19 3:38 ` [PATCH v4 12/24] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 13/24] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 14/24] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 15/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-05-19 3:38 ` [PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 17/24] iommu/arm-smmu-v3: Add master in arm_smmu_inv for ATS entries Nicolin Chen
2026-05-19 12:01 ` Jason Gunthorpe
2026-05-19 3:39 ` [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-05-19 12:06 ` Jason Gunthorpe
2026-05-30 1:27 ` Nicolin Chen
2026-06-01 12:32 ` Jason Gunthorpe
2026-06-01 20:41 ` Nicolin Chen
2026-06-02 0:15 ` Jason Gunthorpe
2026-06-02 5:44 ` Nicolin Chen
2026-06-05 19:42 ` Jason Gunthorpe [this message]
2026-05-19 3:39 ` [PATCH v4 19/24] iommu/arm-smmu-v3: Add invs and has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-05-19 12:09 ` Jason Gunthorpe
2026-05-19 3:39 ` [PATCH v4 20/24] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 21/24] iommu/arm-smmu-v3: Move arm_smmu_invs_for_each_entry to header Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 22/24] iommu/arm-smmu-v3: Introduce master->ats_invs Nicolin Chen
2026-05-19 12:12 ` Jason Gunthorpe
2026-05-19 3:39 ` [PATCH v4 23/24] iommu/arm-smmu-v3: Serialize STE.EATS and ats_broken updates Nicolin Chen
2026-05-19 3:39 ` [PATCH v4 24/24] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-05-20 3:59 ` [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-05-20 6:38 ` Nicolin Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260605194259.GE1962447@nvidia.com \
--to=jgg@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=bhelgaas@google.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=lenb@kernel.org \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=nicolinc@nvidia.com \
--cc=praan@google.com \
--cc=rafael@kernel.org \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=vsethi@nvidia.com \
--cc=will@kernel.org \
--cc=xueshuai@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox