All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	"Pranjal Shrivastava" <praan@google.com>,
	Mostafa Saleh <smostafa@google.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <vsethi@nvidia.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
Date: Fri, 5 Jun 2026 21:04:09 -0700	[thread overview]
Message-ID: <aiOcOehTZn50hfIK@nvidia.com> (raw)
In-Reply-To: <20260605230315.GF1962447@nvidia.com>

On Fri, Jun 05, 2026 at 08:03:15PM -0300, Jason Gunthorpe wrote:
> On Fri, Jun 05, 2026 at 02:56:06PM -0700, Nicolin Chen wrote:
> > On Fri, Jun 05, 2026 at 04:42:59PM -0300, Jason Gunthorpe wrote:
> > > I don't see any of these options as appealing. We have to maintain a
> > > few key invariants, and I think it cannot be done without a way to
> > > find all the domains that are using the STE.
> > > 
> > > One way or another you have to be using the invs list rw locks to
> > > synchronize the EATS state changes.
> > > 
> > > It is okayish to be sloppy when turning EATS off, but when turning it
> > > back on we do need to cycle through every invs list and toggle its
> > > lock to ensure that the invalidations are synchronized before
> > > EATS=enable happens.
> > 
> > I think the core guarantees that "cycle through every invs list"
> > happens: a PCI reset calls reset_prepare() blocking all the RID
> > and PASID domains and removing ATS entries from every invs list,
> > and then calls reset_done() that re-attach RID/PASID domains so
> > freshly new ATS entries will be installed before EATS=enable.
> 
> I think this whole thing is so async and racy this is not something we
> can truely rely on. The driver is going to have to make sure it
> doesn't get turned on accidentally while the CD is still populated.

You mean the driver needs to ensure that the ATS entires in the
invs list would not be skipped when STE.EATS is enabled, right?

Otherwise, INV_TYPE_ATS_BROKEN + per-master domain list cannot
prevent a concurrent attach() from turning on STE.EATS either.

> > > Given you must have a way to go from STE -> master -> all invs lists
> > > I'm not sure either option really makes such a large difference.
> > > 
> > > If so then adjusting the invs to disable the ATS is pretty simple, run
> > > over the xarray and set them all off. Yes you could find the master
> > > through a SID lookup with some locking adjustment.
> > > > 
> > > > (1) Per-invs marker: INV_TYPE_ATS_BROKEN + master_domains
> > > >     disable_ats() in the timeout path walks master->master_domains
> > > >     and flips matching ATS invs entries to the BROKEN type.
> > > > 
> > > >   + invs walker is free (one case label in the existing type switch).
> > > >   + No lock or pointer deref in the invs walker.
> > > >   + No master pointer stored in invs; no lifetime concern.
> > > > 
> > > >   - disable_ats() walks every (master, domain) and marks each invs
> > > >     set; the list needs locking usable from atomic.
> > > 
> > > This doesn't seem so bad
> > 
> > Yea, the only thing is that the disable path has to deal with a
> > complexity from going through a per-device domain list. Maybe it
> > can reuse iommu_group->pasid_array by taking xa_lock?
> 
> Maybe the locking seems tricky as the locks might end up nesting in
> weird ways.

Oh, that's thoughtful.

> The streams rb tree and existing master domains linked list seems
> appealing if the locking can nest acceptably.

Yea, limiting the locking at the driver-level is more manageable. 

> > > > (3) Per-master flag + inv->master pointer (v4)
> > > >     invs entry carries a master pointer; the invs walker reads
> > > >     cur->master->ats_broken directly.
> > > > 
> > > >   + invs walker is one READ_ONCE through a cached pointer.
> > > >   + disable_ats is one WRITE_ONCE.
> > > >   + atc_inv_master early-skip via one READ_ONCE.
> > > >   + attach gate + post-attach re-check, same as (2).
> > > > 
> > > >   - invs holds a master ptr, so release_device must synchronize_rcu()
> > > >     before freeing the master to drain walkers under rcu_read_lock().
> > > >     We dropped this from v4 for that reason.
> > > 
> > > synchronize_rcu is not right because you have to have gone through the
> > > rwlock so there can be no readers.
> > 
> > Ah, I think you are right! When release_device() is invoked, the
> > device must be already in the release (blocked) domain. So there
> > should be no domain->invs in the system holding its ATS entries.
> > And the enable part would work as (2).
> > 
> > In this case, (3) seems the best? It's fast on every aspect.
> 
> I don't like it mainly because of the sketch enable side, and if we
> tighten that then you can just do 1 which doesn't have a perf impact..

I agree that per-master flag alone would be racy. So, for it to
work, it would need an extra spinlock. This v4 actually added a
pairing ats_broken_lock to fence arm_smmu_write_entry() in the
attach path, while the quarantine path is disabling STE.EATS and
setting the per-master flag. With that, the driver can ensure:
 - no race in STE.EATS update
 - consistency between STE.EATS and ATS entries in every invs

> But still, I'm not sure how all the asyncess and races will resolve in
> any of these cases.

Maybe we can try "INV_TYPE_ATS_BROKEN + per-master domain list"
first and see if Sashiko might identify some corner case.

Thanks
Nicolin

  reply	other threads:[~2026-06-06  4:04 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  3:38 [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 01/24] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 02/24] PCI: Propagate FLR return values to callers Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 03/24] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 04/24] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 05/24] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 06/24] iommu: Defer iommu_group free via kfree_rcu() Nicolin Chen
2026-05-19 11:39   ` Jason Gunthorpe
2026-05-19 18:54     ` Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 07/24] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-05-19 11:47   ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 08/24] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 09/24] iommu: Add group pointer to struct group_device Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 10/24] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-05-19 12:07   ` Jason Gunthorpe
2026-05-19 18:29     ` Nicolin Chen
2026-05-19 19:16       ` Jason Gunthorpe
2026-05-19 22:30         ` Nicolin Chen
2026-05-19 23:02           ` Jason Gunthorpe
2026-05-20  0:21             ` Nicolin Chen
2026-05-20  0:30               ` Jason Gunthorpe
2026-05-20  7:20                 ` Nicolin Chen
2026-05-20 17:51                   ` Jason Gunthorpe
2026-05-20 18:13                     ` Nicolin Chen
2026-05-21 13:12                       ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 12/24] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 13/24] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 14/24] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 15/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 17/24] iommu/arm-smmu-v3: Add master in arm_smmu_inv for ATS entries Nicolin Chen
2026-05-19 12:01   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-05-19 12:06   ` Jason Gunthorpe
2026-05-30  1:27     ` Nicolin Chen
2026-06-01 12:32       ` Jason Gunthorpe
2026-06-01 20:41         ` Nicolin Chen
2026-06-02  0:15           ` Jason Gunthorpe
2026-06-02  5:44             ` Nicolin Chen
2026-06-05 19:42               ` Jason Gunthorpe
2026-06-05 21:56                 ` Nicolin Chen
2026-06-05 23:03                   ` Jason Gunthorpe
2026-06-06  4:04                     ` Nicolin Chen [this message]
2026-05-19  3:39 ` [PATCH v4 19/24] iommu/arm-smmu-v3: Add invs and has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-05-19 12:09   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 20/24] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 21/24] iommu/arm-smmu-v3: Move arm_smmu_invs_for_each_entry to header Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 22/24] iommu/arm-smmu-v3: Introduce master->ats_invs Nicolin Chen
2026-05-19 12:12   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 23/24] iommu/arm-smmu-v3: Serialize STE.EATS and ats_broken updates Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 24/24] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-05-20  3:59 ` [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-05-20  6:38   ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiOcOehTZn50hfIK@nvidia.com \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.