All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	"Pranjal Shrivastava" <praan@google.com>,
	Mostafa Saleh <smostafa@google.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <vsethi@nvidia.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
Date: Fri, 5 Jun 2026 14:56:06 -0700	[thread overview]
Message-ID: <aiNF9kzjzDY/u1NG@nvidia.com> (raw)
In-Reply-To: <20260605194259.GE1962447@nvidia.com>

Thanks for the reply.

This is indeed a very complex and sophisticated topic..

On Fri, Jun 05, 2026 at 04:42:59PM -0300, Jason Gunthorpe wrote:
> I don't see any of these options as appealing. We have to maintain a
> few key invariants, and I think it cannot be done without a way to
> find all the domains that are using the STE.
> 
> One way or another you have to be using the invs list rw locks to
> synchronize the EATS state changes.
> 
> It is okayish to be sloppy when turning EATS off, but when turning it
> back on we do need to cycle through every invs list and toggle its
> lock to ensure that the invalidations are synchronized before
> EATS=enable happens.

I think the core guarantees that "cycle through every invs list"
happens: a PCI reset calls reset_prepare() blocking all the RID
and PASID domains and removing ATS entries from every invs list,
and then calls reset_done() that re-attach RID/PASID domains so
freshly new ATS entries will be installed before EATS=enable.

So, I think the enable path is not an issue, though the disable
path or the invalidation path would need "a way to find all the
domains that are using the STE".

> Given you must have a way to go from STE -> master -> all invs lists
> I'm not sure either option really makes such a large difference.
> 
> If so then adjusting the invs to disable the ATS is pretty simple, run
> over the xarray and set them all off. Yes you could find the master
> through a SID lookup with some locking adjustment.
> > 
> > (1) Per-invs marker: INV_TYPE_ATS_BROKEN + master_domains
> >     disable_ats() in the timeout path walks master->master_domains
> >     and flips matching ATS invs entries to the BROKEN type.
> > 
> >   + invs walker is free (one case label in the existing type switch).
> >   + No lock or pointer deref in the invs walker.
> >   + No master pointer stored in invs; no lifetime concern.
> > 
> >   - disable_ats() walks every (master, domain) and marks each invs
> >     set; the list needs locking usable from atomic.
> 
> This doesn't seem so bad

Yea, the only thing is that the disable path has to deal with a
complexity from going through a per-device domain list. Maybe it
can reuse iommu_group->pasid_array by taking xa_lock?

> > (2) Per-master flag + streams_lock
> >     invs walker resolves SID -> master via streams_lock and reads
> >     master->ats_broken.
> > 
> >   + Single source of truth on the master.
> >   + disable_ats() is one WRITE_ONCE.
> >   + atc_inv_master early-skips via one READ_ONCE.
> >   + attach gates ats_enabled on the flag; a concurrent quarantine
> >     race can be closed by a short post-attach re-check in commit()
> >   + No master pointer in invs; no lifetime concern.
> > 
> >   - invs walker pays streams_lock + rb_find(SID) per ATS entry on
> >     every invalidation. Measurable on ATS-heavy workloads.
> 
> Doesn't consider how to enable

The enable side is core-driven: when reset_done() re-attaches
the device from blocked_domain back to its RID/PASID domains,
the new attach_dev callback (old_domain == blocked_domain) can
clear the per-master flag. If the device is still broken, then
arm_smmu_atc_inv_master() at the end of attach_commit() times
out and re-triggers quarantine.

The flaw lives in the invalidation path as it must translate
every SID to master using streams_lock + rb_find(SID) per ATS
entry, which make it very less attractive.

> > (3) Per-master flag + inv->master pointer (v4)
> >     invs entry carries a master pointer; the invs walker reads
> >     cur->master->ats_broken directly.
> > 
> >   + invs walker is one READ_ONCE through a cached pointer.
> >   + disable_ats is one WRITE_ONCE.
> >   + atc_inv_master early-skip via one READ_ONCE.
> >   + attach gate + post-attach re-check, same as (2).
> > 
> >   - invs holds a master ptr, so release_device must synchronize_rcu()
> >     before freeing the master to drain walkers under rcu_read_lock().
> >     We dropped this from v4 for that reason.
> 
> synchronize_rcu is not right because you have to have gone through the
> rwlock so there can be no readers.

Ah, I think you are right! When release_device() is invoked, the
device must be already in the release (blocked) domain. So there
should be no domain->invs in the system holding its ATS entries.
And the enable part would work as (2).

In this case, (3) seems the best? It's fast on every aspect.

And I think it would fit we plan to generalize the invs design:

struct inv {
	struct arm_smmu_device *smmu; // => struct iommu_device *iommu;
	struct arm_smmu_master *master; // => void *priv;
					//    (dev->iommu->priv)

Thanks
Nicolin

  reply	other threads:[~2026-06-05 21:56 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  3:38 [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 01/24] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 02/24] PCI: Propagate FLR return values to callers Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 03/24] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 04/24] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 05/24] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 06/24] iommu: Defer iommu_group free via kfree_rcu() Nicolin Chen
2026-05-19 11:39   ` Jason Gunthorpe
2026-05-19 18:54     ` Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 07/24] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-05-19 11:47   ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 08/24] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 09/24] iommu: Add group pointer to struct group_device Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 10/24] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-05-19 12:07   ` Jason Gunthorpe
2026-05-19 18:29     ` Nicolin Chen
2026-05-19 19:16       ` Jason Gunthorpe
2026-05-19 22:30         ` Nicolin Chen
2026-05-19 23:02           ` Jason Gunthorpe
2026-05-20  0:21             ` Nicolin Chen
2026-05-20  0:30               ` Jason Gunthorpe
2026-05-20  7:20                 ` Nicolin Chen
2026-05-20 17:51                   ` Jason Gunthorpe
2026-05-20 18:13                     ` Nicolin Chen
2026-05-21 13:12                       ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 12/24] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 13/24] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 14/24] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 15/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 17/24] iommu/arm-smmu-v3: Add master in arm_smmu_inv for ATS entries Nicolin Chen
2026-05-19 12:01   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-05-19 12:06   ` Jason Gunthorpe
2026-05-30  1:27     ` Nicolin Chen
2026-06-01 12:32       ` Jason Gunthorpe
2026-06-01 20:41         ` Nicolin Chen
2026-06-02  0:15           ` Jason Gunthorpe
2026-06-02  5:44             ` Nicolin Chen
2026-06-05 19:42               ` Jason Gunthorpe
2026-06-05 21:56                 ` Nicolin Chen [this message]
2026-06-05 23:03                   ` Jason Gunthorpe
2026-06-06  4:04                     ` Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 19/24] iommu/arm-smmu-v3: Add invs and has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-05-19 12:09   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 20/24] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 21/24] iommu/arm-smmu-v3: Move arm_smmu_invs_for_each_entry to header Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 22/24] iommu/arm-smmu-v3: Introduce master->ats_invs Nicolin Chen
2026-05-19 12:12   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 23/24] iommu/arm-smmu-v3: Serialize STE.EATS and ats_broken updates Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 24/24] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-05-20  3:59 ` [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-05-20  6:38   ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiNF9kzjzDY/u1NG@nvidia.com \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.