Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	"Pranjal Shrivastava" <praan@google.com>,
	Mostafa Saleh <smostafa@google.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <vsethi@nvidia.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
Date: Mon, 1 Jun 2026 22:44:36 -0700	[thread overview]
Message-ID: <ah5txKCvspL8zMeG@Asurada-Nvidia> (raw)
In-Reply-To: <20260602001547.GR3195266@nvidia.com>

On Mon, Jun 01, 2026 at 09:15:47PM -0300, Jason Gunthorpe wrote:
> On Mon, Jun 01, 2026 at 01:41:26PM -0700, Nicolin Chen wrote:
> > On Mon, Jun 01, 2026 at 09:32:31AM -0300, Jason Gunthorpe wrote:
> > > On Fri, May 29, 2026 at 06:27:40PM -0700, Nicolin Chen wrote:
> > > > On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> > > > > On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > > > So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
> > > > each batch is built from domain->invs so it can carry the "invs";
> > > > if the batch times out, we can immediately mutate its ATS entries.
> > > > 
> > > > But I realized a limitation. E.g., if a device attaches to two SVA
> > > > domains on two SSIDs. An invalidation timing out on one of the SVA
> > > > domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
> > > > the other SVA domain's invs?
> > > 
> > > You'd have to mark all the S1's sharing the STE.
> > 
> > That would be a bit convoluted as we would have to go through all
> > other domains' invs arrays.
> 
> Ok, that is certainly an annoying problem.
> 
> I don't have a better idea than storing the master unfortunately
> 
> But I think the locking for that is going to be tricky, I'm not sure it does
> actually fully work..

Yes, there can be a race that sets STE.EATS back while per-master
flag is set, which would skip the ATC_INV in commit(), so no more
ATC_INV timeout that resets STE.EATS=0. To close it, we can force 
STE.EATS=0 at the end of commit() when state->ats_enabled and the
per-master flag are both set, which is only possible in a race.

However, after more thought, I found that there's a big trade-off
to pay in the invalidation path, if we go on this path. The invs
array only has SID, so it needs to convert it to master using the
rb_tree for every entry (holding streams_lock).

I asked AI to list and compare the three solutions that we have:

(1) Per-invs marker: INV_TYPE_ATS_BROKEN + master_domains
    disable_ats() in the timeout path walks master->master_domains
    and flips matching ATS invs entries to the BROKEN type.

  + invs walker is free (one case label in the existing type switch).
  + No lock or pointer deref in the invs walker.
  + No master pointer stored in invs; no lifetime concern.

  - disable_ats() walks every (master, domain) and marks each invs
    set; the list needs locking usable from atomic.
  - No master-level flag, so atc_inv_master loses the cheap early
    skip and attach has nothing to gate ats_enabled on. Concurrent
    quarantine self-heals only after the next ATC_INV times out (~1s),
    unless we also keep a master flag (i.e. a hybrid of (1) and (2)).

(2) Per-master flag + streams_lock
    invs walker resolves SID -> master via streams_lock and reads
    master->ats_broken.

  + Single source of truth on the master.
  + disable_ats() is one WRITE_ONCE.
  + atc_inv_master early-skips via one READ_ONCE.
  + attach gates ats_enabled on the flag; a concurrent quarantine
    race can be closed by a short post-attach re-check in commit()
  + No master pointer in invs; no lifetime concern.

  - invs walker pays streams_lock + rb_find(SID) per ATS entry on
    every invalidation. Measurable on ATS-heavy workloads.

(3) Per-master flag + inv->master pointer (v4)
    invs entry carries a master pointer; the invs walker reads
    cur->master->ats_broken directly.

  + invs walker is one READ_ONCE through a cached pointer.
  + disable_ats is one WRITE_ONCE.
  + atc_inv_master early-skip via one READ_ONCE.
  + attach gate + post-attach re-check, same as (2).

  - invs holds a master ptr, so release_device must synchronize_rcu()
    before freeing the master to drain walkers under rcu_read_lock().
    We dropped this from v4 for that reason.

Side-by-side:
                (1)      (2)      (3)
  invs walker   fastest  slow     fast
  disable_ats   slow     fast     fast
  atc_inv skip  slow     fast     fast
  attach race   slow     fast     fast
  master free   fast     fast     slow

FWIW, AI picks (3), but this would reintroduce synchronize_rcu()
that you dislike. My personal feeling is that (1) is the cleanest
shirt in the dirty laundry?

I'd like to hear your thought before finalizing the design.

Thanks
Nicolin


  reply	other threads:[~2026-06-02  5:45 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  3:38 [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 01/24] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 02/24] PCI: Propagate FLR return values to callers Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 03/24] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 04/24] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 05/24] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 06/24] iommu: Defer iommu_group free via kfree_rcu() Nicolin Chen
2026-05-19 11:39   ` Jason Gunthorpe
2026-05-19 18:54     ` Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 07/24] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-05-19 11:47   ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 08/24] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 09/24] iommu: Add group pointer to struct group_device Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 10/24] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-05-19 12:07   ` Jason Gunthorpe
2026-05-19 18:29     ` Nicolin Chen
2026-05-19 19:16       ` Jason Gunthorpe
2026-05-19 22:30         ` Nicolin Chen
2026-05-19 23:02           ` Jason Gunthorpe
2026-05-20  0:21             ` Nicolin Chen
2026-05-20  0:30               ` Jason Gunthorpe
2026-05-20  7:20                 ` Nicolin Chen
2026-05-20 17:51                   ` Jason Gunthorpe
2026-05-20 18:13                     ` Nicolin Chen
2026-05-21 13:12                       ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 12/24] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 13/24] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 14/24] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 15/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 17/24] iommu/arm-smmu-v3: Add master in arm_smmu_inv for ATS entries Nicolin Chen
2026-05-19 12:01   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-05-19 12:06   ` Jason Gunthorpe
2026-05-30  1:27     ` Nicolin Chen
2026-06-01 12:32       ` Jason Gunthorpe
2026-06-01 20:41         ` Nicolin Chen
2026-06-02  0:15           ` Jason Gunthorpe
2026-06-02  5:44             ` Nicolin Chen [this message]
2026-05-19  3:39 ` [PATCH v4 19/24] iommu/arm-smmu-v3: Add invs and has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-05-19 12:09   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 20/24] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 21/24] iommu/arm-smmu-v3: Move arm_smmu_invs_for_each_entry to header Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 22/24] iommu/arm-smmu-v3: Introduce master->ats_invs Nicolin Chen
2026-05-19 12:12   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 23/24] iommu/arm-smmu-v3: Serialize STE.EATS and ats_broken updates Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 24/24] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-05-20  3:59 ` [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-05-20  6:38   ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ah5txKCvspL8zMeG@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox