Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Rafael J . Wysocki" <rafael@kernel.org>,
	Len Brown <lenb@kernel.org>,
	"Pranjal Shrivastava" <praan@google.com>,
	Mostafa Saleh <smostafa@google.com>,
	Lu Baolu <baolu.lu@linux.intel.com>,
	Kevin Tian <kevin.tian@intel.com>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <vsethi@nvidia.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>
Subject: Re: [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag
Date: Mon, 1 Jun 2026 13:41:26 -0700	[thread overview]
Message-ID: <ah3udmX5SIO9lAXR@Asurada-Nvidia> (raw)
In-Reply-To: <20260601123231.GG3195266@nvidia.com>

On Mon, Jun 01, 2026 at 09:32:31AM -0300, Jason Gunthorpe wrote:
> On Fri, May 29, 2026 at 06:27:40PM -0700, Nicolin Chen wrote:
> > On Tue, May 19, 2026 at 09:06:58AM -0300, Jason Gunthorpe wrote:
> > > On Mon, May 18, 2026 at 08:39:01PM -0700, Nicolin Chen wrote:
> > So I've tried INV_TYPE_ATS_BROKEN: during per-domain invalidation,
> > each batch is built from domain->invs so it can carry the "invs";
> > if the batch times out, we can immediately mutate its ATS entries.
> > 
> > But I realized a limitation. E.g., if a device attaches to two SVA
> > domains on two SSIDs. An invalidation timing out on one of the SVA
> > domains could mark INV_TYPE_ATS_BROKEN in its own invs, but not in
> > the other SVA domain's invs?
> 
> You'd have to mark all the S1's sharing the STE.

That would be a bit convoluted as we would have to go through all
other domains' invs arrays.

A master (that timed out an ATC_INV) might be attached to multiple
domains (RID, SVA1, SVA2, ...). Also, we currently don't have any
per-master reverse-tracking to its attached domains (master_domain
is added to smmu_domain->devices list only for now).

So, two things would be needed on top of what we currently have.

Firstly, we would need another per-master list tracking all the
attached smmu_domains. Maybe reuse master_domain? Let's call this
master->master_domains for now.

Secondly, locking. We have two paths that can trigger an ATC_INV
timeout: __arm_smmu_domain_inv_range() that takes the rwlock read
on the current smmu_domain->invs; arm_smmu_atc_inv_master() that
doesn't take any rwlock. When these two paths walk through the
master->master_domains, we would need to take different rwlocks
for those domains. Also, the __arm_smmu_domain_inv_range() path
should skip the invs on the current master_domain, as the rwlock
is already held.

I wonder what's your opinion about these?

Given all this complexity, I started to wonder if we could have
implemented the invs as an RCU-list than an RCU-array: all IOTLB
tag nodes would be still allocated to add/delete/read locklessly;
all ATS nodes would be fixed in the master structure to add/del/
read with the rwlock. Then, a timeout occurring to either path
can simply mutate the ATS entries on the master directly without
going through the list of domains.

> > So, it seems that master->ats_broken is still a cleaner solution?
> 
> I don't want the invs code touching master, that is against the entire
> design.

I think I can understand the idea here: we want the invs design to
be in the common code, so anything that's driver-specific (smmu or
master) shouldn't be touched.

> Maybe a flag in the invs list itself is sufficient.

I think we would have to use INV_TYPE_ATS_BROKEN than a per-invs
flag: e.g., a nesting parent domain will have multiple ATS devices
so it cannot use one flag on its big invs to separate the broken
devices from all other healthy devices.

Nicolin


  reply	other threads:[~2026-06-01 20:42 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-19  3:38 [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 01/24] PCI: Don't suspend IOMMU when probing reset capability Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 02/24] PCI: Propagate FLR return values to callers Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 03/24] iommu: Convert gdev->blocked from bool to enum gdev_blocked Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 04/24] iommu: Pass in reset result to pci_dev_reset_iommu_done() Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 05/24] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 06/24] iommu: Defer iommu_group free via kfree_rcu() Nicolin Chen
2026-05-19 11:39   ` Jason Gunthorpe
2026-05-19 18:54     ` Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 07/24] iommu: Defer __iommu_group_free_device() to be outside group->mutex Nicolin Chen
2026-05-19 11:47   ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 08/24] iommu: Change group->devices to RCU-protected list Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 09/24] iommu: Add group pointer to struct group_device Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 10/24] iommu: Add __iommu_group_block_device helper Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 11/24] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-05-19 12:07   ` Jason Gunthorpe
2026-05-19 18:29     ` Nicolin Chen
2026-05-19 19:16       ` Jason Gunthorpe
2026-05-19 22:30         ` Nicolin Chen
2026-05-19 23:02           ` Jason Gunthorpe
2026-05-20  0:21             ` Nicolin Chen
2026-05-20  0:30               ` Jason Gunthorpe
2026-05-20  7:20                 ` Nicolin Chen
2026-05-20 17:51                   ` Jason Gunthorpe
2026-05-20 18:13                     ` Nicolin Chen
2026-05-21 13:12                       ` Jason Gunthorpe
2026-05-19  3:38 ` [PATCH v4 12/24] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 13/24] iommu/arm-smmu-v3: Skip remaining GERROR causes on SFM Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 14/24] iommu/arm-smmu-v3: Introduce per-cmdq cmdq_err_handler callback Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 15/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when CMD_SYNC times out Nicolin Chen
2026-05-19  3:38 ` [PATCH v4 16/24] iommu/arm-smmu-v3: Co-clear pending CMDQ_ERR when queue_has_space() fails Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 17/24] iommu/arm-smmu-v3: Add master in arm_smmu_inv for ATS entries Nicolin Chen
2026-05-19 12:01   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 18/24] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-05-19 12:06   ` Jason Gunthorpe
2026-05-30  1:27     ` Nicolin Chen
2026-06-01 12:32       ` Jason Gunthorpe
2026-06-01 20:41         ` Nicolin Chen [this message]
2026-05-19  3:39 ` [PATCH v4 19/24] iommu/arm-smmu-v3: Add invs and has_ats to struct arm_smmu_cmdq_batch Nicolin Chen
2026-05-19 12:09   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 20/24] iommu/arm-smmu-v3: Introduce arm_smmu_cmdq_batch_issue() wrapper Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 21/24] iommu/arm-smmu-v3: Move arm_smmu_invs_for_each_entry to header Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 22/24] iommu/arm-smmu-v3: Introduce master->ats_invs Nicolin Chen
2026-05-19 12:12   ` Jason Gunthorpe
2026-05-19  3:39 ` [PATCH v4 23/24] iommu/arm-smmu-v3: Serialize STE.EATS and ats_broken updates Nicolin Chen
2026-05-19  3:39 ` [PATCH v4 24/24] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-05-20  3:59 ` [PATCH v4 00/24] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-05-20  6:38   ` Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ah3udmX5SIO9lAXR@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox