Re: [PATCH v9 7/7] iommu/arm-smmu-v3: Perform per-domain invalidations using arm_smmu_invs

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Will Deacon <will@kernel.org>
Cc: Nicolin Chen <nicolinc@nvidia.com>,
	jean-philippe@linaro.org, robin.murphy@arm.com, joro@8bytes.org,
	balbirs@nvidia.com, miko.lenczewski@arm.com,
	peterz@infradead.org, kevin.tian@intel.com, praan@google.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v9 7/7] iommu/arm-smmu-v3: Perform per-domain invalidations using arm_smmu_invs
Date: Fri, 23 Jan 2026 15:59:39 -0400	[thread overview]
Message-ID: <20260123195939.GE1134360@nvidia.com> (raw)
In-Reply-To: <aXOqwymxe-nzYpwU@willie-the-truck>

On Fri, Jan 23, 2026 at 05:07:15PM +0000, Will Deacon wrote:
> On Fri, Dec 19, 2025 at 12:11:29PM -0800, Nicolin Chen wrote:
> > Replace the old invalidation functions with arm_smmu_domain_inv_range() in
> > all the existing invalidation routines. And deprecate the old functions.
> > 
> > The new arm_smmu_domain_inv_range() handles the CMDQ_MAX_TLBI_OPS as well,
> > so drop it in the SVA function.
> > 
> > Since arm_smmu_cmdq_batch_add_range() has only one caller now, and it must
> > be given a valid size, add a WARN_ON_ONCE to catch any missed case.
> > 
> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   7 -
> >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  29 +--
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 165 +-----------------
> >  3 files changed, 11 insertions(+), 190 deletions(-)
> 
> It's one thing replacing the invalidation implementation but I think you
> need to update some of the old ordering comments, too. In particular,
> the old code relies on the dma_wmb() during cmdq insertion to order
> updates to in-memory structures, which includes the pgtable in non-strict
> mode.
> 
> I don't think any of that is true now?

You are talking about this comment?

	/*
	 * NOTE: when io-pgtable is in non-strict mode, we may get here with
	 * PTEs previously cleared by unmaps on the current CPU not yet visible
	 * to the SMMU. We are relying on the dma_wmb() implicit during cmd
	 * insertion to guarantee those are observed before the TLBI. Do be
	 * careful, 007.
	 */

Maybe we can restate that a little bit:

	/*
	 * If the DMA API is running in non-strict mode then another CPU could
	 * have changed the page table and not invoked any flush op. Instead the
	 * other CPU will do an atomic_read() and this CPU will have done an
	 * atomic_write(). That handshake is enough to acquire the page table
	 * writes from the other CPU.
	 *
	 * All command execution has a dma_wmb() to release all the in-memory
	 * structures written by this CPU, that barrier must also release the
	 * writes acquired from all the other CPUs too.
	 *
	 * There are other barriers and atomics on this path, but the above is
	 * the essential mechanism for ensuring that HW sees the page table
	 * writes from another CPU before it executes the IOTLB invalidation.
	 */

I'm sure this series adds more barriers that might move things earlier
in the sequence, but that isn't why those barries exist.

I think the original documentation was on to something important so
I'd like to keep the information, though perhaps the comment belongs
in dma-iommu.c as it really applies to all iommu drivers on all
architectures.

It is also why things were done as a smp_XX not a dma_XX. The
intention was not to release things all the way to DMA, but just to
release things enough that the thread writing the STE can acquire
them and it will ensure they are released to DMA.

For instance on UP it is fine to have no barriers at all in the
invalidation code, the dma_wmb() in the STE store command is
sufficient.

Jason

next prev parent reply	other threads:[~2026-01-23 19:59 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-19 20:11 [PATCH v9 0/7] iommu/arm-smmu-v3: Introduce an RCU-protected invalidation array Nicolin Chen
2025-12-19 20:11 ` [PATCH v9 1/7] iommu/arm-smmu-v3: Explicitly set smmu_domain->stage for SVA Nicolin Chen
2026-01-23  9:49   ` Pranjal Shrivastava
2025-12-19 20:11 ` [PATCH v9 2/7] iommu/arm-smmu-v3: Add an inline arm_smmu_domain_free() Nicolin Chen
2026-01-23  9:50   ` Pranjal Shrivastava
2025-12-19 20:11 ` [PATCH v9 3/7] iommu/arm-smmu-v3: Introduce a per-domain arm_smmu_invs array Nicolin Chen
2026-01-23  9:53   ` Pranjal Shrivastava
2026-01-23 17:03   ` Will Deacon
2026-01-23 17:35     ` Nicolin Chen
2026-01-23 17:51       ` Will Deacon
2026-01-23 17:56         ` Nicolin Chen
2026-01-23 19:16           ` Jason Gunthorpe
2026-01-23 19:18             ` Nicolin Chen
2026-01-26 14:54             ` Will Deacon
2026-01-26 15:21               ` Jason Gunthorpe
2025-12-19 20:11 ` [PATCH v9 4/7] iommu/arm-smmu-v3: Pre-allocate a per-master invalidation array Nicolin Chen
2026-01-23  9:54   ` Pranjal Shrivastava
2025-12-19 20:11 ` [PATCH v9 5/7] iommu/arm-smmu-v3: Populate smmu_domain->invs when attaching masters Nicolin Chen
2025-12-19 20:11 ` [PATCH v9 6/7] iommu/arm-smmu-v3: Add arm_smmu_invs based arm_smmu_domain_inv_range() Nicolin Chen
2026-01-23  9:48   ` Pranjal Shrivastava
2026-01-23 13:56     ` Jason Gunthorpe
2026-01-27 16:38     ` Nicolin Chen
2026-01-27 17:08       ` Jason Gunthorpe
2026-01-27 18:07         ` Nicolin Chen
2026-01-27 18:23           ` Jason Gunthorpe
2026-01-27 18:37             ` Nicolin Chen
2026-01-27 19:19               ` Jason Gunthorpe
2026-01-27 20:14                 ` Nicolin Chen
2026-01-28  0:05                   ` Jason Gunthorpe
2026-01-23 17:05   ` Will Deacon
2026-01-23 17:10     ` Will Deacon
2026-01-23 17:43       ` Nicolin Chen
2026-01-23 20:03       ` Jason Gunthorpe
2026-01-26 13:01         ` Will Deacon
2026-01-26 15:20           ` Jason Gunthorpe
2026-01-26 16:02             ` Will Deacon
2026-01-26 16:09               ` Jason Gunthorpe
2026-01-26 18:56                 ` Will Deacon
2026-01-27  3:14                   ` Nicolin Chen
2026-01-26 17:50             ` Nicolin Chen
2025-12-19 20:11 ` [PATCH v9 7/7] iommu/arm-smmu-v3: Perform per-domain invalidations using arm_smmu_invs Nicolin Chen
2026-01-23 17:07   ` Will Deacon
2026-01-23 17:47     ` Nicolin Chen
2026-01-23 19:59     ` Jason Gunthorpe [this message]
2026-01-19 17:10 ` [PATCH v9 0/7] iommu/arm-smmu-v3: Introduce an RCU-protected invalidation array Nicolin Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260123195939.GE1134360@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=balbirs@nvidia.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miko.lenczewski@arm.com \
    --cc=nicolinc@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=praan@google.com \
    --cc=robin.murphy@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox