Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Mostafa Saleh <smostafa@google.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Ashish Mhetre <amhetre@nvidia.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions
Date: Tue, 9 Jun 2026 18:58:05 +0000	[thread overview]
Message-ID: <aihiPQ2mdqMm13Cd@google.com> (raw)
In-Reply-To: <CAE2F3rDgXxAhJFUX6=xars9H92t8NBDqcFRxU7d5wJQ7buUbNA@mail.gmail.com>

On Tue, Jun 09, 2026 at 11:20:52AM -0700, Daniel Mentz wrote:
> On Tue, Jun 9, 2026 at 3:05 AM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > > Even if the worker CPU reorders the PTE write after the STOP_FLAG check,
> > > > it is benign because the SMMU is incapable of fetching that (or any) PTE
> > > > while the gate is closed. Because GATE_CLOSED == SMMUEN = 0, implying no
> > > > access to any HW structures whatsoever.
> > > >
> > > > The real synchronization happens in the Resume Path:
> > > >
> > > > 1. arm_smmu_device_reset() clears all caches / TLBs.
> > > >    (None of these can have entries before SMMUEN=1)
> > > >
> > > > 2. We execute a full smp_mb() before setting SMMUEN=1. (that's why we
> > > >    need smp_mb before SMMUEN=1). This barrier ensures that any PTE
> > > >    writes made by any thread—including those that were elided while the
> > > >    gate was closed, are globally visible before the SMMU hardware starts
> > > >    fetching into TLBs again. (This is why Jason suggested this in v6 [1])
> > >
> > > A barrier on one CPU has no bearing on whether writes by any other CPU
> > > can be observed by any particular agent in the system.
> > >
> > > Let's compare this with the long comment in
> > > arm_smmu_domain_inv_range() which is what I believe Jason was
> > > referring to. In that example, you see smp_mb() in the code path on
> > > CPU0 and dma_wmb() in the code path on CPU1. Hence, barriers exist on
> > > both sides. If you compare the runtime PM design with
> > > arm_smmu_domain_inv_range(), then smp_mb() belongs in the CPU thread
> > > that performs the translation table updates not the one that performs
> > > the suspend/resume operation.
> > >
> >
> > I might be missing something here, so please bear with me. My
> > understanding it that's needed because the IOMMU is live & actively
> > caching, which is not true for our case.
> 
> I think the "invs" design (Per-domain invalidation array) is more
> similar than you think! An SMMU being absent from invs is equivalent
> to the STOP flag, and the STE pointing to TTB0 is roughly the
> equivalent of SMMEN=1 i.e. the IOMMU is not actively caching a
> particular translation domain until an STE (or CD) points to it.
> 
> > [Assuming we use non-relaxed semantics & ordering for the STOP flag,
> > i.e. set STOP_FLAG + barrier & clear STOP_FLAG (implicit dma_wmb())]
> >
> > In our case, during the resume op, we first clear the STOP_FLAG before
> > setting SMMUEN=1 in program order. Thus, any PTE invalidations occurring
> > before SMMUEN=1 are executed, i.e. EVEN when the SMMU is guaranteed not
> > to access any structures, we've resumed invalidations.
> 
> "[...] we first clear the STOP_FLAG before setting SMMUEN=1 in program
> order." I think this should be modified to "we first clear the
> STOP_FLAG and ensure that the cleared STOP_FLAG is observable by all
> other CPUs before setting SMMUEN=1"
> 

Ack. The goal was to explain the algorithm for this thread, I won't be
commenting it in code. Are you suggesting I should convert my 
explaination of the algorithm above into in-line comments and make sure
to include the STOP_FLAG observability part?

> Re "Thus, any PTE invalidations occurring before SMMUEN=1 are
> executed,": I think that "a PTE invalidation occurring" is not clearly
> defined. Also, it's not clear to me what this statement implies. It's
> paramount that invalidations are performed when SMMUEN=1. The fact
> that we perform invalidations before SMMUEN=1 is more of a side effect
> of our methodology.

By "invalidations occurring before SMMUEN=1", I mean that once the STOP_FLAG
is cleared, any concurrent unmaps (PTE updates) will successfully submit 
invalidation cmds to the cmdq. Since SMMU translation (SMMUEN) is still 
disabled at this point, these invalidations execute safely without racing
against SMMU caching (which was the concern in the previous email).

> 
> I would define a set of invariants:
> 
>  * If an agent observes the STOP flag, it is guaranteed that SMMUEN=0
> (with ABORT set) at the time of observation.
>  * Any transition from a set STOP flag to SMMUEN=1 involves an
> invalidate-all operation prior to setting SMMUEN=1
> 
> Hence, if a CPU observes the STOP flag, it is assured that (a)
> transactions are blocked and (b) if the SMMU is ever re-enabled, an
> invalidate-all is performed prior to it being enabled.
> 
> I would then argue that all operations support these invariants. For
> example, we need proper barriers in the iommu_unmap path to ensure
> that the STOP flag is only checked *after* the translation table
> update is made. Hence, we need a memory barrier.
> 
> I look at it this way: Every elided invalidation creates an
> "invalidation deficit", and this deficit is tolerable for two reasons:
> (a) SMMU blocks all transactions while there is a deficit. (b) An
> invalidate-all eliminates any deficit accrued while the STOP flag was
> set.

Ack, which means you agree with the design proposed in my last reply.
I'll document these invariants in line if that's what you're suggesting
here?

> 
> > Let's consider a few examples:
> >
> > 1. SUSPEND (say CPU0 is suspending)
> >
> > [CPU0] SMMUEN = 0 ==> SMMU stops accessing HW structures (ABORT NOT set)
> 
> I thought we never disable the SMMU unless ABORT is set.

Yes, that is evident from the code [1]. I made a mistake in ordering
while trying to explain why & how this works.

> 
> 
> >                       HW structures not accessed means no TLB / CFG
> >                       cache accesses as well according to the spec.
> >
> > [CPU1] ==> PTE update => Invalidate => Succeeds (although SMMUEN = 0)
> >
> > [CPU0] GBPA.Abort set ==> Txns are blocked
> >
> > [CPU2] => PTE update => Invalidate => Succeeds [Txns blocked + SMMUEN=0]
> >
> > [CPU0] ==> SET STOP_FLAG ==> Elision begins
> >
> > [CPU3] ==> PTE update ==> Invalidation ==> Elided [Txns blocked + SMMUEN=0]
> >
> > Hence, the races in the suspend sequence are handled correctly.
> 
> I'm not sure if this description demonstrates that every possible race
> is handled correctly. If I compare this with Nicolin's presentation in
> arm_smmu_domain_inv_range, I like that presentation, as it explicitly
> mentions loads and barriers. For example, it has an smp_mb() followed
> by "// load the updated invs". I think you should make have something
> like "smp_mb() ; CHECK STOP_FLAG" in your presentation. Currently, the
> STOP_FLAG checking is somehow implicit in "Invalidation".

Ack. The goal of this diagram was to explain the working of the design,
this is NOT the comment/document I plan to include in code. 

I'll add this as an in-line comment if that's what you're suggesting? OR
are you also suggesting I should have this in my cover letter?

> 
> >
> > 2. RESUME (say CPU0 is resuming)
> >
> > [CPU1] ==> Update PTE ==> Invalidate ==> Elided [Txns blocked + SMMUEN=0]
> >
> > [CPU0] ==> Clear STOP_FLAGs [Txns still blocked + SMMUEN=0]
> >
> > [CPU2] ==> Update PTE ==> Invalidate ==> Succeeds [Txns blocked + SMMUEN=0]
> >
> > [CPU0] ==> Invalidate all TLB ==> Succeeds [Txns still blocked + SMMUEN=0]
> > [CPU0] ==> Invalidate all CFG ==> Succeeds [Txns still blocked + SMMUEN=0]
> >
> > [CPU2] ==> Update PTE ==> Invalidate ==> Succeeds [Txns still blocked + SMMUEN=0]
> >
> > [CPU0] ==> Set SMMUEN = 1 [SMMU can now access in memory structures]
> >            However, the TLBs and CFG caches are clean because everything
> >            until this point couldn't have cached anything anyway.
> 
> My concern with this diagram is that it appears sequential, suggesting
> operations happen in a specific order across CPUs when they, in fact,
> occur in parallel. I find these diagrams more useful for describing
> failure cases than for proving that every race is handled correctly.
> 

Ack. Again, the goal was to explain it to you and not propose how the
comments are gonna look like. I'll use your preferred way to explain in
the future.

> >
> > Hence, right after clearing the STOP_FLAG, we're taking in invalidations
> > as normal in the resume, much before the real caching can begin.
> >
> > Thus, by resuming invalidations before SMMUEN=1, we guarantee a
> > consistent state before the very first translation is performed.
> >
> > Apart from this, I guess I'll drop the can_elide check from all
> > invalidation paths.
> >
> > Does that sound fine?
> 
> Dropping can_elide sounds fine. However, if you still use this
> function, for example in the gerror handler, then you might consider
> renaming it.

Ack, I'll name it back to arm_smmu_is_suspended().

Thanks,
Praan


  reply	other threads:[~2026-06-09 18:58 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01 21:58 [PATCH v8 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-06-01 21:58 ` [PATCH v8 01/12] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-06-01 21:58 ` [PATCH v8 02/12] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-06-02  0:12   ` Nicolin Chen
2026-06-02  3:28     ` Pranjal Shrivastava
2026-06-02  5:21   ` Daniel Mentz
2026-06-01 21:59 ` [PATCH v8 03/12] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 04/12] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 05/12] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 06/12] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-06-02  0:15   ` Nicolin Chen
2026-06-02  3:31     ` Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions Pranjal Shrivastava
2026-06-07 21:42   ` Daniel Mentz
2026-06-08  6:19     ` Pranjal Shrivastava
2026-06-09  4:20       ` Daniel Mentz
2026-06-09 10:05         ` Pranjal Shrivastava
2026-06-09 18:20           ` Daniel Mentz
2026-06-09 18:58             ` Pranjal Shrivastava [this message]
2026-06-01 21:59 ` [PATCH v8 08/12] iommu/tegra241-cmdqv: Add a helper to quiesce VCMDQs Pranjal Shrivastava
2026-06-02  0:14   ` Nicolin Chen
2026-06-02  3:37     ` Pranjal Shrivastava
2026-06-02  5:59       ` Nicolin Chen
2026-06-02  6:21         ` Pranjal Shrivastava
2026-06-02  6:29           ` Nicolin Chen
2026-06-01 21:59 ` [PATCH v8 09/12] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-06-02  5:25   ` Daniel Mentz
2026-06-02 12:12     ` Pranjal Shrivastava
2026-06-07 22:36       ` Daniel Mentz
2026-06-02 15:27   ` Daniel Mentz
2026-06-09 10:13     ` Pranjal Shrivastava
2026-06-07 21:53   ` Daniel Mentz
2026-06-09 10:12     ` Pranjal Shrivastava
2026-06-07 22:30   ` Daniel Mentz
2026-06-09 10:09     ` Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 10/12] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 11/12] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-06-02  0:24   ` Nicolin Chen
2026-06-02  3:59     ` Pranjal Shrivastava
2026-06-02  5:51       ` Nicolin Chen
2026-06-02  6:24         ` Pranjal Shrivastava
2026-06-03 20:28   ` Daniel Mentz
2026-06-04  6:27     ` Pranjal Shrivastava
2026-06-07 22:22       ` Daniel Mentz
2026-06-09 10:34         ` Pranjal Shrivastava
2026-06-09 18:34           ` Daniel Mentz
2026-06-09 20:11             ` Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 12/12] iommu/arm-smmu-v3: Add KUnit unit tests for Runtime PM Pranjal Shrivastava
2026-06-02  6:03 ` [PATCH v8 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Nicolin Chen
2026-06-02 12:04   ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aihiPQ2mdqMm13Cd@google.com \
    --to=praan@google.com \
    --cc=amhetre@nvidia.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox