Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Mostafa Saleh <smostafa@google.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Ashish Mhetre <amhetre@nvidia.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions
Date: Tue, 9 Jun 2026 10:05:28 +0000	[thread overview]
Message-ID: <aiflaI4svEJvZbsC@google.com> (raw)
In-Reply-To: <CAE2F3rDZupDgq5tQrvpuixGi-kVs+oi-Tacqa=Rxtri03LBPHQ@mail.gmail.com>

On Mon, Jun 08, 2026 at 09:20:42PM -0700, Daniel Mentz wrote:
> On Sun, Jun 7, 2026 at 11:19 PM Pranjal Shrivastava <praan@google.com> wrote:
> > > >
> > > >  static void queue_poll_init(struct arm_smmu_device *smmu,
> > > > @@ -718,8 +719,25 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > >         do {
> > > >                 u64 old;
> > > >
> > > > +               /*
> > > > +                * If the SMMU is suspended/suspending, any new CMDs are elided.
> > > > +                * This loop is the Point of Commitment. If we haven't cmpxchg'd
> > > > +                * our new indices yet, we can safely bail. Once the indices are
> > > > +                * committed, we MUST write valid commands to those slots to
> > > > +                * avoid indefinite polling in the drain function.
> > > > +                */
> > > > +               if (Q_STOP(llq.prod)) {
> > > > +                       local_irq_restore(flags);
> > > > +                       return 0;
> > >
> > > On second thought, I no longer believe that this is safe. I understand
> > > that READ_ONCE(cmdq->q.llq.val) implies no ordering guarantees with
> > > respect to any writes to translation tables. In the non-stopped case,
> > > we can rely on the call to dma_wmb() further down in this function.
> > > However, for the stopped case, I can't identify any barriers that
> > > would ensure that the STOP flag is checked only after the writes are
> > > visible to SMMU. Here is an example: Let's assume the following
> > > program order:
> > >
> > >  * Write to invalidate PTE
> > >  * Read from cmdq->q.llq.val, determine STOP flag is set, elide TLBI
> > >
> > > What prevents the CPU from reordering these operations as follows?
> > >
> > >  * Read from cmdq->q.llq.val, determine STOP flag is set, elide TLBI
> > >  * Write to invalidate PTE
> 
> Now, I believe that the smp_mb() call in arm_smmu_domain_inv_range()
> actually prevents this (if we remove arm_smmu_cmdq_can_elide or move
> it to after the smp_mb() call)
> 
> > >
> > > Can the following situation occur?
> >
> > Not really because:
> >
> > >
> > >  * Read from cmdq->q.llq.val, determine STOP flag is set, elide TLBI
> > >  * (Different CPU resumes SMMU)
> >
> > We do a full smp_mb() here to ensure all writes are seen before SMMUEN=1
> > (note that the STOP_FLAG is already unset at that point, so we stop
> > eliding commands much before the SMMU is physically able to access any
> > config/xlation structures, I've explained everything below).
> >
> > >  * SMMU loads old PTE value into TLB
> > >  * Write to invalidate PTE
> > >  * (stale PTE remains in TLB)
> > >
> > > I propose the following: If you find the STOP flag set, run dma_mb()
> > > and check again. I'm afraid that running dma_mb() unconditionally
> > > might incur too much of a performance penalty.
> >
> > IMHO, I think this might be overcomplicating things here..
> > Here's why the current version works according to me:
> >
> > Till SMMUEN=0, the SMMU is spec-guaranteed to never access translation
> > or configuration structures (Section 6.3.9.6).
> >
> > Our runtime_suspend callback sets SMMUEN=0 before setting the STOP_FLAG.
> 
> Does "before" refer to program order or the order in which other
> agents can observe the STOP_FLAG relative to when SMMUEN=0 is set?
> Your program might read "Set SMMUEN=0 then set STOP_FLAG". However, my
> question is whether other observers might see the STOP_FLAG before
> SMMUEN is cleared. I couldn't identify any barriers in
> arm_smmu_runtime_suspend() that would prevent this type of
> re-ordering.

Hmm.. that's right. I guess I was wrongly relying on the fact that 
_relaxed() variants are guaranteed to be ordered amongst themselves on
the same CPU but reading the barriers documentation [1] I see:

These are similar to readX() and writeX(), 
[...] but they are still guaranteed to be ordered with respect to other
accesses from the same CPU thread to the same peripheral when operating
on __iomem pointers mapped with the default I/O attributes.

Hence, it seems it is possible for RAM writes to be re-ordered 
before the writeX_relaxed(MMIO).

In that case, I guess we could just use the non-relaxed version to set
the STOP_FLAG...

> 
> > Even if the worker CPU reorders the PTE write after the STOP_FLAG check,
> > it is benign because the SMMU is incapable of fetching that (or any) PTE
> > while the gate is closed. Because GATE_CLOSED == SMMUEN = 0, implying no
> > access to any HW structures whatsoever.
> >
> > The real synchronization happens in the Resume Path:
> >
> > 1. arm_smmu_device_reset() clears all caches / TLBs.
> >    (None of these can have entries before SMMUEN=1)
> >
> > 2. We execute a full smp_mb() before setting SMMUEN=1. (that's why we
> >    need smp_mb before SMMUEN=1). This barrier ensures that any PTE
> >    writes made by any thread—including those that were elided while the
> >    gate was closed, are globally visible before the SMMU hardware starts
> >    fetching into TLBs again. (This is why Jason suggested this in v6 [1])
> 
> A barrier on one CPU has no bearing on whether writes by any other CPU
> can be observed by any particular agent in the system.
> 
> Let's compare this with the long comment in
> arm_smmu_domain_inv_range() which is what I believe Jason was
> referring to. In that example, you see smp_mb() in the code path on
> CPU0 and dma_wmb() in the code path on CPU1. Hence, barriers exist on
> both sides. If you compare the runtime PM design with
> arm_smmu_domain_inv_range(), then smp_mb() belongs in the CPU thread
> that performs the translation table updates not the one that performs
> the suspend/resume operation.
> 

I might be missing something here, so please bear with me. My 
understanding it that's needed because the IOMMU is live & actively
caching, which is not true for our case.

[Assuming we use non-relaxed semantics & ordering for the STOP flag,
i.e. set STOP_FLAG + barrier & clear STOP_FLAG (implicit dma_wmb())]

In our case, during the resume op, we first clear the STOP_FLAG before
setting SMMUEN=1 in program order. Thus, any PTE invalidations occurring
before SMMUEN=1 are executed, i.e. EVEN when the SMMU is guaranteed not 
to access any structures, we've resumed invalidations. 

Let's consider a few examples:

1. SUSPEND (say CPU0 is suspending)

[CPU0] SMMUEN = 0 ==> SMMU stops accessing HW structures (ABORT NOT set)
		      HW structures not accessed means no TLB / CFG
		      cache accesses as well according to the spec.

[CPU1] ==> PTE update => Invalidate => Succeeds (although SMMUEN = 0)

[CPU0] GBPA.Abort set ==> Txns are blocked

[CPU2] => PTE update => Invalidate => Succeeds [Txns blocked + SMMUEN=0]

[CPU0] ==> SET STOP_FLAG ==> Elision begins

[CPU3] ==> PTE update ==> Invalidation ==> Elided [Txns blocked + SMMUEN=0]

Hence, the races in the suspend sequence are handled correctly.

2. RESUME (say CPU0 is resuming)

[CPU1] ==> Update PTE ==> Invalidate ==> Elided [Txns blocked + SMMUEN=0]

[CPU0] ==> Clear STOP_FLAGs [Txns still blocked + SMMUEN=0]

[CPU2] ==> Update PTE ==> Invalidate ==> Succeeds [Txns blocked + SMMUEN=0]

[CPU0] ==> Invalidate all TLB ==> Succeeds [Txns still blocked + SMMUEN=0]
[CPU0] ==> Invalidate all CFG ==> Succeeds [Txns still blocked + SMMUEN=0]

[CPU2] ==> Update PTE ==> Invalidate ==> Succeeds [Txns still blocked + SMMUEN=0]

[CPU0] ==> Set SMMUEN = 1 [SMMU can now access in memory structures]
	   However, the TLBs and CFG caches are clean because everything
	   until this point couldn't have cached anything anyway.

Hence, right after clearing the STOP_FLAG, we're taking in invalidations
as normal in the resume, much before the real caching can begin.

Thus, by resuming invalidations before SMMUEN=1, we guarantee a 
consistent state before the very first translation is performed.

Apart from this, I guess I'll drop the can_elide check from all 
invalidation paths.

Does that sound fine?

> Actually, now that I think of it: We could just piggy pack on the
> existing smp_mb() in arm_smmu_domain_inv_range(). For your
> arm_smmu_cmdq_can_elide() to work correctly, though, you would need to
> move it to *after* the smp_mb() call. This being said, I would prefer
> to remove arm_smmu_cmdq_can_elide() as I wrote on the other thread.
> If we do take advantage of the existing smp_mb() in
> arm_smmu_domain_inv_range(), then we should probably update that
> comment to state that the runtime PM implementation now also depends
> on it.
> 
> >
> > Adding a dma_mb() to the elision path would be redundant given the
> > SMMU can't access any structures and the resume barrier. We'd be
> > needlessly injecting dma_mbs when no entity is actually accessing those
> > structures while the STOP_FLAG is set.
> > >
> > > I have the same concerns with arm_smmu_cmdq_can_elide(): That
> > > READ_ONCE in arm_smmu_cmdq_can_elide() provides no guarantees in
> > > regards to ordering relative to PTE writes.
> > >
> >
> > The same as above, No structures are accessed during SMMUEN=0 (spec) and
> > during resume before setting SMMUEN=1, we do a full smp_mb() to ensure
> > all concurrent PTE writes are acquired in our resume thread before we
> > enable SMMUEN=1
> 
> I think the definition of "all concurrent PTE writes" is a bit vague.
> I don't think the architecture allows you to wait for writes performed
> by other CPUs to be observable by all other agents. The only thing you
> can do is to ensure that other agents can observe that the STOP_FLAG
> has been cleared *before* SMMUEN is set. However, this is already
> achieved by the existing dma_wmb() in arm_smmu_cmdq_issue_cmdlist().
> Hence, the extra smp_mb() is not required.
> 

Right, I meant that the smp_mb was needed for the STOP_FLAG is visible to
other CPUs performing PTE writes.. but I agree that the first CFGI
invalidation's dma_wmb() would ensure the STOP_FLAG is visible to all 
other CPUs, I'll drop this and add a comment explaining this.

Thanks,
Praan


  reply	other threads:[~2026-06-09 10:05 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01 21:58 [PATCH v8 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-06-01 21:58 ` [PATCH v8 01/12] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-06-01 21:58 ` [PATCH v8 02/12] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-06-02  0:12   ` Nicolin Chen
2026-06-02  3:28     ` Pranjal Shrivastava
2026-06-02  5:21   ` Daniel Mentz
2026-06-01 21:59 ` [PATCH v8 03/12] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 04/12] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 05/12] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 06/12] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-06-02  0:15   ` Nicolin Chen
2026-06-02  3:31     ` Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions Pranjal Shrivastava
2026-06-07 21:42   ` Daniel Mentz
2026-06-08  6:19     ` Pranjal Shrivastava
2026-06-09  4:20       ` Daniel Mentz
2026-06-09 10:05         ` Pranjal Shrivastava [this message]
2026-06-01 21:59 ` [PATCH v8 08/12] iommu/tegra241-cmdqv: Add a helper to quiesce VCMDQs Pranjal Shrivastava
2026-06-02  0:14   ` Nicolin Chen
2026-06-02  3:37     ` Pranjal Shrivastava
2026-06-02  5:59       ` Nicolin Chen
2026-06-02  6:21         ` Pranjal Shrivastava
2026-06-02  6:29           ` Nicolin Chen
2026-06-01 21:59 ` [PATCH v8 09/12] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-06-02  5:25   ` Daniel Mentz
2026-06-02 12:12     ` Pranjal Shrivastava
2026-06-07 22:36       ` Daniel Mentz
2026-06-02 15:27   ` Daniel Mentz
2026-06-09 10:13     ` Pranjal Shrivastava
2026-06-07 21:53   ` Daniel Mentz
2026-06-09 10:12     ` Pranjal Shrivastava
2026-06-07 22:30   ` Daniel Mentz
2026-06-09 10:09     ` Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 10/12] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 11/12] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-06-02  0:24   ` Nicolin Chen
2026-06-02  3:59     ` Pranjal Shrivastava
2026-06-02  5:51       ` Nicolin Chen
2026-06-02  6:24         ` Pranjal Shrivastava
2026-06-03 20:28   ` Daniel Mentz
2026-06-04  6:27     ` Pranjal Shrivastava
2026-06-07 22:22       ` Daniel Mentz
2026-06-09 10:34         ` Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 12/12] iommu/arm-smmu-v3: Add KUnit unit tests for Runtime PM Pranjal Shrivastava
2026-06-02  6:03 ` [PATCH v8 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Nicolin Chen
2026-06-02 12:04   ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiflaI4svEJvZbsC@google.com \
    --to=praan@google.com \
    --cc=amhetre@nvidia.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox