Re: [PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Mostafa Saleh <smostafa@google.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Ashish Mhetre <amhetre@nvidia.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions
Date: Mon, 8 Jun 2026 06:19:35 +0000	[thread overview]
Message-ID: <aiZe97OkmvcUwCzS@google.com> (raw)
In-Reply-To: <CAE2F3rDF5dVJip2gc+vaxtM-FRttnUEnAetUYz6qDCXRM=fNYg@mail.gmail.com>

On Sun, Jun 07, 2026 at 02:42:37PM -0700, Daniel Mentz wrote:
> On Mon, Jun 1, 2026 at 2:59 PM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > Introduce a new bit flag, CMDQ_PROD_STOP_FLAG (bit 30), in the command
> > queue's producer index to safely gate command submissions during device
> > suspension.
> >
> > The flag embeds the suspend state directly into the existing global state
> > The flag checked in the cmpxchg loop in arm_smmu_cmdq_issue_cmdlist(),
> > which acts as a Point of Commitment, ensuring that no indices are
> > reserved or committed once the SMMU begins suspending.
> >
> > This prevents a situation of "abandoned batches" where indices are
> > incremented but commands are never written, which would otherwise
> > lead to timeout during the drain poll.
> >
> > Update queue_inc_prod_n() to preserve this flag during index
> > calculations, ensuring that any in-flight commands that successfully
> > passed the point of commitment can proceed to completion while the
> > flag remains set.

[...]

> >
> >  static void queue_poll_init(struct arm_smmu_device *smmu,
> > @@ -718,8 +719,25 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> >         do {
> >                 u64 old;
> >
> > +               /*
> > +                * If the SMMU is suspended/suspending, any new CMDs are elided.
> > +                * This loop is the Point of Commitment. If we haven't cmpxchg'd
> > +                * our new indices yet, we can safely bail. Once the indices are
> > +                * committed, we MUST write valid commands to those slots to
> > +                * avoid indefinite polling in the drain function.
> > +                */
> > +               if (Q_STOP(llq.prod)) {
> > +                       local_irq_restore(flags);
> > +                       return 0;
> 
> On second thought, I no longer believe that this is safe. I understand
> that READ_ONCE(cmdq->q.llq.val) implies no ordering guarantees with
> respect to any writes to translation tables. In the non-stopped case,
> we can rely on the call to dma_wmb() further down in this function.
> However, for the stopped case, I can't identify any barriers that
> would ensure that the STOP flag is checked only after the writes are
> visible to SMMU. Here is an example: Let's assume the following
> program order:
> 
>  * Write to invalidate PTE
>  * Read from cmdq->q.llq.val, determine STOP flag is set, elide TLBI
> 
> What prevents the CPU from reordering these operations as follows?
> 
>  * Read from cmdq->q.llq.val, determine STOP flag is set, elide TLBI
>  * Write to invalidate PTE
> 
> Can the following situation occur?

Not really because:

> 
>  * Read from cmdq->q.llq.val, determine STOP flag is set, elide TLBI
>  * (Different CPU resumes SMMU)

We do a full smp_mb() here to ensure all writes are seen before SMMUEN=1
(note that the STOP_FLAG is already unset at that point, so we stop
eliding commands much before the SMMU is physically able to access any
config/xlation structures, I've explained everything below).

>  * SMMU loads old PTE value into TLB
>  * Write to invalidate PTE
>  * (stale PTE remains in TLB)
> 
> I propose the following: If you find the STOP flag set, run dma_mb()
> and check again. I'm afraid that running dma_mb() unconditionally
> might incur too much of a performance penalty.

IMHO, I think this might be overcomplicating things here.. 
Here's why the current version works according to me:

Till SMMUEN=0, the SMMU is spec-guaranteed to never access translation
or configuration structures (Section 6.3.9.6). 

Our runtime_suspend callback sets SMMUEN=0 before setting the STOP_FLAG.

Even if the worker CPU reorders the PTE write after the STOP_FLAG check,
it is benign because the SMMU is incapable of fetching that (or any) PTE
while the gate is closed. Because GATE_CLOSED == SMMUEN = 0, implying no
access to any HW structures whatsoever.

The real synchronization happens in the Resume Path:

1. arm_smmu_device_reset() clears all caches / TLBs.
   (None of these can have entries before SMMUEN=1)

2. We execute a full smp_mb() before setting SMMUEN=1. (that's why we
   need smp_mb before SMMUEN=1). This barrier ensures that any PTE 
   writes made by any thread—including those that were elided while the
   gate was closed, are globally visible before the SMMU hardware starts 
   fetching into TLBs again. (This is why Jason suggested this in v6 [1])

Adding a dma_mb() to the elision path would be redundant given the
SMMU can't access any structures and the resume barrier. We'd be
needlessly injecting dma_mbs when no entity is actually accessing those
structures while the STOP_FLAG is set. 
> 
> I have the same concerns with arm_smmu_cmdq_can_elide(): That
> READ_ONCE in arm_smmu_cmdq_can_elide() provides no guarantees in
> regards to ordering relative to PTE writes.
> 

The same as above, No structures are accessed during SMMUEN=0 (spec) and
during resume before setting SMMUEN=1, we do a full smp_mb() to ensure
all concurrent PTE writes are acquired in our resume thread before we
enable SMMUEN=1

Thanks,
Praan

[1] https://lore.kernel.org/all/20260424151639.GE3611611@ziepe.ca/

next prev parent reply	other threads:[~2026-06-08  6:19 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01 21:58 [PATCH v8 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-06-01 21:58 ` [PATCH v8 01/12] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-06-01 21:58 ` [PATCH v8 02/12] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-06-02  0:12   ` Nicolin Chen
2026-06-02  3:28     ` Pranjal Shrivastava
2026-06-02  5:21   ` Daniel Mentz
2026-06-01 21:59 ` [PATCH v8 03/12] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 04/12] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 05/12] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 06/12] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-06-02  0:15   ` Nicolin Chen
2026-06-02  3:31     ` Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 07/12] iommu/arm-smmu-v3: Add CMDQ_PROD_STOP_FLAG to gate CMDQ submissions Pranjal Shrivastava
2026-06-07 21:42   ` Daniel Mentz
2026-06-08  6:19     ` Pranjal Shrivastava [this message]
2026-06-01 21:59 ` [PATCH v8 08/12] iommu/tegra241-cmdqv: Add a helper to quiesce VCMDQs Pranjal Shrivastava
2026-06-02  0:14   ` Nicolin Chen
2026-06-02  3:37     ` Pranjal Shrivastava
2026-06-02  5:59       ` Nicolin Chen
2026-06-02  6:21         ` Pranjal Shrivastava
2026-06-02  6:29           ` Nicolin Chen
2026-06-01 21:59 ` [PATCH v8 09/12] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-06-02  5:25   ` Daniel Mentz
2026-06-02 12:12     ` Pranjal Shrivastava
2026-06-07 22:36       ` Daniel Mentz
2026-06-02 15:27   ` Daniel Mentz
2026-06-07 21:53   ` Daniel Mentz
2026-06-07 22:30   ` Daniel Mentz
2026-06-01 21:59 ` [PATCH v8 10/12] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-06-01 21:59 ` [PATCH v8 11/12] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-06-02  0:24   ` Nicolin Chen
2026-06-02  3:59     ` Pranjal Shrivastava
2026-06-02  5:51       ` Nicolin Chen
2026-06-02  6:24         ` Pranjal Shrivastava
2026-06-03 20:28   ` Daniel Mentz
2026-06-04  6:27     ` Pranjal Shrivastava
2026-06-07 22:22       ` Daniel Mentz
2026-06-01 21:59 ` [PATCH v8 12/12] iommu/arm-smmu-v3: Add KUnit unit tests for Runtime PM Pranjal Shrivastava
2026-06-02  6:03 ` [PATCH v8 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Nicolin Chen
2026-06-02 12:04   ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aiZe97OkmvcUwCzS@google.com \
    --to=praan@google.com \
    --cc=amhetre@nvidia.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox