From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
Joerg Roedel <joro@8bytes.org>,
Robin Murphy <robin.murphy@arm.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Mostafa Saleh <smostafa@google.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Ashish Mhetre <amhetre@nvidia.com>,
Sairaj Kodilkar <sarunkod@amd.com>
Subject: Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
Date: Mon, 9 Mar 2026 21:13:26 +0000 [thread overview]
Message-ID: <aa839kz29GIgg_W3@google.com> (raw)
In-Reply-To: <aa8jkLD0l2WqMnyG@google.com>
On Mon, Mar 09, 2026 at 07:46:24PM +0000, Pranjal Shrivastava wrote:
> On Sun, Mar 08, 2026 at 02:23:03PM -0700, Daniel Mentz wrote:
> > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > Introduce a biased counter to track the number of active cmdq owners as
> > > a preparatory step for the runtime PM implementation.
> >
> > Non-owners waiting for their CMD_SYNC to be consumed also appear to
> > increase this counter.
> >
>
> Ack, I'll clarify that in the commit message too, any owners or
> non-owners waiting on sync would increment this counter.
>
> > >
> > > The counter will be used to gate command submission, preventing the
> > > submission of new commands while the device is suspended and deferring
> > > suspend while the command submissions are in-flight.
> >
> > What does command submission mean in this context? I understand that
> > commands are still added to the queue (until the queue is full). It's
> > only the producer index that is not advanced.
> >
>
> Exactly, incrementing the "producer index" is how the command is
> actually submitted to the HW. In case the SMMU is suspended, we'd simply
> add it to the queue but not touch the PROD register in the HW. I can
> clarify this further by saying "preventing the flushing of new commands"
>
> > >
> > > The counter is biased to a value of 1 during device reset. A cmdq owner
> > > or a thread issuing cmds with sync, increment it before accessing HW
> > > registers and decrements it with release semantics afterwards.
> > >
> > > A value of 1 represents an idle (but active) state. A suspend operation
> > > will set it to from 1 -> 0 representing the suspended state.
> > >
> > > Signed-off-by: Pranjal Shrivastava <praan@google.com>
> > > ---
> > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++++----
> > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 +
> > > 2 files changed, 57 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 91edcd8922a7..3a8d3c2a8d69 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -805,7 +805,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > u64 cmd_sync[CMDQ_ENT_DWORDS];
> > > u32 prod;
> > > unsigned long flags;
> > > - bool owner;
> > > + bool owner, has_ref = false;
> > > struct arm_smmu_ll_queue llq, head;
> > > int ret = 0;
> > >
> > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > > while (!queue_has_space(&llq, n + sync)) {
> > > local_irq_restore(flags);
> > > +
> > > + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > > + /* Device is suspended, don't wait for space */
> > > + return 0;
> > > +
> >
> > Does this mean that we still accumulate commands until the queue is
> > full while in the suspended state and then have SMMU excecute all of
> > these (now irrelevant) commands when the command queue is re-enabled?
> > One might argue that this is wasteful, but I guess it might be
> > technically correct, because it doesn't hurt either.
> >
>
> No, we drop any commands that are issued if the command queue doesn't
> have space while it is suspended. There are two reasons for this:
>
> 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
> MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
> when the SMMU is suspended.
>
> 2. We usually have 3 types of commands in the SMMUv3:
>
> a) Invalidation commands: We clear the entire TLB during resume, so we
> don't really need to batch these.
>
> b) Prefetch commands: Any prefetches are effectively no-ops as when the
> SMMU is suspended, the resume would effectively read the latest config
>
> c) Page responses: These are NOT expected to occur while the SMMU is
> suspended and we shouldn't be batching them anyway. (The page resp
> handlers handle such situations and avoid queueing-in commands anyway
> with this series)
>
> Thus, any dropped invalidation / prefetch commands don't harm us if the
> SMMU was suspended.
>
>
> > > if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
> > > dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
> > > +
> > > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > local_irq_save(flags);
> > > }
> > >
> > > @@ -879,10 +886,35 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod);
> > >
> > > /*
> > > - * d. Advance the hardware prod pointer
> > > + * d. Advance the hardware prod pointer (if smmu is still active)
> >
> > Consider "(if command queue is still enabled)". Otherwise, what does
> > "active" mean? SMMU and/or command queue enabled?
> >
>
> The suspend callback is implemented in a way where we can't have a
> "partial" suspend, i.e. there is no code path where the CMDQ is disabled
> but the SMMU is not. Thus, "active" was supposed to mean The SMMU is
> enabled entirely since there isn't a partial suspend/resume implemented.
>
> Although, I can update the comment to say (if smmu is still enabled) OR
> (if smmu is NOT suspended).
>
> > > * Control dependency ordering from the entries becoming valid.
> > > */
> > > - writel_relaxed(prod, cmdq->q.prod_reg);
> > > + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > > + writel_relaxed(prod, cmdq->q.prod_reg);
> > > +
> > > + if (sync) {
> > > + has_ref = true;
> > > + } else {
> > > + /*
> > > + * Use release semantics to enforce ordering without a full barrier.
> > > + * This ensures the prior writel_relaxed() is ordered/visible
> > > + * before the refcount decrement, avoiding the heavy pipeline
> > > + * stall of a full wmb().
> > > + *
> > > + * We need the atomic_dec_return_release() below and the
> > > + * atomic_set_release() in step (e) below doesn't suffice.
> > > + *
> > > + * Specifically, without release semantics on the decrement,
> > > + * the CPU is free to reorder the independent atomic_dec_relaxed()
> > > + * before the writel_relaxed().
> > > + *
> > > + * If this happens, the refcount could drop to zero, allowing the PM
> > > + * suspend path (running on another CPU) to disable the SMMU before
> > > + * the register write completes, resulting in a bus fault.
> > > + */
> > > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > + }
> > > + }
> > >
> > > /*
> > > * e. Tell the next owner we're done
> > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > > if (sync) {
> > > - llq.prod = queue_inc_prod_n(&llq, n);
> > > - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > > - if (ret) {
> > > - dev_err_ratelimited(smmu->dev,
> > > - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > > - llq.prod,
> > > - readl_relaxed(cmdq->q.prod_reg),
> > > - readl_relaxed(cmdq->q.cons_reg));
> > > +
> > > + /* If we are not the owner, check if we're suspended */
> > > + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> >
> > Can there be a situation, where we execute this branch and the
> > hardware prod pointer has not been updated, in which case
> > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> > it times out). Imagine the following series of events:
> > * nr_cmdq_users==0
> > * hardware prod pointer update is skipped
> > * nr_cmdq_users becomes 1
> > * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> > by SMMU because hardware prod pointer update wasn't updated
> > * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> > stuck because the queue is full.
> > * cmdq->q.llq.cons is never updated because the shared lock is held.
> > queue_has_space() returns false, and
> > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> >
>
> Thanks for catching this! IIUC, you mean a race between the resume and
> this code block can cause a deadlock? Something like the following:
>
> CPU 0 CPU 1
> nr_cmdq_users == 0
> skip hw_prod++
> ->resume cb
> nr_cmdq_user == 1
> nr_cmdq_users == 1
> in the if (sync) part
> arm_smmu_cmdq_poll_until_sync()
> CMDQ_OP_CFGI_ALL stuck for space
>
> I guess we could add a check here if (has_ref == false) but
> nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
> need to factor out a helper function here.. I'll think more about this..
>
> I guess the simplest thing would be to bail out much before, something
> like if (nr_cmdq_users == 0 && sync) return;
>
> > > + has_ref = true;
> > > + llq.prod = queue_inc_prod_n(&llq, n);
> > > + ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> >
> > If you skip arm_smmu_cmdq_poll_until_sync() in the nr_cmdq_users==0
> > case, I'm thinking that llq.cons is not updated. If this is the case,
> > is the subsequent WRITE_ONCE(cmdq->q.llq.cons, llq.cons); potentially
> > incorrect?
> >
>
> The goal here was to "simulate" a successful command submission while
> the smmu is suspended (as mentioned by Will earlier), we do everything
> except for touching MMIO here.
>
> Since the commands can be one of the three types mentioned above, we
> don't really miss out on anything with the faux-cmd submission here.
>
> Thus, what if we update llq.prod always and set llq.cons == llq.prod
> condtionally if nr_cmdq_users == 0.
>
> Ideally, this should also prevent the `queue_full` scenario as all
> commands are faux-issued instantly. Maybe something like:
>
> if (sync) {
> llq.prod = queue_inc_prod_n(&llq, n);
>
> if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> if (!has_ref) {
> /* Repair: We were suspended earlier but now we're active.
> * Kick the hardware to publish our commands. */
> has_ref = true;
> ... owner logic again ...
> ...
> writel_relaxed(llq.prod, cmdq->q.prod_reg);
> }
> ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
>
> if (ret) {
> // ...
> }
> } else {
> /*
> * Device is suspended. Fast-forward the consumer pointer in memory
> * to "drop" these commands and avoid a full queue deadlock.
> * (n + 1 to include the SYNC command)
> */
> llq.cons = queue_inc_prod_n(&llq, n + 1);
> }
>
> if (!arm_smmu_cmdq_shared_tryunlock(cmdq)) {
> WRITE_ONCE(cmdq->q.llq.cons, llq.cons);
> arm_smmu_cmdq_shared_unlock(cmdq);
> }
> }
>
> This could still be racy.. I'll give it another thought..
>
Still thinking out loud, but maybe we can have something like:
if (owner) {
....
if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
writel_relaxes(llq.prod, cmdq->q.prod_reg);
has_ref = true;
} else {
/* SMMU is suspended, fake a consumption */
WRITE_ONCE(cmdq->q.llq.cons, prod);
/* Avoid racing with resume */
sync = false;
}
atomic_set_release(&cmdq->owner_prod, prod);
}
if (sync) {
... same stuff ..
}
LMK, what you think?
[ snip ]
Thanks,
Praan
next prev parent reply other threads:[~2026-03-09 21:13 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
2026-03-08 21:23 ` Daniel Mentz
2026-03-09 19:46 ` Pranjal Shrivastava
2026-03-09 21:13 ` Pranjal Shrivastava [this message]
2026-03-09 22:56 ` Daniel Mentz
2026-03-10 16:46 ` Pranjal Shrivastava
2026-03-09 22:41 ` Daniel Mentz
2026-03-10 16:54 ` Pranjal Shrivastava
2026-03-11 4:43 ` Daniel Mentz
2026-03-11 13:53 ` Pranjal Shrivastava
2026-03-11 22:56 ` Daniel Mentz
2026-03-16 15:36 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-03-10 1:45 ` Daniel Mentz
2026-03-10 16:58 ` Pranjal Shrivastava
2026-03-10 21:16 ` Daniel Mentz
2026-03-10 21:40 ` Pranjal Shrivastava
2026-03-11 0:12 ` Daniel Mentz
2026-03-11 5:31 ` Pranjal Shrivastava
2026-03-11 17:26 ` Daniel Mentz
2026-03-16 15:05 ` Pranjal Shrivastava
2026-03-12 19:20 ` Daniel Mentz
2026-01-26 15:11 ` [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
2026-02-11 13:33 ` Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa839kz29GIgg_W3@google.com \
--to=praan@google.com \
--cc=amhetre@nvidia.com \
--cc=danielmentz@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=sarunkod@amd.com \
--cc=smostafa@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox