public inbox for iommu@lists.linux-foundation.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Mostafa Saleh <smostafa@google.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Ashish Mhetre <amhetre@nvidia.com>,
	Sairaj Kodilkar <sarunkod@amd.com>
Subject: Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
Date: Mon, 9 Mar 2026 21:13:26 +0000	[thread overview]
Message-ID: <aa839kz29GIgg_W3@google.com> (raw)
In-Reply-To: <aa8jkLD0l2WqMnyG@google.com>

On Mon, Mar 09, 2026 at 07:46:24PM +0000, Pranjal Shrivastava wrote:
> On Sun, Mar 08, 2026 at 02:23:03PM -0700, Daniel Mentz wrote:
> > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava <praan@google.com> wrote:
> > >
> > > Introduce a biased counter to track the number of active cmdq owners as
> > > a preparatory step for the runtime PM implementation.
> > 
> > Non-owners waiting for their CMD_SYNC to be consumed also appear to
> > increase this counter.
> > 
> 
> Ack, I'll clarify that in the commit message too, any owners or
> non-owners waiting on sync would increment this counter.
> 
> > >
> > > The counter will be used to gate command submission, preventing the
> > > submission of new commands while the device is suspended and deferring
> > > suspend while the command submissions are in-flight.
> > 
> > What does command submission mean in this context? I understand that
> > commands are still added to the queue (until the queue is full). It's
> > only the producer index that is not advanced.
> > 
> 
> Exactly, incrementing the "producer index" is how the command is
> actually submitted to the HW. In case the SMMU is suspended, we'd simply
> add it to the queue but not touch the PROD register in the HW. I can
> clarify this further by saying "preventing the flushing of new commands"
> 
> > >
> > > The counter is biased to a value of 1 during device reset. A cmdq owner
> > > or a thread issuing cmds with sync, increment it before accessing HW
> > > registers and decrements it with release semantics afterwards.
> > >
> > > A value of 1 represents an idle (but active) state. A suspend operation
> > > will set it to from 1 -> 0 representing the suspended state.
> > >
> > > Signed-off-by: Pranjal Shrivastava <praan@google.com>
> > > ---
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++++----
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 +
> > >  2 files changed, 57 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index 91edcd8922a7..3a8d3c2a8d69 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -805,7 +805,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >         u64 cmd_sync[CMDQ_ENT_DWORDS];
> > >         u32 prod;
> > >         unsigned long flags;
> > > -       bool owner;
> > > +       bool owner, has_ref = false;
> > >         struct arm_smmu_ll_queue llq, head;
> > >         int ret = 0;
> > >
> > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > >                 while (!queue_has_space(&llq, n + sync)) {
> > >                         local_irq_restore(flags);
> > > +
> > > +                       if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > > +                               /* Device is suspended, don't wait for space */
> > > +                               return 0;
> > > +
> > 
> > Does this mean that we still accumulate commands until the queue is
> > full while in the suspended state and then have SMMU excecute all of
> > these (now irrelevant) commands when the command queue is re-enabled?
> > One might argue that this is wasteful, but I guess it might be
> > technically correct, because it doesn't hurt either.
> > 
> 
> No, we drop any commands that are issued if the command queue doesn't
> have space while it is suspended. There are two reasons for this:
> 
> 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
> MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
> when the SMMU is suspended.
> 
> 2. We usually have 3 types of commands in the SMMUv3:
> 
>  a) Invalidation commands: We clear the entire TLB during resume, so we
>  don't really need to batch these.
> 
>  b) Prefetch commands: Any prefetches are effectively no-ops as when the
>  SMMU is suspended, the resume would effectively read the latest config
> 
>  c) Page responses: These are NOT expected to occur while the SMMU is
>  suspended and we shouldn't be batching them anyway. (The page resp
>  handlers handle such situations and avoid queueing-in commands anyway
>  with this series)
> 
> Thus, any dropped invalidation / prefetch commands don't harm us if the
> SMMU was suspended.
>  
> 
> > >                         if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
> > >                                 dev_err_ratelimited(smmu->dev, "CMDQ timeout\n");
> > > +
> > > +                       atomic_dec_return_release(&smmu->nr_cmdq_users);
> > >                         local_irq_save(flags);
> > >                 }
> > >
> > > @@ -879,10 +886,35 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >                 arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod);
> > >
> > >                 /*
> > > -                * d. Advance the hardware prod pointer
> > > +                * d. Advance the hardware prod pointer (if smmu is still active)
> > 
> > Consider "(if command queue is still enabled)". Otherwise, what does
> > "active" mean? SMMU and/or command queue enabled?
> > 
> 
> The suspend callback is implemented in a way where we can't have a
> "partial" suspend, i.e. there is no code path where the CMDQ is disabled
> but the SMMU is not. Thus, "active" was supposed to mean The SMMU is
> enabled entirely since there isn't a partial suspend/resume implemented.
> 
> Although, I can update the comment to say (if smmu is still enabled) OR
> (if smmu is NOT suspended).
> 
> > >                  * Control dependency ordering from the entries becoming valid.
> > >                  */
> > > -               writel_relaxed(prod, cmdq->q.prod_reg);
> > > +               if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > > +                       writel_relaxed(prod, cmdq->q.prod_reg);
> > > +
> > > +                       if (sync) {
> > > +                               has_ref = true;
> > > +                       } else {
> > > +                               /*
> > > +                                * Use release semantics to enforce ordering without a full barrier.
> > > +                                * This ensures the prior writel_relaxed() is ordered/visible
> > > +                                * before the refcount decrement, avoiding the heavy pipeline
> > > +                                * stall of a full wmb().
> > > +                                *
> > > +                                * We need the atomic_dec_return_release() below and the
> > > +                                * atomic_set_release() in step (e) below doesn't suffice.
> > > +                                *
> > > +                                * Specifically, without release semantics on the decrement,
> > > +                                * the CPU is free to reorder the independent atomic_dec_relaxed()
> > > +                                * before the writel_relaxed().
> > > +                                *
> > > +                                * If this happens, the refcount could drop to zero, allowing the PM
> > > +                                * suspend path (running on another CPU) to disable the SMMU before
> > > +                                * the register write completes, resulting in a bus fault.
> > > +                                */
> > > +                               atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > +                       }
> > > +               }
> > >
> > >                 /*
> > >                  * e. Tell the next owner we're done
> > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > >
> > >         /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > >         if (sync) {
> > > -               llq.prod = queue_inc_prod_n(&llq, n);
> > > -               ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > > -               if (ret) {
> > > -                       dev_err_ratelimited(smmu->dev,
> > > -                                           "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > > -                                           llq.prod,
> > > -                                           readl_relaxed(cmdq->q.prod_reg),
> > > -                                           readl_relaxed(cmdq->q.cons_reg));
> > > +
> > > +               /* If we are not the owner, check if we're suspended */
> > > +               if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > 
> > Can there be a situation, where we execute this branch and the
> > hardware prod pointer has not been updated, in which case
> > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> > it times out). Imagine the following series of events:
> >  * nr_cmdq_users==0
> >  * hardware prod pointer update is skipped
> >  * nr_cmdq_users becomes 1
> >  * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> > by SMMU because hardware prod pointer update wasn't updated
> >  * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> > stuck because the queue is full.
> >  * cmdq->q.llq.cons is never updated because the shared lock is held.
> > queue_has_space() returns false, and
> > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> > 
> 
> Thanks for catching this! IIUC, you mean a race between the resume and
> this code block can cause a deadlock? Something like the following:
> 
> CPU 0					CPU 1
> nr_cmdq_users == 0
> skip hw_prod++
> 					->resume cb
> 					nr_cmdq_user == 1
> nr_cmdq_users == 1
> in the if (sync) part
> arm_smmu_cmdq_poll_until_sync()
> 					CMDQ_OP_CFGI_ALL stuck for space
> 
> I guess we could add a check here if (has_ref == false) but
> nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
> need to factor out a helper function here.. I'll think more about this..
> 
> I guess the simplest thing would be to bail out much before, something
> like if (nr_cmdq_users == 0 && sync) return;
> 
> > > +                       has_ref = true;
> > > +                       llq.prod = queue_inc_prod_n(&llq, n);
> > > +                       ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > 
> > If you skip arm_smmu_cmdq_poll_until_sync() in the nr_cmdq_users==0
> > case, I'm thinking that llq.cons is not updated. If this is the case,
> > is the subsequent WRITE_ONCE(cmdq->q.llq.cons, llq.cons); potentially
> > incorrect?
> > 
> 
> The goal here was to "simulate" a successful command submission while 
> the smmu is suspended (as mentioned by Will earlier), we do everything
> except for touching MMIO here. 
> 
> Since the commands can be one of the three types mentioned above, we
> don't really miss out on anything with the faux-cmd submission here.
> 
> Thus, what if we update llq.prod always and set llq.cons == llq.prod
> condtionally if nr_cmdq_users == 0. 
> 
> Ideally, this should also prevent the `queue_full` scenario as all
> commands are faux-issued instantly. Maybe something like:
> 
> if (sync) {                                                               
>       llq.prod = queue_inc_prod_n(&llq, n);                               
>                                                                           
>       if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
>           if (!has_ref) {  
>               /* Repair: We were suspended earlier but now we're active.
>                * Kick the hardware to publish our commands. */
>               has_ref = true;
> 	      ... owner logic again ...
> 	          ...
>               writel_relaxed(llq.prod, cmdq->q.prod_reg);                 
>           }
>           ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> 
>            if (ret) {
>                // ...                                      
>            }                                                                   
>        } else {
>            /*
>             * Device is suspended. Fast-forward the consumer pointer in memory 
>             * to "drop" these commands and avoid a full queue deadlock.
> 	    * (n + 1 to include the SYNC command)
>             */        
>            llq.cons = queue_inc_prod_n(&llq, n + 1);
>        }                                               
>                                                                        
>        if (!arm_smmu_cmdq_shared_tryunlock(cmdq)) {
>            WRITE_ONCE(cmdq->q.llq.cons, llq.cons);
>            arm_smmu_cmdq_shared_unlock(cmdq);
>        }
>    }
> 
> This could still be racy.. I'll give it another thought..
> 

Still thinking out loud, but maybe we can have something like:

if (owner) {
	....

	if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
		writel_relaxes(llq.prod, cmdq->q.prod_reg);
		has_ref = true;
	} else {

		/* SMMU is suspended, fake a consumption */
		WRITE_ONCE(cmdq->q.llq.cons, prod);

		/* Avoid racing with resume */
		sync = false;
	}

	atomic_set_release(&cmdq->owner_prod, prod);
}

if (sync) {

  ... same stuff ..

}

LMK, what you think?

[ snip ]

Thanks,
Praan

  reply	other threads:[~2026-03-09 21:13 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
2026-03-08 21:23   ` Daniel Mentz
2026-03-09 19:46     ` Pranjal Shrivastava
2026-03-09 21:13       ` Pranjal Shrivastava [this message]
2026-03-09 22:56         ` Daniel Mentz
2026-03-10 16:46           ` Pranjal Shrivastava
2026-03-09 22:41       ` Daniel Mentz
2026-03-10 16:54         ` Pranjal Shrivastava
2026-03-11  4:43           ` Daniel Mentz
2026-03-11 13:53             ` Pranjal Shrivastava
2026-03-11 22:56               ` Daniel Mentz
2026-03-16 15:36                 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-03-10  1:45   ` Daniel Mentz
2026-03-10 16:58     ` Pranjal Shrivastava
2026-03-10 21:16       ` Daniel Mentz
2026-03-10 21:40         ` Pranjal Shrivastava
2026-03-11  0:12           ` Daniel Mentz
2026-03-11  5:31             ` Pranjal Shrivastava
2026-03-11 17:26               ` Daniel Mentz
2026-03-16 15:05                 ` Pranjal Shrivastava
2026-03-12 19:20   ` Daniel Mentz
2026-01-26 15:11 ` [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
2026-02-11 13:33   ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa839kz29GIgg_W3@google.com \
    --to=praan@google.com \
    --cc=amhetre@nvidia.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=sarunkod@amd.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox