Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq

public inbox for iommu@lists.linux-foundation.org
 help / color / mirror / Atom feed

From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
	Joerg Roedel <joro@8bytes.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Mostafa Saleh <smostafa@google.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Ashish Mhetre <amhetre@nvidia.com>,
	Sairaj Kodilkar <sarunkod@amd.com>
Subject: Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
Date: Tue, 10 Mar 2026 16:54:35 +0000	[thread overview]
Message-ID: <abBMy5MgrrOlofDF@google.com> (raw)
In-Reply-To: <CAE2F3rBFRSTC7bFCYTCit1-OKrT3jB-fwYGu_UJbwX+7aE65eA@mail.gmail.com>

On Mon, Mar 09, 2026 at 03:41:16PM -0700, Daniel Mentz wrote:
> On Mon, Mar 9, 2026 at 12:46 PM Pranjal Shrivastava <praan@google.com> wrote:
> > > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > >
> > > >                 while (!queue_has_space(&llq, n + sync)) {
> > > >                         local_irq_restore(flags);
> > > > +
> > > > +                       if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > > > +                               /* Device is suspended, don't wait for space */
> > > > +                               return 0;
> > > > +
> > >
> > > Does this mean that we still accumulate commands until the queue is
> > > full while in the suspended state and then have SMMU excecute all of
> > > these (now irrelevant) commands when the command queue is re-enabled?
> > > One might argue that this is wasteful, but I guess it might be
> > > technically correct, because it doesn't hurt either.
> > >
> >
> > No, we drop any commands that are issued if the command queue doesn't
> > have space while it is suspended. There are two reasons for this:
> >
> > 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
> > MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
> > when the SMMU is suspended.
> >
> > 2. We usually have 3 types of commands in the SMMUv3:
> >
> >  a) Invalidation commands: We clear the entire TLB during resume, so we
> >  don't really need to batch these.
> >
> >  b) Prefetch commands: Any prefetches are effectively no-ops as when the
> >  SMMU is suspended, the resume would effectively read the latest config
> >
> >  c) Page responses: These are NOT expected to occur while the SMMU is
> >  suspended and we shouldn't be batching them anyway. (The page resp
> >  handlers handle such situations and avoid queueing-in commands anyway
> >  with this series)
> >
> > Thus, any dropped invalidation / prefetch commands don't harm us if the
> > SMMU was suspended.
> 
> Understood. My point is that we start dropping commands only when the
> command queue fills up, even though we could potentially drop commands
> even earlier.
> 

Hmm.. but how early? Since we also need the bit where we let the next
owner know that the current owner was done..

> > > >                  * Control dependency ordering from the entries becoming valid.
> > > >                  */
> > > > -               writel_relaxed(prod, cmdq->q.prod_reg);
> > > > +               if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > > > +                       writel_relaxed(prod, cmdq->q.prod_reg);
> > > > +
> > > > +                       if (sync) {
> > > > +                               has_ref = true;
> > > > +                       } else {
> > > > +                               /*
> > > > +                                * Use release semantics to enforce ordering without a full barrier.
> > > > +                                * This ensures the prior writel_relaxed() is ordered/visible
> > > > +                                * before the refcount decrement, avoiding the heavy pipeline
> > > > +                                * stall of a full wmb().
> > > > +                                *
> > > > +                                * We need the atomic_dec_return_release() below and the
> > > > +                                * atomic_set_release() in step (e) below doesn't suffice.
> > > > +                                *
> > > > +                                * Specifically, without release semantics on the decrement,
> > > > +                                * the CPU is free to reorder the independent atomic_dec_relaxed()
> > > > +                                * before the writel_relaxed().
> > > > +                                *
> > > > +                                * If this happens, the refcount could drop to zero, allowing the PM
> > > > +                                * suspend path (running on another CPU) to disable the SMMU before
> > > > +                                * the register write completes, resulting in a bus fault.
> > > > +                                */
> > > > +                               atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > > +                       }
> > > > +               }
> > > >
> > > >                 /*
> > > >                  * e. Tell the next owner we're done
> > > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > >
> > > >         /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > > >         if (sync) {
> > > > -               llq.prod = queue_inc_prod_n(&llq, n);
> > > > -               ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > > > -               if (ret) {
> > > > -                       dev_err_ratelimited(smmu->dev,
> > > > -                                           "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > > > -                                           llq.prod,
> > > > -                                           readl_relaxed(cmdq->q.prod_reg),
> > > > -                                           readl_relaxed(cmdq->q.cons_reg));
> > > > +
> > > > +               /* If we are not the owner, check if we're suspended */
> > > > +               if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > >
> > > Can there be a situation, where we execute this branch and the
> > > hardware prod pointer has not been updated, in which case
> > > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> > > it times out). Imagine the following series of events:
> > >  * nr_cmdq_users==0
> > >  * hardware prod pointer update is skipped
> > >  * nr_cmdq_users becomes 1
> > >  * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> > > by SMMU because hardware prod pointer update wasn't updated
> > >  * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> > > stuck because the queue is full.
> > >  * cmdq->q.llq.cons is never updated because the shared lock is held.
> > > queue_has_space() returns false, and
> > > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> > >
> >
> > Thanks for catching this! IIUC, you mean a race between the resume and
> > this code block can cause a deadlock? Something like the following:
> >
> > CPU 0                                   CPU 1
> > nr_cmdq_users == 0
> > skip hw_prod++
> >                                         ->resume cb
> >                                         nr_cmdq_user == 1
> > nr_cmdq_users == 1
> > in the if (sync) part
> > arm_smmu_cmdq_poll_until_sync()
> >                                         CMDQ_OP_CFGI_ALL stuck for space
> >
> > I guess we could add a check here if (has_ref == false) but
> > nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
> > need to factor out a helper function here.. I'll think more about this..
> >
> > I guess the simplest thing would be to bail out much before, something
> > like if (nr_cmdq_users == 0 && sync) return;
> 
> On a related note: Can you guarantee that with your current approach,
> nr_cmdq_users drops to 1 even with a high rate of calls to
> arm_smmu_cmdq_issue_cmdlist() from multiple CPUs or will your "/* Try
> to suspend the device, wait for in-flight submissions */" eventually
> time out?
> 
> Have you considered an alternative approach where you put
> 
> if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> return 0;
> 
> at the beginning of arm_smmu_cmdq_issue_cmdlist() and an unconditional
> atomic_dec_return_release(&smmu->nr_cmdq_users); at the end? I think
> that would avoid this potential deadlock situation.
> 

Right that's what I meant with "bail out much before", I'll try to think
if that would work always and if there's a situation where it may fail?

> >
> > > I'm still trying to get my head around this algorithm, but accorrding
> > > to my current understanding, the following rule applies:
> > >
> > > Only the cmdq owner may update the hardware prod pointer, and only
> > > after arm_smmu_cmdq_poll_valid_map() returned for the relevant range.
> > > Does the following instruction in arm_smmu_device_reset() violate this
> > > rule?
> > >
> > > writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
> > >
> >
> > Not really as we don't enable the command queue by that point.
> > (but it's still racy as you've mentioned in the queue_full scenario
> > above)
> 
> You're enabling the command queue right after writing to ARM_SMMU_CMDQ_PROD
> 
> /* Command queue */
> writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
> writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
> writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS);
> 
> enables = CR0_CMDQEN;
> ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
>       ARM_SMMU_CR0ACK);
> 
> My point is that writing smmu->cmdq.q.llq.prod to ARM_SMMU_CMDQ_PROD
> is premature. This might lead to the SMMU reading from command slots
> that haven't been fully populated.
> 

Well.. if the smmu->cmdq.q.llq.cons was also updated with the "fake"
consumption logic, then it wouldn't right? 

I agree that the current implementation might have missed a case but
otherwise, this should be fine if both llq.prod & llq.cons are updated
in a balanced manner during the suspend.

> Daniel

Thanks,
Praan

next prev parent reply	other threads:[~2026-03-10 16:54 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
2026-03-08 21:23   ` Daniel Mentz
2026-03-09 19:46     ` Pranjal Shrivastava
2026-03-09 21:13       ` Pranjal Shrivastava
2026-03-09 22:56         ` Daniel Mentz
2026-03-10 16:46           ` Pranjal Shrivastava
2026-03-09 22:41       ` Daniel Mentz
2026-03-10 16:54         ` Pranjal Shrivastava [this message]
2026-03-11  4:43           ` Daniel Mentz
2026-03-11 13:53             ` Pranjal Shrivastava
2026-03-11 22:56               ` Daniel Mentz
2026-03-16 15:36                 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-03-10  1:45   ` Daniel Mentz
2026-03-10 16:58     ` Pranjal Shrivastava
2026-03-10 21:16       ` Daniel Mentz
2026-03-10 21:40         ` Pranjal Shrivastava
2026-03-11  0:12           ` Daniel Mentz
2026-03-11  5:31             ` Pranjal Shrivastava
2026-03-11 17:26               ` Daniel Mentz
2026-03-16 15:05                 ` Pranjal Shrivastava
2026-03-12 19:20   ` Daniel Mentz
2026-01-26 15:11 ` [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
2026-02-11 13:33   ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abBMy5MgrrOlofDF@google.com \
    --to=praan@google.com \
    --cc=amhetre@nvidia.com \
    --cc=danielmentz@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@ziepe.ca \
    --cc=joro@8bytes.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=sarunkod@amd.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox