From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
Joerg Roedel <joro@8bytes.org>,
Robin Murphy <robin.murphy@arm.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Mostafa Saleh <smostafa@google.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Ashish Mhetre <amhetre@nvidia.com>,
Sairaj Kodilkar <sarunkod@amd.com>
Subject: Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
Date: Tue, 10 Mar 2026 16:54:35 +0000 [thread overview]
Message-ID: <abBMy5MgrrOlofDF@google.com> (raw)
In-Reply-To: <CAE2F3rBFRSTC7bFCYTCit1-OKrT3jB-fwYGu_UJbwX+7aE65eA@mail.gmail.com>
On Mon, Mar 09, 2026 at 03:41:16PM -0700, Daniel Mentz wrote:
> On Mon, Mar 9, 2026 at 12:46 PM Pranjal Shrivastava <praan@google.com> wrote:
> > > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > >
> > > > while (!queue_has_space(&llq, n + sync)) {
> > > > local_irq_restore(flags);
> > > > +
> > > > + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> > > > + /* Device is suspended, don't wait for space */
> > > > + return 0;
> > > > +
> > >
> > > Does this mean that we still accumulate commands until the queue is
> > > full while in the suspended state and then have SMMU excecute all of
> > > these (now irrelevant) commands when the command queue is re-enabled?
> > > One might argue that this is wasteful, but I guess it might be
> > > technically correct, because it doesn't hurt either.
> > >
> >
> > No, we drop any commands that are issued if the command queue doesn't
> > have space while it is suspended. There are two reasons for this:
> >
> > 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an
> > MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed
> > when the SMMU is suspended.
> >
> > 2. We usually have 3 types of commands in the SMMUv3:
> >
> > a) Invalidation commands: We clear the entire TLB during resume, so we
> > don't really need to batch these.
> >
> > b) Prefetch commands: Any prefetches are effectively no-ops as when the
> > SMMU is suspended, the resume would effectively read the latest config
> >
> > c) Page responses: These are NOT expected to occur while the SMMU is
> > suspended and we shouldn't be batching them anyway. (The page resp
> > handlers handle such situations and avoid queueing-in commands anyway
> > with this series)
> >
> > Thus, any dropped invalidation / prefetch commands don't harm us if the
> > SMMU was suspended.
>
> Understood. My point is that we start dropping commands only when the
> command queue fills up, even though we could potentially drop commands
> even earlier.
>
Hmm.. but how early? Since we also need the bit where we let the next
owner know that the current owner was done..
> > > > * Control dependency ordering from the entries becoming valid.
> > > > */
> > > > - writel_relaxed(prod, cmdq->q.prod_reg);
> > > > + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > > > + writel_relaxed(prod, cmdq->q.prod_reg);
> > > > +
> > > > + if (sync) {
> > > > + has_ref = true;
> > > > + } else {
> > > > + /*
> > > > + * Use release semantics to enforce ordering without a full barrier.
> > > > + * This ensures the prior writel_relaxed() is ordered/visible
> > > > + * before the refcount decrement, avoiding the heavy pipeline
> > > > + * stall of a full wmb().
> > > > + *
> > > > + * We need the atomic_dec_return_release() below and the
> > > > + * atomic_set_release() in step (e) below doesn't suffice.
> > > > + *
> > > > + * Specifically, without release semantics on the decrement,
> > > > + * the CPU is free to reorder the independent atomic_dec_relaxed()
> > > > + * before the writel_relaxed().
> > > > + *
> > > > + * If this happens, the refcount could drop to zero, allowing the PM
> > > > + * suspend path (running on another CPU) to disable the SMMU before
> > > > + * the register write completes, resulting in a bus fault.
> > > > + */
> > > > + atomic_dec_return_release(&smmu->nr_cmdq_users);
> > > > + }
> > > > + }
> > > >
> > > > /*
> > > > * e. Tell the next owner we're done
> > > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
> > > >
> > > > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
> > > > if (sync) {
> > > > - llq.prod = queue_inc_prod_n(&llq, n);
> > > > - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
> > > > - if (ret) {
> > > > - dev_err_ratelimited(smmu->dev,
> > > > - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n",
> > > > - llq.prod,
> > > > - readl_relaxed(cmdq->q.prod_reg),
> > > > - readl_relaxed(cmdq->q.cons_reg));
> > > > +
> > > > + /* If we are not the owner, check if we're suspended */
> > > > + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) {
> > >
> > > Can there be a situation, where we execute this branch and the
> > > hardware prod pointer has not been updated, in which case
> > > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until
> > > it times out). Imagine the following series of events:
> > > * nr_cmdq_users==0
> > > * hardware prod pointer update is skipped
> > > * nr_cmdq_users becomes 1
> > > * arm_smmu_cmdq_poll_until_sync waits but commands are not executed
> > > by SMMU because hardware prod pointer update wasn't updated
> > > * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets
> > > stuck because the queue is full.
> > > * cmdq->q.llq.cons is never updated because the shared lock is held.
> > > queue_has_space() returns false, and
> > > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock
> > >
> >
> > Thanks for catching this! IIUC, you mean a race between the resume and
> > this code block can cause a deadlock? Something like the following:
> >
> > CPU 0 CPU 1
> > nr_cmdq_users == 0
> > skip hw_prod++
> > ->resume cb
> > nr_cmdq_user == 1
> > nr_cmdq_users == 1
> > in the if (sync) part
> > arm_smmu_cmdq_poll_until_sync()
> > CMDQ_OP_CFGI_ALL stuck for space
> >
> > I guess we could add a check here if (has_ref == false) but
> > nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd
> > need to factor out a helper function here.. I'll think more about this..
> >
> > I guess the simplest thing would be to bail out much before, something
> > like if (nr_cmdq_users == 0 && sync) return;
>
> On a related note: Can you guarantee that with your current approach,
> nr_cmdq_users drops to 1 even with a high rate of calls to
> arm_smmu_cmdq_issue_cmdlist() from multiple CPUs or will your "/* Try
> to suspend the device, wait for in-flight submissions */" eventually
> time out?
>
> Have you considered an alternative approach where you put
>
> if (!atomic_inc_not_zero(&smmu->nr_cmdq_users))
> return 0;
>
> at the beginning of arm_smmu_cmdq_issue_cmdlist() and an unconditional
> atomic_dec_return_release(&smmu->nr_cmdq_users); at the end? I think
> that would avoid this potential deadlock situation.
>
Right that's what I meant with "bail out much before", I'll try to think
if that would work always and if there's a situation where it may fail?
> >
> > > I'm still trying to get my head around this algorithm, but accorrding
> > > to my current understanding, the following rule applies:
> > >
> > > Only the cmdq owner may update the hardware prod pointer, and only
> > > after arm_smmu_cmdq_poll_valid_map() returned for the relevant range.
> > > Does the following instruction in arm_smmu_device_reset() violate this
> > > rule?
> > >
> > > writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
> > >
> >
> > Not really as we don't enable the command queue by that point.
> > (but it's still racy as you've mentioned in the queue_full scenario
> > above)
>
> You're enabling the command queue right after writing to ARM_SMMU_CMDQ_PROD
>
> /* Command queue */
> writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE);
> writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD);
> writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS);
>
> enables = CR0_CMDQEN;
> ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0,
> ARM_SMMU_CR0ACK);
>
> My point is that writing smmu->cmdq.q.llq.prod to ARM_SMMU_CMDQ_PROD
> is premature. This might lead to the SMMU reading from command slots
> that haven't been fully populated.
>
Well.. if the smmu->cmdq.q.llq.cons was also updated with the "fake"
consumption logic, then it wouldn't right?
I agree that the current implementation might have missed a case but
otherwise, this should be fine if both llq.prod & llq.cons are updated
in a balanced manner during the suspend.
> Daniel
Thanks,
Praan
next prev parent reply other threads:[~2026-03-10 16:54 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
2026-03-08 21:23 ` Daniel Mentz
2026-03-09 19:46 ` Pranjal Shrivastava
2026-03-09 21:13 ` Pranjal Shrivastava
2026-03-09 22:56 ` Daniel Mentz
2026-03-10 16:46 ` Pranjal Shrivastava
2026-03-09 22:41 ` Daniel Mentz
2026-03-10 16:54 ` Pranjal Shrivastava [this message]
2026-03-11 4:43 ` Daniel Mentz
2026-03-11 13:53 ` Pranjal Shrivastava
2026-03-11 22:56 ` Daniel Mentz
2026-03-16 15:36 ` Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-03-10 1:45 ` Daniel Mentz
2026-03-10 16:58 ` Pranjal Shrivastava
2026-03-10 21:16 ` Daniel Mentz
2026-03-10 21:40 ` Pranjal Shrivastava
2026-03-11 0:12 ` Daniel Mentz
2026-03-11 5:31 ` Pranjal Shrivastava
2026-03-11 17:26 ` Daniel Mentz
2026-03-16 15:05 ` Pranjal Shrivastava
2026-03-12 19:20 ` Daniel Mentz
2026-01-26 15:11 ` [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
2026-02-11 13:33 ` Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abBMy5MgrrOlofDF@google.com \
--to=praan@google.com \
--cc=amhetre@nvidia.com \
--cc=danielmentz@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=sarunkod@amd.com \
--cc=smostafa@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.