From: Pranjal Shrivastava <praan@google.com>
To: Daniel Mentz <danielmentz@google.com>
Cc: iommu@lists.linux.dev, Will Deacon <will@kernel.org>,
Joerg Roedel <joro@8bytes.org>,
Robin Murphy <robin.murphy@arm.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Mostafa Saleh <smostafa@google.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Ashish Mhetre <amhetre@nvidia.com>,
Sairaj Kodilkar <sarunkod@amd.com>
Subject: Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq
Date: Mon, 16 Mar 2026 15:36:10 +0000 [thread overview]
Message-ID: <abgjagJWs19zqL3J@google.com> (raw)
In-Reply-To: <CAE2F3rDh4c4KzEqapu-qQw9MWZ3gqPkpawpnToEmnwMXZn1Qpw@mail.gmail.com>
On Wed, Mar 11, 2026 at 03:56:55PM -0700, Daniel Mentz wrote:
> On Wed, Mar 11, 2026 at 6:53 AM Pranjal Shrivastava <praan@google.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 09:43:05PM -0700, Daniel Mentz wrote:
> > > Hi Pranjal,
> > >
> > > I have thought about this further and would like to propose an
> > > alternative design.
> > >
> > > A key difference in this design is that no commands are written to the
> > > command queue, and the producer index is not updated while the SMMU is
> > > suspended.
> > >
> > > Similar to the existing CMDQ_PROD_OWNED_FLAG, we could define a new
> > > flag, CMDQ_PROD_STOP_FLAG, stored in cmdq->q.llq.prod.
> > >
> > > In arm_smmu_cmdq_issue_cmdlist():
> > > Perform "old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val);".
> > > If CMDQ_PROD_STOP_FLAG is set in the variable named old, return from
> > > arm_smmu_cmdq_issue_cmdlist() immediately.
> > >
> > > In the Suspend handler:
> > >
> > > 1. Clear SMMU_CR0.SMMUEN.
> > > 2. Set CMDQ_PROD_STOP_FLAG in cmdq->q.llq.prod.
> > > 3. Wait until CMDQ_PROD_OWNED_FLAG is cleared from cmdq->q.llq.prod.
> > > 4. Wait until cmdq->owner_prod matches cmdq->q.llq.prod (excluding the
> > > CMDQ_PROD_STOP_FLAG).
> > > 5. Wait until cmdq->lock is 0.
> > > 6. Clear SMMU_CR0.
> > >
> > > In the Resume handler:
> > > Follow the same logic as your patch, but clear CMDQ_PROD_STOP_FLAG in
> > > cmdq->q.llq.prod instead of setting nr_cmdq_users to 1.
> > >
> > > One benefit of this proposal is that it avoids adding a new atomic
> > > variable that multiple CPU threads might compete for. Instead, we
> > > reuse existing atomic operations, which should reduce the performance
> > > impact for server use cases where many CPU threads compete for command
> > > queue access.
> > >
> >
> > I like the idea/direction! Although, there could be a few (solvable)
> > challenges, for example:
> >
> > 1. Abandoned Command Batch
> > In the SMMUv3 driver, cmdq->q.llq.val is the shared global state in
> > RAM. The cmpxchg is the "Point of Commitment." Consider the following
> > sequence of events:
> >
> > a. Assume STOP_FLAG is already set. Global prod is 10 | STOP_FLAG.
> > b. CPU-A calls issue_cmdlist. It reads the global state:
> > llq.val = 10 | STOP_FLAG.
> > c. CPU-A calculates its new target:
> > head.prod = 14 | STOP_FLAG | OWNED_FLAG.
>
> head.prod is calculated like so, and queue_inc_prod_n() will clear the
> CMDQ_PROD_STOP_FLAG.
>
> head.prod = queue_inc_prod_n(&llq, n + sync) |
> CMDQ_PROD_OWNED_FLAG;
>
> You're bringing up a good point, though. This would clear the
> CMDQ_PROD_STOP_FLAG which we don't want.
>
> > d. CPU-A executes: old = cmpxchg(&cmdq->q.llq.val, llq.val, head.val)
> > The swap succeeds because the global state was exactly llq.val.
> > Global prod is now 14. We have just told the entire system that
> > there are 14 commands.
> > e. CPU-A now checks the old variable. It sees STOP_FLAG is set and
> > returns immediately.
> >
> > CPU-A has "reserved" the indices 10, 11, 12, and 13 in the global
> > producer pointer, but it never wrote the commands to those memory slots.
> > Indices 10–13 now contain garbage/stale data. When the SMMU resumes and
> > processes up to index 14, it will execute that garbage cmd leading to a
> > fault (CERROR_ILL).
> >
> > 2. The OWNED_FLAG Deadlock
> >
> > a. The CPU who finds OWNED_FLAG is 0 in the cmpxchg becomes the owner
> > Its job is to gather all concurrent work and update the hardware.
> > b. Any subsequent CPUs that find OWNED_FLAG is 1 in the cmpxchg. They
> > write their commands to RAM and then just wait. They rely on the
> > Owner to "kick" the HW for them & to clear the flag for the next
> > batch.
> >
> > The Deadlock Scenario:
> > a. Suspend Handler: Sets the STOP_FLAG in the global cmdq->q.llq.val.
> > b. CPU-A (The Owner): Executes cmpxchg. It moves the global prod from
> > 10 to 14. Because it was the first in this batch, it does not set
> > the OWNED_FLAG.
>
> Everybody sets the OWNED_FLAG in
>
> head.prod = queue_inc_prod_n(&llq, n + sync) |
> CMDQ_PROD_OWNED_FLAG;
>
> The OWNED_FLAG is always set when any CPU breaks out of that first
> do/while loop in arm_smmu_cmdq_issue_cmdlist.
>
> > c. CPU-B (The Follower): Executes cmpxchg. It moves the global prod
> > from 14 to 18. Because it is joining CPU-A's batch, its cmpxchg
> > sets the OWNED_FLAG in the global state.
> > d. Now, CPU-A checks the old value, sees STOP_FLAG, and returns.
> > CPU-B checks the old value, sees STOP_FLAG, and returns.
> > e. The Consequence: The global state cmdq->q.llq.prod now has the
> > OWNED_FLAG set.
> > f. Suspend Handler: Calls Wait until CMDQ_PROD_OWNED_FLAG is cleared
> > g. Since the Owner (CPU-A) should clear that flag (it does so at the
> > very end of the function). The Suspend Handler will spin on this
> > flag forever.
> >
> > This can be solved by checking the flag before setting the OWNED_FLAG.
>
> Good point. We should check llq.prod for CMDQ_PROD_STOP_FLAG at the
> beginning of the do/while loop in arm_smmu_cmdq_issue_cmdlist and
> return early if it's set. The function arm_smmu_cmdq_issue_cmdlist
> should never clear CMDQ_PROD_STOP_FLAG.
>
> > While the atomic counter (nr_cmdq_users) used in v5 adds an tomic, I
> > believe that it provides a cleaner "gate" that prevents CPUs from ever
> > entering the "Commitment" phase once a suspend is imminent (with fixes).
> >
> > However, I agree that the atomic cntr can cause pref drops on servers.
>
> I think another benefit of my approach is that it avoids the problem
> where a high rate of calls to arm_smmu_cmdq_issue_cmdlist can starve
> the suspend handler. I'm afraid that with the nr_cmdq_users approach,
> we might end up in situations where the suspend handler times out
> because nr_cmdq_users never drops to 1.
>
I agree.
> > I'm okay to adopt Daniel's direction (it only impacts 2 patches) & post
> > a v6. However, given we've been working through this for a year now, I'd
> > want to make sure we have a solid consensus on the design from everyone.
> > I felt with v4 that we have consensus on the design but it still seems
> > to be evolving. I’m ready to post v6 once we settle on the path forward.
>
> I think the fastest way forward is to post an updated patch right
> away. We can then use that as a baseline for discussions.
Agreed. I'll think through this CMDQ_STOP_FLAG approach (addressing the
issues I've discussed above) and post a new version soon.
Thanks
Praan
next prev parent reply other threads:[~2026-03-16 15:36 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-26 15:11 [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 01/10] iommu/arm-smmu-v3: Refactor arm_smmu_setup_irqs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 02/10] iommu/arm-smmu-v3: Add a helper to drain cmd queues Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 03/10] iommu/tegra241-cmdqv: Add a helper to drain VCMDQs Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 04/10] iommu/tegra241-cmdqv: Restore PROD and CONS after resume Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 05/10] iommu/arm-smmu-v3: Cache and restore MSI config Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Pranjal Shrivastava
2026-03-08 21:23 ` Daniel Mentz
2026-03-09 19:46 ` Pranjal Shrivastava
2026-03-09 21:13 ` Pranjal Shrivastava
2026-03-09 22:56 ` Daniel Mentz
2026-03-10 16:46 ` Pranjal Shrivastava
2026-03-09 22:41 ` Daniel Mentz
2026-03-10 16:54 ` Pranjal Shrivastava
2026-03-11 4:43 ` Daniel Mentz
2026-03-11 13:53 ` Pranjal Shrivastava
2026-03-11 22:56 ` Daniel Mentz
2026-03-16 15:36 ` Pranjal Shrivastava [this message]
2026-01-26 15:11 ` [PATCH v5 07/10] iommu/arm-smmu-v3: Implement pm_runtime & system sleep ops Pranjal Shrivastava
2026-03-10 1:45 ` Daniel Mentz
2026-03-10 16:58 ` Pranjal Shrivastava
2026-03-10 21:16 ` Daniel Mentz
2026-03-10 21:40 ` Pranjal Shrivastava
2026-03-11 0:12 ` Daniel Mentz
2026-03-11 5:31 ` Pranjal Shrivastava
2026-03-11 17:26 ` Daniel Mentz
2026-03-16 15:05 ` Pranjal Shrivastava
2026-03-12 19:20 ` Daniel Mentz
2026-01-26 15:11 ` [PATCH v5 08/10] iommu/arm-smmu-v3: Handle gerror during suspend Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 09/10] iommu/arm-smmu-v3: Enable pm_runtime and setup devlinks Pranjal Shrivastava
2026-01-26 15:11 ` [PATCH v5 10/10] iommu/arm-smmu-v3: Invoke pm_runtime before hw access Pranjal Shrivastava
2026-02-06 15:48 ` [PATCH v5 00/10] iommu/arm-smmu-v3: Implement Runtime/System Sleep ops Ashish Mhetre
2026-02-11 13:33 ` Pranjal Shrivastava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abgjagJWs19zqL3J@google.com \
--to=praan@google.com \
--cc=amhetre@nvidia.com \
--cc=danielmentz@google.com \
--cc=iommu@lists.linux.dev \
--cc=jgg@ziepe.ca \
--cc=joro@8bytes.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=sarunkod@amd.com \
--cc=smostafa@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox