From: Jacob Pan <jacob.pan@linux.microsoft.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>,
linux-kernel@vger.kernel.org,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
Joerg Roedel <joro@8bytes.org>,
Mostafa Saleh <smostafa@google.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Nicolin Chen <nicolinc@nvidia.com>,
Zhang Yu <zhangyu1@linux.microsoft.com>,
Jean Philippe-Brucker <jean-philippe@linaro.org>,
Alexander Grest <Alexander.Grest@microsoft.com>
Subject: Re: [PATCH v4 1/2] iommu/arm-smmu-v3: Fix CMDQ timeout warning
Date: Mon, 1 Dec 2025 13:49:14 -0800 [thread overview]
Message-ID: <20251201134914.00003f41@linux.microsoft.com> (raw)
In-Reply-To: <4c347cb4-bb82-4209-82fc-59088cc19bc2@arm.com>
Hi Robin,
On Mon, 1 Dec 2025 19:57:31 +0000
Robin Murphy <robin.murphy@arm.com> wrote:
> On 2025-11-30 11:06 pm, Jacob Pan wrote:
> > Hi Will,
> >
> > On Tue, 25 Nov 2025 17:19:16 +0000
> > Will Deacon <will@kernel.org> wrote:
> >
> >> On Fri, Nov 14, 2025 at 09:17:17AM -0800, Jacob Pan wrote:
> >>> While polling for n spaces in the cmdq, the current code instead
> >>> checks if the queue is full. If the queue is almost full but not
> >>> enough space (<n), then the CMDQ timeout warning is never
> >>> triggered even if the polling has exceeded timeout limit.
> >>>
> >>> The existing arm_smmu_cmdq_poll_until_not_full() doesn't fit
> >>> efficiently nor ideally to the only caller
> >>> arm_smmu_cmdq_issue_cmdlist():
> >>> - It uses a new timer at every single call, which fails to limit
> >>> to the preset ARM_SMMU_POLL_TIMEOUT_US per issue.
> >>> - It has a redundant internal queue_full(), which doesn't detect
> >>> whether there is a enough space for number of n commands.
> >>>
> >>> This patch polls for the availability of exact space instead of
> >>> full and emit timeout warning accordingly.
> >>>
> >>> Fixes: 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention during
> >>> command-queue insertion") Co-developed-by: Yu Zhang
> >>> <zhangyu1@linux.microsoft.com> Signed-off-by: Yu Zhang
> >>> <zhangyu1@linux.microsoft.com> Signed-off-by: Jacob Pan
> >>> <jacob.pan@linux.microsoft.com>
> >>
> >> I'm assuming you're seeing problems with an emulated command queue?
> >> Any chance you could make that bigger?
> >>
> > This is not related to queue size, but rather a logic issue when
> > anytime queue is nearly full.
>
> It is absolutely related to queue size, because there are only two
> reasons why this warning should ever be seen:
>
> 1: The SMMU is in some unexpected error state and has stopped
> consuming commands altogether.
> 2: The queue is far too small to do its job of buffering commands for
> the number of present CPUs.
I agree that smaller queue size or slow emulation makes this problem
more visible, in this sense they are related.
> >>> @@ -804,12 +794,13 @@ int arm_smmu_cmdq_issue_cmdlist(struct
> >>> arm_smmu_device *smmu, local_irq_save(flags);
> >>> llq.val = READ_ONCE(cmdq->q.llq.val);
> >>> do {
> >>> + struct arm_smmu_queue_poll qp;
> >>> u64 old;
> >>>
> >>> + queue_poll_init(smmu, &qp);
> >>> while (!queue_has_space(&llq, n + sync)) {
> >>> local_irq_restore(flags);
> >>> - if
> >>> (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
> >>> - dev_err_ratelimited(smmu->dev,
> >>> "CMDQ timeout\n");
> >>> + arm_smmu_cmdq_poll(smmu, cmdq, &llq,
> >>> &qp);
> >>
> >> Isn't this broken for wfe-based polling? The SMMU only generates
> >> the wake-up event when the queue becomes non-full.
> > I don't see this is a problem since any interrupts such as scheduler
> > tick can be a break evnt for WFE, no?
>
> No, we cannot assume that any other WFE wakeup event is *guaranteed*.
> It's certainly possible to get stuck in WFE on a NOHZ_FULL kernel with
> the arch timer event stream disabled - I recall proving that back
> when I hoped I might not have to bother upstreaming a workaround for
> MMU-600 erratum #1076982.
>
Make sense, thanks for sharing this history.
> Yes, a large part of the reason we enable the event stream by default
> is to help mitigate errata and software bugs which lead to missed
> events, but that still doesn't mean we should consciously abuse it. I
> guess something like the diff below (completely untested) would be a
> bit closer to correct (basically, allow WFE when the queue is
> completely full, but suppress it if we're then still waiting for more
> space in a non-full queue).
>
The diff below looks good to me, I can give it a try.
But at the same time, since queue full is supposed to be an exceptional
scenario, I wonder if we can just disable WFE all together in this case.
Power saving may not matter here?
> Thanks,
> Robin.
>
> ----->8-----
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index
> a0c6d87f85a1..206c6c6860dd 100644 ---
> a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -123,7 +123,7 @@
> static void parse_driver_options(struct arm_smmu_device *smmu) }
>
> /* Low-level queue manipulation functions */
> -static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
> +static int queue_space(struct arm_smmu_ll_queue *q)
> {
> u32 space, prod, cons;
>
> @@ -135,7 +135,7 @@ static bool queue_has_space(struct
> arm_smmu_ll_queue *q, u32 n) else
> space = cons - prod;
>
> - return space >= n;
> + return space;
> }
>
> static bool queue_empty(struct arm_smmu_ll_queue *q)
> @@ -796,9 +796,11 @@ int arm_smmu_cmdq_issue_cmdlist(struct
> arm_smmu_device *smmu, do {
> struct arm_smmu_queue_poll qp;
> u64 old;
> + int space;
>
> queue_poll_init(smmu, &qp);
> - while (!queue_has_space(&llq, n + sync)) {
> + while (space = queue_space(&llq), space < n + sync) {
> + qp.wfe &= !space;
> local_irq_restore(flags);
> arm_smmu_cmdq_poll(smmu, cmdq, &llq, &qp);
> local_irq_save(flags);
next prev parent reply other threads:[~2025-12-01 21:49 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-14 17:17 [PATCH v4 0/2] SMMU v3 CMDQ fix and improvement Jacob Pan
2025-11-14 17:17 ` [PATCH v4 1/2] iommu/arm-smmu-v3: Fix CMDQ timeout warning Jacob Pan
2025-11-14 18:29 ` Nicolin Chen
2025-11-25 17:19 ` Will Deacon
2025-11-30 23:06 ` Jacob Pan
2025-12-01 19:57 ` Robin Murphy
2025-12-01 21:49 ` Jacob Pan [this message]
2025-12-01 17:42 ` Jacob Pan
2025-11-14 17:17 ` [PATCH v4 2/2] iommu/arm-smmu-v3: Improve CMDQ lock fairness and efficiency Jacob Pan
2025-11-25 17:18 ` Will Deacon
2025-11-30 22:52 ` Jacob Pan
2025-12-10 3:11 ` Will Deacon
2025-11-20 17:10 ` [PATCH v4 0/2] SMMU v3 CMDQ fix and improvement Jacob Pan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251201134914.00003f41@linux.microsoft.com \
--to=jacob.pan@linux.microsoft.com \
--cc=Alexander.Grest@microsoft.com \
--cc=iommu@lists.linux.dev \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nicolinc@nvidia.com \
--cc=robin.murphy@arm.com \
--cc=smostafa@google.com \
--cc=will@kernel.org \
--cc=zhangyu1@linux.microsoft.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox