public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jacob Pan <jacob.pan@linux.microsoft.com>
To: Robin Murphy <robin.murphy@arm.com>
Cc: Will Deacon <will@kernel.org>,
	linux-kernel@vger.kernel.org,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	Joerg Roedel <joro@8bytes.org>,
	Mostafa Saleh <smostafa@google.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Nicolin Chen <nicolinc@nvidia.com>,
	Zhang Yu <zhangyu1@linux.microsoft.com>,
	Jean Philippe-Brucker <jean-philippe@linaro.org>,
	Alexander Grest <Alexander.Grest@microsoft.com>
Subject: Re: [PATCH v4 1/2] iommu/arm-smmu-v3: Fix CMDQ timeout warning
Date: Mon, 1 Dec 2025 13:49:14 -0800	[thread overview]
Message-ID: <20251201134914.00003f41@linux.microsoft.com> (raw)
In-Reply-To: <4c347cb4-bb82-4209-82fc-59088cc19bc2@arm.com>

Hi Robin,

On Mon, 1 Dec 2025 19:57:31 +0000
Robin Murphy <robin.murphy@arm.com> wrote:

> On 2025-11-30 11:06 pm, Jacob Pan wrote:
> > Hi Will,
> > 
> > On Tue, 25 Nov 2025 17:19:16 +0000
> > Will Deacon <will@kernel.org> wrote:
> >   
> >> On Fri, Nov 14, 2025 at 09:17:17AM -0800, Jacob Pan wrote:  
> >>> While polling for n spaces in the cmdq, the current code instead
> >>> checks if the queue is full. If the queue is almost full but not
> >>> enough space (<n), then the CMDQ timeout warning is never
> >>> triggered even if the polling has exceeded timeout limit.
> >>>
> >>> The existing arm_smmu_cmdq_poll_until_not_full() doesn't fit
> >>> efficiently nor ideally to the only caller
> >>> arm_smmu_cmdq_issue_cmdlist():
> >>>   - It uses a new timer at every single call, which fails to limit
> >>> to the preset ARM_SMMU_POLL_TIMEOUT_US per issue.
> >>> - It has a redundant internal queue_full(), which doesn't detect
> >>> whether there is a enough space for number of n commands.
> >>>
> >>> This patch polls for the availability of exact space instead of
> >>> full and emit timeout warning accordingly.
> >>>
> >>> Fixes: 587e6c10a7ce ("iommu/arm-smmu-v3: Reduce contention during
> >>> command-queue insertion") Co-developed-by: Yu Zhang
> >>> <zhangyu1@linux.microsoft.com> Signed-off-by: Yu Zhang
> >>> <zhangyu1@linux.microsoft.com> Signed-off-by: Jacob Pan
> >>> <jacob.pan@linux.microsoft.com>  
> >>
> >> I'm assuming you're seeing problems with an emulated command queue?
> >> Any chance you could make that bigger?
> >>  
> > This is not related to queue size, but rather a logic issue when
> > anytime queue is nearly full.  
> 
> It is absolutely related to queue size, because there are only two
> reasons why this warning should ever be seen:
> 
> 1: The SMMU is in some unexpected error state and has stopped
> consuming commands altogether.
> 2: The queue is far too small to do its job of buffering commands for
> the number of present CPUs.
I agree that smaller queue size or slow emulation makes this problem
more visible, in this sense they are related.

> >>> @@ -804,12 +794,13 @@ int arm_smmu_cmdq_issue_cmdlist(struct
> >>> arm_smmu_device *smmu, local_irq_save(flags);
> >>>   	llq.val = READ_ONCE(cmdq->q.llq.val);
> >>>   	do {
> >>> +		struct arm_smmu_queue_poll qp;
> >>>   		u64 old;
> >>>   
> >>> +		queue_poll_init(smmu, &qp);
> >>>   		while (!queue_has_space(&llq, n + sync)) {
> >>>   			local_irq_restore(flags);
> >>> -			if
> >>> (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq))
> >>> -				dev_err_ratelimited(smmu->dev,
> >>> "CMDQ timeout\n");
> >>> +			arm_smmu_cmdq_poll(smmu, cmdq, &llq,
> >>> &qp); 
> >>
> >> Isn't this broken for wfe-based polling? The SMMU only generates
> >> the wake-up event when the queue becomes non-full.  
> > I don't see this is a problem since any interrupts such as scheduler
> > tick can be a break evnt for WFE, no?  
> 
> No, we cannot assume that any other WFE wakeup event is *guaranteed*.
> It's certainly possible to get stuck in WFE on a NOHZ_FULL kernel with
> the arch timer event stream disabled - I recall proving that back
> when I hoped I might not have to bother upstreaming a workaround for
> MMU-600 erratum #1076982.
> 
Make sense, thanks for sharing this history.

> Yes, a large part of the reason we enable the event stream by default
> is to help mitigate errata and software bugs which lead to missed
> events, but that still doesn't mean we should consciously abuse it. I
> guess something like the diff below (completely untested) would be a
> bit closer to correct (basically, allow WFE when the queue is
> completely full, but suppress it if we're then still waiting for more
> space in a non-full queue).
> 
The diff below looks good to me, I can give it a try.

But at the same time, since queue full is supposed to be an exceptional
scenario, I wonder if we can just disable WFE all together in this case.
Power saving may not matter here?

> Thanks,
> Robin.
> 
> ----->8-----  
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index
> a0c6d87f85a1..206c6c6860dd 100644 ---
> a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -123,7 +123,7 @@
> static void parse_driver_options(struct arm_smmu_device *smmu) }
>   
>   /* Low-level queue manipulation functions */
> -static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n)
> +static int queue_space(struct arm_smmu_ll_queue *q)
>   {
>   	u32 space, prod, cons;
>   
> @@ -135,7 +135,7 @@ static bool queue_has_space(struct
> arm_smmu_ll_queue *q, u32 n) else
>   		space = cons - prod;
>   
> -	return space >= n;
> +	return space;
>   }
>   
>   static bool queue_empty(struct arm_smmu_ll_queue *q)
> @@ -796,9 +796,11 @@ int arm_smmu_cmdq_issue_cmdlist(struct
> arm_smmu_device *smmu, do {
>   		struct arm_smmu_queue_poll qp;
>   		u64 old;
> +		int space;
>   
>   		queue_poll_init(smmu, &qp);
> -		while (!queue_has_space(&llq, n + sync)) {
> +		while (space = queue_space(&llq), space < n + sync) {
> +			qp.wfe &= !space;
>   			local_irq_restore(flags);
>   			arm_smmu_cmdq_poll(smmu, cmdq, &llq, &qp);
>   			local_irq_save(flags);


  reply	other threads:[~2025-12-01 21:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-14 17:17 [PATCH v4 0/2] SMMU v3 CMDQ fix and improvement Jacob Pan
2025-11-14 17:17 ` [PATCH v4 1/2] iommu/arm-smmu-v3: Fix CMDQ timeout warning Jacob Pan
2025-11-14 18:29   ` Nicolin Chen
2025-11-25 17:19   ` Will Deacon
2025-11-30 23:06     ` Jacob Pan
2025-12-01 19:57       ` Robin Murphy
2025-12-01 21:49         ` Jacob Pan [this message]
2025-12-01 17:42     ` Jacob Pan
2025-11-14 17:17 ` [PATCH v4 2/2] iommu/arm-smmu-v3: Improve CMDQ lock fairness and efficiency Jacob Pan
2025-11-25 17:18   ` Will Deacon
2025-11-30 22:52     ` Jacob Pan
2025-12-10  3:11       ` Will Deacon
2025-11-20 17:10 ` [PATCH v4 0/2] SMMU v3 CMDQ fix and improvement Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251201134914.00003f41@linux.microsoft.com \
    --to=jacob.pan@linux.microsoft.com \
    --cc=Alexander.Grest@microsoft.com \
    --cc=iommu@lists.linux.dev \
    --cc=jean-philippe@linaro.org \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=will@kernel.org \
    --cc=zhangyu1@linux.microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox