Re: [PATCH 5/4] iommu/arm-smmu: Poll for TLB sync completion more effectively

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
To: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	sunil.kovvuri-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: Re: [PATCH 5/4] iommu/arm-smmu: Poll for TLB sync completion more effectively
Date: Thu, 30 Mar 2017 15:42:19 +0100	[thread overview]
Message-ID: <20170330144219.GH22160@arm.com> (raw)
In-Reply-To: <67da197b56bb8763623fc215176b86ab04b1799e.1490291854.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>

On Thu, Mar 23, 2017 at 05:59:40PM +0000, Robin Murphy wrote:
> On relatively slow development platforms and software models, the
> inefficiency of our TLB sync loop tends not to show up - for instance on
> a Juno r1 board I typically see the TLBI has completed of its own accord
> by the time we get to the sync, such that the latter finishes instantly.
> 
> However, on larger systems doing real I/O, it's less realistic for the
> TLBs to go idle immediately, and at that point falling into the 1MHz
> polling loop turns out to throw away performance drastically. Let's
> strike a balance by polling more than once between pauses, such that we
> have much more chance of catching normal operations completing before
> committing to the fixed delay, but also backing off exponentially, since
> if a sync really hasn't completed within one or two "reasonable time"
> periods, it becomes increasingly unlikely that it ever will.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

Thanks, I like this patch.

> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index f7411109670f..aa17f3d937a0 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -162,6 +162,7 @@
>  #define ARM_SMMU_GR0_sTLBGSTATUS	0x74
>  #define sTLBGSTATUS_GSACTIVE		(1 << 0)
>  #define TLB_LOOP_TIMEOUT		1000000	/* 1s! */
> +#define TLB_SPIN_COUNT			10
>  
>  /* Stream mapping registers */
>  #define ARM_SMMU_GR0_SMR(n)		(0x800 + ((n) << 2))
> @@ -574,18 +575,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>  				void __iomem *sync, void __iomem *status)
>  {
> -	int count = 0;
> +	unsigned int spin_count, delay;
>  
>  	writel_relaxed(0, sync);
> -	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
> -		cpu_relax();
> -		if (++count == TLB_LOOP_TIMEOUT) {
> -			dev_err_ratelimited(smmu->dev,
> -			"TLB sync timed out -- SMMU may be deadlocked\n");
> -			return;
> -		}
> -		udelay(1);
> +	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> +		for (spin_count = TLB_SPIN_COUNT; spin_count > 0; spin_count--)
> +			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> +				return;

Can you keep the cpu_relax in the inner loop please?

> +		udelay(delay);
>  	}
> +	dev_err_ratelimited(smmu->dev,
> +			    "TLB sync timed out -- SMMU may be deadlocked\n");

Whilst we can have WFE-based spinning with SMMUv3, I suppose we should
do something similar in queue_poll_cons... Fancy taking a look?

Will

WARNING: multiple messages have this Message-ID (diff)

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 5/4] iommu/arm-smmu: Poll for TLB sync completion more effectively
Date: Thu, 30 Mar 2017 15:42:19 +0100	[thread overview]
Message-ID: <20170330144219.GH22160@arm.com> (raw)
In-Reply-To: <67da197b56bb8763623fc215176b86ab04b1799e.1490291854.git.robin.murphy@arm.com>

On Thu, Mar 23, 2017 at 05:59:40PM +0000, Robin Murphy wrote:
> On relatively slow development platforms and software models, the
> inefficiency of our TLB sync loop tends not to show up - for instance on
> a Juno r1 board I typically see the TLBI has completed of its own accord
> by the time we get to the sync, such that the latter finishes instantly.
> 
> However, on larger systems doing real I/O, it's less realistic for the
> TLBs to go idle immediately, and at that point falling into the 1MHz
> polling loop turns out to throw away performance drastically. Let's
> strike a balance by polling more than once between pauses, such that we
> have much more chance of catching normal operations completing before
> committing to the fixed delay, but also backing off exponentially, since
> if a sync really hasn't completed within one or two "reasonable time"
> periods, it becomes increasingly unlikely that it ever will.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)

Thanks, I like this patch.

> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index f7411109670f..aa17f3d937a0 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -162,6 +162,7 @@
>  #define ARM_SMMU_GR0_sTLBGSTATUS	0x74
>  #define sTLBGSTATUS_GSACTIVE		(1 << 0)
>  #define TLB_LOOP_TIMEOUT		1000000	/* 1s! */
> +#define TLB_SPIN_COUNT			10
>  
>  /* Stream mapping registers */
>  #define ARM_SMMU_GR0_SMR(n)		(0x800 + ((n) << 2))
> @@ -574,18 +575,17 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
>  				void __iomem *sync, void __iomem *status)
>  {
> -	int count = 0;
> +	unsigned int spin_count, delay;
>  
>  	writel_relaxed(0, sync);
> -	while (readl_relaxed(status) & sTLBGSTATUS_GSACTIVE) {
> -		cpu_relax();
> -		if (++count == TLB_LOOP_TIMEOUT) {
> -			dev_err_ratelimited(smmu->dev,
> -			"TLB sync timed out -- SMMU may be deadlocked\n");
> -			return;
> -		}
> -		udelay(1);
> +	for (delay = 1; delay < TLB_LOOP_TIMEOUT; delay *= 2) {
> +		for (spin_count = TLB_SPIN_COUNT; spin_count > 0; spin_count--)
> +			if (!(readl_relaxed(status) & sTLBGSTATUS_GSACTIVE))
> +				return;

Can you keep the cpu_relax in the inner loop please?

> +		udelay(delay);
>  	}
> +	dev_err_ratelimited(smmu->dev,
> +			    "TLB sync timed out -- SMMU may be deadlocked\n");

Whilst we can have WFE-based spinning with SMMUv3, I suppose we should
do something similar in queue_poll_cons... Fancy taking a look?

Will

next prev parent reply	other threads:[~2017-03-30 14:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-07 18:09 [PATCH 0/4] ARM SMMU per-context TLB sync Robin Murphy
2017-03-07 18:09 ` Robin Murphy
     [not found] ` <cover.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-07 18:09   ` [PATCH 1/4] iommu/arm-smmu: Handle size mismatches better Robin Murphy
2017-03-07 18:09     ` Robin Murphy
     [not found]     ` <6e61306e6823943b3eeb0c405d4505dd565e723e.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-30 14:09       ` Will Deacon
2017-03-30 14:09         ` Will Deacon
2017-03-07 18:09   ` [PATCH 2/4] iommu/arm-smmu: Simplify ASID/VMID handling Robin Murphy
2017-03-07 18:09     ` Robin Murphy
2017-03-07 18:09   ` [PATCH 3/4] iommu/arm-smmu: Tidy up context bank indexing Robin Murphy
2017-03-07 18:09     ` Robin Murphy
2017-03-07 18:09   ` [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate Robin Murphy
2017-03-07 18:09     ` Robin Murphy
     [not found]     ` <03ef837586cd991f2d8431908230fd3a3dd8c9cf.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-30 14:37       ` Will Deacon
2017-03-30 14:37         ` Will Deacon
     [not found]         ` <20170330143744.GG22160-5wv7dgnIgG8@public.gmane.org>
2017-03-30 15:48           ` Robin Murphy
2017-03-30 15:48             ` Robin Murphy
2017-03-23 17:59   ` [PATCH 5/4] iommu/arm-smmu: Poll for TLB sync completion more effectively Robin Murphy
2017-03-23 17:59     ` Robin Murphy
     [not found]     ` <67da197b56bb8763623fc215176b86ab04b1799e.1490291854.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-27 10:38       ` Sunil Kovvuri
2017-03-27 10:38         ` Sunil Kovvuri
2017-03-30 14:42       ` Will Deacon [this message]
2017-03-30 14:42         ` Will Deacon
2017-03-30 14:43   ` [PATCH 0/4] ARM SMMU per-context TLB sync Will Deacon
2017-03-30 14:43     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170330144219.GH22160@arm.com \
    --to=will.deacon-5wv7dgnigg8@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=robin.murphy-5wv7dgnIgG8@public.gmane.org \
    --cc=sunil.kovvuri-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.