Re: [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
To: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
Subject: Re: [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
Date: Thu, 30 Mar 2017 15:37:44 +0100	[thread overview]
Message-ID: <20170330143744.GG22160@arm.com> (raw)
In-Reply-To: <03ef837586cd991f2d8431908230fd3a3dd8c9cf.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>

Hi Robin,

This mostly looks great, but I have a couple of minor comments below.

On Tue, Mar 07, 2017 at 06:09:07PM +0000, Robin Murphy wrote:
> TLB synchronisation typically involves the SMMU blocking all incoming
> transactions until the TLBs report completion of all outstanding
> operations. In the common SMMUv2 configuration of a single distributed
> SMMU serving multiple peripherals, that means that a single unmap
> request has the potential to bring the hammer down on the entire system
> if synchronised globally. Since stage 1 contexts, and stage 2 contexts
> under SMMUv2, offer local sync operations, let's make use of those
> wherever we can in the hope of minimising global disruption.
> 
> To that end, rather than add any more branches to the already unwieldy
> monolithic TLB maintenance ops, break them up into smaller, neater,
> functions which we can then mix and match as appropriate.
> 
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 100 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c8aafe304171..f7411109670f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>  #define ARM_SMMU_CB_ATS1PR		0x800
>  #define ARM_SMMU_CB_ATSR		0x8f0
>  
> @@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  }
>  
>  /* Wait for any pending TLB invalidations to complete */
> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> +				void __iomem *sync, void __iomem *status)

Given that you take the arm_smmu_device anyway, I think I'd prefer just
passing the offsets for sync and status and avoiding the additions
in the caller (a bit like your other patch in this series ;).

>  static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
> @@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>  {
>  	struct arm_smmu_domain *smmu_domain = cookie;
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
> -	void __iomem *reg;
> +	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
> +	size_t step;
>  
> -	if (stage1) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> -		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
> -
> -		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> -			iova &= ~12UL;
> -			iova |= cfg->asid;
> -			do {
> -				writel_relaxed(iova, reg);
> -				iova += granule;
> -			} while (size -= granule);
> -		} else {
> -			iova >>= 12;
> -			iova |= (u64)cfg->asid << 48;
> -			do {
> -				writeq_relaxed(iova, reg);
> -				iova += granule >> 12;
> -			} while (size -= granule);
> -		}
> -	} else if (smmu->version == ARM_SMMU_V2) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> +	if (stage1)
> +		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
> +			      ARM_SMMU_CB_S1_TLBIVA;
> +	else
>  		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
>  			      ARM_SMMU_CB_S2_TLBIIPAS2;
> -		iova >>= 12;
> -		do {
> -			smmu_write_atomic_lq(iova, reg);
> -			iova += granule >> 12;
> -		} while (size -= granule);
> +
> +	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> +		iova &= ~12UL;
> +		iova |= cfg->asid;
> +		step = granule;
>  	} else {
> -		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
> -		writel_relaxed(cfg->vmid, reg);
> +		iova >>= 12;
> +		step = granule >> 12;
> +		if (stage1)
> +			iova |= (u64)cfg->asid << 48;
>  	}
> +
> +	do {
> +		smmu_write_atomic_lq(iova, reg);
> +		iova += step;
> +	} while (size -= granule);

There seems to be a lot of refactoring going on here, but I'm not entirely
comfortable with the unconditional move to smmu_write_atomic_lq. Given the
way in which arm_smmu_tlb_inv_range_nosync is now called (i.e. only for
stage-1 or SMMUv2 stage-2), then I think you just need to delete the final
else clause in the current code and you're done.

Will

WARNING: multiple messages have this Message-ID (diff)

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate
Date: Thu, 30 Mar 2017 15:37:44 +0100	[thread overview]
Message-ID: <20170330143744.GG22160@arm.com> (raw)
In-Reply-To: <03ef837586cd991f2d8431908230fd3a3dd8c9cf.1488907474.git.robin.murphy@arm.com>

Hi Robin,

This mostly looks great, but I have a couple of minor comments below.

On Tue, Mar 07, 2017 at 06:09:07PM +0000, Robin Murphy wrote:
> TLB synchronisation typically involves the SMMU blocking all incoming
> transactions until the TLBs report completion of all outstanding
> operations. In the common SMMUv2 configuration of a single distributed
> SMMU serving multiple peripherals, that means that a single unmap
> request has the potential to bring the hammer down on the entire system
> if synchronised globally. Since stage 1 contexts, and stage 2 contexts
> under SMMUv2, offer local sync operations, let's make use of those
> wherever we can in the hope of minimising global disruption.
> 
> To that end, rather than add any more branches to the already unwieldy
> monolithic TLB maintenance ops, break them up into smaller, neater,
> functions which we can then mix and match as appropriate.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu.c | 156 ++++++++++++++++++++++++++++++-----------------
>  1 file changed, 100 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index c8aafe304171..f7411109670f 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -237,6 +237,8 @@ enum arm_smmu_s2cr_privcfg {
>  #define ARM_SMMU_CB_S1_TLBIVAL		0x620
>  #define ARM_SMMU_CB_S2_TLBIIPAS2	0x630
>  #define ARM_SMMU_CB_S2_TLBIIPAS2L	0x638
> +#define ARM_SMMU_CB_TLBSYNC		0x7f0
> +#define ARM_SMMU_CB_TLBSTATUS		0x7f4
>  #define ARM_SMMU_CB_ATS1PR		0x800
>  #define ARM_SMMU_CB_ATSR		0x8f0
>  
> @@ -569,14 +571,13 @@ static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
>  }
>  
>  /* Wait for any pending TLB invalidations to complete */
> -static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +static void __arm_smmu_tlb_sync(struct arm_smmu_device *smmu,
> +				void __iomem *sync, void __iomem *status)

Given that you take the arm_smmu_device anyway, I think I'd prefer just
passing the offsets for sync and status and avoiding the additions
in the caller (a bit like your other patch in this series ;).

>  static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
> @@ -617,48 +638,66 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
>  {
>  	struct arm_smmu_domain *smmu_domain = cookie;
>  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> -	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	bool stage1 = cfg->cbar != CBAR_TYPE_S2_TRANS;
> -	void __iomem *reg;
> +	void __iomem *reg = ARM_SMMU_CB(smmu_domain->smmu, cfg->cbndx);
> +	size_t step;
>  
> -	if (stage1) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> -		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
> -
> -		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> -			iova &= ~12UL;
> -			iova |= cfg->asid;
> -			do {
> -				writel_relaxed(iova, reg);
> -				iova += granule;
> -			} while (size -= granule);
> -		} else {
> -			iova >>= 12;
> -			iova |= (u64)cfg->asid << 48;
> -			do {
> -				writeq_relaxed(iova, reg);
> -				iova += granule >> 12;
> -			} while (size -= granule);
> -		}
> -	} else if (smmu->version == ARM_SMMU_V2) {
> -		reg = ARM_SMMU_CB(smmu, cfg->cbndx);
> +	if (stage1)
> +		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL :
> +			      ARM_SMMU_CB_S1_TLBIVA;
> +	else
>  		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
>  			      ARM_SMMU_CB_S2_TLBIIPAS2;
> -		iova >>= 12;
> -		do {
> -			smmu_write_atomic_lq(iova, reg);
> -			iova += granule >> 12;
> -		} while (size -= granule);
> +
> +	if (stage1 && cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
> +		iova &= ~12UL;
> +		iova |= cfg->asid;
> +		step = granule;
>  	} else {
> -		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
> -		writel_relaxed(cfg->vmid, reg);
> +		iova >>= 12;
> +		step = granule >> 12;
> +		if (stage1)
> +			iova |= (u64)cfg->asid << 48;
>  	}
> +
> +	do {
> +		smmu_write_atomic_lq(iova, reg);
> +		iova += step;
> +	} while (size -= granule);

There seems to be a lot of refactoring going on here, but I'm not entirely
comfortable with the unconditional move to smmu_write_atomic_lq. Given the
way in which arm_smmu_tlb_inv_range_nosync is now called (i.e. only for
stage-1 or SMMUv2 stage-2), then I think you just need to delete the final
else clause in the current code and you're done.

Will

next prev parent reply	other threads:[~2017-03-30 14:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-07 18:09 [PATCH 0/4] ARM SMMU per-context TLB sync Robin Murphy
2017-03-07 18:09 ` Robin Murphy
     [not found] ` <cover.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-07 18:09   ` [PATCH 1/4] iommu/arm-smmu: Handle size mismatches better Robin Murphy
2017-03-07 18:09     ` Robin Murphy
     [not found]     ` <6e61306e6823943b3eeb0c405d4505dd565e723e.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-30 14:09       ` Will Deacon
2017-03-30 14:09         ` Will Deacon
2017-03-07 18:09   ` [PATCH 2/4] iommu/arm-smmu: Simplify ASID/VMID handling Robin Murphy
2017-03-07 18:09     ` Robin Murphy
2017-03-07 18:09   ` [PATCH 3/4] iommu/arm-smmu: Tidy up context bank indexing Robin Murphy
2017-03-07 18:09     ` Robin Murphy
2017-03-07 18:09   ` [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate Robin Murphy
2017-03-07 18:09     ` Robin Murphy
     [not found]     ` <03ef837586cd991f2d8431908230fd3a3dd8c9cf.1488907474.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-30 14:37       ` Will Deacon [this message]
2017-03-30 14:37         ` Will Deacon
     [not found]         ` <20170330143744.GG22160-5wv7dgnIgG8@public.gmane.org>
2017-03-30 15:48           ` Robin Murphy
2017-03-30 15:48             ` Robin Murphy
2017-03-23 17:59   ` [PATCH 5/4] iommu/arm-smmu: Poll for TLB sync completion more effectively Robin Murphy
2017-03-23 17:59     ` Robin Murphy
     [not found]     ` <67da197b56bb8763623fc215176b86ab04b1799e.1490291854.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-03-27 10:38       ` Sunil Kovvuri
2017-03-27 10:38         ` Sunil Kovvuri
2017-03-30 14:42       ` Will Deacon
2017-03-30 14:42         ` Will Deacon
2017-03-30 14:43   ` [PATCH 0/4] ARM SMMU per-context TLB sync Will Deacon
2017-03-30 14:43     ` Will Deacon
  -- strict thread matches above, loose matches on Subject: below --
2016-01-26 18:06 [PATCH 0/4] Miscellaneous ARM SMMU patches Robin Murphy
     [not found] ` <cover.1453830752.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-01-26 18:06   ` [PATCH 4/4] iommu/arm-smmu: Use per-context TLB sync as appropriate Robin Murphy
2016-01-26 18:06     ` Robin Murphy
     [not found]     ` <fc2ede950aea2e84a0b7cf93e4cd605f0cbdf3c3.1453830752.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-02-09 14:15       ` Will Deacon
2016-02-09 14:15         ` Will Deacon
     [not found]         ` <20160209141508.GO22874-5wv7dgnIgG8@public.gmane.org>
2016-02-10 11:58           ` Robin Murphy
2016-02-10 11:58             ` Robin Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170330143744.GG22160@arm.com \
    --to=will.deacon-5wv7dgnigg8@public.gmane.org \
    --cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org \
    --cc=robin.murphy-5wv7dgnIgG8@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.