[PATCH 1/2] iommu: Optimize IOMMU UnMap

Linux Tegra architecture development
 help / color / mirror / Atom feed

* [PATCH 1/2] iommu: Optimize IOMMU UnMap
@ 2024-07-17 10:06 Ashish Mhetre
  2024-07-17 10:06 ` [PATCH 2/2] include: linux: Update gather only if it's not NULL Ashish Mhetre
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ashish Mhetre @ 2024-07-17 10:06 UTC (permalink / raw)
  To: will, robin.murphy, joro
  Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre

The current __arm_lpae_unmap() function calls dma_sync() on individual
PTEs after clearing them. Overall unmap performance can be improved by
around 25% for large buffer sizes by combining the syncs for adjacent
leaf entries.
This patch optimizes the unmap time by clearing all the leaf entries and
issuing a single dma_sync() for them.
Below is detailed analysis of average unmap latency(in us) with and
without this optimization obtained by running dma_map_benchmark for
different buffer sizes.

		UnMap Latency(us)
Size	Without		With		% gain with
	optimiztion	optimization	optimization

4KB	3		3		0
8KB	4		3.8		5
16KB	6.1		5.4		11.48
32KB	10.2		8.5		16.67
64KB	18.5		14.9		19.46
128KB	35		27.5		21.43
256KB	67.5		52.2		22.67
512KB	127.9		97.2		24.00
1MB	248.6		187.4		24.62
2MB	65.5		65.5		0
4MB	119.2		119		0.17

Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
 drivers/iommu/io-pgtable-arm.c | 34 +++++++++++++++++++++-------------
 1 file changed, 21 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f5d9fd1f45bf..1787615eec24 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -274,13 +274,15 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
 				   sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
 }
 
-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg)
+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries)
 {
+	int i;
 
-	*ptep = 0;
+	for (i = 0; i < num_entries; i++)
+		ptep[i] = 0;
 
 	if (!cfg->coherent_walk)
-		__arm_lpae_sync_pte(ptep, 1, cfg);
+		__arm_lpae_sync_pte(ptep, num_entries, cfg);
 }
 
 static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -635,9 +637,10 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 			       unsigned long iova, size_t size, size_t pgcount,
 			       int lvl, arm_lpae_iopte *ptep)
 {
+	bool gather_queued;
 	arm_lpae_iopte pte;
 	struct io_pgtable *iop = &data->iop;
-	int i = 0, num_entries, max_entries, unmap_idx_start;
+	int i = 0, j = 0, num_entries, max_entries, unmap_idx_start;
 
 	/* Something went horribly wrong and we ran out of page table */
 	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
@@ -652,28 +655,33 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
 	/* If the size matches this level, we're in the right place */
 	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
 		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
+		gather_queued = iommu_iotlb_gather_queued(gather);
 		num_entries = min_t(int, pgcount, max_entries);
 
-		while (i < num_entries) {
-			pte = READ_ONCE(*ptep);
+		/* Find and handle non-leaf entries */
+		for (i = 0; i < num_entries; i++) {
+			pte = READ_ONCE(ptep[i]);
 			if (WARN_ON(!pte))
 				break;
 
-			__arm_lpae_clear_pte(ptep, &iop->cfg);
-
 			if (!iopte_leaf(pte, lvl, iop->fmt)) {
+				__arm_lpae_clear_pte(ptep, &iop->cfg, 1);
+
 				/* Also flush any partial walks */
 				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
 							  ARM_LPAE_GRANULE(data));
 				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
-			} else if (!iommu_iotlb_gather_queued(gather)) {
-				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
 			}
-
-			ptep++;
-			i++;
 		}
 
+		/* Clear the remaining entries */
+		if (i)
+			__arm_lpae_clear_pte(ptep, &iop->cfg, i);
+
+		if (!gather_queued)
+			for (j = 0; j < i; j++)
+				io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
+
 		return i * size;
 	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
 		/*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] include: linux: Update gather only if it's not NULL
  2024-07-17 10:06 [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
@ 2024-07-17 10:06 ` Ashish Mhetre
  2024-07-29  8:21 ` [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
  2024-07-29  8:43 ` Markus Elfring
  2 siblings, 0 replies; 5+ messages in thread
From: Ashish Mhetre @ 2024-07-17 10:06 UTC (permalink / raw)
  To: will, robin.murphy, joro
  Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra, Ashish Mhetre

Gather can be NULL when unmap is called for freeing old table while
mapping. If it's NULL then there is no need to add page for syncing
the TLB.

Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
 include/linux/iommu.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4d47f2c33311..2a28c1ef8517 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -928,6 +928,9 @@ static inline void iommu_iotlb_gather_add_page(struct iommu_domain *domain,
 					       struct iommu_iotlb_gather *gather,
 					       unsigned long iova, size_t size)
 {
+	if (!gather)
+		return;
+
 	/*
 	 * If the new page is disjoint from the current range or is mapped at
 	 * a different granularity, then sync the TLB so that the gather
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] iommu: Optimize IOMMU UnMap
  2024-07-17 10:06 [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
  2024-07-17 10:06 ` [PATCH 2/2] include: linux: Update gather only if it's not NULL Ashish Mhetre
@ 2024-07-29  8:21 ` Ashish Mhetre
  2024-07-29  8:43 ` Markus Elfring
  2 siblings, 0 replies; 5+ messages in thread
From: Ashish Mhetre @ 2024-07-29  8:21 UTC (permalink / raw)
  To: will, robin.murphy, joro
  Cc: linux-arm-kernel, iommu, linux-kernel, linux-tegra


On 7/17/2024 3:36 PM, Ashish Mhetre wrote:
> The current __arm_lpae_unmap() function calls dma_sync() on individual
> PTEs after clearing them. Overall unmap performance can be improved by
> around 25% for large buffer sizes by combining the syncs for adjacent
> leaf entries.
> This patch optimizes the unmap time by clearing all the leaf entries and
> issuing a single dma_sync() for them.
> Below is detailed analysis of average unmap latency(in us) with and
> without this optimization obtained by running dma_map_benchmark for
> different buffer sizes.
>
> 		UnMap Latency(us)
> Size	Without		With		% gain with
> 	optimiztion	optimization	optimization
>
> 4KB	3		3		0
> 8KB	4		3.8		5
> 16KB	6.1		5.4		11.48
> 32KB	10.2		8.5		16.67
> 64KB	18.5		14.9		19.46
> 128KB	35		27.5		21.43
> 256KB	67.5		52.2		22.67
> 512KB	127.9		97.2		24.00
> 1MB	248.6		187.4		24.62
> 2MB	65.5		65.5		0
> 4MB	119.2		119		0.17
>
> Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
> ---
>   drivers/iommu/io-pgtable-arm.c | 34 +++++++++++++++++++++-------------
>   1 file changed, 21 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index f5d9fd1f45bf..1787615eec24 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -274,13 +274,15 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
>   				   sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
>   }
>   
> -static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg)
> +static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries)
>   {
> +	int i;
>   
> -	*ptep = 0;
> +	for (i = 0; i < num_entries; i++)
> +		ptep[i] = 0;
>   
>   	if (!cfg->coherent_walk)
> -		__arm_lpae_sync_pte(ptep, 1, cfg);
> +		__arm_lpae_sync_pte(ptep, num_entries, cfg);
>   }
>   
>   static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
> @@ -635,9 +637,10 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   			       unsigned long iova, size_t size, size_t pgcount,
>   			       int lvl, arm_lpae_iopte *ptep)
>   {
> +	bool gather_queued;
>   	arm_lpae_iopte pte;
>   	struct io_pgtable *iop = &data->iop;
> -	int i = 0, num_entries, max_entries, unmap_idx_start;
> +	int i = 0, j = 0, num_entries, max_entries, unmap_idx_start;
>   
>   	/* Something went horribly wrong and we ran out of page table */
>   	if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
> @@ -652,28 +655,33 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
>   	/* If the size matches this level, we're in the right place */
>   	if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
>   		max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
> +		gather_queued = iommu_iotlb_gather_queued(gather);
>   		num_entries = min_t(int, pgcount, max_entries);
>   
> -		while (i < num_entries) {
> -			pte = READ_ONCE(*ptep);
> +		/* Find and handle non-leaf entries */
> +		for (i = 0; i < num_entries; i++) {
> +			pte = READ_ONCE(ptep[i]);
>   			if (WARN_ON(!pte))
>   				break;
>   
> -			__arm_lpae_clear_pte(ptep, &iop->cfg);
> -
>   			if (!iopte_leaf(pte, lvl, iop->fmt)) {
> +				__arm_lpae_clear_pte(ptep, &iop->cfg, 1);
> +
>   				/* Also flush any partial walks */
>   				io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
>   							  ARM_LPAE_GRANULE(data));
>   				__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
> -			} else if (!iommu_iotlb_gather_queued(gather)) {
> -				io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
>   			}
> -
> -			ptep++;
> -			i++;
>   		}
>   
> +		/* Clear the remaining entries */
> +		if (i)
> +			__arm_lpae_clear_pte(ptep, &iop->cfg, i);
> +
> +		if (!gather_queued)
> +			for (j = 0; j < i; j++)
> +				io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
> +
>   		return i * size;
>   	} else if (iopte_leaf(pte, lvl, iop->fmt)) {
>   		/*
Hi all,

Can you please review the patches and provide feedback?
Thanks,
Ashish Mhetre


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] iommu: Optimize IOMMU UnMap
  2024-07-17 10:06 [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
  2024-07-17 10:06 ` [PATCH 2/2] include: linux: Update gather only if it's not NULL Ashish Mhetre
  2024-07-29  8:21 ` [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
@ 2024-07-29  8:43 ` Markus Elfring
  2024-07-30  3:58   ` Ashish Mhetre
  2 siblings, 1 reply; 5+ messages in thread
From: Markus Elfring @ 2024-07-29  8:43 UTC (permalink / raw)
  To: Ashish Mhetre, iommu, linux-tegra, linux-arm-kernel
  Cc: LKML, Jörg Rödel, Robin Murphy, Will Deacon

…
> This patch optimizes …

See also:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.10#n94

Regards,
Markus

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] iommu: Optimize IOMMU UnMap
  2024-07-29  8:43 ` Markus Elfring
@ 2024-07-30  3:58   ` Ashish Mhetre
  0 siblings, 0 replies; 5+ messages in thread
From: Ashish Mhetre @ 2024-07-30  3:58 UTC (permalink / raw)
  To: Markus Elfring, iommu, linux-tegra, linux-arm-kernel
  Cc: LKML, Jörg Rödel, Robin Murphy, Will Deacon


On 7/29/2024 2:13 PM, Markus Elfring wrote:
> External email: Use caution opening links or attachments
>
>
> …
>> This patch optimizes …
> See also:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.10#n94
>
> Regards,
> Markus

Thanks Markus, I'll update the commit message in new version.
I'll wait for any other comments and address them all in next version.

Thanks,
Ashish Mhetre

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-07-30  3:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-17 10:06 [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
2024-07-17 10:06 ` [PATCH 2/2] include: linux: Update gather only if it's not NULL Ashish Mhetre
2024-07-29  8:21 ` [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
2024-07-29  8:43 ` Markus Elfring
2024-07-30  3:58   ` Ashish Mhetre

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox