From: Ashish Mhetre <amhetre@nvidia.com>
To: <will@kernel.org>, <robin.murphy@arm.com>, <joro@8bytes.org>
Cc: <linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
<linux-kernel@vger.kernel.org>, <linux-tegra@vger.kernel.org>,
Ashish Mhetre <amhetre@nvidia.com>
Subject: [PATCH 1/2] iommu: Optimize IOMMU UnMap
Date: Wed, 17 Jul 2024 10:06:18 +0000 [thread overview]
Message-ID: <20240717100619.108250-1-amhetre@nvidia.com> (raw)
The current __arm_lpae_unmap() function calls dma_sync() on individual
PTEs after clearing them. Overall unmap performance can be improved by
around 25% for large buffer sizes by combining the syncs for adjacent
leaf entries.
This patch optimizes the unmap time by clearing all the leaf entries and
issuing a single dma_sync() for them.
Below is detailed analysis of average unmap latency(in us) with and
without this optimization obtained by running dma_map_benchmark for
different buffer sizes.
UnMap Latency(us)
Size Without With % gain with
optimiztion optimization optimization
4KB 3 3 0
8KB 4 3.8 5
16KB 6.1 5.4 11.48
32KB 10.2 8.5 16.67
64KB 18.5 14.9 19.46
128KB 35 27.5 21.43
256KB 67.5 52.2 22.67
512KB 127.9 97.2 24.00
1MB 248.6 187.4 24.62
2MB 65.5 65.5 0
4MB 119.2 119 0.17
Signed-off-by: Ashish Mhetre <amhetre@nvidia.com>
---
drivers/iommu/io-pgtable-arm.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index f5d9fd1f45bf..1787615eec24 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -274,13 +274,15 @@ static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries,
sizeof(*ptep) * num_entries, DMA_TO_DEVICE);
}
-static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg)
+static void __arm_lpae_clear_pte(arm_lpae_iopte *ptep, struct io_pgtable_cfg *cfg, int num_entries)
{
+ int i;
- *ptep = 0;
+ for (i = 0; i < num_entries; i++)
+ ptep[i] = 0;
if (!cfg->coherent_walk)
- __arm_lpae_sync_pte(ptep, 1, cfg);
+ __arm_lpae_sync_pte(ptep, num_entries, cfg);
}
static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
@@ -635,9 +637,10 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
unsigned long iova, size_t size, size_t pgcount,
int lvl, arm_lpae_iopte *ptep)
{
+ bool gather_queued;
arm_lpae_iopte pte;
struct io_pgtable *iop = &data->iop;
- int i = 0, num_entries, max_entries, unmap_idx_start;
+ int i = 0, j = 0, num_entries, max_entries, unmap_idx_start;
/* Something went horribly wrong and we ran out of page table */
if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS))
@@ -652,28 +655,33 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
/* If the size matches this level, we're in the right place */
if (size == ARM_LPAE_BLOCK_SIZE(lvl, data)) {
max_entries = ARM_LPAE_PTES_PER_TABLE(data) - unmap_idx_start;
+ gather_queued = iommu_iotlb_gather_queued(gather);
num_entries = min_t(int, pgcount, max_entries);
- while (i < num_entries) {
- pte = READ_ONCE(*ptep);
+ /* Find and handle non-leaf entries */
+ for (i = 0; i < num_entries; i++) {
+ pte = READ_ONCE(ptep[i]);
if (WARN_ON(!pte))
break;
- __arm_lpae_clear_pte(ptep, &iop->cfg);
-
if (!iopte_leaf(pte, lvl, iop->fmt)) {
+ __arm_lpae_clear_pte(ptep, &iop->cfg, 1);
+
/* Also flush any partial walks */
io_pgtable_tlb_flush_walk(iop, iova + i * size, size,
ARM_LPAE_GRANULE(data));
__arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data));
- } else if (!iommu_iotlb_gather_queued(gather)) {
- io_pgtable_tlb_add_page(iop, gather, iova + i * size, size);
}
-
- ptep++;
- i++;
}
+ /* Clear the remaining entries */
+ if (i)
+ __arm_lpae_clear_pte(ptep, &iop->cfg, i);
+
+ if (!gather_queued)
+ for (j = 0; j < i; j++)
+ io_pgtable_tlb_add_page(iop, gather, iova + j * size, size);
+
return i * size;
} else if (iopte_leaf(pte, lvl, iop->fmt)) {
/*
--
2.25.1
next reply other threads:[~2024-07-17 10:06 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-17 10:06 Ashish Mhetre [this message]
2024-07-17 10:06 ` [PATCH 2/2] include: linux: Update gather only if it's not NULL Ashish Mhetre
2024-07-29 8:21 ` [PATCH 1/2] iommu: Optimize IOMMU UnMap Ashish Mhetre
2024-07-29 8:43 ` Markus Elfring
2024-07-30 3:58 ` Ashish Mhetre
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240717100619.108250-1-amhetre@nvidia.com \
--to=amhetre@nvidia.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox