* [RFC PATCH 0/3] iommu: Add range flush operation
@ 2015-09-29 5:25 Tomasz Figa
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: Tomasz Figa, Joerg Roedel, Hiroshi Doyu, Stephen Warren,
Thierry Reding, Alexandre Courbot, Vince Hsu, Russell King,
Paul Walmsley, Tomeu Vizoso, Mikko Perttunen, Will Deacon,
Alex Williamson, Arnd Bergmann, Marek Szyprowski,
Antonios Motakis, Olav Haugan, Nicolas Iooss,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
page by page, however it involves flushing the caches (CPU and IOMMU) for
every mapped page separately, which is unsuitable for use cases that
require low mapping latency. Similarly iommu_unmap(), even though it
takes a full IOVA range as its argument, performs unmapping in a page
by page manner.
To make mapping operation more suitable for such use cases, iommu_map_sg()
and .map_sg() callback in iommu_ops struct were introduced, which allowed
particular IOMMU drivers to directly iterate over SG entries, create
necessary mappings and flush everything in one go.
This approach, however, has two drawbacks:
1) it does not do anything about unmap performance,
2) it requires each driver willing to have fast map to implement its
own SG iteration code, even though this is a mostly generic operation.
This series tries to mitigate the two issues above, while acknowledging
the fact that the .map_sg() callback might be still necessary for some
specific platforms, which could have the need to iterate over SG elements
inside driver code. Proposed solution introduces a new .flush() callback,
which expects IOVA range as its argument and is expected to flush all
respective caches (be it CPU, IOMMU TLB or whatever) to make the given
IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
map/unmap operations are modified to call the .flush() callback at the end
of the operation.
Advantages of proposed approach include:
1) ability to use default_iommu_map_sg() helper if all the driver needs
for performance optimization is batching the flush,
2) completely no effect on existing code - the .flush() callback is made
optional and if it isn't implemented drivers are expected to do
necessary flushes on a page by page basis in respective (un)mapping
callbakcs,
3) possibility of exporting the iommu_flush() operation and providing
unsynchronized map/unmap operations for subsystems with even higher
requirements for performance (e.g. drivers/gpu/drm).
The series includes a generic patch implementing necessary changes in
IOMMU API and two Tegra-specific patches that demonstrate implementation
on driver side and which can be used for further testing.
Last, but not least, some performance numbers on Tegra210:
+-----------+--------------+-------------+------------+
| Operation | Size [bytes] | Before [us] | After [us] |
+-----------+--------------+-------------+------------+
| Map | 128K | 139 | 40 |
| | | 136 | 34 |
| | | 137 | 38 |
| | | 136 | 36 |
| | 4M | 3939 | 1163 |
| | | 3730 | 2389 |
| | | 3613 | 997 |
| | | 3622 | 1620 |
| | ~18M | 18635 | 4741 |
| | | 19261 | 6550 |
| | | 18473 | 9304 |
| | | 18125 | 5120 |
| Unmap | 128K | 128 | 7 |
| | | 122 | 8 |
| | | 119 | 10 |
| | | 123 | 12 |
| | 4M | 3829 | 151 |
| | | 3964 | 150 |
| | | 3908 | 145 |
| | | 3875 | 155 |
| | ~18M | 18570 | 683 |
| | | 18473 | 806 |
| | | 21020 | 643 |
| | | 21764 | 652 |
+-----------+--------------+-------------+------------+
The values are obtained by surrounding the calls to iommu_map_sg()
(with default_iommu_map_sg() helper used as .map_sg() callback) and
iommu_unmap() with ktime-based time measurement code. Taken 4 samples
of every buffer size. ~18M means around 17-19M due do the variance
in requested buffer sizes.
Tomasz Figa (2):
iommu: Add support for out of band flushing
iommu/tegra-smmu: Make the driver use out of band flushing
Vince Hsu (1):
memory: tegra: add TLB cache line size
drivers/iommu/iommu.c | 33 +++++++++++++--
drivers/iommu/tegra-smmu.c | 91 +++++++++++++++++++++++++++++++++++++----
drivers/memory/tegra/tegra114.c | 1 +
drivers/memory/tegra/tegra124.c | 3 ++
drivers/memory/tegra/tegra210.c | 1 +
drivers/memory/tegra/tegra30.c | 1 +
include/linux/iommu.h | 2 +
include/soc/tegra/mc.h | 1 +
8 files changed, 122 insertions(+), 11 deletions(-)
--
2.6.0.rc2.230.g3dd15c0
^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH 1/3] iommu: Add support for out of band flushing
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
@ 2015-09-29 5:25 ` Tomasz Figa
[not found] ` <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 5:25 ` [RFC PATCH 2/3] memory: tegra: add TLB cache line size Tomasz Figa
` (4 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann,
Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa,
linux-tegra-u79uwXL29TY76Z2rM5mHXA, Thierry Reding, Nicolas Iooss,
Russell King, Vince Hsu, Mikko Perttunen
This patch adds a new "flush" callback to iommu_ops, which is supposed
to perform any necessary flushes within given IOMMU domain to make any
changes to mappings of given area [iova; iova + size) be reflected to
IOMMU clients.
The purpose is to let IOMMU drivers skip page-by-page flushes and
replace it with one flush of full address range on devices which support
it.
Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
---
drivers/iommu/iommu.c | 33 ++++++++++++++++++++++++++++++---
include/linux/iommu.h | 2 ++
2 files changed, 32 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 049df49..d6bb8fe 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1286,8 +1286,15 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
return pgsize;
}
-int iommu_map(struct iommu_domain *domain, unsigned long iova,
- phys_addr_t paddr, size_t size, int prot)
+static void iommu_flush(struct iommu_domain *domain, unsigned long iova,
+ size_t size)
+{
+ if (domain->ops->flush)
+ domain->ops->flush(domain, iova, size);
+}
+
+static int __iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot)
{
unsigned long orig_iova = iova;
unsigned int min_pagesz;
@@ -1340,6 +1347,20 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova,
return ret;
}
+
+int iommu_map(struct iommu_domain *domain, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot)
+{
+ int ret;
+
+ ret = __iommu_map(domain, iova, paddr, size, prot);
+ if (ret)
+ return ret;
+
+ iommu_flush(domain, iova, size);
+
+ return 0;
+}
EXPORT_SYMBOL_GPL(iommu_map);
size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
@@ -1389,6 +1410,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
unmapped += unmapped_page;
}
+ iommu_flush(domain, orig_iova, unmapped);
trace_unmap(orig_iova, size, unmapped);
return unmapped;
}
@@ -1419,19 +1441,24 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
if (!IS_ALIGNED(s->offset, min_pagesz))
goto out_err;
- ret = iommu_map(domain, iova + mapped, phys, s->length, prot);
+ ret = __iommu_map(domain, iova + mapped, phys, s->length, prot);
if (ret)
goto out_err;
mapped += s->length;
}
+ iommu_flush(domain, iova, mapped);
+
return mapped;
out_err:
/* undo mappings already done */
iommu_unmap(domain, iova, mapped);
+ /* flush in case part of our mapping already got cached */
+ iommu_flush(domain, iova, mapped);
+
return 0;
}
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f9c1b6d..18e59a9 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -164,6 +164,8 @@ struct iommu_ops {
size_t size);
size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova,
struct scatterlist *sg, unsigned int nents, int prot);
+ void (*flush)(struct iommu_domain *domain, unsigned long iova,
+ size_t size);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
int (*add_device)(struct device *dev);
void (*remove_device)(struct device *dev);
--
2.6.0.rc2.230.g3dd15c0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH 2/3] memory: tegra: add TLB cache line size
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 5:25 ` [RFC PATCH 1/3] iommu: Add support for out of band flushing Tomasz Figa
@ 2015-09-29 5:25 ` Tomasz Figa
[not found] ` <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 5:25 ` [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing Tomasz Figa
` (3 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: Vince Hsu, Tomasz Figa, Joerg Roedel, Hiroshi Doyu,
Stephen Warren, Thierry Reding, Alexandre Courbot, Russell King,
Paul Walmsley, Mikko Perttunen, Tomeu Vizoso, Will Deacon,
Alex Williamson, Arnd Bergmann, Antonios Motakis, Nicolas Iooss,
Olav Haugan, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
From: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
This patch adds SMMU line size to Tegra SoC data struct to enable SMMU
driver to use this knowledge in code added by further patch.
Also add the missing TLB line number for Tegra124.
Signed-off-by: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
[tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org: Rebased, revised commit message.]
Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
---
drivers/memory/tegra/tegra114.c | 1 +
drivers/memory/tegra/tegra124.c | 3 +++
drivers/memory/tegra/tegra210.c | 1 +
drivers/memory/tegra/tegra30.c | 1 +
include/soc/tegra/mc.h | 1 +
5 files changed, 7 insertions(+)
diff --git a/drivers/memory/tegra/tegra114.c b/drivers/memory/tegra/tegra114.c
index ba8fff3..173d2dd 100644
--- a/drivers/memory/tegra/tegra114.c
+++ b/drivers/memory/tegra/tegra114.c
@@ -920,6 +920,7 @@ static const struct tegra_smmu_soc tegra114_smmu_soc = {
.supports_round_robin_arbitration = false,
.supports_request_limit = false,
.num_tlb_lines = 32,
+ .tlb_line_size = 16,
.num_asids = 4,
};
diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c
index 21e7255..ff12487 100644
--- a/drivers/memory/tegra/tegra124.c
+++ b/drivers/memory/tegra/tegra124.c
@@ -1007,6 +1007,8 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = {
.num_swgroups = ARRAY_SIZE(tegra124_swgroups),
.supports_round_robin_arbitration = true,
.supports_request_limit = true,
+ .num_tlb_lines = 32,
+ .tlb_line_size = 32,
.num_asids = 128,
};
@@ -1031,6 +1033,7 @@ static const struct tegra_smmu_soc tegra132_smmu_soc = {
.supports_round_robin_arbitration = true,
.supports_request_limit = true,
.num_tlb_lines = 32,
+ .tlb_line_size = 32,
.num_asids = 128,
};
diff --git a/drivers/memory/tegra/tegra210.c b/drivers/memory/tegra/tegra210.c
index 5e144ab..bf6a9c1 100644
--- a/drivers/memory/tegra/tegra210.c
+++ b/drivers/memory/tegra/tegra210.c
@@ -1067,6 +1067,7 @@ static const struct tegra_smmu_soc tegra210_smmu_soc = {
.supports_round_robin_arbitration = true,
.supports_request_limit = true,
.num_tlb_lines = 32,
+ .tlb_line_size = 32,
.num_asids = 128,
};
diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c
index b447378..08951db 100644
--- a/drivers/memory/tegra/tegra30.c
+++ b/drivers/memory/tegra/tegra30.c
@@ -942,6 +942,7 @@ static const struct tegra_smmu_soc tegra30_smmu_soc = {
.supports_round_robin_arbitration = false,
.supports_request_limit = false,
.num_tlb_lines = 16,
+ .tlb_line_size = 16,
.num_asids = 4,
};
diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h
index 44202ff..a6dbb35 100644
--- a/include/soc/tegra/mc.h
+++ b/include/soc/tegra/mc.h
@@ -62,6 +62,7 @@ struct tegra_smmu_soc {
bool supports_request_limit;
unsigned int num_tlb_lines;
+ unsigned int tlb_line_size;
unsigned int num_asids;
};
--
2.6.0.rc2.230.g3dd15c0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 5:25 ` [RFC PATCH 1/3] iommu: Add support for out of band flushing Tomasz Figa
2015-09-29 5:25 ` [RFC PATCH 2/3] memory: tegra: add TLB cache line size Tomasz Figa
@ 2015-09-29 5:25 ` Tomasz Figa
2015-09-29 9:27 ` [RFC PATCH 0/3] iommu: Add range flush operation Thierry Reding
` (2 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: Tomasz Figa, Vince Hsu, Joerg Roedel, Hiroshi Doyu,
Stephen Warren, Thierry Reding, Alexandre Courbot, Vince Hsu,
Paul Walmsley, Russell King, Mikko Perttunen, Tomeu Vizoso,
Will Deacon, Alex Williamson, Marek Szyprowski, Arnd Bergmann,
Antonios Motakis, Nicolas Iooss, Olav Haugan,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
This patch modifies the tegra-smmu driver to perform PTC and TLB flushes
inside iommu_ops .flush() callback instead of map and unmap operations,
so that performance of large maps and unmaps is heavily optimized due to
elimination of page-by-page flushing.
Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
Signed-off-by: Vince Hsu <vinceh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
drivers/iommu/tegra-smmu.c | 91 ++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 83 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 9305964..92b46d2 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -614,18 +614,54 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova)
}
}
+static void tegra_smmu_pte_put_use_range(struct tegra_smmu_as *as,
+ unsigned long iova, unsigned int len)
+{
+ unsigned int i;
+
+ for (i = 0; i < len; i++)
+ tegra_smmu_pte_put_use(as, iova + i * PAGE_SIZE);
+}
+
static void tegra_smmu_set_pte(struct tegra_smmu_as *as, unsigned long iova,
u32 *pte, dma_addr_t pte_dma, u32 val)
{
- struct tegra_smmu *smmu = as->smmu;
- unsigned long offset = offset_in_page(pte);
-
*pte = val;
+}
+
+static void tegra_smmu_flush_pte_range(struct tegra_smmu_as *as,
+ unsigned long iova, unsigned int num_ptes, u32 *pte,
+ dma_addr_t pt_dma)
+{
+ struct tegra_smmu *smmu = as->smmu;
+ unsigned int tlb_lines_per_atom;
+ unsigned int ptes_per_tlb_line;
+ unsigned int ptes_per_atom;
+ unsigned long offset;
+ unsigned long iova_end;
+ int i;
+
+ ptes_per_atom = smmu->mc->soc->atom_size / sizeof(*pte);
+ ptes_per_tlb_line = smmu->soc->tlb_line_size / sizeof(*pte);
+ tlb_lines_per_atom = smmu->mc->soc->atom_size
+ / smmu->soc->tlb_line_size;
+
+ offset = round_down(offset_in_page(pte), smmu->mc->soc->atom_size);
+
+ iova_end = iova + num_ptes * PAGE_SIZE;
+ iova = round_down(iova, ptes_per_atom * PAGE_SIZE);
+ iova_end = round_up(iova_end, ptes_per_atom * PAGE_SIZE);
+ num_ptes = (iova_end - iova) / PAGE_SIZE;
+ while (num_ptes) {
+ smmu_flush_ptc(smmu, pt_dma, offset);
+ for (i = 0; i < tlb_lines_per_atom; i++) {
+ smmu_flush_tlb_group(smmu, as->id, iova);
+ iova += ptes_per_tlb_line * PAGE_SIZE;
+ }
+ offset += smmu->mc->soc->atom_size;
+ num_ptes -= ptes_per_atom;
+ }
- dma_sync_single_range_for_device(smmu->dev, pte_dma, offset,
- 4, DMA_TO_DEVICE);
- smmu_flush_ptc(smmu, pte_dma, offset);
- smmu_flush_tlb_group(smmu, as->id, iova);
smmu_flush(smmu);
}
@@ -662,11 +698,49 @@ static size_t tegra_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
return 0;
tegra_smmu_set_pte(as, iova, pte, pte_dma, 0);
- tegra_smmu_pte_put_use(as, iova);
return size;
}
+static void tegra_smmu_flush(struct iommu_domain *domain, unsigned long iova,
+ size_t size)
+{
+ struct tegra_smmu_as *as = to_smmu_as(domain);
+ struct tegra_smmu *smmu = as->smmu;
+ u32 num = size >> PAGE_SHIFT;
+
+ might_sleep();
+
+ while (num) {
+ unsigned int pt_index = iova_pt_index(iova);
+ unsigned int len, end;
+ unsigned long offset;
+ dma_addr_t pte_dma;
+ u32 *pte;
+
+ end = pt_index + num;
+ if (end > SMMU_NUM_PTE)
+ end = SMMU_NUM_PTE;
+ len = end - pt_index;
+
+ pte = tegra_smmu_pte_lookup(as, iova, &pte_dma);
+ if (!pte)
+ goto next_pde;
+
+ offset = offset_in_page(pte);
+ dma_sync_single_range_for_device(smmu->dev, pte_dma, offset,
+ sizeof(*pte) * len, DMA_TO_DEVICE);
+
+ tegra_smmu_flush_pte_range(as, iova, len, pte, pte_dma);
+ if (*pte == 0)
+ tegra_smmu_pte_put_use_range(as, iova, len);
+
+next_pde:
+ num -= len;
+ iova += len << PAGE_SHIFT;
+ }
+}
+
static phys_addr_t tegra_smmu_iova_to_phys(struct iommu_domain *domain,
dma_addr_t iova)
{
@@ -743,6 +817,7 @@ static const struct iommu_ops tegra_smmu_ops = {
.map = tegra_smmu_map,
.unmap = tegra_smmu_unmap,
.map_sg = default_iommu_map_sg,
+ .flush = tegra_smmu_flush,
.iova_to_phys = tegra_smmu_iova_to_phys,
.pgsize_bitmap = SZ_4K,
--
2.6.0.rc2.230.g3dd15c0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
` (2 preceding siblings ...)
2015-09-29 5:25 ` [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing Tomasz Figa
@ 2015-09-29 9:27 ` Thierry Reding
[not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 12:20 ` Joerg Roedel
2015-09-29 14:20 ` Robin Murphy
5 siblings, 1 reply; 17+ messages in thread
From: Thierry Reding @ 2015-09-29 9:27 UTC (permalink / raw)
To: Tomasz Figa
Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann,
Tomeu Vizoso, Stephen Warren, Antonios Motakis,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Mikko Perttunen,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nicolas Iooss,
Russell King, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Vince Hsu
[-- Attachment #1.1: Type: text/plain, Size: 4776 bytes --]
On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote:
> Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
> iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
> page by page, however it involves flushing the caches (CPU and IOMMU) for
> every mapped page separately, which is unsuitable for use cases that
> require low mapping latency. Similarly iommu_unmap(), even though it
> takes a full IOVA range as its argument, performs unmapping in a page
> by page manner.
>
> To make mapping operation more suitable for such use cases, iommu_map_sg()
> and .map_sg() callback in iommu_ops struct were introduced, which allowed
> particular IOMMU drivers to directly iterate over SG entries, create
> necessary mappings and flush everything in one go.
>
> This approach, however, has two drawbacks:
> 1) it does not do anything about unmap performance,
> 2) it requires each driver willing to have fast map to implement its
> own SG iteration code, even though this is a mostly generic operation.
>
> This series tries to mitigate the two issues above, while acknowledging
> the fact that the .map_sg() callback might be still necessary for some
> specific platforms, which could have the need to iterate over SG elements
> inside driver code. Proposed solution introduces a new .flush() callback,
> which expects IOVA range as its argument and is expected to flush all
> respective caches (be it CPU, IOMMU TLB or whatever) to make the given
> IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
> map/unmap operations are modified to call the .flush() callback at the end
> of the operation.
>
> Advantages of proposed approach include:
> 1) ability to use default_iommu_map_sg() helper if all the driver needs
> for performance optimization is batching the flush,
> 2) completely no effect on existing code - the .flush() callback is made
> optional and if it isn't implemented drivers are expected to do
> necessary flushes on a page by page basis in respective (un)mapping
> callbakcs,
> 3) possibility of exporting the iommu_flush() operation and providing
> unsynchronized map/unmap operations for subsystems with even higher
> requirements for performance (e.g. drivers/gpu/drm).
That would require passing in some sort of flag that the core shouldn't
be flushing itself, right? Currently it would flush on every map/unmap.
>
> The series includes a generic patch implementing necessary changes in
> IOMMU API and two Tegra-specific patches that demonstrate implementation
> on driver side and which can be used for further testing.
>
> Last, but not least, some performance numbers on Tegra210:
> +-----------+--------------+-------------+------------+
> | Operation | Size [bytes] | Before [us] | After [us] |
> +-----------+--------------+-------------+------------+
> | Map | 128K | 139 | 40 |
> | | | 136 | 34 |
> | | | 137 | 38 |
> | | | 136 | 36 |
> | | 4M | 3939 | 1163 |
> | | | 3730 | 2389 |
> | | | 3613 | 997 |
> | | | 3622 | 1620 |
> | | ~18M | 18635 | 4741 |
> | | | 19261 | 6550 |
> | | | 18473 | 9304 |
> | | | 18125 | 5120 |
> | Unmap | 128K | 128 | 7 |
> | | | 122 | 8 |
> | | | 119 | 10 |
> | | | 123 | 12 |
> | | 4M | 3829 | 151 |
> | | | 3964 | 150 |
> | | | 3908 | 145 |
> | | | 3875 | 155 |
> | | ~18M | 18570 | 683 |
> | | | 18473 | 806 |
> | | | 21020 | 643 |
> | | | 21764 | 652 |
> +-----------+--------------+-------------+------------+
> The values are obtained by surrounding the calls to iommu_map_sg()
> (with default_iommu_map_sg() helper used as .map_sg() callback) and
> iommu_unmap() with ktime-based time measurement code. Taken 4 samples
> of every buffer size. ~18M means around 17-19M due do the variance
> in requested buffer sizes.
Those are pretty impressive numbers.
Thierry
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] iommu: Add support for out of band flushing
[not found] ` <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
@ 2015-09-29 9:32 ` Thierry Reding
[not found] ` <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Thierry Reding @ 2015-09-29 9:32 UTC (permalink / raw)
To: Tomasz Figa
Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann,
Tomeu Vizoso, Stephen Warren, Antonios Motakis,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Mikko Perttunen,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nicolas Iooss,
Russell King, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Vince Hsu
[-- Attachment #1.1: Type: text/plain, Size: 2148 bytes --]
On Tue, Sep 29, 2015 at 02:25:24PM +0900, Tomasz Figa wrote:
> This patch adds a new "flush" callback to iommu_ops, which is supposed
> to perform any necessary flushes within given IOMMU domain to make any
> changes to mappings of given area [iova; iova + size) be reflected to
> IOMMU clients.
>
> The purpose is to let IOMMU drivers skip page-by-page flushes and
> replace it with one flush of full address range on devices which support
> it.
>
> Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> ---
> drivers/iommu/iommu.c | 33 ++++++++++++++++++++++++++++++---
> include/linux/iommu.h | 2 ++
> 2 files changed, 32 insertions(+), 3 deletions(-)
I seem to remember that Rob Clark had proposed something like this back
before it was decided to introduce the ->map_sg() callback instead. I
can't find a reference to the discussion, so perhaps I'm misremembering
but adding Rob Clark in case he has any recollection of why the outcome
was what it was.
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
[...]
> @@ -1389,6 +1410,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
> unmapped += unmapped_page;
> }
>
> + iommu_flush(domain, orig_iova, unmapped);
> trace_unmap(orig_iova, size, unmapped);
> return unmapped;
> }
> @@ -1419,19 +1441,24 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
> if (!IS_ALIGNED(s->offset, min_pagesz))
> goto out_err;
>
> - ret = iommu_map(domain, iova + mapped, phys, s->length, prot);
> + ret = __iommu_map(domain, iova + mapped, phys, s->length, prot);
> if (ret)
> goto out_err;
>
> mapped += s->length;
> }
>
> + iommu_flush(domain, iova, mapped);
> +
> return mapped;
>
> out_err:
> /* undo mappings already done */
> iommu_unmap(domain, iova, mapped);
>
> + /* flush in case part of our mapping already got cached */
> + iommu_flush(domain, iova, mapped);
> +
> return 0;
>
> }
iommu_unmap() already does an iommu_flush(), so why flush again after
iommu_unmap()?
Thierry
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] memory: tegra: add TLB cache line size
[not found] ` <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
@ 2015-09-29 9:43 ` Thierry Reding
[not found] ` <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Thierry Reding @ 2015-09-29 9:43 UTC (permalink / raw)
To: Tomasz Figa
Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann,
Tomeu Vizoso, Stephen Warren, Antonios Motakis,
linux-kernel-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Mikko Perttunen,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nicolas Iooss,
Russell King, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Vince Hsu
[-- Attachment #1.1: Type: text/plain, Size: 1966 bytes --]
On Tue, Sep 29, 2015 at 02:25:25PM +0900, Tomasz Figa wrote:
> From: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>
> This patch adds SMMU line size to Tegra SoC data struct to enable SMMU
> driver to use this knowledge in code added by further patch.
I think the line size should either be added in the same patch that adds
the feature which uses it, or the commit message should describe what
purpose it will be used for. As it is this commit message leaves too
many questions unanswered.
> Also add the missing TLB line number for Tegra124.
>
> Signed-off-by: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> [tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org: Rebased, revised commit message.]
> Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
> diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c
> index 21e7255..ff12487 100644
> --- a/drivers/memory/tegra/tegra124.c
> +++ b/drivers/memory/tegra/tegra124.c
> @@ -1007,6 +1007,8 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = {
> .num_swgroups = ARRAY_SIZE(tegra124_swgroups),
> .supports_round_robin_arbitration = true,
> .supports_request_limit = true,
> + .num_tlb_lines = 32,
> + .tlb_line_size = 32,
> .num_asids = 128,
> };
Oh my... try to fix one platform and break another. Fortunately it seems
like Tegra124 copes much better without TLB because I'm not seeing any
buffer underruns or similar on Tegra124 without this fix.
Anyway, this change is completely unrelated and fixes a regression (even
though it might not be noticeable in many use-cases), so can you please
split it out into a separate patch and add a
Fixes: 11cec15bf3fb ("iommu/tegra-smmu: Parameterize number of TLB lines")
line to it? That patch went into v4.3-rc1 and it'd be nice to get this
fix in before the final v4.3. Feel free to add my Acked-by/Reviewed-by
as well.
Thierry
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
@ 2015-09-29 11:54 ` Tomasz Figa
2015-09-29 12:22 ` Joerg Roedel
1 sibling, 0 replies; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 11:54 UTC (permalink / raw)
To: Thierry Reding
Cc: open list:IOMMU DRIVERS, Joerg Roedel, Hiroshi Doyu,
Stephen Warren, Alexandre Courbot, Vince Hsu, Russell King,
Paul Walmsley, Tomeu Vizoso, Mikko Perttunen, Will Deacon,
Alex Williamson, Arnd Bergmann, Marek Szyprowski,
Antonios Motakis, Olav Haugan, Nicolas Iooss,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
On Tue, Sep 29, 2015 at 6:27 PM, Thierry Reding
<thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>
> On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote:
> > Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
> > iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
> > page by page, however it involves flushing the caches (CPU and IOMMU) for
> > every mapped page separately, which is unsuitable for use cases that
> > require low mapping latency. Similarly iommu_unmap(), even though it
> > takes a full IOVA range as its argument, performs unmapping in a page
> > by page manner.
> >
> > To make mapping operation more suitable for such use cases, iommu_map_sg()
> > and .map_sg() callback in iommu_ops struct were introduced, which allowed
> > particular IOMMU drivers to directly iterate over SG entries, create
> > necessary mappings and flush everything in one go.
> >
> > This approach, however, has two drawbacks:
> > 1) it does not do anything about unmap performance,
> > 2) it requires each driver willing to have fast map to implement its
> > own SG iteration code, even though this is a mostly generic operation.
> >
> > This series tries to mitigate the two issues above, while acknowledging
> > the fact that the .map_sg() callback might be still necessary for some
> > specific platforms, which could have the need to iterate over SG elements
> > inside driver code. Proposed solution introduces a new .flush() callback,
> > which expects IOVA range as its argument and is expected to flush all
> > respective caches (be it CPU, IOMMU TLB or whatever) to make the given
> > IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
> > map/unmap operations are modified to call the .flush() callback at the end
> > of the operation.
> >
> > Advantages of proposed approach include:
> > 1) ability to use default_iommu_map_sg() helper if all the driver needs
> > for performance optimization is batching the flush,
> > 2) completely no effect on existing code - the .flush() callback is made
> > optional and if it isn't implemented drivers are expected to do
> > necessary flushes on a page by page basis in respective (un)mapping
> > callbakcs,
> > 3) possibility of exporting the iommu_flush() operation and providing
> > unsynchronized map/unmap operations for subsystems with even higher
> > requirements for performance (e.g. drivers/gpu/drm).
>
> That would require passing in some sort of flag that the core shouldn't
> be flushing itself, right? Currently it would flush on every map/unmap.
>
Are you asking about 3) in particular? If so, I was thinking about
iommu_map_noflush(), iommu_unmap_noflush(), which could be then
wrapped by iommu_map() with call to iommu_flush() added at the end.
>
> >
> > The series includes a generic patch implementing necessary changes in
> > IOMMU API and two Tegra-specific patches that demonstrate implementation
> > on driver side and which can be used for further testing.
> >
> > Last, but not least, some performance numbers on Tegra210:
> > +-----------+--------------+-------------+------------+
> > | Operation | Size [bytes] | Before [us] | After [us] |
> > +-----------+--------------+-------------+------------+
> > | Map | 128K | 139 | 40 |
> > | | | 136 | 34 |
> > | | | 137 | 38 |
> > | | | 136 | 36 |
> > | | 4M | 3939 | 1163 |
> > | | | 3730 | 2389 |
> > | | | 3613 | 997 |
> > | | | 3622 | 1620 |
> > | | ~18M | 18635 | 4741 |
> > | | | 19261 | 6550 |
> > | | | 18473 | 9304 |
> > | | | 18125 | 5120 |
> > | Unmap | 128K | 128 | 7 |
> > | | | 122 | 8 |
> > | | | 119 | 10 |
> > | | | 123 | 12 |
> > | | 4M | 3829 | 151 |
> > | | | 3964 | 150 |
> > | | | 3908 | 145 |
> > | | | 3875 | 155 |
> > | | ~18M | 18570 | 683 |
> > | | | 18473 | 806 |
> > | | | 21020 | 643 |
> > | | | 21764 | 652 |
> > +-----------+--------------+-------------+------------+
> > The values are obtained by surrounding the calls to iommu_map_sg()
> > (with default_iommu_map_sg() helper used as .map_sg() callback) and
> > iommu_unmap() with ktime-based time measurement code. Taken 4 samples
> > of every buffer size. ~18M means around 17-19M due do the variance
> > in requested buffer sizes.
>
> Those are pretty impressive numbers.
I was surprised myself that there is so much difference on this
platform, but it seems to converge with downstream kernel. Moreover we
can supposedly get even better results by simply invalidating full TLB
above some length threshold.
Best regards,
Tomasz
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 1/3] iommu: Add support for out of band flushing
[not found] ` <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
@ 2015-09-29 11:56 ` Tomasz Figa
0 siblings, 0 replies; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 11:56 UTC (permalink / raw)
To: Thierry Reding
Cc: open list:IOMMU DRIVERS, Joerg Roedel, Hiroshi Doyu,
Stephen Warren, Alexandre Courbot, Vince Hsu, Russell King,
Paul Walmsley, Mikko Perttunen, Tomeu Vizoso, Will Deacon,
Alex Williamson, Marek Szyprowski, Arnd Bergmann,
Antonios Motakis, Olav Haugan, Nicolas Iooss,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-tegra-u79uwXL29TY76Z2rM5mHXA, Rob Clark
On Tue, Sep 29, 2015 at 6:32 PM, Thierry Reding
<thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Sep 29, 2015 at 02:25:24PM +0900, Tomasz Figa wrote:
>> This patch adds a new "flush" callback to iommu_ops, which is supposed
>> to perform any necessary flushes within given IOMMU domain to make any
>> changes to mappings of given area [iova; iova + size) be reflected to
>> IOMMU clients.
>>
>> The purpose is to let IOMMU drivers skip page-by-page flushes and
>> replace it with one flush of full address range on devices which support
>> it.
>>
>> Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>> ---
>> drivers/iommu/iommu.c | 33 ++++++++++++++++++++++++++++++---
>> include/linux/iommu.h | 2 ++
>> 2 files changed, 32 insertions(+), 3 deletions(-)
>
> I seem to remember that Rob Clark had proposed something like this back
> before it was decided to introduce the ->map_sg() callback instead. I
> can't find a reference to the discussion, so perhaps I'm misremembering
> but adding Rob Clark in case he has any recollection of why the outcome
> was what it was.
Oops, I was supposed to add Rob, but forgot in the end. Thanks for remembering.
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> [...]
>> @@ -1389,6 +1410,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size)
>> unmapped += unmapped_page;
>> }
>>
>> + iommu_flush(domain, orig_iova, unmapped);
>> trace_unmap(orig_iova, size, unmapped);
>> return unmapped;
>> }
>> @@ -1419,19 +1441,24 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova,
>> if (!IS_ALIGNED(s->offset, min_pagesz))
>> goto out_err;
>>
>> - ret = iommu_map(domain, iova + mapped, phys, s->length, prot);
>> + ret = __iommu_map(domain, iova + mapped, phys, s->length, prot);
>> if (ret)
>> goto out_err;
>>
>> mapped += s->length;
>> }
>>
>> + iommu_flush(domain, iova, mapped);
>> +
>> return mapped;
>>
>> out_err:
>> /* undo mappings already done */
>> iommu_unmap(domain, iova, mapped);
>>
>> + /* flush in case part of our mapping already got cached */
>> + iommu_flush(domain, iova, mapped);
>> +
>> return 0;
>>
>> }
>
> iommu_unmap() already does an iommu_flush(), so why flush again after
> iommu_unmap()?
Right, my mistake. This should be removed. Thanks for catching.
Best regards,
Tomasz
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 2/3] memory: tegra: add TLB cache line size
[not found] ` <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
@ 2015-09-29 12:11 ` Tomasz Figa
0 siblings, 0 replies; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 12:11 UTC (permalink / raw)
To: Thierry Reding
Cc: open list:IOMMU DRIVERS, Vince Hsu, Joerg Roedel, Hiroshi Doyu,
Stephen Warren, Alexandre Courbot, Russell King, Paul Walmsley,
Mikko Perttunen, Tomeu Vizoso, Will Deacon, Alex Williamson,
Arnd Bergmann, Antonios Motakis, Nicolas Iooss, Olav Haugan,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
On Tue, Sep 29, 2015 at 6:43 PM, Thierry Reding
<thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Tue, Sep 29, 2015 at 02:25:25PM +0900, Tomasz Figa wrote:
>> From: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>>
>> This patch adds SMMU line size to Tegra SoC data struct to enable SMMU
>> driver to use this knowledge in code added by further patch.
>
> I think the line size should either be added in the same patch that adds
> the feature which uses it, or the commit message should describe what
> purpose it will be used for. As it is this commit message leaves too
> many questions unanswered.
100% agreed. Anyway I just put this patch in this RFC quickly so that
the whole series can be applied and tested in somebody wants to do so.
I'll update commit message in next version.
>
>> Also add the missing TLB line number for Tegra124.
>>
>> Signed-off-by: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>> [tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org: Rebased, revised commit message.]
>> Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
>
>> diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c
>> index 21e7255..ff12487 100644
>> --- a/drivers/memory/tegra/tegra124.c
>> +++ b/drivers/memory/tegra/tegra124.c
>> @@ -1007,6 +1007,8 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = {
>> .num_swgroups = ARRAY_SIZE(tegra124_swgroups),
>> .supports_round_robin_arbitration = true,
>> .supports_request_limit = true,
>> + .num_tlb_lines = 32,
>> + .tlb_line_size = 32,
>> .num_asids = 128,
>> };
>
> Oh my... try to fix one platform and break another. Fortunately it seems
> like Tegra124 copes much better without TLB because I'm not seeing any
> buffer underruns or similar on Tegra124 without this fix.
>
> Anyway, this change is completely unrelated and fixes a regression (even
> though it might not be noticeable in many use-cases), so can you please
> split it out into a separate patch and add a
>
> Fixes: 11cec15bf3fb ("iommu/tegra-smmu: Parameterize number of TLB lines")
>
> line to it? That patch went into v4.3-rc1 and it'd be nice to get this
> fix in before the final v4.3. Feel free to add my Acked-by/Reviewed-by
> as well.
Sounds good to me. We also have a patch adding locking around page
table get/put in map/unmap, which fixes a race between concurrent maps
and unmaps within the same page table. I'll send both soon.
Best regards,
Tomasz
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
` (3 preceding siblings ...)
2015-09-29 9:27 ` [RFC PATCH 0/3] iommu: Add range flush operation Thierry Reding
@ 2015-09-29 12:20 ` Joerg Roedel
2015-09-29 14:20 ` Robin Murphy
5 siblings, 0 replies; 17+ messages in thread
From: Joerg Roedel @ 2015-09-29 12:20 UTC (permalink / raw)
To: Tomasz Figa
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Hiroshi Doyu,
Stephen Warren, Thierry Reding, Alexandre Courbot, Vince Hsu,
Russell King, Paul Walmsley, Tomeu Vizoso, Mikko Perttunen,
Will Deacon, Alex Williamson, Arnd Bergmann, Marek Szyprowski,
Antonios Motakis, Olav Haugan, Nicolas Iooss,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
Hi Tomasz,
On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote:
> This series tries to mitigate the two issues above, while acknowledging
> the fact that the .map_sg() callback might be still necessary for some
> specific platforms, which could have the need to iterate over SG elements
> inside driver code. Proposed solution introduces a new .flush() callback,
> which expects IOVA range as its argument and is expected to flush all
> respective caches (be it CPU, IOMMU TLB or whatever) to make the given
> IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
> map/unmap operations are modified to call the .flush() callback at the end
> of the operation.
>
> Advantages of proposed approach include:
> 1) ability to use default_iommu_map_sg() helper if all the driver needs
> for performance optimization is batching the flush,
> 2) completely no effect on existing code - the .flush() callback is made
> optional and if it isn't implemented drivers are expected to do
> necessary flushes on a page by page basis in respective (un)mapping
> callbakcs,
> 3) possibility of exporting the iommu_flush() operation and providing
> unsynchronized map/unmap operations for subsystems with even higher
> requirements for performance (e.g. drivers/gpu/drm).
Thanks for the patches, I really like the idea. The VT-d driver probably
also benefits a lot from this.
In the past I also proposed something like this, it was called a new
iommu_commit() API function for making changes from map/unmap visible to
the hardware. But this requires updating all callers first, so your
approach of doing an implicit flush at the end of map/unmap is better.
Joerg
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 11:54 ` Tomasz Figa
@ 2015-09-29 12:22 ` Joerg Roedel
1 sibling, 0 replies; 17+ messages in thread
From: Joerg Roedel @ 2015-09-29 12:22 UTC (permalink / raw)
To: Thierry Reding
Cc: Tomasz Figa, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Hiroshi Doyu, Stephen Warren, Alexandre Courbot, Vince Hsu,
Russell King, Paul Walmsley, Tomeu Vizoso, Mikko Perttunen,
Will Deacon, Alex Williamson, Arnd Bergmann, Marek Szyprowski,
Antonios Motakis, Olav Haugan, Nicolas Iooss,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
On Tue, Sep 29, 2015 at 11:27:14AM +0200, Thierry Reding wrote:
> On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote:
> > 3) possibility of exporting the iommu_flush() operation and providing
> > unsynchronized map/unmap operations for subsystems with even higher
> > requirements for performance (e.g. drivers/gpu/drm).
>
> That would require passing in some sort of flag that the core shouldn't
> be flushing itself, right? Currently it would flush on every map/unmap.
Not necessarily. Introducing a flag would require updating all callers,
so while at it we could also just change the semantics of map/unmap to
require an explicit flush afterwards (only if it turns out that such an
iommu_flush() function is really needed).
Joerg
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
` (4 preceding siblings ...)
2015-09-29 12:20 ` Joerg Roedel
@ 2015-09-29 14:20 ` Robin Murphy
[not found] ` <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org>
5 siblings, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2015-09-29 14:20 UTC (permalink / raw)
To: Tomasz Figa,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org
Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann,
Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Thierry Reding, Nicolas Iooss, Russell King, Vince Hsu,
Mikko Perttunen
Hi Tomasz,
On 29/09/15 06:25, Tomasz Figa wrote:
> Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
> iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
> page by page, however it involves flushing the caches (CPU and IOMMU) for
> every mapped page separately, which is unsuitable for use cases that
> require low mapping latency. Similarly iommu_unmap(), even though it
> takes a full IOVA range as its argument, performs unmapping in a page
> by page manner.
This isn't necessarily the general case, though. If the IOMMU has a
coherent page table walk interface and its architecture prohibits
caching invalid PTEs, then the overhead on an unmap is only a TLB
invalidation and the overhead on a map is nothing.
> To make mapping operation more suitable for such use cases, iommu_map_sg()
> and .map_sg() callback in iommu_ops struct were introduced, which allowed
> particular IOMMU drivers to directly iterate over SG entries, create
> necessary mappings and flush everything in one go.
>
> This approach, however, has two drawbacks:
> 1) it does not do anything about unmap performance,
> 2) it requires each driver willing to have fast map to implement its
> own SG iteration code, even though this is a mostly generic operation.
>
> This series tries to mitigate the two issues above, while acknowledging
> the fact that the .map_sg() callback might be still necessary for some
> specific platforms, which could have the need to iterate over SG elements
> inside driver code. Proposed solution introduces a new .flush() callback,
> which expects IOVA range as its argument and is expected to flush all
> respective caches (be it CPU, IOMMU TLB or whatever) to make the given
> IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
> map/unmap operations are modified to call the .flush() callback at the end
> of the operation.
>
> Advantages of proposed approach include:
> 1) ability to use default_iommu_map_sg() helper if all the driver needs
> for performance optimization is batching the flush,
> 2) completely no effect on existing code - the .flush() callback is made
> optional and if it isn't implemented drivers are expected to do
> necessary flushes on a page by page basis in respective (un)mapping
> callbakcs,
> 3) possibility of exporting the iommu_flush() operation and providing
> unsynchronized map/unmap operations for subsystems with even higher
> requirements for performance (e.g. drivers/gpu/drm).
A single callback doesn't really generalise well enough: If we wanted to
implement this in the ARM SMMU drivers to optimise the unmap() case [ask
Will how long he spends waiting for a software model to tear down an
entire VFIO domain invalidating one page at a time ;)], then we'd either
regress performance in the map() case with an unnecessary TLB flush, or
have to do a table walk in every flush() call to infer what actually
needs doing.
Personally I think it would be nicest to have two separate callbacks,
e.g. .map_sync/.unmap_sync, but at the very least some kind of
additional 'direction' kind of parameter would be necessary.
> The series includes a generic patch implementing necessary changes in
> IOMMU API and two Tegra-specific patches that demonstrate implementation
> on driver side and which can be used for further testing.
>
> Last, but not least, some performance numbers on Tegra210:
> +-----------+--------------+-------------+------------+
> | Operation | Size [bytes] | Before [us] | After [us] |
> +-----------+--------------+-------------+------------+
> | Map | 128K | 139 | 40 |
> | | | 136 | 34 |
> | | | 137 | 38 |
> | | | 136 | 36 |
> | | 4M | 3939 | 1163 |
> | | | 3730 | 2389 |
> | | | 3613 | 997 |
> | | | 3622 | 1620 |
> | | ~18M | 18635 | 4741 |
> | | | 19261 | 6550 |
> | | | 18473 | 9304 |
> | | | 18125 | 5120 |
> | Unmap | 128K | 128 | 7 |
> | | | 122 | 8 |
> | | | 119 | 10 |
> | | | 123 | 12 |
> | | 4M | 3829 | 151 |
> | | | 3964 | 150 |
> | | | 3908 | 145 |
> | | | 3875 | 155 |
> | | ~18M | 18570 | 683 |
> | | | 18473 | 806 |
> | | | 21020 | 643 |
> | | | 21764 | 652 |
> +-----------+--------------+-------------+------------+
> The values are obtained by surrounding the calls to iommu_map_sg()
> (with default_iommu_map_sg() helper used as .map_sg() callback) and
> iommu_unmap() with ktime-based time measurement code. Taken 4 samples
> of every buffer size. ~18M means around 17-19M due do the variance
> in requested buffer sizes.
It would be interesting to know how much of the gain here is due to
batching up the TLB maintenance vs. doing the DMA sync in fewer total
pieces (with correspondingly fewer barriers). For the drivers which use
the io-pgtable framework, trying to do anything about the latter (where
it's relevant) looks essentially impossible without rewriting the whole
thing, but the former is definitely something we should be able to
handle and benefit from.
It certainly seems like a reasonable way to get closer to the kind of
iommu_map_range()/iommu_unmap_range() operations proposed before, but
with less trampling on the external API. Plus it's nicer than the
alternative workaround of having the driver claim anything larger than
your basic page size is valid so you can batch up most of your pagetable
updates behind the API's back.
Robin.
> Tomasz Figa (2):
> iommu: Add support for out of band flushing
> iommu/tegra-smmu: Make the driver use out of band flushing
>
> Vince Hsu (1):
> memory: tegra: add TLB cache line size
>
> drivers/iommu/iommu.c | 33 +++++++++++++--
> drivers/iommu/tegra-smmu.c | 91 +++++++++++++++++++++++++++++++++++++----
> drivers/memory/tegra/tegra114.c | 1 +
> drivers/memory/tegra/tegra124.c | 3 ++
> drivers/memory/tegra/tegra210.c | 1 +
> drivers/memory/tegra/tegra30.c | 1 +
> include/linux/iommu.h | 2 +
> include/soc/tegra/mc.h | 1 +
> 8 files changed, 122 insertions(+), 11 deletions(-)
>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org>
@ 2015-09-29 14:32 ` Russell King - ARM Linux
[not found] ` <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2015-09-29 14:32 UTC (permalink / raw)
To: Robin Murphy
Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Tomeu Vizoso,
Arnd Bergmann, Stephen Warren, Will Deacon,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tomasz Figa,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Thierry Reding, Vince Hsu, Nicolas Iooss,
linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Antonios Motakis, Mikko Perttunen
On Tue, Sep 29, 2015 at 03:20:38PM +0100, Robin Murphy wrote:
> A single callback doesn't really generalise well enough: If we wanted to
> implement this in the ARM SMMU drivers to optimise the unmap() case [ask
> Will how long he spends waiting for a software model to tear down an entire
> VFIO domain invalidating one page at a time ;)], then we'd either regress
> performance in the map() case with an unnecessary TLB flush, or have to do a
> table walk in every flush() call to infer what actually needs doing.
And this is the problem of frameworks. They get in the way of doing
things efficiently.
Fine, we have the DMA ops, and that calls a map_sg() method. What we
then need is to have a series of standardised library functions which
can be called to perform various actions.
Consider this: an IOMMU driver gets the raw scatterlist which the
driver passed. The IOMMU driver walks the scatterlist, creating the
IOMMU side mapping, and writing the device DMA addresses and DMA lengths
to the scatterlist, possibly coalescing some of the entries. It
remembers the number of scatterlist entries that the DMA operation now
requires. The IOMMU code can setup whatever mappings it wants using
whatever sizes it wants to satisfy the requested scatterlist.
It then goes on to call the arch backend with the original scatterlist,
asking it to _only_ deal with the CPU coherency for the mapping. The
arch code walks the scatterlist again, this time dealing with the CPU
coherency part.
Finally, the IOMMU code returns the number of DMA scatterlist entries.
When it comes to tearing it down, it's a similar operation to the above,
except reversing those actions.
The only issue with this approach is that it opens up some of the cache
handling to the entire kernel, and that will be _too_ big a target for
idiotic driver writers to think they have permission to directly use
those interfaces. To solve this, I'd love to be able to have the linker
link together certain objects in the kernel build, and then convert some
global symbols to be local symbols, thus denying access to functions that
driver authors have no business what so ever touching.
> Personally I think it would be nicest to have two separate callbacks, e.g.
> .map_sync/.unmap_sync, but at the very least some kind of additional
> 'direction' kind of parameter would be necessary.
No, not more callbacks - that's the framework thinking, not the library
thinking.
--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
@ 2015-09-29 16:27 ` Robin Murphy
[not found] ` <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Robin Murphy @ 2015-09-29 16:27 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Tomasz Figa,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann,
Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Thierry Reding, Nicolas Iooss, Vince Hsu, Mikko Perttunen
On 29/09/15 15:32, Russell King - ARM Linux wrote:
> On Tue, Sep 29, 2015 at 03:20:38PM +0100, Robin Murphy wrote:
>> A single callback doesn't really generalise well enough: If we wanted to
>> implement this in the ARM SMMU drivers to optimise the unmap() case [ask
>> Will how long he spends waiting for a software model to tear down an entire
>> VFIO domain invalidating one page at a time ;)], then we'd either regress
>> performance in the map() case with an unnecessary TLB flush, or have to do a
>> table walk in every flush() call to infer what actually needs doing.
>
> And this is the problem of frameworks. They get in the way of doing
> things efficiently.
>
> Fine, we have the DMA ops, and that calls a map_sg() method. What we
> then need is to have a series of standardised library functions which
> can be called to perform various actions.
> Consider this: an IOMMU driver gets the raw scatterlist which the
> driver passed. The IOMMU driver walks the scatterlist, creating the
> IOMMU side mapping, and writing the device DMA addresses and DMA lengths
> to the scatterlist, possibly coalescing some of the entries. It
> remembers the number of scatterlist entries that the DMA operation now
> requires. The IOMMU code can setup whatever mappings it wants using
... and making that elided "setup whatever mappings it wants" step more
efficient is the sole thing that this patch set is trying to address. I
apologise for not really following what you're getting at here.
> whatever sizes it wants to satisfy the requested scatterlist.
>
> It then goes on to call the arch backend with the original scatterlist,
> asking it to _only_ deal with the CPU coherency for the mapping. The
> arch code walks the scatterlist again, this time dealing with the CPU
> coherency part.
>
> Finally, the IOMMU code returns the number of DMA scatterlist entries.
>
> When it comes to tearing it down, it's a similar operation to the above,
> except reversing those actions.
>
> The only issue with this approach is that it opens up some of the cache
> handling to the entire kernel, and that will be _too_ big a target for
> idiotic driver writers to think they have permission to directly use
> those interfaces. To solve this, I'd love to be able to have the linker
> link together certain objects in the kernel build, and then convert some
> global symbols to be local symbols, thus denying access to functions that
> driver authors have no business what so ever touching.
>
>> Personally I think it would be nicest to have two separate callbacks, e.g.
>> .map_sync/.unmap_sync, but at the very least some kind of additional
>> 'direction' kind of parameter would be necessary.
>
> No, not more callbacks - that's the framework thinking, not the library
> thinking.
Eh, swings and roundabouts. An argument denoting whether the flush is
being called on the map or unmap path would be fine, it just means some
implementations will be doing an extra no-op function call half the
time. On closer inspection, the code in patch 3 _is_ using a table walk
to figure out if the IOVA has been mapped or unmapped, it just happens
that this particular implementation needs to do that walk anyway to sync
the PTE updates, so gets away with it.
Robin.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org>
@ 2015-09-29 16:40 ` Russell King - ARM Linux
[not found] ` <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Russell King - ARM Linux @ 2015-09-29 16:40 UTC (permalink / raw)
To: Robin Murphy
Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Tomeu Vizoso,
Arnd Bergmann, Stephen Warren, Will Deacon,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tomasz Figa,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Thierry Reding, Vince Hsu, Nicolas Iooss,
linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Antonios Motakis, Mikko Perttunen
On Tue, Sep 29, 2015 at 05:27:12PM +0100, Robin Murphy wrote:
> Eh, swings and roundabouts. An argument denoting whether the flush is being
> called on the map or unmap path would be fine,
Sorry, that statement is wrong. It's not about whether you flush before
or after the DMA operation. I'm afraid I'm probably going to tell you
how to suck eggs here, because I don't think you quite "get it" with
non-dma-coherent modern CPUs.
Modern CPUs prefetch data into their caches, and they also randomly write
back data from their caches to memory. When performing a DMA operation
from device to memory, you need to do two things with CPU caches which
aren't coherent:
1. Before starting the DMA operation, you need to walk over the memory to
be mapped, ensuring that any dirty cache lines are written back. This
is to prevent dirty cache lines overwriting data that has already been
DMA'd from the device.
2. After the DMA operation has completed, you need to walk over the
memory again, invalidating any cache lines which may have been
speculatively loaded from that memory while DMA was running. These
cache lines may have been loaded prior to the DMA operation placing
the new data into memory.
So, it's not a before-or-after, you have to always perform write-back
cache maintanence prior to any DMA operation, and then invalidate cache
maintanence after the DMA operation has completed for any mapping which
the DMA may have written to (which means device-to-memory and
bidirectional mappings.)
--
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation
[not found] ` <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
@ 2015-09-29 17:13 ` Robin Murphy
0 siblings, 0 replies; 17+ messages in thread
From: Robin Murphy @ 2015-09-29 17:13 UTC (permalink / raw)
To: Russell King - ARM Linux
Cc: Tomasz Figa,
iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann,
Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Thierry Reding, Nicolas Iooss, Vince Hsu, Mikko Perttunen
On 29/09/15 17:40, Russell King - ARM Linux wrote:
> On Tue, Sep 29, 2015 at 05:27:12PM +0100, Robin Murphy wrote:
>> Eh, swings and roundabouts. An argument denoting whether the flush is being
>> called on the map or unmap path would be fine,
>
> Sorry, that statement is wrong. It's not about whether you flush before
> or after the DMA operation. I'm afraid I'm probably going to tell you
> how to suck eggs here, because I don't think you quite "get it" with
> non-dma-coherent modern CPUs.
>
> Modern CPUs prefetch data into their caches, and they also randomly write
> back data from their caches to memory. When performing a DMA operation
> from device to memory, you need to do two things with CPU caches which
> aren't coherent:
>
> 1. Before starting the DMA operation, you need to walk over the memory to
> be mapped, ensuring that any dirty cache lines are written back. This
> is to prevent dirty cache lines overwriting data that has already been
> DMA'd from the device.
>
> 2. After the DMA operation has completed, you need to walk over the
> memory again, invalidating any cache lines which may have been
> speculatively loaded from that memory while DMA was running. These
> cache lines may have been loaded prior to the DMA operation placing
> the new data into memory.
>
> So, it's not a before-or-after, you have to always perform write-back
> cache maintanence prior to any DMA operation, and then invalidate cache
> maintanence after the DMA operation has completed for any mapping which
> the DMA may have written to (which means device-to-memory and
> bidirectional mappings.)
Yup, I'm well aware of all that; in fact you and I have already agreed
elsewhere that we can only really get away with using the streaming DMA
API to flush IOMMU page table updates _because_ they aren't written back
to, thus data only ever goes from CPU->IOMMU and we can skip the problem
of where to put an invalidation; you wrote the tegra-smmu code that does
this. The coherency of whatever device which made a DMA API call for
which the IOMMU API is creating/removing a mapping is irrelevant at this
point - this is the DMA operation within the DMA operation.
None of which has anything to do with the point I raised, which is that
if iommu_unmap() calls iommu_flush(), I want to issue TLB invalidations,
but if iommu_map() calls iommu_flush(), I don't.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2015-09-29 17:13 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-29 5:25 [RFC PATCH 0/3] iommu: Add range flush operation Tomasz Figa
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 5:25 ` [RFC PATCH 1/3] iommu: Add support for out of band flushing Tomasz Figa
[not found] ` <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 9:32 ` Thierry Reding
[not found] ` <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 11:56 ` Tomasz Figa
2015-09-29 5:25 ` [RFC PATCH 2/3] memory: tegra: add TLB cache line size Tomasz Figa
[not found] ` <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 9:43 ` Thierry Reding
[not found] ` <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 12:11 ` Tomasz Figa
2015-09-29 5:25 ` [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing Tomasz Figa
2015-09-29 9:27 ` [RFC PATCH 0/3] iommu: Add range flush operation Thierry Reding
[not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 11:54 ` Tomasz Figa
2015-09-29 12:22 ` Joerg Roedel
2015-09-29 12:20 ` Joerg Roedel
2015-09-29 14:20 ` Robin Murphy
[not found] ` <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org>
2015-09-29 14:32 ` Russell King - ARM Linux
[not found] ` <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-29 16:27 ` Robin Murphy
[not found] ` <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org>
2015-09-29 16:40 ` Russell King - ARM Linux
[not found] ` <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-29 17:13 ` Robin Murphy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).