* [RFC PATCH 0/3] iommu: Add range flush operation
@ 2015-09-29 5:25 Tomasz Figa
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
0 siblings, 1 reply; 17+ messages in thread
From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw)
To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
Cc: Tomasz Figa, Joerg Roedel, Hiroshi Doyu, Stephen Warren,
Thierry Reding, Alexandre Courbot, Vince Hsu, Russell King,
Paul Walmsley, Tomeu Vizoso, Mikko Perttunen, Will Deacon,
Alex Williamson, Arnd Bergmann, Marek Szyprowski,
Antonios Motakis, Olav Haugan, Nicolas Iooss,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
linux-tegra-u79uwXL29TY76Z2rM5mHXA
Currently the IOMMU subsystem provides 3 basic operations: iommu_map(),
iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory
page by page, however it involves flushing the caches (CPU and IOMMU) for
every mapped page separately, which is unsuitable for use cases that
require low mapping latency. Similarly iommu_unmap(), even though it
takes a full IOVA range as its argument, performs unmapping in a page
by page manner.
To make mapping operation more suitable for such use cases, iommu_map_sg()
and .map_sg() callback in iommu_ops struct were introduced, which allowed
particular IOMMU drivers to directly iterate over SG entries, create
necessary mappings and flush everything in one go.
This approach, however, has two drawbacks:
1) it does not do anything about unmap performance,
2) it requires each driver willing to have fast map to implement its
own SG iteration code, even though this is a mostly generic operation.
This series tries to mitigate the two issues above, while acknowledging
the fact that the .map_sg() callback might be still necessary for some
specific platforms, which could have the need to iterate over SG elements
inside driver code. Proposed solution introduces a new .flush() callback,
which expects IOVA range as its argument and is expected to flush all
respective caches (be it CPU, IOMMU TLB or whatever) to make the given
IOVA area mapping change visible to IOMMU clients. Then all the 3 basic
map/unmap operations are modified to call the .flush() callback at the end
of the operation.
Advantages of proposed approach include:
1) ability to use default_iommu_map_sg() helper if all the driver needs
for performance optimization is batching the flush,
2) completely no effect on existing code - the .flush() callback is made
optional and if it isn't implemented drivers are expected to do
necessary flushes on a page by page basis in respective (un)mapping
callbakcs,
3) possibility of exporting the iommu_flush() operation and providing
unsynchronized map/unmap operations for subsystems with even higher
requirements for performance (e.g. drivers/gpu/drm).
The series includes a generic patch implementing necessary changes in
IOMMU API and two Tegra-specific patches that demonstrate implementation
on driver side and which can be used for further testing.
Last, but not least, some performance numbers on Tegra210:
+-----------+--------------+-------------+------------+
| Operation | Size [bytes] | Before [us] | After [us] |
+-----------+--------------+-------------+------------+
| Map | 128K | 139 | 40 |
| | | 136 | 34 |
| | | 137 | 38 |
| | | 136 | 36 |
| | 4M | 3939 | 1163 |
| | | 3730 | 2389 |
| | | 3613 | 997 |
| | | 3622 | 1620 |
| | ~18M | 18635 | 4741 |
| | | 19261 | 6550 |
| | | 18473 | 9304 |
| | | 18125 | 5120 |
| Unmap | 128K | 128 | 7 |
| | | 122 | 8 |
| | | 119 | 10 |
| | | 123 | 12 |
| | 4M | 3829 | 151 |
| | | 3964 | 150 |
| | | 3908 | 145 |
| | | 3875 | 155 |
| | ~18M | 18570 | 683 |
| | | 18473 | 806 |
| | | 21020 | 643 |
| | | 21764 | 652 |
+-----------+--------------+-------------+------------+
The values are obtained by surrounding the calls to iommu_map_sg()
(with default_iommu_map_sg() helper used as .map_sg() callback) and
iommu_unmap() with ktime-based time measurement code. Taken 4 samples
of every buffer size. ~18M means around 17-19M due do the variance
in requested buffer sizes.
Tomasz Figa (2):
iommu: Add support for out of band flushing
iommu/tegra-smmu: Make the driver use out of band flushing
Vince Hsu (1):
memory: tegra: add TLB cache line size
drivers/iommu/iommu.c | 33 +++++++++++++--
drivers/iommu/tegra-smmu.c | 91 +++++++++++++++++++++++++++++++++++++----
drivers/memory/tegra/tegra114.c | 1 +
drivers/memory/tegra/tegra124.c | 3 ++
drivers/memory/tegra/tegra210.c | 1 +
drivers/memory/tegra/tegra30.c | 1 +
include/linux/iommu.h | 2 +
include/soc/tegra/mc.h | 1 +
8 files changed, 122 insertions(+), 11 deletions(-)
--
2.6.0.rc2.230.g3dd15c0
^ permalink raw reply [flat|nested] 17+ messages in thread[parent not found: <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>]
* [RFC PATCH 1/3] iommu: Add support for out of band flushing [not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> @ 2015-09-29 5:25 ` Tomasz Figa [not found] ` <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> 2015-09-29 5:25 ` [RFC PATCH 2/3] memory: tegra: add TLB cache line size Tomasz Figa ` (4 subsequent siblings) 5 siblings, 1 reply; 17+ messages in thread From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann, Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Tomasz Figa, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Thierry Reding, Nicolas Iooss, Russell King, Vince Hsu, Mikko Perttunen This patch adds a new "flush" callback to iommu_ops, which is supposed to perform any necessary flushes within given IOMMU domain to make any changes to mappings of given area [iova; iova + size) be reflected to IOMMU clients. The purpose is to let IOMMU drivers skip page-by-page flushes and replace it with one flush of full address range on devices which support it. Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> --- drivers/iommu/iommu.c | 33 ++++++++++++++++++++++++++++++--- include/linux/iommu.h | 2 ++ 2 files changed, 32 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 049df49..d6bb8fe 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1286,8 +1286,15 @@ static size_t iommu_pgsize(struct iommu_domain *domain, return pgsize; } -int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot) +static void iommu_flush(struct iommu_domain *domain, unsigned long iova, + size_t size) +{ + if (domain->ops->flush) + domain->ops->flush(domain, iova, size); +} + +static int __iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) { unsigned long orig_iova = iova; unsigned int min_pagesz; @@ -1340,6 +1347,20 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, return ret; } + +int iommu_map(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot) +{ + int ret; + + ret = __iommu_map(domain, iova, paddr, size, prot); + if (ret) + return ret; + + iommu_flush(domain, iova, size); + + return 0; +} EXPORT_SYMBOL_GPL(iommu_map); size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) @@ -1389,6 +1410,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) unmapped += unmapped_page; } + iommu_flush(domain, orig_iova, unmapped); trace_unmap(orig_iova, size, unmapped); return unmapped; } @@ -1419,19 +1441,24 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, if (!IS_ALIGNED(s->offset, min_pagesz)) goto out_err; - ret = iommu_map(domain, iova + mapped, phys, s->length, prot); + ret = __iommu_map(domain, iova + mapped, phys, s->length, prot); if (ret) goto out_err; mapped += s->length; } + iommu_flush(domain, iova, mapped); + return mapped; out_err: /* undo mappings already done */ iommu_unmap(domain, iova, mapped); + /* flush in case part of our mapping already got cached */ + iommu_flush(domain, iova, mapped); + return 0; } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index f9c1b6d..18e59a9 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -164,6 +164,8 @@ struct iommu_ops { size_t size); size_t (*map_sg)(struct iommu_domain *domain, unsigned long iova, struct scatterlist *sg, unsigned int nents, int prot); + void (*flush)(struct iommu_domain *domain, unsigned long iova, + size_t size); phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); int (*add_device)(struct device *dev); void (*remove_device)(struct device *dev); -- 2.6.0.rc2.230.g3dd15c0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>]
* Re: [RFC PATCH 1/3] iommu: Add support for out of band flushing [not found] ` <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> @ 2015-09-29 9:32 ` Thierry Reding [not found] ` <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Thierry Reding @ 2015-09-29 9:32 UTC (permalink / raw) To: Tomasz Figa Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann, Tomeu Vizoso, Stephen Warren, Antonios Motakis, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Mikko Perttunen, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nicolas Iooss, Russell King, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Vince Hsu [-- Attachment #1.1: Type: text/plain, Size: 2148 bytes --] On Tue, Sep 29, 2015 at 02:25:24PM +0900, Tomasz Figa wrote: > This patch adds a new "flush" callback to iommu_ops, which is supposed > to perform any necessary flushes within given IOMMU domain to make any > changes to mappings of given area [iova; iova + size) be reflected to > IOMMU clients. > > The purpose is to let IOMMU drivers skip page-by-page flushes and > replace it with one flush of full address range on devices which support > it. > > Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> > --- > drivers/iommu/iommu.c | 33 ++++++++++++++++++++++++++++++--- > include/linux/iommu.h | 2 ++ > 2 files changed, 32 insertions(+), 3 deletions(-) I seem to remember that Rob Clark had proposed something like this back before it was decided to introduce the ->map_sg() callback instead. I can't find a reference to the discussion, so perhaps I'm misremembering but adding Rob Clark in case he has any recollection of why the outcome was what it was. > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c [...] > @@ -1389,6 +1410,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) > unmapped += unmapped_page; > } > > + iommu_flush(domain, orig_iova, unmapped); > trace_unmap(orig_iova, size, unmapped); > return unmapped; > } > @@ -1419,19 +1441,24 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, > if (!IS_ALIGNED(s->offset, min_pagesz)) > goto out_err; > > - ret = iommu_map(domain, iova + mapped, phys, s->length, prot); > + ret = __iommu_map(domain, iova + mapped, phys, s->length, prot); > if (ret) > goto out_err; > > mapped += s->length; > } > > + iommu_flush(domain, iova, mapped); > + > return mapped; > > out_err: > /* undo mappings already done */ > iommu_unmap(domain, iova, mapped); > > + /* flush in case part of our mapping already got cached */ > + iommu_flush(domain, iova, mapped); > + > return 0; > > } iommu_unmap() already does an iommu_flush(), so why flush again after iommu_unmap()? Thierry [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>]
* Re: [RFC PATCH 1/3] iommu: Add support for out of band flushing [not found] ` <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org> @ 2015-09-29 11:56 ` Tomasz Figa 0 siblings, 0 replies; 17+ messages in thread From: Tomasz Figa @ 2015-09-29 11:56 UTC (permalink / raw) To: Thierry Reding Cc: open list:IOMMU DRIVERS, Joerg Roedel, Hiroshi Doyu, Stephen Warren, Alexandre Courbot, Vince Hsu, Russell King, Paul Walmsley, Mikko Perttunen, Tomeu Vizoso, Will Deacon, Alex Williamson, Marek Szyprowski, Arnd Bergmann, Antonios Motakis, Olav Haugan, Nicolas Iooss, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Rob Clark On Tue, Sep 29, 2015 at 6:32 PM, Thierry Reding <thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On Tue, Sep 29, 2015 at 02:25:24PM +0900, Tomasz Figa wrote: >> This patch adds a new "flush" callback to iommu_ops, which is supposed >> to perform any necessary flushes within given IOMMU domain to make any >> changes to mappings of given area [iova; iova + size) be reflected to >> IOMMU clients. >> >> The purpose is to let IOMMU drivers skip page-by-page flushes and >> replace it with one flush of full address range on devices which support >> it. >> >> Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> >> --- >> drivers/iommu/iommu.c | 33 ++++++++++++++++++++++++++++++--- >> include/linux/iommu.h | 2 ++ >> 2 files changed, 32 insertions(+), 3 deletions(-) > > I seem to remember that Rob Clark had proposed something like this back > before it was decided to introduce the ->map_sg() callback instead. I > can't find a reference to the discussion, so perhaps I'm misremembering > but adding Rob Clark in case he has any recollection of why the outcome > was what it was. Oops, I was supposed to add Rob, but forgot in the end. Thanks for remembering. >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > [...] >> @@ -1389,6 +1410,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) >> unmapped += unmapped_page; >> } >> >> + iommu_flush(domain, orig_iova, unmapped); >> trace_unmap(orig_iova, size, unmapped); >> return unmapped; >> } >> @@ -1419,19 +1441,24 @@ size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long iova, >> if (!IS_ALIGNED(s->offset, min_pagesz)) >> goto out_err; >> >> - ret = iommu_map(domain, iova + mapped, phys, s->length, prot); >> + ret = __iommu_map(domain, iova + mapped, phys, s->length, prot); >> if (ret) >> goto out_err; >> >> mapped += s->length; >> } >> >> + iommu_flush(domain, iova, mapped); >> + >> return mapped; >> >> out_err: >> /* undo mappings already done */ >> iommu_unmap(domain, iova, mapped); >> >> + /* flush in case part of our mapping already got cached */ >> + iommu_flush(domain, iova, mapped); >> + >> return 0; >> >> } > > iommu_unmap() already does an iommu_flush(), so why flush again after > iommu_unmap()? Right, my mistake. This should be removed. Thanks for catching. Best regards, Tomasz ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH 2/3] memory: tegra: add TLB cache line size [not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> 2015-09-29 5:25 ` [RFC PATCH 1/3] iommu: Add support for out of band flushing Tomasz Figa @ 2015-09-29 5:25 ` Tomasz Figa [not found] ` <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> 2015-09-29 5:25 ` [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing Tomasz Figa ` (3 subsequent siblings) 5 siblings, 1 reply; 17+ messages in thread From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Vince Hsu, Tomasz Figa, Joerg Roedel, Hiroshi Doyu, Stephen Warren, Thierry Reding, Alexandre Courbot, Russell King, Paul Walmsley, Mikko Perttunen, Tomeu Vizoso, Will Deacon, Alex Williamson, Arnd Bergmann, Antonios Motakis, Nicolas Iooss, Olav Haugan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-tegra-u79uwXL29TY76Z2rM5mHXA From: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> This patch adds SMMU line size to Tegra SoC data struct to enable SMMU driver to use this knowledge in code added by further patch. Also add the missing TLB line number for Tegra124. Signed-off-by: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> [tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org: Rebased, revised commit message.] Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> --- drivers/memory/tegra/tegra114.c | 1 + drivers/memory/tegra/tegra124.c | 3 +++ drivers/memory/tegra/tegra210.c | 1 + drivers/memory/tegra/tegra30.c | 1 + include/soc/tegra/mc.h | 1 + 5 files changed, 7 insertions(+) diff --git a/drivers/memory/tegra/tegra114.c b/drivers/memory/tegra/tegra114.c index ba8fff3..173d2dd 100644 --- a/drivers/memory/tegra/tegra114.c +++ b/drivers/memory/tegra/tegra114.c @@ -920,6 +920,7 @@ static const struct tegra_smmu_soc tegra114_smmu_soc = { .supports_round_robin_arbitration = false, .supports_request_limit = false, .num_tlb_lines = 32, + .tlb_line_size = 16, .num_asids = 4, }; diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c index 21e7255..ff12487 100644 --- a/drivers/memory/tegra/tegra124.c +++ b/drivers/memory/tegra/tegra124.c @@ -1007,6 +1007,8 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = { .num_swgroups = ARRAY_SIZE(tegra124_swgroups), .supports_round_robin_arbitration = true, .supports_request_limit = true, + .num_tlb_lines = 32, + .tlb_line_size = 32, .num_asids = 128, }; @@ -1031,6 +1033,7 @@ static const struct tegra_smmu_soc tegra132_smmu_soc = { .supports_round_robin_arbitration = true, .supports_request_limit = true, .num_tlb_lines = 32, + .tlb_line_size = 32, .num_asids = 128, }; diff --git a/drivers/memory/tegra/tegra210.c b/drivers/memory/tegra/tegra210.c index 5e144ab..bf6a9c1 100644 --- a/drivers/memory/tegra/tegra210.c +++ b/drivers/memory/tegra/tegra210.c @@ -1067,6 +1067,7 @@ static const struct tegra_smmu_soc tegra210_smmu_soc = { .supports_round_robin_arbitration = true, .supports_request_limit = true, .num_tlb_lines = 32, + .tlb_line_size = 32, .num_asids = 128, }; diff --git a/drivers/memory/tegra/tegra30.c b/drivers/memory/tegra/tegra30.c index b447378..08951db 100644 --- a/drivers/memory/tegra/tegra30.c +++ b/drivers/memory/tegra/tegra30.c @@ -942,6 +942,7 @@ static const struct tegra_smmu_soc tegra30_smmu_soc = { .supports_round_robin_arbitration = false, .supports_request_limit = false, .num_tlb_lines = 16, + .tlb_line_size = 16, .num_asids = 4, }; diff --git a/include/soc/tegra/mc.h b/include/soc/tegra/mc.h index 44202ff..a6dbb35 100644 --- a/include/soc/tegra/mc.h +++ b/include/soc/tegra/mc.h @@ -62,6 +62,7 @@ struct tegra_smmu_soc { bool supports_request_limit; unsigned int num_tlb_lines; + unsigned int tlb_line_size; unsigned int num_asids; }; -- 2.6.0.rc2.230.g3dd15c0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>]
* Re: [RFC PATCH 2/3] memory: tegra: add TLB cache line size [not found] ` <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> @ 2015-09-29 9:43 ` Thierry Reding [not found] ` <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Thierry Reding @ 2015-09-29 9:43 UTC (permalink / raw) To: Tomasz Figa Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann, Tomeu Vizoso, Stephen Warren, Antonios Motakis, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Mikko Perttunen, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nicolas Iooss, Russell King, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Vince Hsu [-- Attachment #1.1: Type: text/plain, Size: 1966 bytes --] On Tue, Sep 29, 2015 at 02:25:25PM +0900, Tomasz Figa wrote: > From: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> > > This patch adds SMMU line size to Tegra SoC data struct to enable SMMU > driver to use this knowledge in code added by further patch. I think the line size should either be added in the same patch that adds the feature which uses it, or the commit message should describe what purpose it will be used for. As it is this commit message leaves too many questions unanswered. > Also add the missing TLB line number for Tegra124. > > Signed-off-by: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> > [tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org: Rebased, revised commit message.] > Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> > diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c > index 21e7255..ff12487 100644 > --- a/drivers/memory/tegra/tegra124.c > +++ b/drivers/memory/tegra/tegra124.c > @@ -1007,6 +1007,8 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = { > .num_swgroups = ARRAY_SIZE(tegra124_swgroups), > .supports_round_robin_arbitration = true, > .supports_request_limit = true, > + .num_tlb_lines = 32, > + .tlb_line_size = 32, > .num_asids = 128, > }; Oh my... try to fix one platform and break another. Fortunately it seems like Tegra124 copes much better without TLB because I'm not seeing any buffer underruns or similar on Tegra124 without this fix. Anyway, this change is completely unrelated and fixes a regression (even though it might not be noticeable in many use-cases), so can you please split it out into a separate patch and add a Fixes: 11cec15bf3fb ("iommu/tegra-smmu: Parameterize number of TLB lines") line to it? That patch went into v4.3-rc1 and it'd be nice to get this fix in before the final v4.3. Feel free to add my Acked-by/Reviewed-by as well. Thierry [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>]
* Re: [RFC PATCH 2/3] memory: tegra: add TLB cache line size [not found] ` <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org> @ 2015-09-29 12:11 ` Tomasz Figa 0 siblings, 0 replies; 17+ messages in thread From: Tomasz Figa @ 2015-09-29 12:11 UTC (permalink / raw) To: Thierry Reding Cc: open list:IOMMU DRIVERS, Vince Hsu, Joerg Roedel, Hiroshi Doyu, Stephen Warren, Alexandre Courbot, Russell King, Paul Walmsley, Mikko Perttunen, Tomeu Vizoso, Will Deacon, Alex Williamson, Arnd Bergmann, Antonios Motakis, Nicolas Iooss, Olav Haugan, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-tegra-u79uwXL29TY76Z2rM5mHXA On Tue, Sep 29, 2015 at 6:43 PM, Thierry Reding <thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > On Tue, Sep 29, 2015 at 02:25:25PM +0900, Tomasz Figa wrote: >> From: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> >> >> This patch adds SMMU line size to Tegra SoC data struct to enable SMMU >> driver to use this knowledge in code added by further patch. > > I think the line size should either be added in the same patch that adds > the feature which uses it, or the commit message should describe what > purpose it will be used for. As it is this commit message leaves too > many questions unanswered. 100% agreed. Anyway I just put this patch in this RFC quickly so that the whole series can be applied and tested in somebody wants to do so. I'll update commit message in next version. > >> Also add the missing TLB line number for Tegra124. >> >> Signed-off-by: Vince Hsu <vince.h-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> >> [tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org: Rebased, revised commit message.] >> Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> > >> diff --git a/drivers/memory/tegra/tegra124.c b/drivers/memory/tegra/tegra124.c >> index 21e7255..ff12487 100644 >> --- a/drivers/memory/tegra/tegra124.c >> +++ b/drivers/memory/tegra/tegra124.c >> @@ -1007,6 +1007,8 @@ static const struct tegra_smmu_soc tegra124_smmu_soc = { >> .num_swgroups = ARRAY_SIZE(tegra124_swgroups), >> .supports_round_robin_arbitration = true, >> .supports_request_limit = true, >> + .num_tlb_lines = 32, >> + .tlb_line_size = 32, >> .num_asids = 128, >> }; > > Oh my... try to fix one platform and break another. Fortunately it seems > like Tegra124 copes much better without TLB because I'm not seeing any > buffer underruns or similar on Tegra124 without this fix. > > Anyway, this change is completely unrelated and fixes a regression (even > though it might not be noticeable in many use-cases), so can you please > split it out into a separate patch and add a > > Fixes: 11cec15bf3fb ("iommu/tegra-smmu: Parameterize number of TLB lines") > > line to it? That patch went into v4.3-rc1 and it'd be nice to get this > fix in before the final v4.3. Feel free to add my Acked-by/Reviewed-by > as well. Sounds good to me. We also have a patch adding locking around page table get/put in map/unmap, which fixes a race between concurrent maps and unmaps within the same page table. I'll send both soon. Best regards, Tomasz ^ permalink raw reply [flat|nested] 17+ messages in thread
* [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing [not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> 2015-09-29 5:25 ` [RFC PATCH 1/3] iommu: Add support for out of band flushing Tomasz Figa 2015-09-29 5:25 ` [RFC PATCH 2/3] memory: tegra: add TLB cache line size Tomasz Figa @ 2015-09-29 5:25 ` Tomasz Figa 2015-09-29 9:27 ` [RFC PATCH 0/3] iommu: Add range flush operation Thierry Reding ` (2 subsequent siblings) 5 siblings, 0 replies; 17+ messages in thread From: Tomasz Figa @ 2015-09-29 5:25 UTC (permalink / raw) To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA Cc: Tomasz Figa, Vince Hsu, Joerg Roedel, Hiroshi Doyu, Stephen Warren, Thierry Reding, Alexandre Courbot, Vince Hsu, Paul Walmsley, Russell King, Mikko Perttunen, Tomeu Vizoso, Will Deacon, Alex Williamson, Marek Szyprowski, Arnd Bergmann, Antonios Motakis, Nicolas Iooss, Olav Haugan, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-tegra-u79uwXL29TY76Z2rM5mHXA This patch modifies the tegra-smmu driver to perform PTC and TLB flushes inside iommu_ops .flush() callback instead of map and unmap operations, so that performance of large maps and unmaps is heavily optimized due to elimination of page-by-page flushing. Signed-off-by: Tomasz Figa <tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> Signed-off-by: Vince Hsu <vinceh-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> --- drivers/iommu/tegra-smmu.c | 91 ++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 83 insertions(+), 8 deletions(-) diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index 9305964..92b46d2 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -614,18 +614,54 @@ static void tegra_smmu_pte_put_use(struct tegra_smmu_as *as, unsigned long iova) } } +static void tegra_smmu_pte_put_use_range(struct tegra_smmu_as *as, + unsigned long iova, unsigned int len) +{ + unsigned int i; + + for (i = 0; i < len; i++) + tegra_smmu_pte_put_use(as, iova + i * PAGE_SIZE); +} + static void tegra_smmu_set_pte(struct tegra_smmu_as *as, unsigned long iova, u32 *pte, dma_addr_t pte_dma, u32 val) { - struct tegra_smmu *smmu = as->smmu; - unsigned long offset = offset_in_page(pte); - *pte = val; +} + +static void tegra_smmu_flush_pte_range(struct tegra_smmu_as *as, + unsigned long iova, unsigned int num_ptes, u32 *pte, + dma_addr_t pt_dma) +{ + struct tegra_smmu *smmu = as->smmu; + unsigned int tlb_lines_per_atom; + unsigned int ptes_per_tlb_line; + unsigned int ptes_per_atom; + unsigned long offset; + unsigned long iova_end; + int i; + + ptes_per_atom = smmu->mc->soc->atom_size / sizeof(*pte); + ptes_per_tlb_line = smmu->soc->tlb_line_size / sizeof(*pte); + tlb_lines_per_atom = smmu->mc->soc->atom_size + / smmu->soc->tlb_line_size; + + offset = round_down(offset_in_page(pte), smmu->mc->soc->atom_size); + + iova_end = iova + num_ptes * PAGE_SIZE; + iova = round_down(iova, ptes_per_atom * PAGE_SIZE); + iova_end = round_up(iova_end, ptes_per_atom * PAGE_SIZE); + num_ptes = (iova_end - iova) / PAGE_SIZE; + while (num_ptes) { + smmu_flush_ptc(smmu, pt_dma, offset); + for (i = 0; i < tlb_lines_per_atom; i++) { + smmu_flush_tlb_group(smmu, as->id, iova); + iova += ptes_per_tlb_line * PAGE_SIZE; + } + offset += smmu->mc->soc->atom_size; + num_ptes -= ptes_per_atom; + } - dma_sync_single_range_for_device(smmu->dev, pte_dma, offset, - 4, DMA_TO_DEVICE); - smmu_flush_ptc(smmu, pte_dma, offset); - smmu_flush_tlb_group(smmu, as->id, iova); smmu_flush(smmu); } @@ -662,11 +698,49 @@ static size_t tegra_smmu_unmap(struct iommu_domain *domain, unsigned long iova, return 0; tegra_smmu_set_pte(as, iova, pte, pte_dma, 0); - tegra_smmu_pte_put_use(as, iova); return size; } +static void tegra_smmu_flush(struct iommu_domain *domain, unsigned long iova, + size_t size) +{ + struct tegra_smmu_as *as = to_smmu_as(domain); + struct tegra_smmu *smmu = as->smmu; + u32 num = size >> PAGE_SHIFT; + + might_sleep(); + + while (num) { + unsigned int pt_index = iova_pt_index(iova); + unsigned int len, end; + unsigned long offset; + dma_addr_t pte_dma; + u32 *pte; + + end = pt_index + num; + if (end > SMMU_NUM_PTE) + end = SMMU_NUM_PTE; + len = end - pt_index; + + pte = tegra_smmu_pte_lookup(as, iova, &pte_dma); + if (!pte) + goto next_pde; + + offset = offset_in_page(pte); + dma_sync_single_range_for_device(smmu->dev, pte_dma, offset, + sizeof(*pte) * len, DMA_TO_DEVICE); + + tegra_smmu_flush_pte_range(as, iova, len, pte, pte_dma); + if (*pte == 0) + tegra_smmu_pte_put_use_range(as, iova, len); + +next_pde: + num -= len; + iova += len << PAGE_SHIFT; + } +} + static phys_addr_t tegra_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) { @@ -743,6 +817,7 @@ static const struct iommu_ops tegra_smmu_ops = { .map = tegra_smmu_map, .unmap = tegra_smmu_unmap, .map_sg = default_iommu_map_sg, + .flush = tegra_smmu_flush, .iova_to_phys = tegra_smmu_iova_to_phys, .pgsize_bitmap = SZ_4K, -- 2.6.0.rc2.230.g3dd15c0 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> ` (2 preceding siblings ...) 2015-09-29 5:25 ` [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing Tomasz Figa @ 2015-09-29 9:27 ` Thierry Reding [not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org> 2015-09-29 12:20 ` Joerg Roedel 2015-09-29 14:20 ` Robin Murphy 5 siblings, 1 reply; 17+ messages in thread From: Thierry Reding @ 2015-09-29 9:27 UTC (permalink / raw) To: Tomasz Figa Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann, Tomeu Vizoso, Stephen Warren, Antonios Motakis, linux-kernel-u79uwXL29TY76Z2rM5mHXA, Will Deacon, Mikko Perttunen, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Nicolas Iooss, Russell King, linux-tegra-u79uwXL29TY76Z2rM5mHXA, Vince Hsu [-- Attachment #1.1: Type: text/plain, Size: 4776 bytes --] On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote: > Currently the IOMMU subsystem provides 3 basic operations: iommu_map(), > iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory > page by page, however it involves flushing the caches (CPU and IOMMU) for > every mapped page separately, which is unsuitable for use cases that > require low mapping latency. Similarly iommu_unmap(), even though it > takes a full IOVA range as its argument, performs unmapping in a page > by page manner. > > To make mapping operation more suitable for such use cases, iommu_map_sg() > and .map_sg() callback in iommu_ops struct were introduced, which allowed > particular IOMMU drivers to directly iterate over SG entries, create > necessary mappings and flush everything in one go. > > This approach, however, has two drawbacks: > 1) it does not do anything about unmap performance, > 2) it requires each driver willing to have fast map to implement its > own SG iteration code, even though this is a mostly generic operation. > > This series tries to mitigate the two issues above, while acknowledging > the fact that the .map_sg() callback might be still necessary for some > specific platforms, which could have the need to iterate over SG elements > inside driver code. Proposed solution introduces a new .flush() callback, > which expects IOVA range as its argument and is expected to flush all > respective caches (be it CPU, IOMMU TLB or whatever) to make the given > IOVA area mapping change visible to IOMMU clients. Then all the 3 basic > map/unmap operations are modified to call the .flush() callback at the end > of the operation. > > Advantages of proposed approach include: > 1) ability to use default_iommu_map_sg() helper if all the driver needs > for performance optimization is batching the flush, > 2) completely no effect on existing code - the .flush() callback is made > optional and if it isn't implemented drivers are expected to do > necessary flushes on a page by page basis in respective (un)mapping > callbakcs, > 3) possibility of exporting the iommu_flush() operation and providing > unsynchronized map/unmap operations for subsystems with even higher > requirements for performance (e.g. drivers/gpu/drm). That would require passing in some sort of flag that the core shouldn't be flushing itself, right? Currently it would flush on every map/unmap. > > The series includes a generic patch implementing necessary changes in > IOMMU API and two Tegra-specific patches that demonstrate implementation > on driver side and which can be used for further testing. > > Last, but not least, some performance numbers on Tegra210: > +-----------+--------------+-------------+------------+ > | Operation | Size [bytes] | Before [us] | After [us] | > +-----------+--------------+-------------+------------+ > | Map | 128K | 139 | 40 | > | | | 136 | 34 | > | | | 137 | 38 | > | | | 136 | 36 | > | | 4M | 3939 | 1163 | > | | | 3730 | 2389 | > | | | 3613 | 997 | > | | | 3622 | 1620 | > | | ~18M | 18635 | 4741 | > | | | 19261 | 6550 | > | | | 18473 | 9304 | > | | | 18125 | 5120 | > | Unmap | 128K | 128 | 7 | > | | | 122 | 8 | > | | | 119 | 10 | > | | | 123 | 12 | > | | 4M | 3829 | 151 | > | | | 3964 | 150 | > | | | 3908 | 145 | > | | | 3875 | 155 | > | | ~18M | 18570 | 683 | > | | | 18473 | 806 | > | | | 21020 | 643 | > | | | 21764 | 652 | > +-----------+--------------+-------------+------------+ > The values are obtained by surrounding the calls to iommu_map_sg() > (with default_iommu_map_sg() helper used as .map_sg() callback) and > iommu_unmap() with ktime-based time measurement code. Taken 4 samples > of every buffer size. ~18M means around 17-19M due do the variance > in requested buffer sizes. Those are pretty impressive numbers. Thierry [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>]
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org> @ 2015-09-29 11:54 ` Tomasz Figa 2015-09-29 12:22 ` Joerg Roedel 1 sibling, 0 replies; 17+ messages in thread From: Tomasz Figa @ 2015-09-29 11:54 UTC (permalink / raw) To: Thierry Reding Cc: open list:IOMMU DRIVERS, Joerg Roedel, Hiroshi Doyu, Stephen Warren, Alexandre Courbot, Vince Hsu, Russell King, Paul Walmsley, Tomeu Vizoso, Mikko Perttunen, Will Deacon, Alex Williamson, Arnd Bergmann, Marek Szyprowski, Antonios Motakis, Olav Haugan, Nicolas Iooss, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-tegra-u79uwXL29TY76Z2rM5mHXA On Tue, Sep 29, 2015 at 6:27 PM, Thierry Reding <thierry.reding-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote: > > Currently the IOMMU subsystem provides 3 basic operations: iommu_map(), > > iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory > > page by page, however it involves flushing the caches (CPU and IOMMU) for > > every mapped page separately, which is unsuitable for use cases that > > require low mapping latency. Similarly iommu_unmap(), even though it > > takes a full IOVA range as its argument, performs unmapping in a page > > by page manner. > > > > To make mapping operation more suitable for such use cases, iommu_map_sg() > > and .map_sg() callback in iommu_ops struct were introduced, which allowed > > particular IOMMU drivers to directly iterate over SG entries, create > > necessary mappings and flush everything in one go. > > > > This approach, however, has two drawbacks: > > 1) it does not do anything about unmap performance, > > 2) it requires each driver willing to have fast map to implement its > > own SG iteration code, even though this is a mostly generic operation. > > > > This series tries to mitigate the two issues above, while acknowledging > > the fact that the .map_sg() callback might be still necessary for some > > specific platforms, which could have the need to iterate over SG elements > > inside driver code. Proposed solution introduces a new .flush() callback, > > which expects IOVA range as its argument and is expected to flush all > > respective caches (be it CPU, IOMMU TLB or whatever) to make the given > > IOVA area mapping change visible to IOMMU clients. Then all the 3 basic > > map/unmap operations are modified to call the .flush() callback at the end > > of the operation. > > > > Advantages of proposed approach include: > > 1) ability to use default_iommu_map_sg() helper if all the driver needs > > for performance optimization is batching the flush, > > 2) completely no effect on existing code - the .flush() callback is made > > optional and if it isn't implemented drivers are expected to do > > necessary flushes on a page by page basis in respective (un)mapping > > callbakcs, > > 3) possibility of exporting the iommu_flush() operation and providing > > unsynchronized map/unmap operations for subsystems with even higher > > requirements for performance (e.g. drivers/gpu/drm). > > That would require passing in some sort of flag that the core shouldn't > be flushing itself, right? Currently it would flush on every map/unmap. > Are you asking about 3) in particular? If so, I was thinking about iommu_map_noflush(), iommu_unmap_noflush(), which could be then wrapped by iommu_map() with call to iommu_flush() added at the end. > > > > > The series includes a generic patch implementing necessary changes in > > IOMMU API and two Tegra-specific patches that demonstrate implementation > > on driver side and which can be used for further testing. > > > > Last, but not least, some performance numbers on Tegra210: > > +-----------+--------------+-------------+------------+ > > | Operation | Size [bytes] | Before [us] | After [us] | > > +-----------+--------------+-------------+------------+ > > | Map | 128K | 139 | 40 | > > | | | 136 | 34 | > > | | | 137 | 38 | > > | | | 136 | 36 | > > | | 4M | 3939 | 1163 | > > | | | 3730 | 2389 | > > | | | 3613 | 997 | > > | | | 3622 | 1620 | > > | | ~18M | 18635 | 4741 | > > | | | 19261 | 6550 | > > | | | 18473 | 9304 | > > | | | 18125 | 5120 | > > | Unmap | 128K | 128 | 7 | > > | | | 122 | 8 | > > | | | 119 | 10 | > > | | | 123 | 12 | > > | | 4M | 3829 | 151 | > > | | | 3964 | 150 | > > | | | 3908 | 145 | > > | | | 3875 | 155 | > > | | ~18M | 18570 | 683 | > > | | | 18473 | 806 | > > | | | 21020 | 643 | > > | | | 21764 | 652 | > > +-----------+--------------+-------------+------------+ > > The values are obtained by surrounding the calls to iommu_map_sg() > > (with default_iommu_map_sg() helper used as .map_sg() callback) and > > iommu_unmap() with ktime-based time measurement code. Taken 4 samples > > of every buffer size. ~18M means around 17-19M due do the variance > > in requested buffer sizes. > > Those are pretty impressive numbers. I was surprised myself that there is so much difference on this platform, but it seems to converge with downstream kernel. Moreover we can supposedly get even better results by simply invalidating full TLB above some length threshold. Best regards, Tomasz ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org> 2015-09-29 11:54 ` Tomasz Figa @ 2015-09-29 12:22 ` Joerg Roedel 1 sibling, 0 replies; 17+ messages in thread From: Joerg Roedel @ 2015-09-29 12:22 UTC (permalink / raw) To: Thierry Reding Cc: Tomasz Figa, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Hiroshi Doyu, Stephen Warren, Alexandre Courbot, Vince Hsu, Russell King, Paul Walmsley, Tomeu Vizoso, Mikko Perttunen, Will Deacon, Alex Williamson, Arnd Bergmann, Marek Szyprowski, Antonios Motakis, Olav Haugan, Nicolas Iooss, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-tegra-u79uwXL29TY76Z2rM5mHXA On Tue, Sep 29, 2015 at 11:27:14AM +0200, Thierry Reding wrote: > On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote: > > 3) possibility of exporting the iommu_flush() operation and providing > > unsynchronized map/unmap operations for subsystems with even higher > > requirements for performance (e.g. drivers/gpu/drm). > > That would require passing in some sort of flag that the core shouldn't > be flushing itself, right? Currently it would flush on every map/unmap. Not necessarily. Introducing a flag would require updating all callers, so while at it we could also just change the semantics of map/unmap to require an explicit flush afterwards (only if it turns out that such an iommu_flush() function is really needed). Joerg ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> ` (3 preceding siblings ...) 2015-09-29 9:27 ` [RFC PATCH 0/3] iommu: Add range flush operation Thierry Reding @ 2015-09-29 12:20 ` Joerg Roedel 2015-09-29 14:20 ` Robin Murphy 5 siblings, 0 replies; 17+ messages in thread From: Joerg Roedel @ 2015-09-29 12:20 UTC (permalink / raw) To: Tomasz Figa Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Hiroshi Doyu, Stephen Warren, Thierry Reding, Alexandre Courbot, Vince Hsu, Russell King, Paul Walmsley, Tomeu Vizoso, Mikko Perttunen, Will Deacon, Alex Williamson, Arnd Bergmann, Marek Szyprowski, Antonios Motakis, Olav Haugan, Nicolas Iooss, linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-tegra-u79uwXL29TY76Z2rM5mHXA Hi Tomasz, On Tue, Sep 29, 2015 at 02:25:23PM +0900, Tomasz Figa wrote: > This series tries to mitigate the two issues above, while acknowledging > the fact that the .map_sg() callback might be still necessary for some > specific platforms, which could have the need to iterate over SG elements > inside driver code. Proposed solution introduces a new .flush() callback, > which expects IOVA range as its argument and is expected to flush all > respective caches (be it CPU, IOMMU TLB or whatever) to make the given > IOVA area mapping change visible to IOMMU clients. Then all the 3 basic > map/unmap operations are modified to call the .flush() callback at the end > of the operation. > > Advantages of proposed approach include: > 1) ability to use default_iommu_map_sg() helper if all the driver needs > for performance optimization is batching the flush, > 2) completely no effect on existing code - the .flush() callback is made > optional and if it isn't implemented drivers are expected to do > necessary flushes on a page by page basis in respective (un)mapping > callbakcs, > 3) possibility of exporting the iommu_flush() operation and providing > unsynchronized map/unmap operations for subsystems with even higher > requirements for performance (e.g. drivers/gpu/drm). Thanks for the patches, I really like the idea. The VT-d driver probably also benefits a lot from this. In the past I also proposed something like this, it was called a new iommu_commit() API function for making changes from map/unmap visible to the hardware. But this requires updating all callers first, so your approach of doing an implicit flush at the end of map/unmap is better. Joerg ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> ` (4 preceding siblings ...) 2015-09-29 12:20 ` Joerg Roedel @ 2015-09-29 14:20 ` Robin Murphy [not found] ` <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org> 5 siblings, 1 reply; 17+ messages in thread From: Robin Murphy @ 2015-09-29 14:20 UTC (permalink / raw) To: Tomasz Figa, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann, Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Thierry Reding, Nicolas Iooss, Russell King, Vince Hsu, Mikko Perttunen Hi Tomasz, On 29/09/15 06:25, Tomasz Figa wrote: > Currently the IOMMU subsystem provides 3 basic operations: iommu_map(), > iommu_map_sg() and iommu_unmap(). iommu_map() can be used to map memory > page by page, however it involves flushing the caches (CPU and IOMMU) for > every mapped page separately, which is unsuitable for use cases that > require low mapping latency. Similarly iommu_unmap(), even though it > takes a full IOVA range as its argument, performs unmapping in a page > by page manner. This isn't necessarily the general case, though. If the IOMMU has a coherent page table walk interface and its architecture prohibits caching invalid PTEs, then the overhead on an unmap is only a TLB invalidation and the overhead on a map is nothing. > To make mapping operation more suitable for such use cases, iommu_map_sg() > and .map_sg() callback in iommu_ops struct were introduced, which allowed > particular IOMMU drivers to directly iterate over SG entries, create > necessary mappings and flush everything in one go. > > This approach, however, has two drawbacks: > 1) it does not do anything about unmap performance, > 2) it requires each driver willing to have fast map to implement its > own SG iteration code, even though this is a mostly generic operation. > > This series tries to mitigate the two issues above, while acknowledging > the fact that the .map_sg() callback might be still necessary for some > specific platforms, which could have the need to iterate over SG elements > inside driver code. Proposed solution introduces a new .flush() callback, > which expects IOVA range as its argument and is expected to flush all > respective caches (be it CPU, IOMMU TLB or whatever) to make the given > IOVA area mapping change visible to IOMMU clients. Then all the 3 basic > map/unmap operations are modified to call the .flush() callback at the end > of the operation. > > Advantages of proposed approach include: > 1) ability to use default_iommu_map_sg() helper if all the driver needs > for performance optimization is batching the flush, > 2) completely no effect on existing code - the .flush() callback is made > optional and if it isn't implemented drivers are expected to do > necessary flushes on a page by page basis in respective (un)mapping > callbakcs, > 3) possibility of exporting the iommu_flush() operation and providing > unsynchronized map/unmap operations for subsystems with even higher > requirements for performance (e.g. drivers/gpu/drm). A single callback doesn't really generalise well enough: If we wanted to implement this in the ARM SMMU drivers to optimise the unmap() case [ask Will how long he spends waiting for a software model to tear down an entire VFIO domain invalidating one page at a time ;)], then we'd either regress performance in the map() case with an unnecessary TLB flush, or have to do a table walk in every flush() call to infer what actually needs doing. Personally I think it would be nicest to have two separate callbacks, e.g. .map_sync/.unmap_sync, but at the very least some kind of additional 'direction' kind of parameter would be necessary. > The series includes a generic patch implementing necessary changes in > IOMMU API and two Tegra-specific patches that demonstrate implementation > on driver side and which can be used for further testing. > > Last, but not least, some performance numbers on Tegra210: > +-----------+--------------+-------------+------------+ > | Operation | Size [bytes] | Before [us] | After [us] | > +-----------+--------------+-------------+------------+ > | Map | 128K | 139 | 40 | > | | | 136 | 34 | > | | | 137 | 38 | > | | | 136 | 36 | > | | 4M | 3939 | 1163 | > | | | 3730 | 2389 | > | | | 3613 | 997 | > | | | 3622 | 1620 | > | | ~18M | 18635 | 4741 | > | | | 19261 | 6550 | > | | | 18473 | 9304 | > | | | 18125 | 5120 | > | Unmap | 128K | 128 | 7 | > | | | 122 | 8 | > | | | 119 | 10 | > | | | 123 | 12 | > | | 4M | 3829 | 151 | > | | | 3964 | 150 | > | | | 3908 | 145 | > | | | 3875 | 155 | > | | ~18M | 18570 | 683 | > | | | 18473 | 806 | > | | | 21020 | 643 | > | | | 21764 | 652 | > +-----------+--------------+-------------+------------+ > The values are obtained by surrounding the calls to iommu_map_sg() > (with default_iommu_map_sg() helper used as .map_sg() callback) and > iommu_unmap() with ktime-based time measurement code. Taken 4 samples > of every buffer size. ~18M means around 17-19M due do the variance > in requested buffer sizes. It would be interesting to know how much of the gain here is due to batching up the TLB maintenance vs. doing the DMA sync in fewer total pieces (with correspondingly fewer barriers). For the drivers which use the io-pgtable framework, trying to do anything about the latter (where it's relevant) looks essentially impossible without rewriting the whole thing, but the former is definitely something we should be able to handle and benefit from. It certainly seems like a reasonable way to get closer to the kind of iommu_map_range()/iommu_unmap_range() operations proposed before, but with less trampling on the external API. Plus it's nicer than the alternative workaround of having the driver claim anything larger than your basic page size is valid so you can batch up most of your pagetable updates behind the API's back. Robin. > Tomasz Figa (2): > iommu: Add support for out of band flushing > iommu/tegra-smmu: Make the driver use out of band flushing > > Vince Hsu (1): > memory: tegra: add TLB cache line size > > drivers/iommu/iommu.c | 33 +++++++++++++-- > drivers/iommu/tegra-smmu.c | 91 +++++++++++++++++++++++++++++++++++++---- > drivers/memory/tegra/tegra114.c | 1 + > drivers/memory/tegra/tegra124.c | 3 ++ > drivers/memory/tegra/tegra210.c | 1 + > drivers/memory/tegra/tegra30.c | 1 + > include/linux/iommu.h | 2 + > include/soc/tegra/mc.h | 1 + > 8 files changed, 122 insertions(+), 11 deletions(-) > ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org>]
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org> @ 2015-09-29 14:32 ` Russell King - ARM Linux [not found] ` <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Russell King - ARM Linux @ 2015-09-29 14:32 UTC (permalink / raw) To: Robin Murphy Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Tomeu Vizoso, Arnd Bergmann, Stephen Warren, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tomasz Figa, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Thierry Reding, Vince Hsu, Nicolas Iooss, linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Antonios Motakis, Mikko Perttunen On Tue, Sep 29, 2015 at 03:20:38PM +0100, Robin Murphy wrote: > A single callback doesn't really generalise well enough: If we wanted to > implement this in the ARM SMMU drivers to optimise the unmap() case [ask > Will how long he spends waiting for a software model to tear down an entire > VFIO domain invalidating one page at a time ;)], then we'd either regress > performance in the map() case with an unnecessary TLB flush, or have to do a > table walk in every flush() call to infer what actually needs doing. And this is the problem of frameworks. They get in the way of doing things efficiently. Fine, we have the DMA ops, and that calls a map_sg() method. What we then need is to have a series of standardised library functions which can be called to perform various actions. Consider this: an IOMMU driver gets the raw scatterlist which the driver passed. The IOMMU driver walks the scatterlist, creating the IOMMU side mapping, and writing the device DMA addresses and DMA lengths to the scatterlist, possibly coalescing some of the entries. It remembers the number of scatterlist entries that the DMA operation now requires. The IOMMU code can setup whatever mappings it wants using whatever sizes it wants to satisfy the requested scatterlist. It then goes on to call the arch backend with the original scatterlist, asking it to _only_ deal with the CPU coherency for the mapping. The arch code walks the scatterlist again, this time dealing with the CPU coherency part. Finally, the IOMMU code returns the number of DMA scatterlist entries. When it comes to tearing it down, it's a similar operation to the above, except reversing those actions. The only issue with this approach is that it opens up some of the cache handling to the entire kernel, and that will be _too_ big a target for idiotic driver writers to think they have permission to directly use those interfaces. To solve this, I'd love to be able to have the linker link together certain objects in the kernel build, and then convert some global symbols to be local symbols, thus denying access to functions that driver authors have no business what so ever touching. > Personally I think it would be nicest to have two separate callbacks, e.g. > .map_sync/.unmap_sync, but at the very least some kind of additional > 'direction' kind of parameter would be necessary. No, not more callbacks - that's the framework thinking, not the library thinking. -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>]
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org> @ 2015-09-29 16:27 ` Robin Murphy [not found] ` <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Robin Murphy @ 2015-09-29 16:27 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Tomasz Figa, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann, Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Thierry Reding, Nicolas Iooss, Vince Hsu, Mikko Perttunen On 29/09/15 15:32, Russell King - ARM Linux wrote: > On Tue, Sep 29, 2015 at 03:20:38PM +0100, Robin Murphy wrote: >> A single callback doesn't really generalise well enough: If we wanted to >> implement this in the ARM SMMU drivers to optimise the unmap() case [ask >> Will how long he spends waiting for a software model to tear down an entire >> VFIO domain invalidating one page at a time ;)], then we'd either regress >> performance in the map() case with an unnecessary TLB flush, or have to do a >> table walk in every flush() call to infer what actually needs doing. > > And this is the problem of frameworks. They get in the way of doing > things efficiently. > > Fine, we have the DMA ops, and that calls a map_sg() method. What we > then need is to have a series of standardised library functions which > can be called to perform various actions. > Consider this: an IOMMU driver gets the raw scatterlist which the > driver passed. The IOMMU driver walks the scatterlist, creating the > IOMMU side mapping, and writing the device DMA addresses and DMA lengths > to the scatterlist, possibly coalescing some of the entries. It > remembers the number of scatterlist entries that the DMA operation now > requires. The IOMMU code can setup whatever mappings it wants using ... and making that elided "setup whatever mappings it wants" step more efficient is the sole thing that this patch set is trying to address. I apologise for not really following what you're getting at here. > whatever sizes it wants to satisfy the requested scatterlist. > > It then goes on to call the arch backend with the original scatterlist, > asking it to _only_ deal with the CPU coherency for the mapping. The > arch code walks the scatterlist again, this time dealing with the CPU > coherency part. > > Finally, the IOMMU code returns the number of DMA scatterlist entries. > > When it comes to tearing it down, it's a similar operation to the above, > except reversing those actions. > > The only issue with this approach is that it opens up some of the cache > handling to the entire kernel, and that will be _too_ big a target for > idiotic driver writers to think they have permission to directly use > those interfaces. To solve this, I'd love to be able to have the linker > link together certain objects in the kernel build, and then convert some > global symbols to be local symbols, thus denying access to functions that > driver authors have no business what so ever touching. > >> Personally I think it would be nicest to have two separate callbacks, e.g. >> .map_sync/.unmap_sync, but at the very least some kind of additional >> 'direction' kind of parameter would be necessary. > > No, not more callbacks - that's the framework thinking, not the library > thinking. Eh, swings and roundabouts. An argument denoting whether the flush is being called on the map or unmap path would be fine, it just means some implementations will be doing an extra no-op function call half the time. On closer inspection, the code in patch 3 _is_ using a table walk to figure out if the IOVA has been mapped or unmapped, it just happens that this particular implementation needs to do that walk anyway to sync the PTE updates, so gets away with it. Robin. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org>]
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org> @ 2015-09-29 16:40 ` Russell King - ARM Linux [not found] ` <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org> 0 siblings, 1 reply; 17+ messages in thread From: Russell King - ARM Linux @ 2015-09-29 16:40 UTC (permalink / raw) To: Robin Murphy Cc: Olav Haugan, Alexandre Courbot, Paul Walmsley, Tomeu Vizoso, Arnd Bergmann, Stephen Warren, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Tomasz Figa, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Thierry Reding, Vince Hsu, Nicolas Iooss, linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Antonios Motakis, Mikko Perttunen On Tue, Sep 29, 2015 at 05:27:12PM +0100, Robin Murphy wrote: > Eh, swings and roundabouts. An argument denoting whether the flush is being > called on the map or unmap path would be fine, Sorry, that statement is wrong. It's not about whether you flush before or after the DMA operation. I'm afraid I'm probably going to tell you how to suck eggs here, because I don't think you quite "get it" with non-dma-coherent modern CPUs. Modern CPUs prefetch data into their caches, and they also randomly write back data from their caches to memory. When performing a DMA operation from device to memory, you need to do two things with CPU caches which aren't coherent: 1. Before starting the DMA operation, you need to walk over the memory to be mapped, ensuring that any dirty cache lines are written back. This is to prevent dirty cache lines overwriting data that has already been DMA'd from the device. 2. After the DMA operation has completed, you need to walk over the memory again, invalidating any cache lines which may have been speculatively loaded from that memory while DMA was running. These cache lines may have been loaded prior to the DMA operation placing the new data into memory. So, it's not a before-or-after, you have to always perform write-back cache maintanence prior to any DMA operation, and then invalidate cache maintanence after the DMA operation has completed for any mapping which the DMA may have written to (which means device-to-memory and bidirectional mappings.) -- FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net. ^ permalink raw reply [flat|nested] 17+ messages in thread
[parent not found: <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>]
* Re: [RFC PATCH 0/3] iommu: Add range flush operation [not found] ` <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org> @ 2015-09-29 17:13 ` Robin Murphy 0 siblings, 0 replies; 17+ messages in thread From: Robin Murphy @ 2015-09-29 17:13 UTC (permalink / raw) To: Russell King - ARM Linux Cc: Tomasz Figa, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, Olav Haugan, Alexandre Courbot, Paul Walmsley, Arnd Bergmann, Tomeu Vizoso, Stephen Warren, Antonios Motakis, Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-tegra-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Thierry Reding, Nicolas Iooss, Vince Hsu, Mikko Perttunen On 29/09/15 17:40, Russell King - ARM Linux wrote: > On Tue, Sep 29, 2015 at 05:27:12PM +0100, Robin Murphy wrote: >> Eh, swings and roundabouts. An argument denoting whether the flush is being >> called on the map or unmap path would be fine, > > Sorry, that statement is wrong. It's not about whether you flush before > or after the DMA operation. I'm afraid I'm probably going to tell you > how to suck eggs here, because I don't think you quite "get it" with > non-dma-coherent modern CPUs. > > Modern CPUs prefetch data into their caches, and they also randomly write > back data from their caches to memory. When performing a DMA operation > from device to memory, you need to do two things with CPU caches which > aren't coherent: > > 1. Before starting the DMA operation, you need to walk over the memory to > be mapped, ensuring that any dirty cache lines are written back. This > is to prevent dirty cache lines overwriting data that has already been > DMA'd from the device. > > 2. After the DMA operation has completed, you need to walk over the > memory again, invalidating any cache lines which may have been > speculatively loaded from that memory while DMA was running. These > cache lines may have been loaded prior to the DMA operation placing > the new data into memory. > > So, it's not a before-or-after, you have to always perform write-back > cache maintanence prior to any DMA operation, and then invalidate cache > maintanence after the DMA operation has completed for any mapping which > the DMA may have written to (which means device-to-memory and > bidirectional mappings.) Yup, I'm well aware of all that; in fact you and I have already agreed elsewhere that we can only really get away with using the streaming DMA API to flush IOMMU page table updates _because_ they aren't written back to, thus data only ever goes from CPU->IOMMU and we can skip the problem of where to put an invalidation; you wrote the tegra-smmu code that does this. The coherency of whatever device which made a DMA API call for which the IOMMU API is creating/removing a mapping is irrelevant at this point - this is the DMA operation within the DMA operation. None of which has anything to do with the point I raised, which is that if iommu_unmap() calls iommu_flush(), I want to issue TLB invalidations, but if iommu_map() calls iommu_flush(), I don't. ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2015-09-29 17:13 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-29 5:25 [RFC PATCH 0/3] iommu: Add range flush operation Tomasz Figa
[not found] ` <1443504379-31841-1-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 5:25 ` [RFC PATCH 1/3] iommu: Add support for out of band flushing Tomasz Figa
[not found] ` <1443504379-31841-2-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 9:32 ` Thierry Reding
[not found] ` <20150929093257.GE9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 11:56 ` Tomasz Figa
2015-09-29 5:25 ` [RFC PATCH 2/3] memory: tegra: add TLB cache line size Tomasz Figa
[not found] ` <1443504379-31841-3-git-send-email-tfiga-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org>
2015-09-29 9:43 ` Thierry Reding
[not found] ` <20150929094348.GF9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 12:11 ` Tomasz Figa
2015-09-29 5:25 ` [RFC PATCH 3/3] iommu/tegra-smmu: Make the driver use out of band flushing Tomasz Figa
2015-09-29 9:27 ` [RFC PATCH 0/3] iommu: Add range flush operation Thierry Reding
[not found] ` <20150929092714.GD9460-AwZRO8vwLAwmlAP/+Wk3EA@public.gmane.org>
2015-09-29 11:54 ` Tomasz Figa
2015-09-29 12:22 ` Joerg Roedel
2015-09-29 12:20 ` Joerg Roedel
2015-09-29 14:20 ` Robin Murphy
[not found] ` <560A9E36.9030903-5wv7dgnIgG8@public.gmane.org>
2015-09-29 14:32 ` Russell King - ARM Linux
[not found] ` <20150929143241.GI21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-29 16:27 ` Robin Murphy
[not found] ` <560ABBE0.8020805-5wv7dgnIgG8@public.gmane.org>
2015-09-29 16:40 ` Russell King - ARM Linux
[not found] ` <20150929164014.GL21513-l+eeeJia6m9vn6HldHNs0ANdhmdF6hFW@public.gmane.org>
2015-09-29 17:13 ` Robin Murphy
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).