[PATCH v3 0/4] arm64: IOMMU-backed DMA mapping

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/4] arm64: IOMMU-backed DMA mapping
@ 2015-07-10 19:19 Robin Murphy
  2015-07-10 19:19 ` [PATCH v3 1/4] iommu/iova: Avoid over-allocating when size-aligned Robin Murphy
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Robin Murphy @ 2015-07-10 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

Here we are again, and I really hope this is good to merge this time.

Changes since v2[1]:
- Rework to use default domains. Of course, we don't have the requisite
  default domain support in arm/arm64 IOMMU drivers yet, but the grotty
  workarounds now end up corralled into one place outside the base code.
- Fixed the logic around atomic allocation where coherent devices would
  have ended up with non-cacheable buffers.
- Fixed a missing size-alignment which made only whole-page allocations
  succeed.
- Added more documentation, especially around the really confusing bit.

Once again, branch available at [2].

Looking ahead, I have some half-worked-out prototypes for DT-based
IOMMU group allocation - with that and Laurent's probe deferral series,
we should hopefully be able to implement the rest of default domain
support in the IOMMU drivers, and get rid of the hacks. 

Robin.

[1]:http://thread.gmane.org/gmane.linux.kernel.iommu/9946
[2]:git://linux-arm.org/linux-rm iommu/dma

Robin Murphy (4):
  iommu/iova: Avoid over-allocating when size-aligned
  iommu: Implement common IOMMU ops for DMA mapping
  arm64: Add IOMMU dma_ops
  arm64: Hook up IOMMU dma_ops

 arch/arm64/Kconfig                   |   1 +
 arch/arm64/include/asm/dma-mapping.h |  15 +-
 arch/arm64/mm/dma-mapping.c          | 447 +++++++++++++++++++++++++++++
 drivers/iommu/Kconfig                |   7 +
 drivers/iommu/Makefile               |   1 +
 drivers/iommu/dma-iommu.c            | 536 +++++++++++++++++++++++++++++++++++
 drivers/iommu/intel-iommu.c          |   2 +
 drivers/iommu/iova.c                 |  23 +-
 include/linux/dma-iommu.h            |  84 ++++++
 include/linux/iommu.h                |   1 +
 10 files changed, 1092 insertions(+), 25 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

-- 
1.9.1

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 1/4] iommu/iova: Avoid over-allocating when size-aligned
  2015-07-10 19:19 [PATCH v3 0/4] arm64: IOMMU-backed DMA mapping Robin Murphy
@ 2015-07-10 19:19 ` Robin Murphy
  2015-07-10 19:19 ` [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping Robin Murphy
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Robin Murphy @ 2015-07-10 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

Currently, allocating a size-aligned IOVA region quietly adjusts the
actual allocation size in the process, returning a rounded-up
power-of-two-sized allocation. This results in mismatched behaviour in
the IOMMU driver if the original size was not a power of two, where the
original size is mapped, but the rounded-up IOVA size is unmapped.

Whilst some IOMMUs will happily unmap already-unmapped pages, others
consider this an error, so fix it by computing the necessary alignment
padding without altering the actual allocation size. Also clean up by
making pad_size unsigned, since its callers always pass unsigned values
and negative padding makes little sense here anyway.

CC: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/intel-iommu.c |  2 ++
 drivers/iommu/iova.c        | 23 ++++++-----------------
 2 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a98a7b2..9210159 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3233,6 +3233,8 @@ static struct iova *intel_alloc_iova(struct device *dev,
 
 	/* Restrict dma_mask to the width that the iommu can handle */
 	dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
+	/* Ensure we reserve the whole size-aligned region */
+	nrpages = __roundup_pow_of_two(nrpages);
 
 	if (!dmar_forcedac && dma_mask > DMA_BIT_MASK(32)) {
 		/*
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index b7c3d92..29f2efc 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -120,19 +120,14 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
 	}
 }
 
-/* Computes the padding size required, to make the
- * the start address naturally aligned on its size
+/*
+ * Computes the padding size required, to make the start address
+ * naturally aligned on the power-of-two order of its size
  */
-static int
-iova_get_pad_size(int size, unsigned int limit_pfn)
+static unsigned int
+iova_get_pad_size(unsigned int size, unsigned int limit_pfn)
 {
-	unsigned int pad_size = 0;
-	unsigned int order = ilog2(size);
-
-	if (order)
-		pad_size = (limit_pfn + 1) % (1 << order);
-
-	return pad_size;
+	return (limit_pfn + 1 - size) & (__roundup_pow_of_two(size) - 1);
 }
 
 static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
@@ -265,12 +260,6 @@ alloc_iova(struct iova_domain *iovad, unsigned long size,
 	if (!new_iova)
 		return NULL;
 
-	/* If size aligned is set then round the size to
-	 * to next power of two.
-	 */
-	if (size_aligned)
-		size = __roundup_pow_of_two(size);
-
 	ret = __alloc_and_insert_iova_range(iovad, size, limit_pfn,
 			new_iova, size_aligned);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-10 19:19 [PATCH v3 0/4] arm64: IOMMU-backed DMA mapping Robin Murphy
  2015-07-10 19:19 ` [PATCH v3 1/4] iommu/iova: Avoid over-allocating when size-aligned Robin Murphy
@ 2015-07-10 19:19 ` Robin Murphy
  2015-07-13 12:34   ` Yong Wu
  2015-07-14 17:16   ` Catalin Marinas
  2015-07-10 19:19 ` [PATCH v3 3/4] arm64: Add IOMMU dma_ops Robin Murphy
  2015-07-10 19:19 ` [PATCH v3 4/4] arm64: Hook up " Robin Murphy
  3 siblings, 2 replies; 11+ messages in thread
From: Robin Murphy @ 2015-07-10 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

Taking inspiration from the existing arch/arm code, break out some
generic functions to interface the DMA-API to the IOMMU-API. This will
do the bulk of the heavy lifting for IOMMU-backed dma-mapping.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/Kconfig     |   7 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/dma-iommu.c | 536 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h |  84 ++++++++
 include/linux/iommu.h     |   1 +
 5 files changed, 629 insertions(+)
 create mode 100644 drivers/iommu/dma-iommu.c
 create mode 100644 include/linux/dma-iommu.h

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index f1fb1d3..efb0e66 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -48,6 +48,13 @@ config OF_IOMMU
        def_bool y
        depends on OF && IOMMU_API
 
+# IOMMU-agnostic DMA-mapping layer
+config IOMMU_DMA
+	bool
+	depends on NEED_SG_DMA_LENGTH
+	select IOMMU_API
+	select IOMMU_IOVA
+
 config FSL_PAMU
 	bool "Freescale IOMMU support"
 	depends on PPC32
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index c6dcc51..f465cfb 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,6 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
new file mode 100644
index 0000000..f4559c7
--- /dev/null
+++ b/drivers/iommu/dma-iommu.c
@@ -0,0 +1,536 @@
+/*
+ * A fairly generic DMA-API to IOMMU-API glue layer.
+ *
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * based in part on arch/arm/mm/dma-mapping.c:
+ * Copyright (C) 2000-2004 Russell King
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+
+int iommu_dma_init(void)
+{
+	return iommu_iova_cache_init();
+}
+
+/**
+ * iommu_get_dma_cookie - Acquire DMA-API resources for a domain
+ * @domain: IOMMU domain to prepare for DMA-API usage
+ *
+ * IOMMU drivers should normally call this from their domain_alloc
+ * callback when domain->type == IOMMU_DOMAIN_DMA.
+ */
+int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad;
+
+	if (domain->dma_api_cookie)
+		return -EEXIST;
+
+	iovad = kzalloc(sizeof(*iovad), GFP_KERNEL);
+	domain->dma_api_cookie = iovad;
+
+	return iovad ? 0 : -ENOMEM;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ *
+ * IOMMU drivers should normally call this from their domain_free callback.
+ */
+void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+
+	if (!iovad)
+		return;
+
+	put_iova_domain(iovad);
+	kfree(iovad);
+	domain->dma_api_cookie = NULL;
+}
+EXPORT_SYMBOL(iommu_put_dma_cookie);
+
+/**
+ * iommu_dma_init_domain - Initialise a DMA mapping domain
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @base: IOVA at which the mappable address space starts
+ * @size: Size of IOVA space
+ *
+ * @base and @size should be exact multiples of IOMMU page granularity to
+ * avoid rounding surprises. If necessary, we reserve the page at address 0
+ * to ensure it is an invalid IOVA.
+ */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long order, base_pfn, end_pfn;
+
+	if (!iovad)
+		return -ENODEV;
+
+	/* Use the smallest supported page size for IOVA granularity */
+	order = __ffs(domain->ops->pgsize_bitmap);
+	base_pfn = max_t(unsigned long, 1, base >> order);
+	end_pfn = (base + size - 1) >> order;
+
+	/* Check the domain allows at least some access to the device... */
+	if (domain->geometry.force_aperture) {
+		if (base > domain->geometry.aperture_end ||
+		    base + size <= domain->geometry.aperture_start) {
+			pr_warn("specified DMA range outside IOMMU capability\n");
+			return -EFAULT;
+		}
+		/* ...then finally give it a kicking to make sure it fits */
+		base_pfn = max_t(unsigned long, base_pfn,
+				domain->geometry.aperture_start >> order);
+		end_pfn = min_t(unsigned long, end_pfn,
+				domain->geometry.aperture_end >> order);
+	}
+
+	init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
+	return 0;
+}
+
+/*
+ * IOVAs are IOMMU _input_ addresses, so there still exists the possibility
+ * for static bus translation between device output and IOMMU input (yuck).
+ */
+static inline dma_addr_t dev_dma_addr(struct device *dev, dma_addr_t addr)
+{
+	dma_addr_t offset = (dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT;
+
+	BUG_ON(addr < offset);
+	return addr - offset;
+}
+
+/**
+ * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
+ * @dir: Direction of DMA transfer
+ * @coherent: Is the DMA master cache-coherent?
+ *
+ * Return: corresponding IOMMU API page protection flags
+ */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
+{
+	int prot = coherent ? IOMMU_CACHE : 0;
+
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return prot | IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return prot | IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return prot | IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
+static struct iova *__alloc_iova(struct device *dev, size_t size, bool coherent)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long shift = iova_shift(iovad);
+	unsigned long length = iova_align(iovad, size) >> shift;
+	u64 dma_limit = coherent ? dev->coherent_dma_mask : dma_get_mask(dev);
+
+	/*
+	 * Enforce size-alignment to be safe - there should probably be
+	 * an attribute to control this per-device, or at least per-domain...
+	 */
+	return alloc_iova(iovad, length, dma_limit >> shift, true);
+}
+
+/* The IOVA allocator knows what we mapped, so just unmap whatever that was */
+static void __iommu_dma_unmap(struct iommu_domain *domain, dma_addr_t dma_addr)
+{
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	unsigned long shift = iova_shift(iovad);
+	unsigned long pfn = dma_addr >> shift;
+	struct iova *iova = find_iova(iovad, pfn);
+	size_t size = iova_size(iova) << shift;
+
+	/* ...and if we can't, then something is horribly, horribly wrong */
+	BUG_ON(iommu_unmap(domain, pfn << shift, size) < size);
+	__free_iova(iovad, iova);
+}
+
+static void __iommu_dma_free_pages(struct page **pages, int count)
+{
+	while (count--)
+		__free_page(pages[count]);
+	kvfree(pages);
+}
+
+static struct page **__iommu_dma_alloc_pages(unsigned int count, gfp_t gfp)
+{
+	struct page **pages;
+	unsigned int i = 0, array_size = count * sizeof(*pages);
+
+	if (array_size <= PAGE_SIZE)
+		pages = kzalloc(array_size, GFP_KERNEL);
+	else
+		pages = vzalloc(array_size);
+	if (!pages)
+		return NULL;
+
+	/* IOMMU can map any pages, so himem can also be used here */
+	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
+
+	while (count) {
+		struct page *page = NULL;
+		int j, order = __fls(count);
+
+		/*
+		 * Higher-order allocations are a convenience rather
+		 * than a necessity, hence using __GFP_NORETRY until
+		 * falling back to single-page allocations.
+		 */
+		for (order = min(order, MAX_ORDER); order > 0; order--) {
+			page = alloc_pages(gfp | __GFP_NORETRY, order);
+			if (!page)
+				continue;
+			if (PageCompound(page)) {
+				if (!split_huge_page(page))
+					break;
+				__free_pages(page, order);
+			} else {
+				split_page(page, order);
+				break;
+			}
+		}
+		if (!page)
+			page = alloc_page(gfp);
+		if (!page) {
+			__iommu_dma_free_pages(pages, i);
+			return NULL;
+		}
+		j = 1 << order;
+		count -= j;
+		while (j--)
+			pages[i++] = page++;
+	}
+	return pages;
+}
+
+/**
+ * iommu_dma_free - Free a buffer allocated by iommu_dma_alloc()
+ * @dev: Device which owns this buffer
+ * @pages: Array of buffer pages as returned by iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @handle: DMA address of buffer
+ *
+ * Frees both the pages associated with the buffer, and the array
+ * describing them
+ */
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), *handle);
+	__iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+	*handle = DMA_ERROR_CODE;
+}
+
+/**
+ * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
+ * @dev: Device to allocate memory for. Must be a real device
+ *	 attached to an iommu_dma_domain
+ * @size: Size of buffer in bytes
+ * @gfp: Allocation flags
+ * @prot: IOMMU mapping flags
+ * @coherent: Which dma_mask to base IOVA allocation on
+ * @handle: Out argument for allocated DMA handle
+ * @flush_page: Arch callback to flush a single page from caches as
+ *		necessary. May be NULL for coherent allocations
+ *
+ * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
+ * but an IOMMU which supports smaller pages might not map the whole thing.
+ * For now, the buffer is unconditionally zeroed for compatibility
+ *
+ * Return: Array of struct page pointers describing the buffer,
+ *	   or NULL on failure.
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
+		int prot, bool coherent, dma_addr_t *handle,
+		void (*flush_page)(const void *, phys_addr_t))
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct page **pages;
+	struct sg_table sgt;
+	struct sg_mapping_iter miter;
+	dma_addr_t dma_addr;
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+	*handle = DMA_ERROR_CODE;
+
+	pages = __iommu_dma_alloc_pages(count, gfp);
+	if (!pages)
+		return NULL;
+
+	iova = __alloc_iova(dev, size, coherent);
+	if (!iova)
+		goto out_free_pages;
+
+	size = iova_align(iovad, size);
+	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
+		goto out_free_iova;
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sgt.sgl, sgt.orig_nents, prot)
+			< size)
+		goto out_free_sg;
+
+	/* Using the non-flushing flag since we're doing our own */
+	sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
+	while (sg_miter_next(&miter)) {
+		memset(miter.addr, 0, PAGE_SIZE);
+		if (flush_page)
+			flush_page(miter.addr, page_to_phys(miter.page));
+	}
+	sg_miter_stop(&miter);
+	sg_free_table(&sgt);
+
+	*handle = dma_addr;
+	return pages;
+
+out_free_sg:
+	sg_free_table(&sgt);
+out_free_iova:
+	__free_iova(iovad, iova);
+out_free_pages:
+	__iommu_dma_free_pages(pages, count);
+	return NULL;
+}
+
+/**
+ * iommu_dma_mmap - Map a buffer into provided user VMA
+ * @pages: Array representing buffer from iommu_dma_alloc()
+ * @size: Size of buffer in bytes
+ * @vma: VMA describing requested userspace mapping
+ *
+ * Maps the pages of the buffer in @pages into @vma. The caller is responsible
+ * for verifying the correct size and protection of @vma beforehand.
+ */
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma)
+{
+	unsigned long uaddr = vma->vm_start;
+	unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	int ret = -ENXIO;
+
+	for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
+		ret = vm_insert_page(vma, uaddr, pages[i]);
+		if (ret)
+			break;
+		uaddr += PAGE_SIZE;
+	}
+	return ret;
+}
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot, bool coherent)
+{
+	dma_addr_t dma_addr;
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	phys_addr_t phys = page_to_phys(page) + offset;
+	size_t iova_off = iova_offset(iovad, phys);
+	size_t len = iova_align(iovad, size + iova_off);
+	struct iova *iova = __alloc_iova(dev, len, coherent);
+
+	if (!iova)
+		return DMA_ERROR_CODE;
+
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (!iommu_map(domain, dma_addr, phys - iova_off, len, prot))
+		return dev_dma_addr(dev, dma_addr + iova_off);
+
+	__free_iova(iovad, iova);
+	return DMA_ERROR_CODE;
+}
+
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), handle);
+}
+
+/*
+ * Go and look at iommu_dma_map_sg first; It's OK, I'll wait...
+ *
+ * ...right, now that the scatterlist pages are all contiguous from the
+ * device's viewpoint, we can collapse any buffer segments which run
+ * together (subject to the device's segment limitations), filling in
+ * the DMA fields at the same time as we run through the list.
+ */
+static int __finalise_sg(struct device *dev, struct scatterlist *sg, int nents,
+		dma_addr_t dma_addr)
+{
+	struct scatterlist *s, *seg = sg;
+	unsigned long seg_mask = dma_get_seg_boundary(dev);
+	unsigned int max_len = dma_get_max_seg_size(dev);
+	unsigned int seg_len = 0, seg_dma = 0;
+	int i, count = 1;
+
+	for_each_sg(sg, s, nents, i) {
+		/* Un-swizzling the fields here, hence the naming mismatch */
+		unsigned int s_offset = sg_dma_address(s);
+		unsigned int s_length = sg_dma_len(s);
+		unsigned int s_dma_len = s->length;
+
+		s->offset = s_offset;
+		s->length = s_length;
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+
+		/*
+		 * This ensures any concatenation we do doesn't exceed the
+		 * dma_parms limits, but it also won't fail if any segments
+		 * were out of spec to begin with - they'll just stay as-is.
+		 */
+		if (seg_len && (seg_dma + seg_len == dma_addr + s_offset) &&
+		    (seg_len + s_dma_len <= max_len) &&
+		    ((seg_dma & seg_mask) <= seg_mask - (seg_len + s_length))
+		   ) {
+			sg_dma_len(seg) += s_dma_len;
+		} else {
+			if (seg_len) {
+				seg = sg_next(seg);
+				count++;
+			}
+			sg_dma_len(seg) = s_dma_len - s_offset;
+			sg_dma_address(seg) = dma_addr + s_offset;
+
+			seg_len = s_offset;
+			seg_dma = dma_addr + s_offset;
+		}
+		seg_len += s_length;
+		dma_addr += s_dma_len;
+	}
+	return count;
+}
+
+/*
+ * If mapping failed, then just restore the original list,
+ * but making sure the DMA fields are invalidated.
+ */
+static void __invalidate_sg(struct scatterlist *sg, int nents)
+{
+	struct scatterlist *s;
+	int i;
+
+	for_each_sg(sg, s, nents, i) {
+		if (sg_dma_address(s) != DMA_ERROR_CODE)
+			s->offset = sg_dma_address(s);
+		if (sg_dma_len(s))
+			s->length = sg_dma_len(s);
+		sg_dma_address(s) = DMA_ERROR_CODE;
+		sg_dma_len(s) = 0;
+	}
+}
+
+/*
+ * The DMA API client is passing in a scatterlist which could describe
+ * any old buffer layout, but the IOMMU API requires everything to be
+ * aligned to IOMMU pages. Hence the need for this complicated bit of
+ * impedance-matching, to be able to hand off a suitably-aligned list,
+ * but still preserve the original offsets and sizes for the caller.
+ */
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot, bool coherent)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iova_domain *iovad = domain->dma_api_cookie;
+	struct iova *iova;
+	struct scatterlist *s;
+	dma_addr_t dma_addr;
+	size_t iova_len = 0;
+	int i;
+
+	/*
+	 * Work out how much IOVA space we need, and align the segments to
+	 * IOVA granules for the IOMMU driver to handle. With some clever
+	 * trickery we can modify the list in-place, but reversibly, by
+	 * hiding the original data in the as-yet-unused DMA fields.
+	 */
+	for_each_sg(sg, s, nents, i) {
+		size_t s_offset = iova_offset(iovad, s->offset);
+		size_t s_length = s->length;
+
+		sg_dma_address(s) = s->offset;
+		sg_dma_len(s) = s_length;
+		s->offset -= s_offset;
+		s_length = iova_align(iovad, s_length + s_offset);
+		s->length = s_length;
+
+		iova_len += s_length;
+	}
+
+	iova = __alloc_iova(dev, iova_len, coherent);
+	if (!iova)
+		goto out_restore_sg;
+
+	/*
+	 * We'll leave any physical concatenation to the IOMMU driver's
+	 * implementation - it knows better than we do.
+	 */
+	dma_addr = iova_dma_addr(iovad, iova);
+	if (iommu_map_sg(domain, dma_addr, sg, nents, prot) < iova_len)
+		goto out_free_iova;
+
+	return __finalise_sg(dev, sg, nents, dev_dma_addr(dev, dma_addr));
+
+out_free_iova:
+	__free_iova(iovad, iova);
+out_restore_sg:
+	__invalidate_sg(sg, nents);
+	return 0;
+}
+
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs)
+{
+	/*
+	 * The scatterlist segments are mapped contiguously
+	 * in IOVA space, so this is incredibly easy.
+	 */
+	__iommu_dma_unmap(iommu_get_domain_for_dev(dev), sg_dma_address(sg));
+}
+
+int iommu_dma_supported(struct device *dev, u64 mask)
+{
+	/*
+	 * 'Special' IOMMUs which don't have the same addressing capability
+	 * as the CPU will have to wait until we have some way to query that
+	 * before they'll be able to use this framework.
+	 */
+	return 1;
+}
+
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
+{
+	return dma_addr == DMA_ERROR_CODE;
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
new file mode 100644
index 0000000..0fbefac
--- /dev/null
+++ b/include/linux/dma-iommu.h
@@ -0,0 +1,84 @@
+/*
+ * Copyright (C) 2014-2015 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+#ifndef __DMA_IOMMU_H
+#define __DMA_IOMMU_H
+
+#ifdef __KERNEL__
+
+#include <linux/iommu.h>
+
+#ifdef CONFIG_IOMMU_DMA
+
+int iommu_dma_init(void);
+
+/* Domain management interface for IOMMU drivers */
+int iommu_get_dma_cookie(struct iommu_domain *domain);
+void iommu_put_dma_cookie(struct iommu_domain *domain);
+
+/* Setup call for arch DMA mapping code */
+int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, u64 size);
+
+/* General helpers for DMA-API <-> IOMMU-API interaction */
+int dma_direction_to_prot(enum dma_data_direction dir, bool coherent);
+
+/*
+ * These implement the bulk of the relevant DMA mapping callbacks, but require
+ * the arch code to take care of attributes and cache maintenance
+ */
+struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
+		int prot, bool coherent, dma_addr_t *handle,
+		void (*flush_page)(const void *, phys_addr_t));
+void iommu_dma_free(struct device *dev, struct page **pages, size_t size,
+		dma_addr_t *handle);
+
+int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct *vma);
+
+dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
+		unsigned long offset, size_t size, int prot, bool coherent);
+int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
+		int nents, int prot, bool coherent);
+
+/*
+ * Arch code with no special attribute handling may use these
+ * directly as DMA mapping callbacks for simplicity
+ */
+void iommu_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
+		enum dma_data_direction dir, struct dma_attrs *attrs);
+int iommu_dma_supported(struct device *dev, u64 mask);
+int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
+
+#else
+
+static inline int iommu_dma_init(void)
+{
+	return 0;
+}
+
+static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
+{
+	return 0;
+}
+
+static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
+{
+}
+
+#endif	/* CONFIG_IOMMU_DMA */
+
+#endif	/* __KERNEL__ */
+#endif	/* __DMA_IOMMU_H */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index dc767f7..19eee27 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -81,6 +81,7 @@ struct iommu_domain {
 	iommu_fault_handler_t handler;
 	void *handler_token;
 	struct iommu_domain_geometry geometry;
+	void *dma_api_cookie;
 };
 
 enum iommu_cap {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] arm64: Add IOMMU dma_ops
  2015-07-10 19:19 [PATCH v3 0/4] arm64: IOMMU-backed DMA mapping Robin Murphy
  2015-07-10 19:19 ` [PATCH v3 1/4] iommu/iova: Avoid over-allocating when size-aligned Robin Murphy
  2015-07-10 19:19 ` [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping Robin Murphy
@ 2015-07-10 19:19 ` Robin Murphy
  2015-07-15  9:31   ` Catalin Marinas
  2015-07-10 19:19 ` [PATCH v3 4/4] arm64: Hook up " Robin Murphy
  3 siblings, 1 reply; 11+ messages in thread
From: Robin Murphy @ 2015-07-10 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

Taking some inspiration from the arch/arm code, implement the
arch-specific side of the DMA mapping ops using the new IOMMU-DMA layer.

Unfortunately the device setup code has to start out as a big ugly mess
in order to work usefully right now, as 'proper' operation depends on
changes to device probe and DMA configuration ordering, IOMMU groups for
platform devices, and default domain support in arm/arm64 IOMMU drivers.
The workarounds here need only exist until that work is finished.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/mm/dma-mapping.c | 423 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 423 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index d16a1ce..ccadfd4 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -526,3 +526,426 @@ static int __init dma_debug_do_init(void)
 	return 0;
 }
 fs_initcall(dma_debug_do_init);
+
+
+#ifdef CONFIG_IOMMU_DMA
+#include <linux/dma-iommu.h>
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+/* Thankfully, all cache ops are by VA so we can ignore phys here */
+static void flush_page(const void *virt, phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+
+	if (gfp & __GFP_WAIT) {
+		struct page **pages;
+		pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
+					     __pgprot(PROT_NORMAL_NC);
+
+		pgprot = __get_dma_pgprot(attrs, pgprot, coherent);
+		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	coherent,
+					handle, coherent ? NULL : flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, size, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = __alloc_from_pool(size, &page, gfp);
+		}
+		if (addr) {
+			*handle = iommu_dma_map_page(dev, page, 0, size,
+						     ioprot, false);
+			if (iommu_dma_mapping_error(dev, *handle)) {
+				if (coherent)
+					__free_pages(page, get_order(size));
+				else
+					__free_from_pool(addr, size);
+				addr = NULL;
+			}
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (__free_from_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, size, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+					     is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size,
+						 prot, coherent);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_unmap_area(sg_virt(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		__dma_map_area(sg_virt(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent),
+			coherent);
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device isn't yet fully created, and the
+ * IOMMU driver hasn't seen it yet, so we need this delayed attachment
+ * dance. Once IOMMU probe ordering is sorted to move the
+ * arch_setup_dma_ops() call later, all the notifier bits below become
+ * unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	struct iommu_domain *dma_domain;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+	/*
+	 * We expect the list to only contain the most recent addition
+	 * (which should be the same device as in @data) so just process
+	 * the whole thing blindly. If any previous attachments did happen
+	 * to fail, they get a free retry since the domains are still live.
+	 */
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (iommu_attach_device(master->dma_domain, master->dev)) {
+			pr_warn("Failed to attach device %s to IOMMU mapping; retaining platform DMA ops\n",
+				dev_name(master->dev));
+		} else {
+			master->dev->archdata.dma_ops = &iommu_dma_ops;
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	int ret;
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int queue_iommu_attach(struct iommu_domain *domain, struct device *dev)
+{
+	struct iommu_dma_notifier_data *iommudata = NULL;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return -ENOMEM;
+
+	iommudata->dev = dev;
+	iommudata->dma_domain = domain;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init arm64_iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+	return ret;
+}
+arch_initcall(arm64_iommu_dma_init);
+
+/* Hijack some domain feature flags for the stop-gap meddling below */
+#define __IOMMU_DOMAIN_ARM64		(1U << 31)
+#define __IOMMU_DOMAIN_ARM64_IOVA	(1U << 30)
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_domain *domain;
+	int err;
+
+	if (!ops)
+		return;
+	/*
+	 * In a perfect world, everything happened in the right order up to
+	 * here, and the IOMMU core has already attached the device to an
+	 * appropriate default domain for us to set up...
+	 */
+	domain = iommu_get_domain_for_dev(dev);
+	if (!domain) {
+		/*
+		 * Urgh. Reliable default domains for platform devices can't
+		 * happen anyway without some sensible way of handling
+		 * non-trivial groups. So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			domain = ops->domain_alloc(IOMMU_DOMAIN_UNMANAGED);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_ARM64;
+		if (!domain->dma_api_cookie) {
+			domain->type |= __IOMMU_DOMAIN_ARM64_IOVA;
+			if (iommu_get_dma_cookie(domain))
+				goto out_put_domain;
+		}
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size)) {
+		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
+			size, dev_name(dev));
+		goto out_put_domain;
+	}
+
+	if (dev->iommu_group)
+		err = iommu_attach_device(domain, dev);
+	else
+		err = queue_iommu_attach(domain, dev);
+
+	if (!err) {
+		dev->archdata.dma_ops = &iommu_dma_ops;
+		return;
+	}
+
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+		iommu_put_dma_cookie(domain);
+	if (domain->type & __IOMMU_DOMAIN_ARM64)
+		ops->domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
+}
+
+#else
+
+static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  struct iommu_ops *iommu)
+{ }
+
+#endif  /* CONFIG_IOMMU_DMA */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 4/4] arm64: Hook up IOMMU dma_ops
  2015-07-10 19:19 [PATCH v3 0/4] arm64: IOMMU-backed DMA mapping Robin Murphy
                   ` (2 preceding siblings ...)
  2015-07-10 19:19 ` [PATCH v3 3/4] arm64: Add IOMMU dma_ops Robin Murphy
@ 2015-07-10 19:19 ` Robin Murphy
  3 siblings, 0 replies; 11+ messages in thread
From: Robin Murphy @ 2015-07-10 19:19 UTC (permalink / raw)
  To: linux-arm-kernel

With iommu_dma_ops in place, hook them up to the configuration code, so
IOMMU-fronted devices will get them automatically.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/dma-mapping.h | 15 +++++++--------
 arch/arm64/mm/dma-mapping.c          | 24 ++++++++++++++++++++++++
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 0f6edb1..3509621 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -72,6 +72,7 @@ config ARM64
 	select HAVE_PERF_USER_STACK_DUMP
 	select HAVE_RCU_TABLE_FREE
 	select HAVE_SYSCALL_TRACEPOINTS
+	select IOMMU_DMA if IOMMU_SUPPORT
 	select IRQ_DOMAIN
 	select IRQ_FORCED_THREADING
 	select MODULES_USE_ELF_RELA
diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index f0d6d0b..7f9edcb 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -56,16 +56,15 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				      struct iommu_ops *iommu, bool coherent)
-{
-	if (!acpi_disabled && !dev->archdata.dma_ops)
-		dev->archdata.dma_ops = dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-}
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
+#ifdef CONFIG_IOMMU_DMA
+void arch_teardown_dma_ops(struct device *dev);
+#define arch_teardown_dma_ops	arch_teardown_dma_ops
+#endif
+
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
 {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index ccadfd4..1e6085a 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -942,6 +942,20 @@ out_no_domain:
 	pr_warn("Failed to set up IOMMU domain for device %s\n", dev_name(dev));
 }
 
+void arch_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_ARM64_IOVA)
+			iommu_put_dma_cookie(domain);
+		if (domain->type & __IOMMU_DOMAIN_ARM64)
+			iommu_domain_free(domain);
+		dev->archdata.dma_ops = NULL;
+	}
+}
+
 #else
 
 static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
@@ -949,3 +963,13 @@ static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 { }
 
 #endif  /* CONFIG_IOMMU_DMA */
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
+{
+	if (!acpi_disabled && !dev->archdata.dma_ops)
+		dev->archdata.dma_ops = dma_ops;
+
+	dev->archdata.dma_coherent = coherent;
+	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-10 19:19 ` [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping Robin Murphy
@ 2015-07-13 12:34   ` Yong Wu
  2015-07-14 17:16   ` Catalin Marinas
  1 sibling, 0 replies; 11+ messages in thread
From: Yong Wu @ 2015-07-13 12:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, 2015-07-10 at 20:19 +0100, Robin Murphy wrote:
> Taking inspiration from the existing arch/arm code, break out some
> generic functions to interface the DMA-API to the IOMMU-API. This will
> do the bulk of the heavy lifting for IOMMU-backed dma-mapping.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/Kconfig     |   7 +
>  drivers/iommu/Makefile    |   1 +
>  drivers/iommu/dma-iommu.c | 536 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dma-iommu.h |  84 ++++++++
>  include/linux/iommu.h     |   1 +
>  5 files changed, 629 insertions(+)
>  create mode 100644 drivers/iommu/dma-iommu.c
>  create mode 100644 include/linux/dma-iommu.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index f1fb1d3..efb0e66 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -48,6 +48,13 @@ config OF_IOMMU
>         def_bool y
>         depends on OF && IOMMU_API
>  
> +# IOMMU-agnostic DMA-mapping layer
> +config IOMMU_DMA
> +	bool
> +	depends on NEED_SG_DMA_LENGTH
> +	select IOMMU_API
> +	select IOMMU_IOVA
> +
>  config FSL_PAMU
>  	bool "Freescale IOMMU support"
>  	depends on PPC32
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index c6dcc51..f465cfb 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,6 +1,7 @@
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> +obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
[...]
> +/**
> + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
> + * @dev: Device to allocate memory for. Must be a real device
> + *	 attached to an iommu_dma_domain
> + * @size: Size of buffer in bytes
> + * @gfp: Allocation flags
> + * @prot: IOMMU mapping flags
> + * @coherent: Which dma_mask to base IOVA allocation on
> + * @handle: Out argument for allocated DMA handle
> + * @flush_page: Arch callback to flush a single page from caches as
> + *		necessary. May be NULL for coherent allocations
> + *
> + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> + * but an IOMMU which supports smaller pages might not map the whole thing.
> + * For now, the buffer is unconditionally zeroed for compatibility
> + *
> + * Return: Array of struct page pointers describing the buffer,
> + *	   or NULL on failure.
> + */
> +struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
> +		int prot, bool coherent, dma_addr_t *handle,
> +		void (*flush_page)(const void *, phys_addr_t))
> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);

Compare with DMA-v2."struct iommu_dma_domain" is deleted and the iommu
domain is got from its iommu_group->domain. So I have to create a
iommu_group.
And the struct iommu_group is defined in iommu.c, I can not write like
this: group->domain = ****.

After check, I have to use iommu_attach_group.
Then our code may like this:
//====
static int mtk_iommu_add_device(struct device *dev)
{
      *******
	if (!dev->archdata.dma_ops)/* Not a iommu client device */
		return -ENODEV;
	group = iommu_group_get(dev);
	if (!group)
		group = iommu_group_alloc();

	ret = iommu_group_add_device(group, dev);

	/* get the mtk_iommu_domain from the master iommu device */
        mtkdom = ****;
	iommu_attach_group(&mtkdom->domain, group); /*attach the iommu domain
*/
	iommu_group_put(group);
	return ret;
}

static int mtk_iommu_attach_device(struct iommu_domain *domain,
				   struct device *dev)
{
	struct mtk_iommu_domain *priv = to_mtk_domain(domain), *imudom;
	struct iommu_group *group;

	/* Reserve one iommu domain as the m4u domain which all 
	 * Multimedia modules share and free the others */
	if (!imudev->archdata.iommu)
		imudev->archdata.iommu = priv;
	else if (imudev->archdata.iommu != priv)
		iommu_domain_free(domain);

	group = iommu_group_get(dev);
      /* return 0 while the attach device is from
__iommu_attach_notifier.
       * the iommu_group will be created in add_device after
mtk-iommu-probe
       */
	if (!group)
		return 0;
	iommu_group_put(group);

	mtk_iommu_init_domain_context(priv); /* init the pagetable */
	mtk_iommu_config(priv, dev, true); /* config the iommu info */

	return 0;
}
//====
    Is it ok? I'm preparing the next patch like this, Could you help
give some suggestion about the flow.
    Thanks very much.

> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	struct iova *iova;
> +	struct page **pages;
> +	struct sg_table sgt;
> +	struct sg_mapping_iter miter;
> +	dma_addr_t dma_addr;
> +	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> +	*handle = DMA_ERROR_CODE;
> +
> +	pages = __iommu_dma_alloc_pages(count, gfp);
> +	if (!pages)
> +		return NULL;
> +
> +	iova = __alloc_iova(dev, size, coherent);
> +	if (!iova)
> +		goto out_free_pages;
> +
> +	size = iova_align(iovad, size);
> +	if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
> +		goto out_free_iova;
> +
> +	dma_addr = iova_dma_addr(iovad, iova);
> +	if (iommu_map_sg(domain, dma_addr, sgt.sgl, sgt.orig_nents, prot)
> +			< size)
> +		goto out_free_sg;
> +
> +	/* Using the non-flushing flag since we're doing our own */
> +	sg_miter_start(&miter, sgt.sgl, sgt.orig_nents, SG_MITER_FROM_SG);
> +	while (sg_miter_next(&miter)) {
> +		memset(miter.addr, 0, PAGE_SIZE);
> +		if (flush_page)
> +			flush_page(miter.addr, page_to_phys(miter.page));
> +	}
> +	sg_miter_stop(&miter);
> +	sg_free_table(&sgt);
> +
> +	*handle = dma_addr;
> +	return pages;
> +
> +out_free_sg:
> +	sg_free_table(&sgt);
> +out_free_iova:
> +	__free_iova(iovad, iova);
> +out_free_pages:
> +	__iommu_dma_free_pages(pages, count);
> +	return NULL;
> +}
> +
[...]
>  enum iommu_cap {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-10 19:19 ` [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping Robin Murphy
  2015-07-13 12:34   ` Yong Wu
@ 2015-07-14 17:16   ` Catalin Marinas
  2015-07-15 15:50     ` Robin Murphy
  1 sibling, 1 reply; 11+ messages in thread
From: Catalin Marinas @ 2015-07-14 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 10, 2015 at 08:19:33PM +0100, Robin Murphy wrote:
> +/*
> + * IOVAs are IOMMU _input_ addresses, so there still exists the possibility
> + * for static bus translation between device output and IOMMU input (yuck).
> + */
> +static inline dma_addr_t dev_dma_addr(struct device *dev, dma_addr_t addr)
> +{
> +	dma_addr_t offset = (dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT;
> +
> +	BUG_ON(addr < offset);
> +	return addr - offset;
> +}

Are these just theoretical or you expect to see some at some point? I
think the dma_limit in __alloc_iova() may also need to take the offset
into account (or just ignore them altogether for now).

> +
> +/**
> + * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
> + * @dir: Direction of DMA transfer
> + * @coherent: Is the DMA master cache-coherent?
> + *
> + * Return: corresponding IOMMU API page protection flags
> + */
> +int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
> +{
> +	int prot = coherent ? IOMMU_CACHE : 0;
> +
> +	switch (dir) {
> +	case DMA_BIDIRECTIONAL:
> +		return prot | IOMMU_READ | IOMMU_WRITE;
> +	case DMA_TO_DEVICE:
> +		return prot | IOMMU_READ;
> +	case DMA_FROM_DEVICE:
> +		return prot | IOMMU_WRITE;
> +	default:
> +		return 0;
> +	}
> +}
> +
> +static struct iova *__alloc_iova(struct device *dev, size_t size, bool coherent)
> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	unsigned long shift = iova_shift(iovad);
> +	unsigned long length = iova_align(iovad, size) >> shift;
> +	u64 dma_limit = coherent ? dev->coherent_dma_mask : dma_get_mask(dev);

"coherent" and "coherent_dma_mask" have related meanings here. As I can
see in patch 3, the coherent information passed all the way to this
function states whether the device is cache coherent or not (and whether
cache maintenance is needed). The coherent_dma_mask refers to an
allocation mask for the dma_alloc_coherent() API but that doesn't
necessarily mean that the device is cache coherent. Similarly, dma_mask
is used for streaming DMA.

You can rename it to coherent_api or simply pass a u64 dma_mask
directly.

[...]
> +/**
> + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
> + * @dev: Device to allocate memory for. Must be a real device
> + *	 attached to an iommu_dma_domain
> + * @size: Size of buffer in bytes
> + * @gfp: Allocation flags
> + * @prot: IOMMU mapping flags
> + * @coherent: Which dma_mask to base IOVA allocation on
> + * @handle: Out argument for allocated DMA handle
> + * @flush_page: Arch callback to flush a single page from caches as
> + *		necessary. May be NULL for coherent allocations
> + *
> + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
> + * but an IOMMU which supports smaller pages might not map the whole thing.
> + * For now, the buffer is unconditionally zeroed for compatibility
> + *
> + * Return: Array of struct page pointers describing the buffer,
> + *	   or NULL on failure.
> + */
> +struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
> +		int prot, bool coherent, dma_addr_t *handle,
> +		void (*flush_page)(const void *, phys_addr_t))

So for this function, coherent should always be true since this is only
used with the coherent DMA API AFAICT.

> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	struct iova *iova;
> +	struct page **pages;
> +	struct sg_table sgt;
> +	struct sg_mapping_iter miter;
> +	dma_addr_t dma_addr;
> +	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
> +
> +	*handle = DMA_ERROR_CODE;
> +
> +	pages = __iommu_dma_alloc_pages(count, gfp);
> +	if (!pages)
> +		return NULL;
> +
> +	iova = __alloc_iova(dev, size, coherent);

And here just __alloc_iova(dev, size, true);

[...]
> +dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> +		unsigned long offset, size_t size, int prot, bool coherent)
> +{
> +	dma_addr_t dma_addr;
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	phys_addr_t phys = page_to_phys(page) + offset;
> +	size_t iova_off = iova_offset(iovad, phys);
> +	size_t len = iova_align(iovad, size + iova_off);
> +	struct iova *iova = __alloc_iova(dev, len, coherent);

Here __alloc_iova(dev, len, false);

[...]
> +/*
> + * The DMA API client is passing in a scatterlist which could describe
> + * any old buffer layout, but the IOMMU API requires everything to be
> + * aligned to IOMMU pages. Hence the need for this complicated bit of
> + * impedance-matching, to be able to hand off a suitably-aligned list,
> + * but still preserve the original offsets and sizes for the caller.
> + */
> +int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
> +		int nents, int prot, bool coherent)
> +{
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iova_domain *iovad = domain->dma_api_cookie;
> +	struct iova *iova;
> +	struct scatterlist *s;
> +	dma_addr_t dma_addr;
> +	size_t iova_len = 0;
> +	int i;
> +
> +	/*
> +	 * Work out how much IOVA space we need, and align the segments to
> +	 * IOVA granules for the IOMMU driver to handle. With some clever
> +	 * trickery we can modify the list in-place, but reversibly, by
> +	 * hiding the original data in the as-yet-unused DMA fields.
> +	 */
> +	for_each_sg(sg, s, nents, i) {
> +		size_t s_offset = iova_offset(iovad, s->offset);
> +		size_t s_length = s->length;
> +
> +		sg_dma_address(s) = s->offset;
> +		sg_dma_len(s) = s_length;
> +		s->offset -= s_offset;
> +		s_length = iova_align(iovad, s_length + s_offset);
> +		s->length = s_length;
> +
> +		iova_len += s_length;
> +	}
> +
> +	iova = __alloc_iova(dev, iova_len, coherent);

__alloc_iova(dev, iova_len, false);

-- 
Catalin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] arm64: Add IOMMU dma_ops
  2015-07-10 19:19 ` [PATCH v3 3/4] arm64: Add IOMMU dma_ops Robin Murphy
@ 2015-07-15  9:31   ` Catalin Marinas
  2015-07-15 16:27     ` Robin Murphy
  0 siblings, 1 reply; 11+ messages in thread
From: Catalin Marinas @ 2015-07-15  9:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jul 10, 2015 at 08:19:34PM +0100, Robin Murphy wrote:
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index d16a1ce..ccadfd4 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -526,3 +526,426 @@ static int __init dma_debug_do_init(void)
>  	return 0;
>  }
>  fs_initcall(dma_debug_do_init);
> +
> +
> +#ifdef CONFIG_IOMMU_DMA
> +#include <linux/dma-iommu.h>
> +#include <linux/platform_device.h>
> +#include <linux/amba/bus.h>
> +
> +/* Thankfully, all cache ops are by VA so we can ignore phys here */
> +static void flush_page(const void *virt, phys_addr_t phys)
> +{
> +	__dma_flush_range(virt, virt + PAGE_SIZE);
> +}
> +
> +static void *__iommu_alloc_attrs(struct device *dev, size_t size,
> +				 dma_addr_t *handle, gfp_t gfp,
> +				 struct dma_attrs *attrs)
> +{
> +	bool coherent = is_device_dma_coherent(dev);
> +	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
> +	void *addr;
> +
> +	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
> +		return NULL;
> +
> +	if (gfp & __GFP_WAIT) {
> +		struct page **pages;
> +		pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
> +					     __pgprot(PROT_NORMAL_NC);
> +
> +		pgprot = __get_dma_pgprot(attrs, pgprot, coherent);
> +		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	coherent,
> +					handle, coherent ? NULL : flush_page);

As I replied already on the other patch, the "coherent" argument here
should always be true.

BTW, why do we need to call flush_page via iommu_dma_alloc() and not
flush the buffer directly in the arch __iommu_alloc_attrs()? We already
have the pointer and size after remapping in the CPU address space), it
would keep the iommu_dma_alloc() simpler.

> +		if (!pages)
> +			return NULL;
> +
> +		addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
> +					      __builtin_return_address(0));
> +		if (!addr)
> +			iommu_dma_free(dev, pages, size, handle);
> +	} else {
> +		struct page *page;
> +		/*
> +		 * In atomic context we can't remap anything, so we'll only
> +		 * get the virtually contiguous buffer we need by way of a
> +		 * physically contiguous allocation.
> +		 */
> +		if (coherent) {
> +			page = alloc_pages(gfp, get_order(size));
> +			addr = page ? page_address(page) : NULL;

We could even use __get_free_pages(gfp & ~__GFP_HIGHMEM) since we don't
have/need highmem on arm64.

> +		} else {
> +			addr = __alloc_from_pool(size, &page, gfp);
> +		}
> +		if (addr) {
> +			*handle = iommu_dma_map_page(dev, page, 0, size,
> +						     ioprot, false);

Why coherent == false?

> +			if (iommu_dma_mapping_error(dev, *handle)) {
> +				if (coherent)
> +					__free_pages(page, get_order(size));
> +				else
> +					__free_from_pool(addr, size);
> +				addr = NULL;
> +			}
> +		}
> +	}
> +	return addr;
> +}

In the second case here (!__GFP_WAIT), do we do any cache maintenance? I
can't see it and it's needed for the !coherent case.

> +static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
> +			       dma_addr_t handle, struct dma_attrs *attrs)
> +{
> +	/*
> +	 * @cpu_addr will be one of 3 things depending on how it was allocated:
> +	 * - A remapped array of pages from iommu_dma_alloc(), for all
> +	 *   non-atomic allocations.
> +	 * - A non-cacheable alias from the atomic pool, for atomic
> +	 *   allocations by non-coherent devices.
> +	 * - A normal lowmem address, for atomic allocations by
> +	 *   coherent devices.
> +	 * Hence how dodgy the below logic looks...
> +	 */
> +	if (__free_from_pool(cpu_addr, size)) {
> +		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
> +	} else if (is_vmalloc_addr(cpu_addr)){
> +		struct vm_struct *area = find_vm_area(cpu_addr);
> +
> +		if (WARN_ON(!area || !area->pages))
> +			return;
> +		iommu_dma_free(dev, area->pages, size, &handle);
> +		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
> +	} else {
> +		__free_pages(virt_to_page(cpu_addr), get_order(size));
> +		iommu_dma_unmap_page(dev, handle, size, 0, NULL);

Just slightly paranoid but it's better to unmap the page from the iommu
space before freeing (in case there is some rogue device still accessing
it).

-- 
Catalin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping
  2015-07-14 17:16   ` Catalin Marinas
@ 2015-07-15 15:50     ` Robin Murphy
  0 siblings, 0 replies; 11+ messages in thread
From: Robin Murphy @ 2015-07-15 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Catalin,

Thanks for the review.

On 14/07/15 18:16, Catalin Marinas wrote:
> On Fri, Jul 10, 2015 at 08:19:33PM +0100, Robin Murphy wrote:
>> +/*
>> + * IOVAs are IOMMU _input_ addresses, so there still exists the possibility
>> + * for static bus translation between device output and IOMMU input (yuck).
>> + */
>> +static inline dma_addr_t dev_dma_addr(struct device *dev, dma_addr_t addr)
>> +{
>> +	dma_addr_t offset = (dma_addr_t)dev->dma_pfn_offset << PAGE_SHIFT;
>> +
>> +	BUG_ON(addr < offset);
>> +	return addr - offset;
>> +}
>
> Are these just theoretical or you expect to see some at some point? I
> think the dma_limit in __alloc_iova() may also need to take the offset
> into account (or just ignore them altogether for now).

Right now I'm not aware of any platform which has both DMA offsets and 
an IOMMU on the same bus, and I would really hope it stays that way. 
This is just extra complication out of attempting to cover every 
possibility, and you're right about the alloc_iova oversight. I'll rip 
it out for simplicity, and remain hopeful that nobody ever builds 
anything mad enough to need it put back.

>> +
>> +/**
>> + * dma_direction_to_prot - Translate DMA API directions to IOMMU API page flags
>> + * @dir: Direction of DMA transfer
>> + * @coherent: Is the DMA master cache-coherent?
>> + *
>> + * Return: corresponding IOMMU API page protection flags
>> + */
>> +int dma_direction_to_prot(enum dma_data_direction dir, bool coherent)
>> +{
>> +	int prot = coherent ? IOMMU_CACHE : 0;
>> +
>> +	switch (dir) {
>> +	case DMA_BIDIRECTIONAL:
>> +		return prot | IOMMU_READ | IOMMU_WRITE;
>> +	case DMA_TO_DEVICE:
>> +		return prot | IOMMU_READ;
>> +	case DMA_FROM_DEVICE:
>> +		return prot | IOMMU_WRITE;
>> +	default:
>> +		return 0;
>> +	}
>> +}
>> +
>> +static struct iova *__alloc_iova(struct device *dev, size_t size, bool coherent)
>> +{
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iova_domain *iovad = domain->dma_api_cookie;
>> +	unsigned long shift = iova_shift(iovad);
>> +	unsigned long length = iova_align(iovad, size) >> shift;
>> +	u64 dma_limit = coherent ? dev->coherent_dma_mask : dma_get_mask(dev);
>
> "coherent" and "coherent_dma_mask" have related meanings here. As I can
> see in patch 3, the coherent information passed all the way to this
> function states whether the device is cache coherent or not (and whether
> cache maintenance is needed). The coherent_dma_mask refers to an
> allocation mask for the dma_alloc_coherent() API but that doesn't
> necessarily mean that the device is cache coherent. Similarly, dma_mask
> is used for streaming DMA.
>
> You can rename it to coherent_api or simply pass a u64 dma_mask
> directly.

Bleh, it seems that at some point along the way I got confused and 
started mistakenly thinking the DMA masks were about the device's 
ability to issue coherent/non-coherent transactions. I'll clean up the 
mess...

> [...]
>> +/**
>> + * iommu_dma_alloc - Allocate and map a buffer contiguous in IOVA space
>> + * @dev: Device to allocate memory for. Must be a real device
>> + *	 attached to an iommu_dma_domain
>> + * @size: Size of buffer in bytes
>> + * @gfp: Allocation flags
>> + * @prot: IOMMU mapping flags
>> + * @coherent: Which dma_mask to base IOVA allocation on
>> + * @handle: Out argument for allocated DMA handle
>> + * @flush_page: Arch callback to flush a single page from caches as
>> + *		necessary. May be NULL for coherent allocations
>> + *
>> + * If @size is less than PAGE_SIZE, then a full CPU page will be allocated,
>> + * but an IOMMU which supports smaller pages might not map the whole thing.
>> + * For now, the buffer is unconditionally zeroed for compatibility
>> + *
>> + * Return: Array of struct page pointers describing the buffer,
>> + *	   or NULL on failure.
>> + */
>> +struct page **iommu_dma_alloc(struct device *dev, size_t size, gfp_t gfp,
>> +		int prot, bool coherent, dma_addr_t *handle,
>> +		void (*flush_page)(const void *, phys_addr_t))
>
> So for this function, coherent should always be true since this is only
> used with the coherent DMA API AFAICT.

Indeed.

>> +{
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iova_domain *iovad = domain->dma_api_cookie;
>> +	struct iova *iova;
>> +	struct page **pages;
>> +	struct sg_table sgt;
>> +	struct sg_mapping_iter miter;
>> +	dma_addr_t dma_addr;
>> +	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
>> +
>> +	*handle = DMA_ERROR_CODE;
>> +
>> +	pages = __iommu_dma_alloc_pages(count, gfp);
>> +	if (!pages)
>> +		return NULL;
>> +
>> +	iova = __alloc_iova(dev, size, coherent);
>
> And here just __alloc_iova(dev, size, true);

In fact, everything it wanted dev for is now available at all the 
callsites, so I'll rejig the whole interface.

Robin.

> [...]
>> +dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>> +		unsigned long offset, size_t size, int prot, bool coherent)
>> +{
>> +	dma_addr_t dma_addr;
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iova_domain *iovad = domain->dma_api_cookie;
>> +	phys_addr_t phys = page_to_phys(page) + offset;
>> +	size_t iova_off = iova_offset(iovad, phys);
>> +	size_t len = iova_align(iovad, size + iova_off);
>> +	struct iova *iova = __alloc_iova(dev, len, coherent);
>
> Here __alloc_iova(dev, len, false);
>
> [...]
>> +/*
>> + * The DMA API client is passing in a scatterlist which could describe
>> + * any old buffer layout, but the IOMMU API requires everything to be
>> + * aligned to IOMMU pages. Hence the need for this complicated bit of
>> + * impedance-matching, to be able to hand off a suitably-aligned list,
>> + * but still preserve the original offsets and sizes for the caller.
>> + */
>> +int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg,
>> +		int nents, int prot, bool coherent)
>> +{
>> +	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>> +	struct iova_domain *iovad = domain->dma_api_cookie;
>> +	struct iova *iova;
>> +	struct scatterlist *s;
>> +	dma_addr_t dma_addr;
>> +	size_t iova_len = 0;
>> +	int i;
>> +
>> +	/*
>> +	 * Work out how much IOVA space we need, and align the segments to
>> +	 * IOVA granules for the IOMMU driver to handle. With some clever
>> +	 * trickery we can modify the list in-place, but reversibly, by
>> +	 * hiding the original data in the as-yet-unused DMA fields.
>> +	 */
>> +	for_each_sg(sg, s, nents, i) {
>> +		size_t s_offset = iova_offset(iovad, s->offset);
>> +		size_t s_length = s->length;
>> +
>> +		sg_dma_address(s) = s->offset;
>> +		sg_dma_len(s) = s_length;
>> +		s->offset -= s_offset;
>> +		s_length = iova_align(iovad, s_length + s_offset);
>> +		s->length = s_length;
>> +
>> +		iova_len += s_length;
>> +	}
>> +
>> +	iova = __alloc_iova(dev, iova_len, coherent);
>
> __alloc_iova(dev, iova_len, false);
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] arm64: Add IOMMU dma_ops
  2015-07-15  9:31   ` Catalin Marinas
@ 2015-07-15 16:27     ` Robin Murphy
  2015-07-15 16:53       ` Catalin Marinas
  0 siblings, 1 reply; 11+ messages in thread
From: Robin Murphy @ 2015-07-15 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 15/07/15 10:31, Catalin Marinas wrote:
> On Fri, Jul 10, 2015 at 08:19:34PM +0100, Robin Murphy wrote:
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index d16a1ce..ccadfd4 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -526,3 +526,426 @@ static int __init dma_debug_do_init(void)
>>   	return 0;
>>   }
>>   fs_initcall(dma_debug_do_init);
>> +
>> +
>> +#ifdef CONFIG_IOMMU_DMA
>> +#include <linux/dma-iommu.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/amba/bus.h>
>> +
>> +/* Thankfully, all cache ops are by VA so we can ignore phys here */
>> +static void flush_page(const void *virt, phys_addr_t phys)
>> +{
>> +	__dma_flush_range(virt, virt + PAGE_SIZE);
>> +}
>> +
>> +static void *__iommu_alloc_attrs(struct device *dev, size_t size,
>> +				 dma_addr_t *handle, gfp_t gfp,
>> +				 struct dma_attrs *attrs)
>> +{
>> +	bool coherent = is_device_dma_coherent(dev);
>> +	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
>> +	void *addr;
>> +
>> +	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
>> +		return NULL;
>> +
>> +	if (gfp & __GFP_WAIT) {
>> +		struct page **pages;
>> +		pgprot_t pgprot = coherent ? __pgprot(PROT_NORMAL) :
>> +					     __pgprot(PROT_NORMAL_NC);
>> +
>> +		pgprot = __get_dma_pgprot(attrs, pgprot, coherent);
>> +		pages = iommu_dma_alloc(dev, size, gfp, ioprot,	coherent,
>> +					handle, coherent ? NULL : flush_page);
>
> As I replied already on the other patch, the "coherent" argument here
> should always be true.
>
> BTW, why do we need to call flush_page via iommu_dma_alloc() and not
> flush the buffer directly in the arch __iommu_alloc_attrs()? We already
> have the pointer and size after remapping in the CPU address space), it
> would keep the iommu_dma_alloc() simpler.

Mostly for the sake of moving arch/arm (and possibly other users) over 
later, where highmem and ATTR_NO_KERNEL_MAPPING make flushing the pages 
at the point of allocation seem the most sensible thing to do. Since 
iommu_dma_alloc already has a temporary scatterlist we can make use of 
the sg mapping iterator there, rather than have separate code to iterate 
over the pages (possibly with open-coded kmap/kunmap) in all the callers.

>> +		if (!pages)
>> +			return NULL;
>> +
>> +		addr = dma_common_pages_remap(pages, size, VM_USERMAP, pgprot,
>> +					      __builtin_return_address(0));
>> +		if (!addr)
>> +			iommu_dma_free(dev, pages, size, handle);
>> +	} else {
>> +		struct page *page;
>> +		/*
>> +		 * In atomic context we can't remap anything, so we'll only
>> +		 * get the virtually contiguous buffer we need by way of a
>> +		 * physically contiguous allocation.
>> +		 */
>> +		if (coherent) {
>> +			page = alloc_pages(gfp, get_order(size));
>> +			addr = page ? page_address(page) : NULL;
>
> We could even use __get_free_pages(gfp & ~__GFP_HIGHMEM) since we don't
> have/need highmem on arm64.

True, but then we'd have to dig the struct page back out to pass through 
to iommu_map_page.

>> +		} else {
>> +			addr = __alloc_from_pool(size, &page, gfp);
>> +		}
>> +		if (addr) {
>> +			*handle = iommu_dma_map_page(dev, page, 0, size,
>> +						     ioprot, false);
>
> Why coherent == false?

I'm not sure I even know any more, but either way it means the wrong 
thing as discussed earlier, so it'll be going away.

>> +			if (iommu_dma_mapping_error(dev, *handle)) {
>> +				if (coherent)
>> +					__free_pages(page, get_order(size));
>> +				else
>> +					__free_from_pool(addr, size);
>> +				addr = NULL;
>> +			}
>> +		}
>> +	}
>> +	return addr;
>> +}
>
> In the second case here (!__GFP_WAIT), do we do any cache maintenance? I
> can't see it and it's needed for the !coherent case.

In the atomic non-coherent case, we're stealing from the atomic pool, so 
addr is already a non-cacheable alias (and alloc_from_pool does 
memset(0) through that). That shouldn't need anything extra, right?

>> +static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
>> +			       dma_addr_t handle, struct dma_attrs *attrs)
>> +{
>> +	/*
>> +	 * @cpu_addr will be one of 3 things depending on how it was allocated:
>> +	 * - A remapped array of pages from iommu_dma_alloc(), for all
>> +	 *   non-atomic allocations.
>> +	 * - A non-cacheable alias from the atomic pool, for atomic
>> +	 *   allocations by non-coherent devices.
>> +	 * - A normal lowmem address, for atomic allocations by
>> +	 *   coherent devices.
>> +	 * Hence how dodgy the below logic looks...
>> +	 */
>> +	if (__free_from_pool(cpu_addr, size)) {
>> +		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
>> +	} else if (is_vmalloc_addr(cpu_addr)){
>> +		struct vm_struct *area = find_vm_area(cpu_addr);
>> +
>> +		if (WARN_ON(!area || !area->pages))
>> +			return;
>> +		iommu_dma_free(dev, area->pages, size, &handle);
>> +		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
>> +	} else {
>> +		__free_pages(virt_to_page(cpu_addr), get_order(size));
>> +		iommu_dma_unmap_page(dev, handle, size, 0, NULL);
>
> Just slightly paranoid but it's better to unmap the page from the iommu
> space before freeing (in case there is some rogue device still accessing
> it).
>

Agreed, I'll switch them round. Similarly, I'll move the zeroing in 
iommu_dma_alloc before the iommu_map too.

Robin.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v3 3/4] arm64: Add IOMMU dma_ops
  2015-07-15 16:27     ` Robin Murphy
@ 2015-07-15 16:53       ` Catalin Marinas
  0 siblings, 0 replies; 11+ messages in thread
From: Catalin Marinas @ 2015-07-15 16:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jul 15, 2015 at 05:27:22PM +0100, Robin Murphy wrote:
> On 15/07/15 10:31, Catalin Marinas wrote:
> >On Fri, Jul 10, 2015 at 08:19:34PM +0100, Robin Murphy wrote:
> >>+			if (iommu_dma_mapping_error(dev, *handle)) {
> >>+				if (coherent)
> >>+					__free_pages(page, get_order(size));
> >>+				else
> >>+					__free_from_pool(addr, size);
> >>+				addr = NULL;
> >>+			}
> >>+		}
> >>+	}
> >>+	return addr;
> >>+}
> >
> >In the second case here (!__GFP_WAIT), do we do any cache maintenance? I
> >can't see it and it's needed for the !coherent case.
> 
> In the atomic non-coherent case, we're stealing from the atomic pool, so
> addr is already a non-cacheable alias (and alloc_from_pool does memset(0)
> through that). That shouldn't need anything extra, right?

You are right, we already flushed the cache for the atomic pool when we
allocated it in atomic_pool_init().

-- 
Catalin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-07-15 16:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-07-10 19:19 [PATCH v3 0/4] arm64: IOMMU-backed DMA mapping Robin Murphy
2015-07-10 19:19 ` [PATCH v3 1/4] iommu/iova: Avoid over-allocating when size-aligned Robin Murphy
2015-07-10 19:19 ` [PATCH v3 2/4] iommu: Implement common IOMMU ops for DMA mapping Robin Murphy
2015-07-13 12:34   ` Yong Wu
2015-07-14 17:16   ` Catalin Marinas
2015-07-15 15:50     ` Robin Murphy
2015-07-10 19:19 ` [PATCH v3 3/4] arm64: Add IOMMU dma_ops Robin Murphy
2015-07-15  9:31   ` Catalin Marinas
2015-07-15 16:27     ` Robin Murphy
2015-07-15 16:53       ` Catalin Marinas
2015-07-10 19:19 ` [PATCH v3 4/4] arm64: Hook up " Robin Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).