* [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware
@ 2011-09-16 17:51 Ohad Ben-Cohen
2011-09-16 17:51 ` [PATCH v3 1/6] iommu/core: " Ohad Ben-Cohen
` (6 more replies)
0 siblings, 7 replies; 36+ messages in thread
From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw)
To: linux-arm-kernel
v2->v3:
- s/KB/KiB/ (David W)
v1->v2:
- split to patches (by keeping the old code around until all drivers are converted) (Joerg)
Ohad Ben-Cohen (6):
iommu/core: split mapping to page sizes as supported by the hardware
iommu/omap: announce supported page sizes
iommu/msm: announce supported page sizes
iommu/amd: announce supported page sizes
iommu/intel: announce supported page sizes
iommu/core: remove the temporary register_iommu_pgsize API
drivers/iommu/amd_iommu.c | 20 ++++++-
drivers/iommu/intel-iommu.c | 20 ++++++-
drivers/iommu/iommu.c | 130 +++++++++++++++++++++++++++++++++++++++----
drivers/iommu/msm_iommu.c | 8 ++-
drivers/iommu/omap-iommu.c | 6 ++-
drivers/iommu/omap-iovmm.c | 12 +---
include/linux/iommu.h | 7 +-
virt/kvm/iommu.c | 4 +-
8 files changed, 177 insertions(+), 30 deletions(-)
--
1.7.4.1
^ permalink raw reply [flat|nested] 36+ messages in thread* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen @ 2011-09-16 17:51 ` Ohad Ben-Cohen 2011-09-27 10:05 ` Roedel, Joerg 2011-09-16 17:51 ` [PATCH v3 2/6] iommu/omap: announce supported page sizes Ohad Ben-Cohen ` (5 subsequent siblings) 6 siblings, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw) To: linux-arm-kernel When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP (and seemingly MSM too) but it would probably not fly with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. This limitation can be dealt with later, if desired. For now, the existing IOMMU API behavior is retained (see: "iommu/intel: announce supported page sizes"). As requested, register_iommu() isn't changed yet, so we can convert the IOMMU drivers in subsequent patches, and after all the drivers are converted, register_iommu will be changed (and the temporary register_iommu_pgsize() will be removed). Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to send the mapping size in bytes instead of in page order. Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm at vger.kernel.org --- drivers/iommu/iommu.c | 158 +++++++++++++++++++++++++++++++++++++++++--- drivers/iommu/omap-iovmm.c | 12 +--- include/linux/iommu.h | 6 +- virt/kvm/iommu.c | 4 +- 4 files changed, 157 insertions(+), 23 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index c68ff29..7c01c8c 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -16,6 +16,8 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#define pr_fmt(fmt) "%s: " fmt, __func__ + #include <linux/kernel.h> #include <linux/bug.h> #include <linux/types.h> @@ -23,15 +25,73 @@ #include <linux/slab.h> #include <linux/errno.h> #include <linux/iommu.h> +#include <linux/bitmap.h> static struct iommu_ops *iommu_ops; +/* bitmap of supported page sizes */ +static unsigned long *iommu_pgsize_bitmap; + +/* number of bits used to represent the supported pages */ +static unsigned int iommu_nr_page_bits; + +/* size of the smallest supported page (in bytes) */ +static unsigned int iommu_min_pagesz; + +/* bit number of the smallest supported page */ +static unsigned int iommu_min_page_idx; + +/** + * register_iommu() - register an IOMMU hardware + * @ops: iommu handlers + * @pgsize_bitmap: bitmap of page sizes supported by the hardware + * @nr_page_bits: size of @pgsize_bitmap (in bits) + * + * Note: this is a temporary function, which will be removed once + * all IOMMU drivers are converted. The only reason it exists is to + * allow splitting the pgsizes changes to several patches in order to ease + * the review. + */ +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap, + unsigned int nr_page_bits) +{ + if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits) + BUG(); + + iommu_ops = ops; + iommu_pgsize_bitmap = pgsize_bitmap; + iommu_nr_page_bits = nr_page_bits; + + /* find the minimum page size and its index only once */ + iommu_min_page_idx = find_first_bit(pgsize_bitmap, nr_page_bits); + iommu_min_pagesz = 1 << iommu_min_page_idx; +} + +/* + * default pagesize bitmap, will be removed once all IOMMU drivers + * are converted + */ +static unsigned long default_iommu_pgsizes = ~0xFFFUL; + void register_iommu(struct iommu_ops *ops) { if (iommu_ops) BUG(); iommu_ops = ops; + + /* + * set default pgsize values, which retain the existing + * IOMMU API behavior: drivers will be called to map + * regions that are sized/aligned to order of 4KiB pages + */ + iommu_pgsize_bitmap = &default_iommu_pgsizes; + iommu_nr_page_bits = BITS_PER_LONG; + + /* find the minimum page size and its index only once */ + iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap, + iommu_nr_page_bits); + iommu_min_pagesz = 1 << iommu_min_page_idx; } bool iommu_found(void) @@ -109,26 +169,104 @@ int iommu_domain_has_cap(struct iommu_domain *domain, EXPORT_SYMBOL_GPL(iommu_domain_has_cap); int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, int gfp_order, int prot) + phys_addr_t paddr, size_t size, int prot) { - size_t size; + int ret = 0; + + /* + * both the virtual address and the physical one, as well as + * the size of the mapping, must be aligned (at least) to the + * size of the smallest page supported by the hardware + */ + if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) { + pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz " + "0x%x\n", iova, (unsigned long)paddr, + (unsigned long)size, iommu_min_pagesz); + return -EINVAL; + } + + pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova, + (unsigned long)paddr, (unsigned long)size); + + while (size) { + unsigned long pgsize = iommu_min_pagesz; + unsigned long idx = iommu_min_page_idx; + unsigned long addr_merge = iova | paddr; + int order; + + /* find the max page size with which iova, paddr are aligned */ + for (;;) { + unsigned long try_pgsize; + + idx = find_next_bit(iommu_pgsize_bitmap, + iommu_nr_page_bits, idx + 1); + + /* no more pages to check ? */ + if (idx >= iommu_nr_page_bits) + break; + + try_pgsize = 1 << idx; - size = 0x1000UL << gfp_order; + /* page too big ? addresses not aligned ? */ + if (size < try_pgsize || + !IS_ALIGNED(addr_merge, try_pgsize)) + break; - BUG_ON(!IS_ALIGNED(iova | paddr, size)); + pgsize = try_pgsize; + } - return iommu_ops->map(domain, iova, paddr, gfp_order, prot); + order = get_order(pgsize); + + pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova, + (unsigned long)paddr, order); + + ret = iommu_ops->map(domain, iova, paddr, order, prot); + if (ret) + break; + + size -= pgsize; + iova += pgsize; + paddr += pgsize; + } + + return ret; } EXPORT_SYMBOL_GPL(iommu_map); -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order) +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) { - size_t size; + int order, unmapped_size, unmapped_order, total_unmapped = 0; + + /* + * The virtual address, as well as the size of the mapping, must be + * aligned (at least) to the size of the smallest page supported + * by the hardware + */ + if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) { + pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n", + iova, (unsigned long)size, iommu_min_pagesz); + return -EINVAL; + } + + pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova, + (unsigned long)size); + + while (size > total_unmapped) { + order = get_order(size - total_unmapped); + + unmapped_order = iommu_ops->unmap(domain, iova, order); + if (unmapped_order < 0) + return unmapped_order; + + pr_debug("unmapped: iova 0x%lx order %d\n", iova, + unmapped_order); - size = 0x1000UL << gfp_order; + unmapped_size = 0x1000UL << unmapped_order; - BUG_ON(!IS_ALIGNED(iova, size)); + iova += unmapped_size; + total_unmapped += unmapped_size; + } - return iommu_ops->unmap(domain, iova, gfp_order); + return get_order(total_unmapped); } EXPORT_SYMBOL_GPL(iommu_unmap); diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c index e8fdb88..f4dea5a 100644 --- a/drivers/iommu/omap-iovmm.c +++ b/drivers/iommu/omap-iovmm.c @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new, unsigned int i, j; struct scatterlist *sg; u32 da = new->da_start; - int order; if (!domain || !sgt) return -EINVAL; @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new, if (bytes_to_iopgsz(bytes) < 0) goto err_out; - order = get_order(bytes); - pr_debug("%s: [%d] %08x %08x(%x)\n", __func__, i, da, pa, bytes); - err = iommu_map(domain, da, pa, order, flags); + err = iommu_map(domain, da, pa, bytes, flags); if (err) goto err_out; @@ -448,10 +445,9 @@ err_out: size_t bytes; bytes = sg->length + sg->offset; - order = get_order(bytes); /* ignore failures.. we're already handling one */ - iommu_unmap(domain, da, order); + iommu_unmap(domain, da, bytes); da += bytes; } @@ -474,12 +470,10 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj, start = area->da_start; for_each_sg(sgt->sgl, sg, sgt->nents, i) { size_t bytes; - int order; bytes = sg->length + sg->offset; - order = get_order(bytes); - err = iommu_unmap(domain, start, order); + err = iommu_unmap(domain, start, bytes); if (err < 0) break; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index d084e87..1806956 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -61,6 +61,8 @@ struct iommu_ops { #ifdef CONFIG_IOMMU_API extern void register_iommu(struct iommu_ops *ops); +extern void register_iommu_pgsize(struct iommu_ops *ops, + unsigned long *pgsize_bitmap, unsigned int nr_page_bits); extern bool iommu_found(void); extern struct iommu_domain *iommu_domain_alloc(void); extern void iommu_domain_free(struct iommu_domain *domain); @@ -69,9 +71,9 @@ extern int iommu_attach_device(struct iommu_domain *domain, extern void iommu_detach_device(struct iommu_domain *domain, struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, int gfp_order, int prot); + phys_addr_t paddr, size_t size, int prot); extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova, - int gfp_order); + size_t size); extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, unsigned long iova); extern int iommu_domain_has_cap(struct iommu_domain *domain, diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 78c80f6..ea142d3 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) /* Map into IO address space */ r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn), - get_order(page_size), flags); + page_size, flags); if (r) { printk(KERN_ERR "kvm_iommu_map_address:" "iommu failed to map pfn=%llx\n", pfn); @@ -293,7 +293,7 @@ static void kvm_iommu_put_pages(struct kvm *kvm, pfn = phys >> PAGE_SHIFT; /* Unmap address from IO address space */ - order = iommu_unmap(domain, gfn_to_gpa(gfn), 0); + order = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE); unmap_pages = 1ULL << order; /* Unpin all pages we just unmapped to not leak any memory */ -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-09-16 17:51 ` [PATCH v3 1/6] iommu/core: " Ohad Ben-Cohen @ 2011-09-27 10:05 ` Roedel, Joerg 2011-09-27 12:26 ` Ohad Ben-Cohen 0 siblings, 1 reply; 36+ messages in thread From: Roedel, Joerg @ 2011-09-27 10:05 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote: > drivers/iommu/iommu.c | 158 +++++++++++++++++++++++++++++++++++++++++--- > drivers/iommu/omap-iovmm.c | 12 +--- > include/linux/iommu.h | 6 +- > virt/kvm/iommu.c | 4 +- > 4 files changed, 157 insertions(+), 23 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index c68ff29..7c01c8c 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -16,6 +16,8 @@ > * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > */ > > +#define pr_fmt(fmt) "%s: " fmt, __func__ > + > #include <linux/kernel.h> > #include <linux/bug.h> > #include <linux/types.h> > @@ -23,15 +25,73 @@ > #include <linux/slab.h> > #include <linux/errno.h> > #include <linux/iommu.h> > +#include <linux/bitmap.h> > > static struct iommu_ops *iommu_ops; > > +/* bitmap of supported page sizes */ > +static unsigned long *iommu_pgsize_bitmap; > + > +/* number of bits used to represent the supported pages */ > +static unsigned int iommu_nr_page_bits; > + > +/* size of the smallest supported page (in bytes) */ > +static unsigned int iommu_min_pagesz; > + > +/* bit number of the smallest supported page */ > +static unsigned int iommu_min_page_idx; > + > +/** > + * register_iommu() - register an IOMMU hardware > + * @ops: iommu handlers > + * @pgsize_bitmap: bitmap of page sizes supported by the hardware > + * @nr_page_bits: size of @pgsize_bitmap (in bits) > + * > + * Note: this is a temporary function, which will be removed once > + * all IOMMU drivers are converted. The only reason it exists is to > + * allow splitting the pgsizes changes to several patches in order to ease > + * the review. > + */ > +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap, > + unsigned int nr_page_bits) I think a plain unsigned long value is sufficient to encode the supported page-sizes. No need to use a full bitmap. > int iommu_map(struct iommu_domain *domain, unsigned long iova, > - phys_addr_t paddr, int gfp_order, int prot) > + phys_addr_t paddr, size_t size, int prot) > { > - size_t size; > + int ret = 0; > + > + /* > + * both the virtual address and the physical one, as well as > + * the size of the mapping, must be aligned (at least) to the > + * size of the smallest page supported by the hardware > + */ > + if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) { > + pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz " > + "0x%x\n", iova, (unsigned long)paddr, > + (unsigned long)size, iommu_min_pagesz); > + return -EINVAL; > + } > + > + pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova, > + (unsigned long)paddr, (unsigned long)size); > + > + while (size) { > + unsigned long pgsize = iommu_min_pagesz; > + unsigned long idx = iommu_min_page_idx; > + unsigned long addr_merge = iova | paddr; > + int order; > + > + /* find the max page size with which iova, paddr are aligned */ > + for (;;) { > + unsigned long try_pgsize; > + > + idx = find_next_bit(iommu_pgsize_bitmap, > + iommu_nr_page_bits, idx + 1); > + > + /* no more pages to check ? */ > + if (idx >= iommu_nr_page_bits) > + break; > + > + try_pgsize = 1 << idx; > > - size = 0x1000UL << gfp_order; > + /* page too big ? addresses not aligned ? */ > + if (size < try_pgsize || > + !IS_ALIGNED(addr_merge, try_pgsize)) > + break; > > - BUG_ON(!IS_ALIGNED(iova | paddr, size)); > + pgsize = try_pgsize; > + } With an unsigned long you can use plain and fast bit_ops instead of the full bitmap functions. > > - return iommu_ops->map(domain, iova, paddr, gfp_order, prot); > + order = get_order(pgsize); > + > + pr_debug("mapping: iova 0x%lx pa 0x%lx order %d\n", iova, > + (unsigned long)paddr, order); > + > + ret = iommu_ops->map(domain, iova, paddr, order, prot); > + if (ret) > + break; > + > + size -= pgsize; > + iova += pgsize; > + paddr += pgsize; > + } > + > + return ret; > } > EXPORT_SYMBOL_GPL(iommu_map); > -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-09-27 10:05 ` Roedel, Joerg @ 2011-09-27 12:26 ` Ohad Ben-Cohen 2011-09-27 13:12 ` Roedel, Joerg 0 siblings, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-27 12:26 UTC (permalink / raw) To: linux-arm-kernel On Tue, Sep 27, 2011 at 1:05 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > On Fri, Sep 16, 2011 at 01:51:41PM -0400, Ohad Ben-Cohen wrote: >> ?int iommu_map(struct iommu_domain *domain, unsigned long iova, >> - ? ? ? ? ? ? phys_addr_t paddr, int gfp_order, int prot) >> + ? ? ? ? ? ? phys_addr_t paddr, size_t size, int prot) >> ?{ >> - ? ? ? size_t size; >> + ? ? ? int ret = 0; >> + >> + ? ? ? /* >> + ? ? ? ?* both the virtual address and the physical one, as well as >> + ? ? ? ?* the size of the mapping, must be aligned (at least) to the >> + ? ? ? ?* size of the smallest page supported by the hardware >> + ? ? ? ?*/ >> + ? ? ? if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) { >> + ? ? ? ? ? ? ? pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz " >> + ? ? ? ? ? ? ? ? ? ? ? "0x%x\n", iova, (unsigned long)paddr, >> + ? ? ? ? ? ? ? ? ? ? ? (unsigned long)size, iommu_min_pagesz); >> + ? ? ? ? ? ? ? return -EINVAL; >> + ? ? ? } >> + >> + ? ? ? pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, (unsigned long)size); >> + >> + ? ? ? while (size) { >> + ? ? ? ? ? ? ? unsigned long pgsize = iommu_min_pagesz; >> + ? ? ? ? ? ? ? unsigned long idx = iommu_min_page_idx; >> + ? ? ? ? ? ? ? unsigned long addr_merge = iova | paddr; >> + ? ? ? ? ? ? ? int order; >> + >> + ? ? ? ? ? ? ? /* find the max page size with which iova, paddr are aligned */ >> + ? ? ? ? ? ? ? for (;;) { >> + ? ? ? ? ? ? ? ? ? ? ? unsigned long try_pgsize; >> + >> + ? ? ? ? ? ? ? ? ? ? ? idx = find_next_bit(iommu_pgsize_bitmap, >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? iommu_nr_page_bits, idx + 1); >> + >> + ? ? ? ? ? ? ? ? ? ? ? /* no more pages to check ? */ >> + ? ? ? ? ? ? ? ? ? ? ? if (idx >= iommu_nr_page_bits) >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? break; >> + >> + ? ? ? ? ? ? ? ? ? ? ? try_pgsize = 1 << idx; >> >> - ? ? ? size ? ? ? ? = 0x1000UL << gfp_order; >> + ? ? ? ? ? ? ? ? ? ? ? /* page too big ? addresses not aligned ? */ >> + ? ? ? ? ? ? ? ? ? ? ? if (size < try_pgsize || >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? !IS_ALIGNED(addr_merge, try_pgsize)) >> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? break; >> >> - ? ? ? BUG_ON(!IS_ALIGNED(iova | paddr, size)); >> + ? ? ? ? ? ? ? ? ? ? ? pgsize = try_pgsize; >> + ? ? ? ? ? ? ? } > > With an unsigned long you can use plain and fast bit_ops instead of the > full bitmap functions. Not sure I follow; the only bit operation I'm using while mapping is find_next_bit() (which is a bitops.h method). What other faster variant are you referring to ? Thanks, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-09-27 12:26 ` Ohad Ben-Cohen @ 2011-09-27 13:12 ` Roedel, Joerg 2011-09-27 13:28 ` Ohad Ben-Cohen 0 siblings, 1 reply; 36+ messages in thread From: Roedel, Joerg @ 2011-09-27 13:12 UTC (permalink / raw) To: linux-arm-kernel On Tue, Sep 27, 2011 at 08:26:29AM -0400, Ohad Ben-Cohen wrote: > > With an unsigned long you can use plain and fast bit_ops instead of the > > full bitmap functions. > > Not sure I follow; the only bit operation I'm using while mapping is > find_next_bit() (which is a bitops.h method). > > What other faster variant are you referring to ? You pass a pointer to an unsigned long for the page-size bitmap. This allows to use an array of unsigned long. But a single unsigned long is sufficient and you can use functions like ffs() and fls() together with shifting. These functions often translate to a single intruction in the binary. The find_next_bit function has much more overhead because it needs to handle the array-of-ulong case. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-09-27 13:12 ` Roedel, Joerg @ 2011-09-27 13:28 ` Ohad Ben-Cohen 2011-09-27 18:14 ` Roedel, Joerg 0 siblings, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-27 13:28 UTC (permalink / raw) To: linux-arm-kernel On Tue, Sep 27, 2011 at 4:12 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > You pass a pointer to an unsigned long for the page-size bitmap. This > allows to use an array of unsigned long. But a single unsigned long is > sufficient This is fine; I can change that if you like it better (though without doing the change below this is probably moot). > and you can use functions like ffs() and fls() together with > shifting. These functions often translate to a single intruction in the > binary. The find_next_bit function has much more overhead because it > needs to handle the array-of-ulong case. So you're suggesting to re-implement find_next_bit() using ffs()/fls() and shifting ? What's the point ? Sure, if we'll have a proven performance issue while using find_next_bit() we can think of doing this, but at this stage, this sounds to me like a premature optimization which isn't too elegant. At this point I strongly prefer readable, maintainable and easier to debug code over optimization which isn't proven to be necessary. Thanks, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-09-27 13:28 ` Ohad Ben-Cohen @ 2011-09-27 18:14 ` Roedel, Joerg 2011-10-02 15:58 ` Ohad Ben-Cohen 0 siblings, 1 reply; 36+ messages in thread From: Roedel, Joerg @ 2011-09-27 18:14 UTC (permalink / raw) To: linux-arm-kernel On Tue, Sep 27, 2011 at 09:28:37AM -0400, Ohad Ben-Cohen wrote: > So you're suggesting to re-implement find_next_bit() using ffs()/fls() > and shifting ? No. I suggest a simpler and shorter algorithm using the bit helpers. Something like that: min_idx = __ffs(iommu_page_sizes); while (size) { /* Max alignment allowed by current physical address */ phys_idx = __ffs(phys); /* Max alignment allowed by current size */ size_idx = __fls(size); /* special case: iova == 0 */ if (likely(phys)) idx = min(phys_idx, size_idx); else idx = size_idx; BUG_ON(idx < min_idx); psize = 1UL << idx; /* search next smaller page-size supported */ while (psize && !(iommu_page_sizes & psize)) psize >>= 1; BUG_ON(psize == 0); iommu_ops->map(domain, iova, phys, get_order(psize), prot); iova += psize; phys += psize; size -= psize; } It is only C-style pseudo-code, of course. These __ffs and __fls lines all translate to a single instruction later. The find_next_bit() function has a lot more overhead because it needs to take account of real bitmaps (arrays of ulong). But this complexity is not required here. And yes, overhead is important when we implement the generic dma-ops on-top of the iommu-api because this will make the iommu_map function a fast-path. So we really care about overhead here. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-09-27 18:14 ` Roedel, Joerg @ 2011-10-02 15:58 ` Ohad Ben-Cohen 2011-10-10 7:40 ` Ohad Ben-Cohen ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-02 15:58 UTC (permalink / raw) To: linux-arm-kernel (sorry for the late response; we had a big holiday here and I was forced away from my keyboard :) On Tue, Sep 27, 2011 at 9:14 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > No. I suggest a simpler and shorter algorithm using the bit helpers. Ok, fair enough. I've revised the patches and attached the main one below; please tell me if it looks ok, and then I'll resubmit the entire patch set. The main changes I've done from your pseudo-code are: 1. removed the internal while loop, and used several bits manipulations to replace it 2. removed the call to get_order(), and instead used the bytes order we already have to deduct it 3. considered iova alignment requirements too 4. removed the BUG_ON()s, since we're checking for minimal size/alignment conditions in the beginning of the map function > And yes, overhead is important when we implement the generic dma-ops > on-top of the iommu-api because this will make the iommu_map function a > fast-path. So we really care about overhead here. Don't get me wrong; I don't underestimate the importance of performance here. I just think that as long as some code is reasonably efficient and not badly designed (so it's not too hard to optimize later if needed), then it's usually better to postpone optimizations until we benchmark and know for certain that the effort in squeezing away several cpu cycles is noticeable. In this case, I'm not entirely convinced that it is, because every map call anyway ends up flushing the caches, which might take hundreds of cpu cycles. In addition, at least the ARM version of find_next_bit doesn't look terribly bad. YMMV though: I didn't find an optimized assembly implementation of find_next_bit for x86, and the generic one does look complex. In addition, the variable page size behavior of the x86 iommu drivers may have caused much more iterations than any of the ARM drivers would have had, so the penalty of running that generic find_next_bit repeatedly could have accumulated into something noticeable. Anyway, here comes the revised patch. Thanks for your review! Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-02 15:58 ` Ohad Ben-Cohen @ 2011-10-10 7:40 ` Ohad Ben-Cohen 2011-10-10 9:47 ` Roedel, Joerg 2011-10-10 12:52 ` KyongHo Cho 2 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-10 7:40 UTC (permalink / raw) To: linux-arm-kernel On Sun, Oct 2, 2011 at 5:58 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote: > Ok, fair enough. I've revised the patches and attached the main one > below; please tell me if it looks ok, and then I'll resubmit the > entire patch set. Ping ? ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-02 15:58 ` Ohad Ben-Cohen 2011-10-10 7:40 ` Ohad Ben-Cohen @ 2011-10-10 9:47 ` Roedel, Joerg 2011-10-10 13:59 ` Ohad Ben-Cohen 2011-10-10 12:52 ` KyongHo Cho 2 siblings, 1 reply; 36+ messages in thread From: Roedel, Joerg @ 2011-10-10 9:47 UTC (permalink / raw) To: linux-arm-kernel Hi Ohad, sorry, I was on vacation last week and had no time to look into this. On Sun, Oct 02, 2011 at 11:58:12AM -0400, Ohad Ben-Cohen wrote: > drivers/iommu/iommu.c | 138 ++++++++++++++++++++++++++++++++++++++++--- > drivers/iommu/omap-iovmm.c | 12 +--- > include/linux/iommu.h | 6 +- > virt/kvm/iommu.c | 4 +- > 4 files changed, 137 insertions(+), 23 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index a7b0862..f23563f 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -16,6 +16,8 @@ > * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > */ > > +#define pr_fmt(fmt) "%s: " fmt, __func__ > + > #include <linux/kernel.h> > #include <linux/bug.h> > #include <linux/types.h> > @@ -23,15 +25,54 @@ > #include <linux/slab.h> > #include <linux/errno.h> > #include <linux/iommu.h> > +#include <linux/bitmap.h> Is this still required? > > static struct iommu_ops *iommu_ops; > > +/* bitmap of supported page sizes */ > +static unsigned long iommu_pgsize_bitmap; > + > +/* size of the smallest supported page (in bytes) */ > +static unsigned int iommu_min_pagesz; > + > +/** > + * register_iommu() - register an IOMMU hardware > + * @ops: iommu handlers > + * @pgsize_bitmap: bitmap of page sizes supported by the hardware > + * > + * Note: this is a temporary function, which will be removed once > + * all IOMMU drivers are converted. The only reason it exists is to > + * allow splitting the pgsizes changes to several patches in order to ease > + * the review. > + */ > +void register_iommu_pgsize(struct iommu_ops *ops, unsigned long pgsize_bitmap) > +{ > + if (iommu_ops || iommu_pgsize_bitmap || !pgsize_bitmap) > + BUG(); > + > + iommu_ops = ops; > + iommu_pgsize_bitmap = pgsize_bitmap; > + > + /* find out the minimum page size only once */ > + iommu_min_pagesz = 1 << __ffs(pgsize_bitmap); > +} Hmm, I thought a little bit about that and came to the conculusion it might be best to just keep the page-sizes as a part of the iommu_ops structure. So there is no need to extend the register_iommu interface. Also, the bus_set_iommu interface is now in the -next branch. Would be good if you rebase the patches to that interface. You can find the current iommu tree with these changes at git://git.8bytes.org/scm/iommu.git > @@ -115,26 +156,103 @@ int iommu_domain_has_cap(struct iommu_domain *domain, > EXPORT_SYMBOL_GPL(iommu_domain_has_cap); > > int iommu_map(struct iommu_domain *domain, unsigned long iova, > - phys_addr_t paddr, int gfp_order, int prot) > + phys_addr_t paddr, size_t size, int prot) > { > - size_t size; > + int ret = 0; > + > + /* > + * both the virtual address and the physical one, as well as > + * the size of the mapping, must be aligned (at least) to the > + * size of the smallest page supported by the hardware > + */ > + if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) { > + pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz " > + "0x%x\n", iova, (unsigned long)paddr, > + (unsigned long)size, iommu_min_pagesz); > + return -EINVAL; > + } > > - size = 0x1000UL << gfp_order; > + pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova, > + (unsigned long)paddr, (unsigned long)size); > > - BUG_ON(!IS_ALIGNED(iova | paddr, size)); > + while (size) { > + unsigned long pgsize, addr_merge = iova | paddr; > + unsigned int pgsize_idx; > > - return iommu_ops->map(domain, iova, paddr, gfp_order, prot); > + /* Max page size that still fits into 'size' */ > + pgsize_idx = __fls(size); > + > + /* need to consider alignment requirements ? */ > + if (likely(addr_merge)) { > + /* Max page size allowed by both iova and paddr */ > + unsigned int align_pgsize_idx = __ffs(addr_merge); > + > + pgsize_idx = min(pgsize_idx, align_pgsize_idx); > + } > + > + /* build a mask of acceptable page sizes */ > + pgsize = (1UL << (pgsize_idx + 1)) - 1; > + > + /* throw away page sizes not supported by the hardware */ > + pgsize &= iommu_pgsize_bitmap; I think we need some care here and check pgsize for 0. A BUG_ON should do. > + > + /* pick the biggest page */ > + pgsize_idx = __fls(pgsize); > + pgsize = 1UL << pgsize_idx; > + > + /* convert index to page order */ > + pgsize_idx -= PAGE_SHIFT; > + > + pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova, > + (unsigned long)paddr, pgsize_idx); > + > + ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot); > + if (ret) > + break; > + > + iova += pgsize; > + paddr += pgsize; > + size -= pgsize; > + } > + > + return ret; > } > EXPORT_SYMBOL_GPL(iommu_map); Otherwise this looks much better, thanks. > > -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order) > +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) > { > - size_t size; > + int order, unmapped_size, unmapped_order, total_unmapped = 0; > + > + /* > + * The virtual address, as well as the size of the mapping, must be > + * aligned (at least) to the size of the smallest page supported > + * by the hardware > + */ > + if (!IS_ALIGNED(iova | size, iommu_min_pagesz)) { > + pr_err("unaligned: iova 0x%lx size 0x%lx min_pagesz 0x%x\n", > + iova, (unsigned long)size, iommu_min_pagesz); > + return -EINVAL; > + } > + > + pr_debug("unmap this: iova 0x%lx size 0x%lx\n", iova, > + (unsigned long)size); > + > + while (size > total_unmapped) { > + order = get_order(size - total_unmapped); > + > + unmapped_order = iommu_ops->unmap(domain, iova, order); I think we should make sure that we call iommu_ops->unmap with the same parameters as iommu_ops->map. Otherwise we still need some page-size complexity in the iommu-drivers. > + if (unmapped_order < 0) > + return unmapped_order; > + > + pr_debug("unmapped: iova 0x%lx order %d\n", iova, > + unmapped_order); > > - size = 0x1000UL << gfp_order; > + unmapped_size = 0x1000UL << unmapped_order; > > - BUG_ON(!IS_ALIGNED(iova, size)); > + iova += unmapped_size; > + total_unmapped += unmapped_size; > + } > > - return iommu_ops->unmap(domain, iova, gfp_order); > + return get_order(total_unmapped); > } When we pass the size now it makes sense to also return the unmapped-size instead of the order. Also the semantics of iommu_unmap need some more discussion I think. We should require that iommu_map/iommu_unmap are called with the same parameters for the same range. Otherwise we can give no guarantees what a unmap-call will unmap. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 9:47 ` Roedel, Joerg @ 2011-10-10 13:59 ` Ohad Ben-Cohen 2011-10-10 15:36 ` Roedel, Joerg 0 siblings, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-10 13:59 UTC (permalink / raw) To: linux-arm-kernel [ -bouncing Hiroshi.DOYU at nokia.com, +not-bouncing hdoyu at nvidia.com : hi Hiroshi :) ] Hi Joerg, On Mon, Oct 10, 2011 at 11:47 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > sorry, I was on vacation last week and had no time to look into this. Sure thing, thanks for replying! >> +#include <linux/bitmap.h> > > Is this still required? Nope, removed, thanks. > Hmm, I thought a little bit about that and came to the conculusion it > might be best to just keep the page-sizes as a part of the iommu_ops > structure. So there is no need to extend the register_iommu interface. Sure. That was one of my initial alternatives, but I decided against it at that time. I'll bring it back - it will help with the bus_set_iommu rebasing. > Also, the bus_set_iommu interface is now in the -next branch. Would be > good if you rebase the patches to that interface. Sure. It's a little tricky though: which branch do I base this on ? Are you ok with me basing this on your 'next' branch ? My current stack depends at least on three branches of yours, so that would be helpful for me (and less merging conflicts for you I guess :). > I think we need some care here and check pgsize for 0. A BUG_ON should > do. I can add it if you prefer, but I don't think it can really happen: basically, it means that we chose a too small and unsupported page bit, which can't happen as long as we check for IS_ALIGNED(iova | paddr | size, iommu_min_pagesz) in the beginning of iommu_map. >> + ? ? ? ? ? ? ? unmapped_order = iommu_ops->unmap(domain, iova, order); > > I think we should make sure that we call iommu_ops->unmap with the same > parameters as iommu_ops->map. Otherwise we still need some page-size > complexity in the iommu-drivers. Ok, let's discuss the semantics of ->unmap(). There isn't a clear documentation of that API (we should probably add some kernel docs after we nail it down now), but judging from the existing users (mainly kvm) and drivers, it seems that iommu_map() and iommu_unmap() aren't symmetric: users rely on unmap() to return the actual size that was unmapped. IOMMU drivers, in turn, should check which page is mapped on 'iova', unmap it, and return its size. This way iommu_unmap() becomes very simple: it just iterates through the region, relying on iommu_ops->unmap() to return the sizes that were actually unmapped (very similar to how amd's iommu_unmap_page works today). This also means that iommu_ops->unmap() doesn't really need a size/order argument and we can remove it (after all drivers fully migrate..). The other approach which you suggest means symmetric iommu_map() and iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(), which is easy, but maybe more concerning is the limitation that it incurs: users will now have to call iommu_unmap() exactly as they called iommu_map() beforehand. Note sure how well this will fly with the existing users (kvm ?) and whether we really want to enforce this (it doesn't mean drivers need to deal with page-size complexity. they are required to unmap a single page at a time, and iommu_unmap() will do the work for them). Another discussion: I think we better change iommu_ops->map() to directly take a 'size' (in bytes) instead of an 'order' (of pages). Most (all?) drivers just immediately do 'size = 0x1000UL << gfp_order', so this whole size -> order -> size back and forth seems redundant. > When we pass the size now it makes sense to also return the > unmapped-size instead of the order. Sure. Thanks for your review, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 13:59 ` Ohad Ben-Cohen @ 2011-10-10 15:36 ` Roedel, Joerg 2011-10-10 17:02 ` Ohad Ben-Cohen 2011-10-10 22:49 ` Ohad Ben-Cohen 0 siblings, 2 replies; 36+ messages in thread From: Roedel, Joerg @ 2011-10-10 15:36 UTC (permalink / raw) To: linux-arm-kernel Hi Ohad, On Mon, Oct 10, 2011 at 09:59:22AM -0400, Ohad Ben-Cohen wrote: > > Also, the bus_set_iommu interface is now in the -next branch. Would be > > good if you rebase the patches to that interface. > > Sure. It's a little tricky though: which branch do I base this on ? > Are you ok with me basing this on your 'next' branch ? My current > stack depends at least on three branches of yours, so that would be > helpful for me (and less merging conflicts for you I guess :). The master branch is best to base your patches on for generic work. For more specific things like omap-only changes you can use the topic branches. > > I think we need some care here and check pgsize for 0. A BUG_ON should > > do. > > I can add it if you prefer, but I don't think it can really happen: > basically, it means that we chose a too small and unsupported page > bit, which can't happen as long as we check for IS_ALIGNED(iova | > paddr | size, iommu_min_pagesz) in the beginning of iommu_map. It can happen when there is a bug somewhere :) So a BUG_ON will yell then and makes debugging easier. An alternative is to use a WARN_ON and let the map-call fail in this case. > Ok, let's discuss the semantics of ->unmap(). > > There isn't a clear documentation of that API (we should probably add > some kernel docs after we nail it down now), but judging from the > existing users (mainly kvm) and drivers, it seems that iommu_map() and > iommu_unmap() aren't symmetric: users rely on unmap() to return the > actual size that was unmapped. IOMMU drivers, in turn, should check > which page is mapped on 'iova', unmap it, and return its size. Right, currently the map/unmap calls are not symetric. But I think they should be to get a clean semantic. Without this requirement and multiple page-sizes in use the iommu-code may has to unmap more address space then requested. The user doesn't know what will be unmapped so it has to make sure that no DMA is happening while unmap runs. When we require the calls to be symetric we can give a guarantee that only the requested region is unmapped and allow DMA to the untouched part of the address-space while unmap() is running. So when the call-places to not follow this restriction we should convert them mid-term. > This way iommu_unmap() becomes very simple: it just iterates through > the region, relying on iommu_ops->unmap() to return the sizes that > were actually unmapped (very similar to how amd's iommu_unmap_page > works today). This also means that iommu_ops->unmap() doesn't really > need a size/order argument and we can remove it (after all drivers > fully migrate..). Yes, somthing like that. Probably the iommu_ops->unmap function should be turned into a unmap_page function call which only takes an iova and no size parameter. The iommu-driver unmaps the page pointing to that iova and returns the size of the page unmapped. This still allows the simple implementation for the unmap-call. This change is no requirement for this patch-set, but if we agree on it this patch-set should keep that direction in mind. > The other approach which you suggest means symmetric iommu_map() and > iommu_unmap(). It means adding a 'paddr' parameter to iommu_unmap(), > which is easy, but maybe more concerning is the limitation that it > incurs: users will now have to call iommu_unmap() exactly as they > called iommu_map() beforehand. Note sure how well this will fly with > the existing users (kvm ?) and whether we really want to enforce this > (it doesn't mean drivers need to deal with page-size complexity. they > are required to unmap a single page at a time, and iommu_unmap() will > do the work for them). It will work with KVM, that is no problem. We don't need to really enforce the calls to be symetric. But we can define that we only give the guarantee about what will be unmapped when the calls are symetric. For example: iommu_map( 0, 0x100000); iommu_unmap(0, 0x100000); /* Guarantee that it will only unmap the range 0-0x100000 */ whereas: iommu_map( 0, 0x100000); iommu_unmap(0, 0x1000); /* Guarantees that 0-0x1000 is unmapped, but other undefined parts of the address space may be unmapped too, up to the whole address space */ The alternative is that we implement page-splitting in the iommu_unmap function. But that introduces complexity I am not sure we really need. KVM for example just unmaps the whole address-space on destruction. For the generic dma_ops this is also not required because the dma_map* functions already have the requirement to be symetric. > > Another discussion: > > I think we better change iommu_ops->map() to directly take a 'size' > (in bytes) instead of an 'order' (of pages). Most (all?) drivers just > immediately do 'size = 0x1000UL << gfp_order', so this whole size -> > order -> size back and forth seems redundant. Yes, this get_order thing should be changes to size long-term. Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 15:36 ` Roedel, Joerg @ 2011-10-10 17:02 ` Ohad Ben-Cohen 2011-10-10 22:01 ` Ohad Ben-Cohen 2011-10-11 10:19 ` Roedel, Joerg 2011-10-10 22:49 ` Ohad Ben-Cohen 1 sibling, 2 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-10 17:02 UTC (permalink / raw) To: linux-arm-kernel Hi Joerg, On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > The master branch is best to base your patches on for generic work. Oh, great. thanks. > It can happen when there is a bug somewhere :) Hmm, bug ? ;) Ok, I'll add a BUG_ON :) > Yes, somthing like that. Probably the iommu_ops->unmap function should > be turned into a unmap_page function call which only takes an iova and > no size parameter. The iommu-driver unmaps the page pointing to that > iova and returns the size of the page unmapped. This still allows the > simple implementation for the unmap-call. Yes, exactly. It will take some time to migrate all drivers (today we have 4 drivers, each of which is implementing a slightly different ->unmap() semantics), but at least let's not accept any new driver that doesn't adhere to this, otherwise it's going to be even harder for the API to evolve. > This change is no requirement for this patch-set, but if we agree on it > this patch-set should keep that direction in mind. Definitely, thanks. > We don't need to really > enforce the calls to be symetric. But we can define that we only give > the guarantee about what will be unmapped when the calls are symetric. Sounds good to me. I'll add this to the kernel doc patch (which I'll submit after this patch set materializes), and when/if we move to symmetric only, we will update it. > The alternative is that we implement page-splitting in the iommu_unmap > function. But that introduces complexity I am not sure we really need. Yeah, me neither. > Yes, this get_order thing should be changes to size long-term. Good. That should be a simple change, I'll do it after this patch set. Thanks, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 17:02 ` Ohad Ben-Cohen @ 2011-10-10 22:01 ` Ohad Ben-Cohen 2011-10-11 10:21 ` Roedel, Joerg 2011-10-11 10:19 ` Roedel, Joerg 1 sibling, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-10 22:01 UTC (permalink / raw) To: linux-arm-kernel > On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: >> The master branch is best to base your patches on for generic work. It looks like the master branch is missing something like this: ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 22:01 ` Ohad Ben-Cohen @ 2011-10-11 10:21 ` Roedel, Joerg 0 siblings, 0 replies; 36+ messages in thread From: Roedel, Joerg @ 2011-10-11 10:21 UTC (permalink / raw) To: linux-arm-kernel On Mon, Oct 10, 2011 at 06:01:06PM -0400, Ohad Ben-Cohen wrote: > > On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > >> The master branch is best to base your patches on for generic work. > > It looks like the master branch is missing something like this: > > >From acb316aa4bcaf383e8cb1580e30c8635e0a34369 Mon Sep 17 00:00:00 2001 > From: Ohad Ben-Cohen <ohad@wizery.com> > Date: Mon, 10 Oct 2011 23:55:51 +0200 > Subject: [PATCH] iommu/core: fix build issue > > Fix this: > > drivers/iommu/iommu.c: In function 'iommu_commit': > drivers/iommu/iommu.c:291: error: 'iommu_ops' undeclared (first use in > this function) > > Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> > --- > drivers/iommu/iommu.c | 4 ++-- > 1 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 909b0d2..a5131f1 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -288,7 +288,7 @@ EXPORT_SYMBOL_GPL(iommu_unmap); > > void iommu_commit(struct iommu_domain *domain) > { > - if (iommu_ops->commit) > - iommu_ops->commit(domain); > + if (domain->ops->commit) > + domain->ops->commit(domain); > } > EXPORT_SYMBOL_GPL(iommu_commit); > -- > 1.7.4.1 Whoops. This iommu_commit branch wasn't meant to be in master already. I'll try to fix that. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 17:02 ` Ohad Ben-Cohen 2011-10-10 22:01 ` Ohad Ben-Cohen @ 2011-10-11 10:19 ` Roedel, Joerg 1 sibling, 0 replies; 36+ messages in thread From: Roedel, Joerg @ 2011-10-11 10:19 UTC (permalink / raw) To: linux-arm-kernel Hi Ohad, On Mon, Oct 10, 2011 at 01:02:46PM -0400, Ohad Ben-Cohen wrote: > On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > > Yes, somthing like that. Probably the iommu_ops->unmap function should > > be turned into a unmap_page function call which only takes an iova and > > no size parameter. The iommu-driver unmaps the page pointing to that > > iova and returns the size of the page unmapped. This still allows the > > simple implementation for the unmap-call. > > Yes, exactly. It will take some time to migrate all drivers (today we > have 4 drivers, each of which is implementing a slightly different > ->unmap() semantics), but at least let's not accept any new driver > that doesn't adhere to this, otherwise it's going to be even harder > for the API to evolve. Agreed. We should change the existing drivers one-by-one. > > We don't need to really enforce the calls to be symetric. But we can > > define that we only give the guarantee about what will be unmapped > > when the calls are symetric. > > Sounds good to me. I'll add this to the kernel doc patch (which I'll > submit after this patch set materializes), and when/if we move to > symmetric only, we will update it. Great. So if no one disagrees with this direction (this means all the other people on Cc :) ) it is the way to. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 15:36 ` Roedel, Joerg 2011-10-10 17:02 ` Ohad Ben-Cohen @ 2011-10-10 22:49 ` Ohad Ben-Cohen 2011-10-11 10:38 ` Roedel, Joerg 2011-10-11 14:59 ` KyongHo Cho 1 sibling, 2 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-10 22:49 UTC (permalink / raw) To: linux-arm-kernel Hi Joerg, On Mon, Oct 10, 2011 at 5:36 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > The master branch is best to base your patches on for generic work. Done. I've revised the patches and attached the main one below; please tell me if it looks ok, and then I'll resubmit the entire patch set. Thanks, Ohad. commit bf1d730b5f4f7631becfcd4be52693d85bfea36b Author: Ohad Ben-Cohen <ohad@wizery.com> Date: Mon Oct 10 23:50:55 2011 +0200 iommu/core: split mapping to page sizes as supported by the hardware When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches, and after all the drivers are converted, register_iommu will be changed (and the temporary default settings will be removed). Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to send the mapping size in bytes instead of in page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm at vger.kernel.org diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 73778b7..909b0d2 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -16,6 +16,8 @@ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ +#define pr_fmt(fmt) "%s: " fmt, __func__ + #include <linux/device.h> #include <linux/kernel.h> #include <linux/bug.h> @@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops) if (bus->iommu_ops != NULL) return -EBUSY; + /* + * Set the default pgsize values, which retain the existing + * IOMMU API behavior: drivers will be called to map + * regions that are sized/aligned to order of 4KiB pages. + * + * This will be removed once all drivers are migrated. + */ + if (!ops->pgsize_bitmap) + ops->pgsize_bitmap = ~0xFFFUL; + + /* find out the minimum page size only once */ + ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap); + bus->iommu_ops = ops; /* Do IOMMU specific setup for this bus-type */ @@ -157,33 +172,117 @@ int iommu_domain_has_cap(struct iommu_domain *domain, EXPORT_SYMBOL_GPL(iommu_domain_has_cap); int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, int gfp_order, int prot) + phys_addr_t paddr, int size, int prot) { - size_t size; + unsigned long orig_iova = iova; + int ret = 0, orig_size = size; if (unlikely(domain->ops->map == NULL)) return -ENODEV; - size = PAGE_SIZE << gfp_order; + /* + * both the virtual address and the physical one, as well as + * the size of the mapping, must be aligned (at least) to the + * size of the smallest page supported by the hardware + */ + if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) { + pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz " + "0x%x\n", iova, (unsigned long)paddr, + size, domain->ops->min_pagesz); + return -EINVAL; + } + + pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova, + (unsigned long)paddr, size); + + while (size) { + unsigned long pgsize, addr_merge = iova | paddr; + unsigned int pgsize_idx; + + /* Max page size that still fits into 'size' */ + pgsize_idx = __fls(size); + + /* need to consider alignment requirements ? */ + if (likely(addr_merge)) { + /* Max page size allowed by both iova and paddr */ + unsigned int align_pgsize_idx = __ffs(addr_merge); + + pgsize_idx = min(pgsize_idx, align_pgsize_idx); + } + + /* build a mask of acceptable page sizes */ + pgsize = (1UL << (pgsize_idx + 1)) - 1; + + /* throw away page sizes not supported by the hardware */ + pgsize &= domain->ops->pgsize_bitmap; + + /* make sure we're still sane */ + BUG_ON(!pgsize); - BUG_ON(!IS_ALIGNED(iova | paddr, size)); + /* pick the biggest page */ + pgsize_idx = __fls(pgsize); + pgsize = 1UL << pgsize_idx; - return domain->ops->map(domain, iova, paddr, gfp_order, prot); + /* convert index to page order */ + pgsize_idx -= PAGE_SHIFT; + + pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova, + (unsigned long)paddr, pgsize_idx); + + ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot); + if (ret) + break; + + iova += pgsize; + paddr += pgsize; + size -= pgsize; + } + + /* unroll mapping in case something went wrong */ + if (ret) + iommu_unmap(domain, orig_iova, orig_size); + + return ret; } EXPORT_SYMBOL_GPL(iommu_map); -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order) +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size) { - size_t size; + int order, unmapped_size, unmapped_order, total_unmapped = 0; if (unlikely(domain->ops->unmap == NULL)) return -ENODEV; - size = PAGE_SIZE << gfp_order; + /* + * The virtual address, as well as the size of the mapping, must be + * aligned (at least) to the size of the smallest page supported + * by the hardware + */ + if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) { + pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n", + iova, size, domain->ops->min_pagesz); + return -EINVAL; + } + + pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size); + + while (size > total_unmapped) { + order = get_order(size - total_unmapped); + + unmapped_order = domain->ops->unmap(domain, iova, order); + if (unmapped_order < 0) + break; + + pr_debug("unmapped: iova 0x%lx order %d\n", iova, + unmapped_order); + + unmapped_size = 0x1000UL << unmapped_order; - BUG_ON(!IS_ALIGNED(iova, size)); + iova += unmapped_size; + total_unmapped += unmapped_size; + } - return domain->ops->unmap(domain, iova, gfp_order); + return total_unmapped; } EXPORT_SYMBOL_GPL(iommu_unmap); diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c index e8fdb88..4a8f761 100644 --- a/drivers/iommu/omap-iovmm.c +++ b/drivers/iommu/omap-iovmm.c @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new, unsigned int i, j; struct scatterlist *sg; u32 da = new->da_start; - int order; if (!domain || !sgt) return -EINVAL; @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain *domain, struct iovm_struct *new, if (bytes_to_iopgsz(bytes) < 0) goto err_out; - order = get_order(bytes); - pr_debug("%s: [%d] %08x %08x(%x)\n", __func__, i, da, pa, bytes); - err = iommu_map(domain, da, pa, order, flags); + err = iommu_map(domain, da, pa, bytes, flags); if (err) goto err_out; @@ -448,10 +445,9 @@ err_out: size_t bytes; bytes = sg->length + sg->offset; - order = get_order(bytes); /* ignore failures.. we're already handling one */ - iommu_unmap(domain, da, order); + iommu_unmap(domain, da, bytes); da += bytes; } @@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain *domain, struct omap_iommu *obj, start = area->da_start; for_each_sg(sgt->sgl, sg, sgt->nents, i) { size_t bytes; - int order; bytes = sg->length + sg->offset; - order = get_order(bytes); - err = iommu_unmap(domain, start, order); - if (err < 0) + err = iommu_unmap(domain, start, bytes); + if (err < bytes) break; dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n", diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 710291f..a10a86c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -48,6 +48,22 @@ struct iommu_domain { #ifdef CONFIG_IOMMU_API +/** + * struct iommu_ops - iommu ops and capabilities + * @domain_init: init iommu domain + * @domain_destroy: destroy iommu domain + * @attach_dev: attach device to an iommu domain + * @detach_dev: detach device from an iommu domain + * @map: map a physically contiguous memory region to an iommu domain + * @unmap: unmap a physically contiguous memory region from an iommu domain + * @iova_to_phys: translate iova to physical address + * @domain_has_cap: domain capabilities query + * @commit: commit iommu domain + * @pgsize_bitmap: bitmap of supported page sizes + * @min_pagesz: smallest page size supported. note: this member is private + * to the IOMMU core, and maintained only for efficiency sake; + * drivers don't need to set it. + */ struct iommu_ops { int (*domain_init)(struct iommu_domain *domain); void (*domain_destroy)(struct iommu_domain *domain); @@ -62,6 +78,8 @@ struct iommu_ops { int (*domain_has_cap)(struct iommu_domain *domain, unsigned long cap); void (*commit)(struct iommu_domain *domain); + unsigned long pgsize_bitmap; + unsigned int min_pagesz; }; extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops); @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain, extern void iommu_detach_device(struct iommu_domain *domain, struct device *dev); extern int iommu_map(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, int gfp_order, int prot); + phys_addr_t paddr, int size, int prot); extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova, - int gfp_order); + int size); extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, unsigned long iova); extern int iommu_domain_has_cap(struct iommu_domain *domain, diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index ca409be..26b3199 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot) /* Map into IO address space */ r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn), - get_order(page_size), flags); + page_size, flags); if (r) { printk(KERN_ERR "kvm_iommu_map_address:" "iommu failed to map pfn=%llx\n", pfn); @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm, while (gfn < end_gfn) { unsigned long unmap_pages; - int order; + int size; /* Get physical address */ phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn)); pfn = phys >> PAGE_SHIFT; /* Unmap address from IO address space */ - order = iommu_unmap(domain, gfn_to_gpa(gfn), 0); - unmap_pages = 1ULL << order; + size = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE); + unmap_pages = 1ULL << get_order(size); /* Unpin all pages we just unmapped to not leak any memory */ kvm_unpin_pages(kvm, pfn, unmap_pages); ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 22:49 ` Ohad Ben-Cohen @ 2011-10-11 10:38 ` Roedel, Joerg 2011-10-11 17:01 ` Ohad Ben-Cohen 2011-10-11 14:59 ` KyongHo Cho 1 sibling, 1 reply; 36+ messages in thread From: Roedel, Joerg @ 2011-10-11 10:38 UTC (permalink / raw) To: linux-arm-kernel On Mon, Oct 10, 2011 at 06:49:32PM -0400, Ohad Ben-Cohen wrote: > -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order) > +int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int size) > { > - size_t size; > + int order, unmapped_size, unmapped_order, total_unmapped = 0; > > if (unlikely(domain->ops->unmap == NULL)) > return -ENODEV; > > - size = PAGE_SIZE << gfp_order; > + /* > + * The virtual address, as well as the size of the mapping, must be > + * aligned (at least) to the size of the smallest page supported > + * by the hardware > + */ > + if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) { > + pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n", > + iova, size, domain->ops->min_pagesz); > + return -EINVAL; > + } > + > + pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size); > + > + while (size > total_unmapped) { > + order = get_order(size - total_unmapped); Hmm, this actually unmaps more than requested, even in the symetric case. Imgine the user wants to unmap a 12kb region, then order will be 4 and this code will tell the iommu-driver to unmap 16kb. You need to make sure that you don;t pass larger regions then requested to the low-level driver. Some logic like in the iommu_map function should do it. > + > + unmapped_order = domain->ops->unmap(domain, iova, order); > + if (unmapped_order < 0) > + break; > + > + pr_debug("unmapped: iova 0x%lx order %d\n", iova, > + unmapped_order); > + > + unmapped_size = 0x1000UL << unmapped_order; Please use PAGE_SIZE instead of 0x1000UL. > > - BUG_ON(!IS_ALIGNED(iova, size)); > + iova += unmapped_size; > + total_unmapped += unmapped_size; > + } > > - return domain->ops->unmap(domain, iova, gfp_order); > + return total_unmapped; > } > EXPORT_SYMBOL_GPL(iommu_unmap); > > diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c > index e8fdb88..4a8f761 100644 > --- a/drivers/iommu/omap-iovmm.c > +++ b/drivers/iommu/omap-iovmm.c > @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain > *domain, struct iovm_struct *new, > unsigned int i, j; > struct scatterlist *sg; > u32 da = new->da_start; > - int order; > > if (!domain || !sgt) > return -EINVAL; > @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain > *domain, struct iovm_struct *new, > if (bytes_to_iopgsz(bytes) < 0) > goto err_out; > > - order = get_order(bytes); > - > pr_debug("%s: [%d] %08x %08x(%x)\n", __func__, > i, da, pa, bytes); > > - err = iommu_map(domain, da, pa, order, flags); > + err = iommu_map(domain, da, pa, bytes, flags); > if (err) > goto err_out; > > @@ -448,10 +445,9 @@ err_out: > size_t bytes; > > bytes = sg->length + sg->offset; > - order = get_order(bytes); > > /* ignore failures.. we're already handling one */ > - iommu_unmap(domain, da, order); > + iommu_unmap(domain, da, bytes); > > da += bytes; > } > @@ -474,13 +470,11 @@ static void unmap_iovm_area(struct iommu_domain > *domain, struct omap_iommu *obj, > start = area->da_start; > for_each_sg(sgt->sgl, sg, sgt->nents, i) { > size_t bytes; > - int order; > > bytes = sg->length + sg->offset; > - order = get_order(bytes); > > - err = iommu_unmap(domain, start, order); > - if (err < 0) > + err = iommu_unmap(domain, start, bytes); > + if (err < bytes) > break; > > dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n", > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 710291f..a10a86c 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -48,6 +48,22 @@ struct iommu_domain { > > #ifdef CONFIG_IOMMU_API > > +/** > + * struct iommu_ops - iommu ops and capabilities > + * @domain_init: init iommu domain > + * @domain_destroy: destroy iommu domain > + * @attach_dev: attach device to an iommu domain > + * @detach_dev: detach device from an iommu domain > + * @map: map a physically contiguous memory region to an iommu domain > + * @unmap: unmap a physically contiguous memory region from an iommu domain > + * @iova_to_phys: translate iova to physical address > + * @domain_has_cap: domain capabilities query > + * @commit: commit iommu domain > + * @pgsize_bitmap: bitmap of supported page sizes > + * @min_pagesz: smallest page size supported. note: this member is private > + * to the IOMMU core, and maintained only for efficiency sake; > + * drivers don't need to set it. > + */ > struct iommu_ops { > int (*domain_init)(struct iommu_domain *domain); > void (*domain_destroy)(struct iommu_domain *domain); > @@ -62,6 +78,8 @@ struct iommu_ops { > int (*domain_has_cap)(struct iommu_domain *domain, > unsigned long cap); > void (*commit)(struct iommu_domain *domain); > + unsigned long pgsize_bitmap; > + unsigned int min_pagesz; > }; > > extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops); > @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain, > extern void iommu_detach_device(struct iommu_domain *domain, > struct device *dev); > extern int iommu_map(struct iommu_domain *domain, unsigned long iova, > - phys_addr_t paddr, int gfp_order, int prot); > + phys_addr_t paddr, int size, int prot); > extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova, > - int gfp_order); > + int size); > extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, > unsigned long iova); > extern int iommu_domain_has_cap(struct iommu_domain *domain, > diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c > index ca409be..26b3199 100644 > --- a/virt/kvm/iommu.c > +++ b/virt/kvm/iommu.c > @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct > kvm_memory_slot *slot) > > /* Map into IO address space */ > r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn), > - get_order(page_size), flags); > + page_size, flags); > if (r) { > printk(KERN_ERR "kvm_iommu_map_address:" > "iommu failed to map pfn=%llx\n", pfn); > @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm, > > while (gfn < end_gfn) { > unsigned long unmap_pages; > - int order; > + int size; > > /* Get physical address */ > phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn)); > pfn = phys >> PAGE_SHIFT; > > /* Unmap address from IO address space */ > - order = iommu_unmap(domain, gfn_to_gpa(gfn), 0); > - unmap_pages = 1ULL << order; > + size = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE); > + unmap_pages = 1ULL << get_order(size); > > /* Unpin all pages we just unmapped to not leak any memory */ > kvm_unpin_pages(kvm, pfn, unmap_pages); -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-11 10:38 ` Roedel, Joerg @ 2011-10-11 17:01 ` Ohad Ben-Cohen 2011-10-14 13:35 ` Roedel, Joerg 0 siblings, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-11 17:01 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > You need to make sure that you don;t pass larger regions then requested > to the low-level driver. Some logic like in the iommu_map function > should do it. You're right. But the solution IMHO should be getting rid of that size -> order -> size conversion which we do back and forth, as it's kinda pointless. > Please use PAGE_SIZE instead of 0x1000UL. This one would go away too if we remove the size -> order -> size conversions. In fact, iommu_unmap() just becomes so much simpler when we do that. I'm attaching two patches: first one that removes the size->order->size conversions we have, and then the usual pgsize patch rebased to it. Please take a look, Thanks, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-11 17:01 ` Ohad Ben-Cohen @ 2011-10-14 13:35 ` Roedel, Joerg 2011-10-14 17:03 ` Ohad Ben-Cohen 0 siblings, 1 reply; 36+ messages in thread From: Roedel, Joerg @ 2011-10-14 13:35 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 11, 2011 at 01:01:23PM -0400, Ohad Ben-Cohen wrote: > On Tue, Oct 11, 2011 at 12:38 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > > You need to make sure that you don;t pass larger regions then requested > > to the low-level driver. Some logic like in the iommu_map function > > should do it. > > You're right. But the solution IMHO should be getting rid of that size > -> order -> size conversion which we do back and forth, as it's kinda > pointless. Right. > > > Please use PAGE_SIZE instead of 0x1000UL. > > This one would go away too if we remove the size -> order -> size > conversions. In fact, iommu_unmap() just becomes so much simpler when > we do that. > > I'm attaching two patches: first one that removes the > size->order->size conversions we have, and then the usual pgsize patch > rebased to it. > > Please take a look, > > Thanks, > Ohad. > > >From 7102bda53a425872591f14f850ab031b1ca5dae1 Mon Sep 17 00:00:00 2001 > From: Ohad Ben-Cohen <ohad@wizery.com> > Date: Tue, 11 Oct 2011 16:50:38 +0200 > Subject: [PATCH 1/2] iommu/core: stop converting bytes to page order > back and forth > > Express sizes in bytes rather than in page order, to eliminate the > size->order->size conversions we have whenever the IOMMU API is calling > the low level drivers' map/unmap methods. > > Adopt all existing drivers. > > Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> > --- > drivers/iommu/amd_iommu.c | 13 +++++-------- > drivers/iommu/intel-iommu.c | 11 ++++------- > drivers/iommu/iommu.c | 8 +++++--- > drivers/iommu/msm_iommu.c | 19 +++++++------------ > drivers/iommu/omap-iommu.c | 14 +++++--------- > include/linux/iommu.h | 6 +++--- > 6 files changed, 29 insertions(+), 42 deletions(-) This patch looks good. Please include it in the page-size patch-set. > drivers/iommu/iommu.c | 112 +++++++++++++++++++++++++++++++++++++++---- > drivers/iommu/omap-iovmm.c | 17 ++---- > include/linux/iommu.h | 24 ++++++++- > virt/kvm/iommu.c | 8 ++-- > 4 files changed, 132 insertions(+), 29 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 5d042e8..9355632 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -16,6 +16,8 @@ > * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA > */ > > +#define pr_fmt(fmt) "%s: " fmt, __func__ > + > #include <linux/device.h> > #include <linux/kernel.h> > #include <linux/bug.h> > @@ -47,6 +49,19 @@ int bus_set_iommu(struct bus_type *bus, struct > iommu_ops *ops) > if (bus->iommu_ops != NULL) > return -EBUSY; > > + /* > + * Set the default pgsize values, which retain the existing > + * IOMMU API behavior: drivers will be called to map > + * regions that are sized/aligned to order of 4KiB pages. > + * > + * This will be removed once all drivers are migrated. > + */ > + if (!ops->pgsize_bitmap) > + ops->pgsize_bitmap = ~0xFFFUL; > + > + /* find out the minimum page size only once */ > + ops->min_pagesz = 1 << __ffs(ops->pgsize_bitmap); Hmm, I'd like to constify the iommu_ops structures in the future. This wouldn't work then anymore. How about renaming it to io_page_size (because it is the base page-size of the iommu) and let the driver set it? > + > bus->iommu_ops = ops; > > /* Do IOMMU specific setup for this bus-type */ > @@ -157,35 +172,110 @@ int iommu_domain_has_cap(struct iommu_domain *domain, > EXPORT_SYMBOL_GPL(iommu_domain_has_cap); > > int iommu_map(struct iommu_domain *domain, unsigned long iova, > - phys_addr_t paddr, int gfp_order, int prot) > + phys_addr_t paddr, size_t size, int prot) > { > - size_t size; > + unsigned long orig_iova = iova; > + size_t orig_size = size; > + int ret = 0; > > if (unlikely(domain->ops->map == NULL)) > return -ENODEV; > > - size = PAGE_SIZE << gfp_order; > + /* > + * both the virtual address and the physical one, as well as > + * the size of the mapping, must be aligned (at least) to the > + * size of the smallest page supported by the hardware > + */ > + if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) { > + pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz " > + "0x%x\n", iova, (unsigned long)paddr, > + size, domain->ops->min_pagesz); > + return -EINVAL; > + } > + > + pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova, > + (unsigned long)paddr, size); > + > + while (size) { > + unsigned long pgsize, addr_merge = iova | paddr; > + unsigned int pgsize_idx; > + > + /* Max page size that still fits into 'size' */ > + pgsize_idx = __fls(size); > + > + /* need to consider alignment requirements ? */ > + if (likely(addr_merge)) { > + /* Max page size allowed by both iova and paddr */ > + unsigned int align_pgsize_idx = __ffs(addr_merge); > + > + pgsize_idx = min(pgsize_idx, align_pgsize_idx); > + } > + > + /* build a mask of acceptable page sizes */ > + pgsize = (1UL << (pgsize_idx + 1)) - 1; > > - BUG_ON(!IS_ALIGNED(iova | paddr, size)); > + /* throw away page sizes not supported by the hardware */ > + pgsize &= domain->ops->pgsize_bitmap; > > - return domain->ops->map(domain, iova, paddr, size, prot); > + /* make sure we're still sane */ > + BUG_ON(!pgsize); > + > + /* pick the biggest page */ > + pgsize_idx = __fls(pgsize); > + pgsize = 1UL << pgsize_idx; > + > + pr_debug("mapping: iova 0x%lx pa 0x%lx pgsize %lu\n", iova, > + (unsigned long)paddr, pgsize); > + > + ret = domain->ops->map(domain, iova, paddr, pgsize, prot); > + if (ret) > + break; > + > + iova += pgsize; > + paddr += pgsize; > + size -= pgsize; > + } > + > + /* unroll mapping in case something went wrong */ > + if (ret) > + iommu_unmap(domain, orig_iova, orig_size - size); > + > + return ret; > } > EXPORT_SYMBOL_GPL(iommu_map); > > -int iommu_unmap(struct iommu_domain *domain, unsigned long iova, int gfp_order) > +size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, > size_t size) > { > - size_t size, unmapped; > + size_t unmapped, left = size; > > if (unlikely(domain->ops->unmap == NULL)) > return -ENODEV; > > - size = PAGE_SIZE << gfp_order; > + /* > + * The virtual address, as well as the size of the mapping, must be > + * aligned (at least) to the size of the smallest page supported > + * by the hardware > + */ > + if (!IS_ALIGNED(iova | size, domain->ops->min_pagesz)) { > + pr_err("unaligned: iova 0x%lx size 0x%x min_pagesz 0x%x\n", > + iova, size, domain->ops->min_pagesz); > + return -EINVAL; > + } > + > + pr_debug("unmap this: iova 0x%lx size 0x%x\n", iova, size); > + > + while (left) { This needs to be (left > 0). The drivers are allowed to unmap more then requested, so this value may turn negative. > + unmapped = domain->ops->unmap(domain, iova, left); > + if (!unmapped) > + break; > > - BUG_ON(!IS_ALIGNED(iova, size)); > + pr_debug("unmapped: iova 0x%lx size %u\n", iova, unmapped); > > - unmapped = domain->ops->unmap(domain, iova, size); > + iova += unmapped; > + left -= unmapped; > + } > > - return get_order(unmapped); > + return size - left; > } > EXPORT_SYMBOL_GPL(iommu_unmap); > > diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c > index e8fdb88..0b7b14c 100644 > --- a/drivers/iommu/omap-iovmm.c > +++ b/drivers/iommu/omap-iovmm.c > @@ -409,7 +409,6 @@ static int map_iovm_area(struct iommu_domain > *domain, struct iovm_struct *new, > unsigned int i, j; > struct scatterlist *sg; > u32 da = new->da_start; > - int order; > > if (!domain || !sgt) > return -EINVAL; > @@ -428,12 +427,10 @@ static int map_iovm_area(struct iommu_domain > *domain, struct iovm_struct *new, > if (bytes_to_iopgsz(bytes) < 0) > goto err_out; > > - order = get_order(bytes); > - > pr_debug("%s: [%d] %08x %08x(%x)\n", __func__, > i, da, pa, bytes); > > - err = iommu_map(domain, da, pa, order, flags); > + err = iommu_map(domain, da, pa, bytes, flags); > if (err) > goto err_out; > > @@ -448,10 +445,9 @@ err_out: > size_t bytes; > > bytes = sg->length + sg->offset; > - order = get_order(bytes); > > /* ignore failures.. we're already handling one */ > - iommu_unmap(domain, da, order); > + iommu_unmap(domain, da, bytes); > > da += bytes; > } > @@ -466,7 +462,8 @@ static void unmap_iovm_area(struct iommu_domain > *domain, struct omap_iommu *obj, > size_t total = area->da_end - area->da_start; > const struct sg_table *sgt = area->sgt; > struct scatterlist *sg; > - int i, err; > + int i; > + size_t unmapped; > > BUG_ON(!sgtable_ok(sgt)); > BUG_ON((!total) || !IS_ALIGNED(total, PAGE_SIZE)); > @@ -474,13 +471,11 @@ static void unmap_iovm_area(struct iommu_domain > *domain, struct omap_iommu *obj, > start = area->da_start; > for_each_sg(sgt->sgl, sg, sgt->nents, i) { > size_t bytes; > - int order; > > bytes = sg->length + sg->offset; > - order = get_order(bytes); > > - err = iommu_unmap(domain, start, order); > - if (err < 0) > + unmapped = iommu_unmap(domain, start, bytes); > + if (unmapped < bytes) > break; > > dev_dbg(obj->dev, "%s: unmap %08x(%x) %08x\n", > diff --git a/include/linux/iommu.h b/include/linux/iommu.h > index 6b6ed21..76d7ce4 100644 > --- a/include/linux/iommu.h > +++ b/include/linux/iommu.h > @@ -48,6 +48,22 @@ struct iommu_domain { > > #ifdef CONFIG_IOMMU_API > > +/** > + * struct iommu_ops - iommu ops and capabilities > + * @domain_init: init iommu domain > + * @domain_destroy: destroy iommu domain > + * @attach_dev: attach device to an iommu domain > + * @detach_dev: detach device from an iommu domain > + * @map: map a physically contiguous memory region to an iommu domain > + * @unmap: unmap a physically contiguous memory region from an iommu domain > + * @iova_to_phys: translate iova to physical address > + * @domain_has_cap: domain capabilities query > + * @commit: commit iommu domain > + * @pgsize_bitmap: bitmap of supported page sizes > + * @min_pagesz: smallest page size supported. note: this member is private > + * to the IOMMU core, and maintained only for efficiency sake; > + * drivers don't need to set it. > + */ > struct iommu_ops { > int (*domain_init)(struct iommu_domain *domain); > void (*domain_destroy)(struct iommu_domain *domain); > @@ -62,6 +78,8 @@ struct iommu_ops { > int (*domain_has_cap)(struct iommu_domain *domain, > unsigned long cap); > void (*commit)(struct iommu_domain *domain); > + unsigned long pgsize_bitmap; > + unsigned int min_pagesz; > }; > > extern int bus_set_iommu(struct bus_type *bus, struct iommu_ops *ops); > @@ -73,9 +91,9 @@ extern int iommu_attach_device(struct iommu_domain *domain, > extern void iommu_detach_device(struct iommu_domain *domain, > struct device *dev); > extern int iommu_map(struct iommu_domain *domain, unsigned long iova, > - phys_addr_t paddr, int gfp_order, int prot); > -extern int iommu_unmap(struct iommu_domain *domain, unsigned long iova, > - int gfp_order); > + phys_addr_t paddr, size_t size, int prot); > +extern size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, > + size_t size); > extern phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, > unsigned long iova); > extern int iommu_domain_has_cap(struct iommu_domain *domain, > diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c > index ca409be..35ed096 100644 > --- a/virt/kvm/iommu.c > +++ b/virt/kvm/iommu.c > @@ -111,7 +111,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct > kvm_memory_slot *slot) > > /* Map into IO address space */ > r = iommu_map(domain, gfn_to_gpa(gfn), pfn_to_hpa(pfn), > - get_order(page_size), flags); > + page_size, flags); > if (r) { > printk(KERN_ERR "kvm_iommu_map_address:" > "iommu failed to map pfn=%llx\n", pfn); > @@ -292,15 +292,15 @@ static void kvm_iommu_put_pages(struct kvm *kvm, > > while (gfn < end_gfn) { > unsigned long unmap_pages; > - int order; > + size_t size; > > /* Get physical address */ > phys = iommu_iova_to_phys(domain, gfn_to_gpa(gfn)); > pfn = phys >> PAGE_SHIFT; > > /* Unmap address from IO address space */ > - order = iommu_unmap(domain, gfn_to_gpa(gfn), 0); > - unmap_pages = 1ULL << order; > + size = iommu_unmap(domain, gfn_to_gpa(gfn), PAGE_SIZE); > + unmap_pages = 1ULL << get_order(size); > > /* Unpin all pages we just unmapped to not leak any memory */ > kvm_unpin_pages(kvm, pfn, unmap_pages); > -- > 1.7.4.1 Btw, how have you tested this code? Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-14 13:35 ` Roedel, Joerg @ 2011-10-14 17:03 ` Ohad Ben-Cohen 2011-10-14 17:05 ` Ohad Ben-Cohen 2011-10-17 8:05 ` Ohad Ben-Cohen 0 siblings, 2 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-14 17:03 UTC (permalink / raw) To: linux-arm-kernel On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > This patch looks good. Please include it in the page-size patch-set. will do. > Hmm, I'd like to constify the iommu_ops structures in the future. This > wouldn't work then anymore. How about renaming it to io_page_size > (because it is the base page-size of the iommu) and let the driver set > it? Sure, that'd work. Though asking the driver to provide this feels a bit redundant, as it already provided this info in the supported page sizes. I guess a cleaner solution might be allocating a 'struct iommu' in which drivers will set a 'const struct iommu_ops *' member. That new struct would then contain the base page-size and any other state the iommu core would ever need. This whole thing is quite marginal though and also very easy to change later, so we can start with the "driver-provided io_page_size" approach for now. > This needs to be (left > 0). The drivers are allowed to unmap more then > requested, so this value may turn negative. Good point. 'left' is size_t though, so i'll fix this a bit differently. > Btw, how have you tested this code? Of course. Every patch I send is always tested on both OMAP3 (with omap3isp) and OMAP4 (with rpmsg/remoteproc). In addition, everyone who uses my wip rpmsg tree (there are several developers who do) is always testing this without even knowing, as I'm always rebasing that tree on the latest iommu patches (my wip branches are not linear). Thanks, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-14 17:03 ` Ohad Ben-Cohen @ 2011-10-14 17:05 ` Ohad Ben-Cohen 2011-10-17 8:05 ` Ohad Ben-Cohen 1 sibling, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-14 17:05 UTC (permalink / raw) To: linux-arm-kernel On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote: >> Btw, how have you tested this code? > > Of course. Every patch I send is always tested on... *oops I thought you asked "Btw, have you tested this code?". I missed the "how" part :) ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-14 17:03 ` Ohad Ben-Cohen 2011-10-14 17:05 ` Ohad Ben-Cohen @ 2011-10-17 8:05 ` Ohad Ben-Cohen 2011-10-17 9:28 ` Roedel, Joerg 1 sibling, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-17 8:05 UTC (permalink / raw) To: linux-arm-kernel Hi Joerg, On Fri, Oct 14, 2011 at 7:03 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote: > On Fri, Oct 14, 2011 at 3:35 PM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: >> Hmm, I'd like to constify the iommu_ops structures in the future. This >> wouldn't work then anymore. How about renaming it to io_page_size >> (because it is the base page-size of the iommu) and let the driver set >> it? > > Sure, that'd work. Though asking the driver to provide this feels a > bit redundant, as it already provided this info in the supported page > sizes. > > I guess a cleaner solution might be allocating a 'struct iommu' in > which drivers will set a 'const struct iommu_ops *' member. That new > struct would then contain the base page-size and any other state the > iommu core would ever need. > > This whole thing is quite marginal though and also very easy to change > later, so we can start with the "driver-provided io_page_size" > approach for now. Sorry, I just couldn't get myself to sign-off this as it really feels wrong to me. This min_pagesz member is just cached by the core so it doesn't need to look it up every time we're mapping. Drivers shouldn't care about it, as it's completely internal to the iommu core. I'm afraid that pushing this to the drivers feels like redundant duplication of code and might also confuse developers. Let me please suggest two alternatives: a) drop this min_pagesz cache completely. iommu core would then redundantly re-calculate this every time something is mapped, but I hardly believe there is going to be a measurable impact on performance. b) keep the current implementation for now, and fix this later (when we constify struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu context. Since this future "constify" patch will anyway need to change 'struct bus_type', it would be a good opportunity to do this change at the same time. I don't mind which of those approaches to take, and I also don't mind doing (b) myself later, in a separate patch. Your call. >> This needs to be (left > 0). The drivers are allowed to unmap more then >> requested, so this value may turn negative. > > Good point. 'left' is size_t though, so i'll fix this a bit differently. Fixed, please take a look: ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-17 8:05 ` Ohad Ben-Cohen @ 2011-10-17 9:28 ` Roedel, Joerg 2011-10-17 9:55 ` Ohad Ben-Cohen 0 siblings, 1 reply; 36+ messages in thread From: Roedel, Joerg @ 2011-10-17 9:28 UTC (permalink / raw) To: linux-arm-kernel Hi Ohad, On Mon, Oct 17, 2011 at 04:05:02AM -0400, Ohad Ben-Cohen wrote: > > This whole thing is quite marginal though and also very easy to change > > later, so we can start with the "driver-provided io_page_size" > > approach for now. > > Sorry, I just couldn't get myself to sign-off this as it really feels > wrong to me. > > This min_pagesz member is just cached by the core so it doesn't need to > look it up every time we're mapping. Drivers shouldn't care about it, as it's > completely internal to the iommu core. I'm afraid that pushing this to > the drivers > feels like redundant duplication of code and might also confuse developers. > > Let me please suggest two alternatives: > a) drop this min_pagesz cache completely. iommu core would then redundantly > re-calculate this every time something is mapped, but I hardly believe there > is going to be a measurable impact on performance. > b) keep the current implementation for now, and fix this later (when we constify > struct iommu_ops *) by caching min_pagesz in a dynamically allocated iommu > context. Since this future "constify" patch will anyway need to change 'struct > bus_type', it would be a good opportunity to do this change at the same time. > > I don't mind which of those approaches to take, and I also don't mind doing (b) > myself later, in a separate patch. Your call. I think option a) is the best. It should add only minimal overhead to the iommu_map path. > > >> This needs to be (left > 0). The drivers are allowed to unmap more then > >> requested, so this value may turn negative. > > > > Good point. 'left' is size_t though, so i'll fix this a bit differently. > > Fixed, please take a look: > > >From 00b8b9373fe2d73da0280ac1e6ade4a701c95140 Mon Sep 17 00:00:00 2001 > From: Ohad Ben-Cohen <ohad@wizery.com> > Date: Mon, 10 Oct 2011 23:50:55 +0200 > Subject: [PATCH] iommu/core: split mapping to page sizes as supported > by the hardware > > When mapping a memory region, split it to page sizes as supported > by the iommu hardware. Always prefer bigger pages, when possible, > in order to reduce the TLB pressure. > > The logic to do that is now added to the IOMMU core, so neither the iommu > drivers themselves nor users of the IOMMU API have to duplicate it. > > This allows a more lenient granularity of mappings; traditionally the > IOMMU API took 'order' (of a page) as a mapping size, and directly let > the low level iommu drivers handle the mapping, but now that the IOMMU > core can split arbitrary memory regions into pages, we can remove this > limitation, so users don't have to split those regions by themselves. > > Currently the supported page sizes are advertised once and they then > remain static. That works well for OMAP and MSM but it would probably > not fly well with intel's hardware, where the page size capabilities > seem to have the potential to be different between several DMA > remapping devices. > > register_iommu() currently sets a default pgsize behavior, so we can convert > the IOMMU drivers in subsequent patches. After all the drivers > are converted, the temporary default settings will be removed. > > Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted > to deal with bytes instead of page order. > > Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! > > Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> > Cc: David Brown <davidb@codeaurora.org> > Cc: David Woodhouse <dwmw2@infradead.org> > Cc: Joerg Roedel <Joerg.Roedel@amd.com> > Cc: Stepan Moskovchenko <stepanm@codeaurora.org> > Cc: KyongHo Cho <pullip.cho@samsung.com> > Cc: Hiroshi DOYU <hdoyu@nvidia.com> > Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> > Cc: kvm at vger.kernel.org > --- > drivers/iommu/iommu.c | 124 +++++++++++++++++++++++++++++++++++++++----- > drivers/iommu/omap-iovmm.c | 17 ++---- > include/linux/iommu.h | 24 +++++++- > virt/kvm/iommu.c | 8 ++-- > 4 files changed, 141 insertions(+), 32 deletions(-) The patch looks good now. Please implement option a) and it should be fine. I will test it on an AMD IOMMU platform. We still need someone to test it on VT-d. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-17 9:28 ` Roedel, Joerg @ 2011-10-17 9:55 ` Ohad Ben-Cohen 0 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-17 9:55 UTC (permalink / raw) To: linux-arm-kernel On Mon, Oct 17, 2011 at 11:28 AM, Roedel, Joerg <Joerg.Roedel@amd.com> wrote: > The patch looks good now. Please implement option a) and it should be > fine. Ok, will send in a few moments. > I will test it on an AMD IOMMU platform. We still need someone to > test it on VT-d. That could be great, yeah. Though by declaring variable-size page sizes support, we intentionally take a very conservative approach which should effectively make this whole logic transparent to both of the x86 implementations: The ->map() handler of both of the AMD driver and the VT-d should be invoked in the exact way it was invoked earlier (same order, same parameters), because before this patch-set, all IOMMU users only mapped page-ordered regions. With this patch-set in hand, the x86 drivers declare support for all possible page-ordered page sizes, so the IOMMU core should just be a pass-through, and call ->map() with the exact parameters that were provided to iommu_map(). Similarly, calls to ->unmap() should not change, because current mainline x86 users (kvm..) only unmap a single page at a time. With this patch set in hand, the iommu core will just trivially pass those request through to the driver's ->unmap(). So I guess that if we can test this on AMD and see that indeed this conservative approach works (the driver gets the same ->map() and ->unmap() calls it got before) then it should work exactly the same with VT-d. But having someone with a VT-d setup to test this can be great of course. Thanks, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 22:49 ` Ohad Ben-Cohen 2011-10-11 10:38 ` Roedel, Joerg @ 2011-10-11 14:59 ` KyongHo Cho 2011-10-11 17:04 ` Ohad Ben-Cohen 1 sibling, 1 reply; 36+ messages in thread From: KyongHo Cho @ 2011-10-11 14:59 UTC (permalink / raw) To: linux-arm-kernel On Tue, Oct 11, 2011 at 7:49 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote: > ?int iommu_map(struct iommu_domain *domain, unsigned long iova, > - ? ? ? ? ? ? phys_addr_t paddr, int gfp_order, int prot) > + ? ? ? ? ? ? phys_addr_t paddr, int size, int prot) > ?{ Even though it is not realistic that size becomes larger than 1 << ((sizeof(int) * 8) - 1), I think the type of size should be size_t. > - ? ? ? size_t size; > + ? ? ? unsigned long orig_iova = iova; > + ? ? ? int ret = 0, orig_size = size; > > ? ? ? ?if (unlikely(domain->ops->map == NULL)) > ? ? ? ? ? ? ? ?return -ENODEV; > > - ? ? ? size ? ? ? ? = PAGE_SIZE << gfp_order; > + ? ? ? /* > + ? ? ? ?* both the virtual address and the physical one, as well as > + ? ? ? ?* the size of the mapping, must be aligned (at least) to the > + ? ? ? ?* size of the smallest page supported by the hardware > + ? ? ? ?*/ > + ? ? ? if (!IS_ALIGNED(iova | paddr | size, domain->ops->min_pagesz)) { > + ? ? ? ? ? ? ? pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%x min_pagesz " > + ? ? ? ? ? ? ? ? ? ? ? "0x%x\n", iova, (unsigned long)paddr, > + ? ? ? ? ? ? ? ? ? ? ? size, domain->ops->min_pagesz); > + ? ? ? ? ? ? ? return -EINVAL; > + ? ? ? } > + > + ? ? ? pr_debug("map: iova 0x%lx pa 0x%lx size 0x%x\n", iova, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, size); > + > + ? ? ? while (size) { > + ? ? ? ? ? ? ? unsigned long pgsize, addr_merge = iova | paddr; > + ? ? ? ? ? ? ? unsigned int pgsize_idx; > + > + ? ? ? ? ? ? ? /* Max page size that still fits into 'size' */ > + ? ? ? ? ? ? ? pgsize_idx = __fls(size); > + > + ? ? ? ? ? ? ? /* need to consider alignment requirements ? */ > + ? ? ? ? ? ? ? if (likely(addr_merge)) { > + ? ? ? ? ? ? ? ? ? ? ? /* Max page size allowed by both iova and paddr */ > + ? ? ? ? ? ? ? ? ? ? ? unsigned int align_pgsize_idx = __ffs(addr_merge); > + > + ? ? ? ? ? ? ? ? ? ? ? pgsize_idx = min(pgsize_idx, align_pgsize_idx); > + ? ? ? ? ? ? ? } > + > + ? ? ? ? ? ? ? /* build a mask of acceptable page sizes */ > + ? ? ? ? ? ? ? pgsize = (1UL << (pgsize_idx + 1)) - 1; > + > + ? ? ? ? ? ? ? /* throw away page sizes not supported by the hardware */ > + ? ? ? ? ? ? ? pgsize &= domain->ops->pgsize_bitmap; > + > + ? ? ? ? ? ? ? /* make sure we're still sane */ > + ? ? ? ? ? ? ? BUG_ON(!pgsize); > > - ? ? ? BUG_ON(!IS_ALIGNED(iova | paddr, size)); > + ? ? ? ? ? ? ? /* pick the biggest page */ > + ? ? ? ? ? ? ? pgsize_idx = __fls(pgsize); > + ? ? ? ? ? ? ? pgsize = 1UL << pgsize_idx; > > - ? ? ? return domain->ops->map(domain, iova, paddr, gfp_order, prot); > + ? ? ? ? ? ? ? /* convert index to page order */ > + ? ? ? ? ? ? ? pgsize_idx -= PAGE_SHIFT; > + > + ? ? ? ? ? ? ? pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, pgsize_idx); > + > + ? ? ? ? ? ? ? ret = domain->ops->map(domain, iova, paddr, pgsize_idx, prot); > + ? ? ? ? ? ? ? if (ret) > + ? ? ? ? ? ? ? ? ? ? ? break; > + > + ? ? ? ? ? ? ? iova += pgsize; > + ? ? ? ? ? ? ? paddr += pgsize; > + ? ? ? ? ? ? ? size -= pgsize; > + ? ? ? } > + > + ? ? ? /* unroll mapping in case something went wrong */ > + ? ? ? if (ret) > + ? ? ? ? ? ? ? iommu_unmap(domain, orig_iova, orig_size); > + domain->ops->map() might return error because a mapping already exists in the range from iova until iova + size. Thus, it should be iommu_unmap(domain, orig_iova, orig_size - size) Regards, KyongHo ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-11 14:59 ` KyongHo Cho @ 2011-10-11 17:04 ` Ohad Ben-Cohen 0 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-11 17:04 UTC (permalink / raw) To: linux-arm-kernel Hi KyongHo, On Tue, Oct 11, 2011 at 4:59 PM, KyongHo Cho <pullip.cho@samsung.com> wrote: > I think the type of size should be size_t. .. > Thus, it should be > iommu_unmap(domain, orig_iova, orig_size - size) Agree with both suggestions, thanks ! P.S - please only keep the parts of my email which you are replying to, otherwise it's a bit hard to find your comments. Thanks, Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-02 15:58 ` Ohad Ben-Cohen 2011-10-10 7:40 ` Ohad Ben-Cohen 2011-10-10 9:47 ` Roedel, Joerg @ 2011-10-10 12:52 ` KyongHo Cho 2011-10-10 14:00 ` Ohad Ben-Cohen 2 siblings, 1 reply; 36+ messages in thread From: KyongHo Cho @ 2011-10-10 12:52 UTC (permalink / raw) To: linux-arm-kernel On Mon, Oct 3, 2011 at 12:58 AM, Ohad Ben-Cohen <ohad@wizery.com> wrote: > ?int iommu_map(struct iommu_domain *domain, unsigned long iova, > - ? ? ? ? ? ? phys_addr_t paddr, int gfp_order, int prot) > + ? ? ? ? ? ? phys_addr_t paddr, size_t size, int prot) > ?{ > - ? ? ? size_t size; > + ? ? ? int ret = 0; > + > + ? ? ? /* > + ? ? ? ?* both the virtual address and the physical one, as well as > + ? ? ? ?* the size of the mapping, must be aligned (at least) to the > + ? ? ? ?* size of the smallest page supported by the hardware > + ? ? ? ?*/ > + ? ? ? if (!IS_ALIGNED(iova | paddr | size, iommu_min_pagesz)) { > + ? ? ? ? ? ? ? pr_err("unaligned: iova 0x%lx pa 0x%lx size 0x%lx min_pagesz " > + ? ? ? ? ? ? ? ? ? ? ? "0x%x\n", iova, (unsigned long)paddr, > + ? ? ? ? ? ? ? ? ? ? ? (unsigned long)size, iommu_min_pagesz); > + ? ? ? ? ? ? ? return -EINVAL; > + ? ? ? } > > - ? ? ? size ? ? ? ? = 0x1000UL << gfp_order; > + ? ? ? pr_debug("map: iova 0x%lx pa 0x%lx size 0x%lx\n", iova, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, (unsigned long)size); > > - ? ? ? BUG_ON(!IS_ALIGNED(iova | paddr, size)); > + ? ? ? while (size) { > + ? ? ? ? ? ? ? unsigned long pgsize, addr_merge = iova | paddr; > + ? ? ? ? ? ? ? unsigned int pgsize_idx; > > - ? ? ? return iommu_ops->map(domain, iova, paddr, gfp_order, prot); > + ? ? ? ? ? ? ? /* Max page size that still fits into 'size' */ > + ? ? ? ? ? ? ? pgsize_idx = __fls(size); > + > + ? ? ? ? ? ? ? /* need to consider alignment requirements ? */ > + ? ? ? ? ? ? ? if (likely(addr_merge)) { > + ? ? ? ? ? ? ? ? ? ? ? /* Max page size allowed by both iova and paddr */ > + ? ? ? ? ? ? ? ? ? ? ? unsigned int align_pgsize_idx = __ffs(addr_merge); > + > + ? ? ? ? ? ? ? ? ? ? ? pgsize_idx = min(pgsize_idx, align_pgsize_idx); > + ? ? ? ? ? ? ? } > + > + ? ? ? ? ? ? ? /* build a mask of acceptable page sizes */ > + ? ? ? ? ? ? ? pgsize = (1UL << (pgsize_idx + 1)) - 1; > + > + ? ? ? ? ? ? ? /* throw away page sizes not supported by the hardware */ > + ? ? ? ? ? ? ? pgsize &= iommu_pgsize_bitmap; > + > + ? ? ? ? ? ? ? /* pick the biggest page */ > + ? ? ? ? ? ? ? pgsize_idx = __fls(pgsize); > + ? ? ? ? ? ? ? pgsize = 1UL << pgsize_idx; > + > + ? ? ? ? ? ? ? /* convert index to page order */ > + ? ? ? ? ? ? ? pgsize_idx -= PAGE_SHIFT; > + > + ? ? ? ? ? ? ? pr_debug("mapping: iova 0x%lx pa 0x%lx order %u\n", iova, > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? (unsigned long)paddr, pgsize_idx); > + > + ? ? ? ? ? ? ? ret = iommu_ops->map(domain, iova, paddr, pgsize_idx, prot); > + ? ? ? ? ? ? ? if (ret) > + ? ? ? ? ? ? ? ? ? ? ? break; > + > + ? ? ? ? ? ? ? iova += pgsize; > + ? ? ? ? ? ? ? paddr += pgsize; > + ? ? ? ? ? ? ? size -= pgsize; > + ? ? ? } > + > + ? ? ? return ret; > ?} > ?EXPORT_SYMBOL_GPL(iommu_map); Do not we need to unmap all intermediate mappings if iommu_map() is failed? I think iommu_map() must map the entire given area if it successes. Otherwise, it should map nothing. I think it can be simply done with calling iommu_unmap() with Joerg's previous suggestion about iommu_unmap(). Regards, KyongHo. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 1/6] iommu/core: split mapping to page sizes as supported by the hardware 2011-10-10 12:52 ` KyongHo Cho @ 2011-10-10 14:00 ` Ohad Ben-Cohen 0 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-10-10 14:00 UTC (permalink / raw) To: linux-arm-kernel On Mon, Oct 10, 2011 at 2:52 PM, KyongHo Cho <pullip.cho@samsung.com> wrote: > Do not we need to unmap all intermediate mappings if iommu_map() is failed? Good idea, I'll add it. Thanks! Ohad. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 2/6] iommu/omap: announce supported page sizes 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 1/6] iommu/core: " Ohad Ben-Cohen @ 2011-09-16 17:51 ` Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 3/6] iommu/msm: " Ohad Ben-Cohen ` (4 subsequent siblings) 6 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw) To: linux-arm-kernel Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes. This way the IOMMU core can split any arbitrary-sized physically contiguous regions (that it needs to map) as needed. Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: Hiroshi DOYU <Hiroshi.DOYU@nokia.com> --- drivers/iommu/omap-iommu.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 7e0188f..403dd6a 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1202,6 +1202,9 @@ static int omap_iommu_domain_has_cap(struct iommu_domain *domain, return 0; } +/* bitmap of the page sizes supported by the OMAP IOMMU hardware */ +static unsigned long omap_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M; + static struct iommu_ops omap_iommu_ops = { .domain_init = omap_iommu_domain_init, .domain_destroy = omap_iommu_domain_destroy, @@ -1225,7 +1228,8 @@ static int __init omap_iommu_init(void) return -ENOMEM; iopte_cachep = p; - register_iommu(&omap_iommu_ops); + /* we're only using the first 25 bits of the pgsizes bitmap */ + register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25); return platform_driver_register(&omap_iommu_driver); } -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 3/6] iommu/msm: announce supported page sizes 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 1/6] iommu/core: " Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 2/6] iommu/omap: announce supported page sizes Ohad Ben-Cohen @ 2011-09-16 17:51 ` Ohad Ben-Cohen 2011-09-25 5:03 ` David Brown 2011-09-16 17:51 ` [PATCH v3 4/6] iommu/amd: " Ohad Ben-Cohen ` (3 subsequent siblings) 6 siblings, 1 reply; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw) To: linux-arm-kernel Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes. This way the IOMMU core can split any arbitrary-sized physically contiguous regions (that it needs to map) as needed. Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> --- drivers/iommu/msm_iommu.c | 8 +++++++- 1 files changed, 7 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index d1733f6..a4ed116 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -676,6 +676,9 @@ fail: return 0; } +/* bitmap of the page sizes currently supported */ +static unsigned long msm_iommu_pgsizes = SZ_4K | SZ_64K | SZ_1M | SZ_16M; + static struct iommu_ops msm_iommu_ops = { .domain_init = msm_iommu_domain_init, .domain_destroy = msm_iommu_domain_destroy, @@ -728,7 +731,10 @@ static void __init setup_iommu_tex_classes(void) static int __init msm_iommu_init(void) { setup_iommu_tex_classes(); - register_iommu(&msm_iommu_ops); + + /* we're only using the first 25 bits of the pgsizes bitmap */ + register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25); + return 0; } -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 3/6] iommu/msm: announce supported page sizes 2011-09-16 17:51 ` [PATCH v3 3/6] iommu/msm: " Ohad Ben-Cohen @ 2011-09-25 5:03 ` David Brown 0 siblings, 0 replies; 36+ messages in thread From: David Brown @ 2011-09-25 5:03 UTC (permalink / raw) To: linux-arm-kernel On Fri, Sep 16, 2011 at 08:51:43PM +0300, Ohad Ben-Cohen wrote: > Let the IOMMU core know we support 4KiB, 64KiB, 1MiB and 16MiB page sizes. > > This way the IOMMU core can split any arbitrary-sized physically > contiguous regions (that it needs to map) as needed. > > Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> > Cc: David Brown <davidb@codeaurora.org> > Cc: Stepan Moskovchenko <stepanm@codeaurora.org> > --- > drivers/iommu/msm_iommu.c | 8 +++++++- Acked-by: David Brown <davidb@codeaurora.org> -- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum. ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 4/6] iommu/amd: announce supported page sizes 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen ` (2 preceding siblings ...) 2011-09-16 17:51 ` [PATCH v3 3/6] iommu/msm: " Ohad Ben-Cohen @ 2011-09-16 17:51 ` Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 5/6] iommu/intel: " Ohad Ben-Cohen ` (2 subsequent siblings) 6 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw) To: linux-arm-kernel Let the IOMMU core know we support arbitrary page sizes (as long as they're an order of 4KiB). This way the IOMMU core will retain the existing behavior we're used to; it will let us map regions that: - their size is an order of 4KiB - they are naturally aligned Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: Joerg Roedel <Joerg.Roedel@amd.com> --- drivers/iommu/amd_iommu.c | 21 ++++++++++++++++++++- 1 files changed, 20 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index a14f8dc..80191d2 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2488,12 +2488,31 @@ static unsigned device_dma_ops_init(void) } /* + * This bitmap is used to advertise the page sizes our hardware support + * to the IOMMU core, which will then use this information to split + * physically contiguous memory regions it is mapping into page sizes + * that we support. + * + * Traditionally the IOMMU core just handed us the mappings directly, + * after making sure the size is an order of a 4KiB page and that the + * mapping has natural alignment. + * + * To retain this behavior, we currently advertise that we support + * all page sizes that are an order of 4KiB. + * + * If at some point we'd like to utilize the IOMMU core's new behavior, + * we could change this to advertise the real page sizes we support. + */ +static unsigned long amd_iommu_pgsizes = ~0xFFFUL; + +/* * The function which clues the AMD IOMMU driver into dma_ops. */ void __init amd_iommu_init_api(void) { - register_iommu(&amd_iommu_ops); + register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes, + BITS_PER_LONG); } int __init amd_iommu_init_dma_ops(void) -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 5/6] iommu/intel: announce supported page sizes 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen ` (3 preceding siblings ...) 2011-09-16 17:51 ` [PATCH v3 4/6] iommu/amd: " Ohad Ben-Cohen @ 2011-09-16 17:51 ` Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API Ohad Ben-Cohen 2011-09-27 8:56 ` [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen 6 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw) To: linux-arm-kernel Let the IOMMU core know we support arbitrary page sizes (as long as they're an order of 4KiB). This way the IOMMU core will retain the existing behavior we're used to; it will let us map regions that: - their size is an order of 4KiB - they are naturally aligned Note: Intel IOMMU hardware doesn't support arbitrary page sizes, but the driver does (it splits arbitrary-sized mappings into the pages supported by the hardware). To make everything simpler for now, though, this patch effectively tells the IOMMU core to keep giving this driver the same memory regions it did before, so nothing is changed as far as it's concerned. At this point, the page sizes announced remain static within the IOMMU core. To correctly utilize the pgsize-splitting of the IOMMU core by this driver, it seems that some core changes should still be done, because Intel's IOMMU page size capabilities seem to have the potential to be different between different DMA remapping devices. Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: David Woodhouse <dwmw2@infradead.org> --- drivers/iommu/intel-iommu.c | 21 ++++++++++++++++++++- 1 files changed, 20 insertions(+), 1 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index c621c98..46c21a2 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3426,6 +3426,24 @@ static struct notifier_block device_nb = { .notifier_call = device_notifier, }; +/* + * This bitmap is used to advertise the page sizes our hardware support + * to the IOMMU core, which will then use this information to split + * physically contiguous memory regions it is mapping into page sizes + * that we support. + * + * Traditionally the IOMMU core just handed us the mappings directly, + * after making sure the size is an order of a 4KiB page and that the + * mapping has natural alignment. + * + * To retain this behavior, we currently advertise that we support + * all page sizes that are an order of 4KiB. + * + * If at some point we'd like to utilize the IOMMU core's new behavior, + * we could change this to advertise the real page sizes we support. + */ +static unsigned long intel_iommu_pgsizes = ~0xFFFUL; + int __init intel_iommu_init(void) { int ret = 0; @@ -3486,7 +3504,8 @@ int __init intel_iommu_init(void) init_iommu_pm_ops(); - register_iommu(&intel_iommu_ops); + register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes, + BITS_PER_LONG); bus_register_notifier(&pci_bus_type, &device_nb); -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen ` (4 preceding siblings ...) 2011-09-16 17:51 ` [PATCH v3 5/6] iommu/intel: " Ohad Ben-Cohen @ 2011-09-16 17:51 ` Ohad Ben-Cohen 2011-09-27 8:56 ` [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen 6 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-16 17:51 UTC (permalink / raw) To: linux-arm-kernel Now that all IOMMU drivers are converted to the new register_iommu_pgsize() API, the old code can be removed, and we can s/register_iommu_pgsize/register_iommu/. Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David Brown <davidb@codeaurora.org> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> --- drivers/iommu/amd_iommu.c | 3 +-- drivers/iommu/intel-iommu.c | 3 +-- drivers/iommu/iommu.c | 34 +--------------------------------- drivers/iommu/msm_iommu.c | 2 +- drivers/iommu/omap-iommu.c | 2 +- include/linux/iommu.h | 5 ++--- 6 files changed, 7 insertions(+), 42 deletions(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 80191d2..cf29645 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -2511,8 +2511,7 @@ static unsigned long amd_iommu_pgsizes = ~0xFFFUL; void __init amd_iommu_init_api(void) { - register_iommu_pgsize(&amd_iommu_ops, &amd_iommu_pgsizes, - BITS_PER_LONG); + register_iommu(&amd_iommu_ops, &amd_iommu_pgsizes, BITS_PER_LONG); } int __init amd_iommu_init_dma_ops(void) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 46c21a2..c013b85 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3504,8 +3504,7 @@ int __init intel_iommu_init(void) init_iommu_pm_ops(); - register_iommu_pgsize(&intel_iommu_ops, &intel_iommu_pgsizes, - BITS_PER_LONG); + register_iommu(&intel_iommu_ops, &intel_iommu_pgsizes, BITS_PER_LONG); bus_register_notifier(&pci_bus_type, &device_nb); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 7c01c8c..8bbd1aa 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -46,13 +46,8 @@ static unsigned int iommu_min_page_idx; * @ops: iommu handlers * @pgsize_bitmap: bitmap of page sizes supported by the hardware * @nr_page_bits: size of @pgsize_bitmap (in bits) - * - * Note: this is a temporary function, which will be removed once - * all IOMMU drivers are converted. The only reason it exists is to - * allow splitting the pgsizes changes to several patches in order to ease - * the review. */ -void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap, +void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap, unsigned int nr_page_bits) { if (iommu_ops || iommu_pgsize_bitmap || !nr_page_bits) @@ -67,33 +62,6 @@ void register_iommu_pgsize(struct iommu_ops *ops, unsigned long *pgsize_bitmap, iommu_min_pagesz = 1 << iommu_min_page_idx; } -/* - * default pagesize bitmap, will be removed once all IOMMU drivers - * are converted - */ -static unsigned long default_iommu_pgsizes = ~0xFFFUL; - -void register_iommu(struct iommu_ops *ops) -{ - if (iommu_ops) - BUG(); - - iommu_ops = ops; - - /* - * set default pgsize values, which retain the existing - * IOMMU API behavior: drivers will be called to map - * regions that are sized/aligned to order of 4KiB pages - */ - iommu_pgsize_bitmap = &default_iommu_pgsizes; - iommu_nr_page_bits = BITS_PER_LONG; - - /* find the minimum page size and its index only once */ - iommu_min_page_idx = find_first_bit(iommu_pgsize_bitmap, - iommu_nr_page_bits); - iommu_min_pagesz = 1 << iommu_min_page_idx; -} - bool iommu_found(void) { return iommu_ops != NULL; diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index a4ed116..e59ced9 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -733,7 +733,7 @@ static int __init msm_iommu_init(void) setup_iommu_tex_classes(); /* we're only using the first 25 bits of the pgsizes bitmap */ - register_iommu_pgsize(&msm_iommu_ops, &msm_iommu_pgsizes, 25); + register_iommu(&msm_iommu_ops, &msm_iommu_pgsizes, 25); return 0; } diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index 403dd6a..3d8ad87 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1229,7 +1229,7 @@ static int __init omap_iommu_init(void) iopte_cachep = p; /* we're only using the first 25 bits of the pgsizes bitmap */ - register_iommu_pgsize(&omap_iommu_ops, &omap_iommu_pgsizes, 25); + register_iommu(&omap_iommu_ops, &omap_iommu_pgsizes, 25); return platform_driver_register(&omap_iommu_driver); } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 1806956..297893f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -60,9 +60,8 @@ struct iommu_ops { #ifdef CONFIG_IOMMU_API -extern void register_iommu(struct iommu_ops *ops); -extern void register_iommu_pgsize(struct iommu_ops *ops, - unsigned long *pgsize_bitmap, unsigned int nr_page_bits); +extern void register_iommu(struct iommu_ops *ops, unsigned long *pgsize_bitmap, + unsigned int nr_page_bits); extern bool iommu_found(void); extern struct iommu_domain *iommu_domain_alloc(void); extern void iommu_domain_free(struct iommu_domain *domain); -- 1.7.4.1 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen ` (5 preceding siblings ...) 2011-09-16 17:51 ` [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API Ohad Ben-Cohen @ 2011-09-27 8:56 ` Ohad Ben-Cohen 6 siblings, 0 replies; 36+ messages in thread From: Ohad Ben-Cohen @ 2011-09-27 8:56 UTC (permalink / raw) To: linux-arm-kernel ping On Fri, Sep 16, 2011 at 8:51 PM, Ohad Ben-Cohen <ohad@wizery.com> wrote: > v2->v3: > > - s/KB/KiB/ (David W) > > v1->v2: > > - split to patches (by keeping the old code around until all drivers are converted) (Joerg) > > > Ohad Ben-Cohen (6): > ?iommu/core: split mapping to page sizes as supported by the hardware > ?iommu/omap: announce supported page sizes > ?iommu/msm: announce supported page sizes > ?iommu/amd: announce supported page sizes > ?iommu/intel: announce supported page sizes > ?iommu/core: remove the temporary register_iommu_pgsize API > > ?drivers/iommu/amd_iommu.c ? | ? 20 ++++++- > ?drivers/iommu/intel-iommu.c | ? 20 ++++++- > ?drivers/iommu/iommu.c ? ? ? | ?130 +++++++++++++++++++++++++++++++++++++++---- > ?drivers/iommu/msm_iommu.c ? | ? ?8 ++- > ?drivers/iommu/omap-iommu.c ?| ? ?6 ++- > ?drivers/iommu/omap-iovmm.c ?| ? 12 +--- > ?include/linux/iommu.h ? ? ? | ? ?7 +- > ?virt/kvm/iommu.c ? ? ? ? ? ?| ? ?4 +- > ?8 files changed, 177 insertions(+), 30 deletions(-) > > -- > 1.7.4.1 > > ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2011-10-17 9:55 UTC | newest] Thread overview: 36+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-16 17:51 [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 1/6] iommu/core: " Ohad Ben-Cohen 2011-09-27 10:05 ` Roedel, Joerg 2011-09-27 12:26 ` Ohad Ben-Cohen 2011-09-27 13:12 ` Roedel, Joerg 2011-09-27 13:28 ` Ohad Ben-Cohen 2011-09-27 18:14 ` Roedel, Joerg 2011-10-02 15:58 ` Ohad Ben-Cohen 2011-10-10 7:40 ` Ohad Ben-Cohen 2011-10-10 9:47 ` Roedel, Joerg 2011-10-10 13:59 ` Ohad Ben-Cohen 2011-10-10 15:36 ` Roedel, Joerg 2011-10-10 17:02 ` Ohad Ben-Cohen 2011-10-10 22:01 ` Ohad Ben-Cohen 2011-10-11 10:21 ` Roedel, Joerg 2011-10-11 10:19 ` Roedel, Joerg 2011-10-10 22:49 ` Ohad Ben-Cohen 2011-10-11 10:38 ` Roedel, Joerg 2011-10-11 17:01 ` Ohad Ben-Cohen 2011-10-14 13:35 ` Roedel, Joerg 2011-10-14 17:03 ` Ohad Ben-Cohen 2011-10-14 17:05 ` Ohad Ben-Cohen 2011-10-17 8:05 ` Ohad Ben-Cohen 2011-10-17 9:28 ` Roedel, Joerg 2011-10-17 9:55 ` Ohad Ben-Cohen 2011-10-11 14:59 ` KyongHo Cho 2011-10-11 17:04 ` Ohad Ben-Cohen 2011-10-10 12:52 ` KyongHo Cho 2011-10-10 14:00 ` Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 2/6] iommu/omap: announce supported page sizes Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 3/6] iommu/msm: " Ohad Ben-Cohen 2011-09-25 5:03 ` David Brown 2011-09-16 17:51 ` [PATCH v3 4/6] iommu/amd: " Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 5/6] iommu/intel: " Ohad Ben-Cohen 2011-09-16 17:51 ` [PATCH v3 6/6] iommu/core: remove the temporary register_iommu_pgsize API Ohad Ben-Cohen 2011-09-27 8:56 ` [PATCH v3 0/6] iommu: split mapping to page sizes as supported by the hardware Ohad Ben-Cohen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).