* [PATCH 0/3] x86: restore old GART alloc_coherent
@ 2008-09-24 11:48 FUJITA Tomonori
2008-09-24 11:48 ` [PATCH 1/3] x86: export pci-nommu's alloc_coherent FUJITA Tomonori
` (2 more replies)
0 siblings, 3 replies; 11+ messages in thread
From: FUJITA Tomonori @ 2008-09-24 11:48 UTC (permalink / raw)
To: mingo; +Cc: linux-kernel, andi, joerg.roedel, fujita.tomonori
This pachset is against tip/iommu.
What this patchset does is restoring old GART alloc_coherent behavior
(before the alloc_coherent rewrite):
http://lkml.org/lkml/2008/8/12/200
Currently, GART alloc_coherent tries to allocate pages with GFP_DMA32
for a device having dma_masks > 24bit < 32bits. If GART gets an
address that a device can't access to, GART tries to map the address
to a virtual I/O address that the device can access to.
But Andi pointed out, "The GART is somewhere in the 4GB range so you
cannot use it to map anything < 4GB. Also GART is pretty small."
http://lkml.org/lkml/2008/9/12/43
So it's possible that GART doesn't have virtual I/O address space that
a device can access to. The current behavior might not work for a
device having dma_masks > 24bit < 32bits. This patchset restores old
GART alloc_coherent behavior, which doesn't use GART hardware (if an
user doesn't enable force_iommu option).
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 1/3] x86: export pci-nommu's alloc_coherent 2008-09-24 11:48 [PATCH 0/3] x86: restore old GART alloc_coherent FUJITA Tomonori @ 2008-09-24 11:48 ` FUJITA Tomonori 2008-09-24 11:48 ` [PATCH 2/3] revert "x86: make GART to respect device's dma_mask about virtual mappings" FUJITA Tomonori 2008-09-24 12:58 ` [PATCH 0/3] x86: restore old GART alloc_coherent Joerg Roedel 2008-09-25 9:04 ` Ingo Molnar 2 siblings, 1 reply; 11+ messages in thread From: FUJITA Tomonori @ 2008-09-24 11:48 UTC (permalink / raw) To: mingo; +Cc: linux-kernel, andi, joerg.roedel, fujita.tomonori This patch exports nommu_alloc_coherent (renamed dma_generic_alloc_coherent). GART needs this function. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- arch/x86/kernel/pci-dma.c | 31 +++++++++++++++++++++++++++++++ arch/x86/kernel/pci-nommu.c | 39 +-------------------------------------- include/asm-x86/dma-mapping.h | 3 +++ 3 files changed, 35 insertions(+), 38 deletions(-) diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index 23882c4..0a3824e 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -134,6 +134,37 @@ unsigned long iommu_num_pages(unsigned long addr, unsigned long len) EXPORT_SYMBOL(iommu_num_pages); #endif +void *dma_generic_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag) +{ + unsigned long dma_mask; + struct page *page; + dma_addr_t addr; + + dma_mask = dma_alloc_coherent_mask(dev, flag); + + flag |= __GFP_ZERO; +again: + page = alloc_pages_node(dev_to_node(dev), flag, get_order(size)); + if (!page) + return NULL; + + addr = page_to_phys(page); + if (!is_buffer_dma_capable(dma_mask, addr, size)) { + __free_pages(page, get_order(size)); + + if (dma_mask < DMA_32BIT_MASK && !(flag & GFP_DMA)) { + flag = (flag & ~GFP_DMA32) | GFP_DMA; + goto again; + } + + return NULL; + } + + *dma_addr = addr; + return page_address(page); +} + /* * See <Documentation/x86_64/boot-options.txt> for the iommu kernel parameter * documentation. diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c index 1c1c98a..c70ab5a 100644 --- a/arch/x86/kernel/pci-nommu.c +++ b/arch/x86/kernel/pci-nommu.c @@ -72,43 +72,6 @@ static int nommu_map_sg(struct device *hwdev, struct scatterlist *sg, return nents; } -static void * -nommu_alloc_coherent(struct device *hwdev, size_t size, - dma_addr_t *dma_addr, gfp_t gfp) -{ - unsigned long dma_mask; - int node; - struct page *page; - dma_addr_t addr; - - dma_mask = dma_alloc_coherent_mask(hwdev, gfp); - - gfp |= __GFP_ZERO; - - node = dev_to_node(hwdev); -again: - page = alloc_pages_node(node, gfp, get_order(size)); - if (!page) - return NULL; - - addr = page_to_phys(page); - if (!is_buffer_dma_capable(dma_mask, addr, size) && !(gfp & GFP_DMA)) { - free_pages((unsigned long)page_address(page), get_order(size)); - gfp |= GFP_DMA; - goto again; - } - - if (check_addr("alloc_coherent", hwdev, addr, size)) { - *dma_addr = addr; - flush_write_buffers(); - return page_address(page); - } - - free_pages((unsigned long)page_address(page), get_order(size)); - - return NULL; -} - static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr, dma_addr_t dma_addr) { @@ -116,7 +79,7 @@ static void nommu_free_coherent(struct device *dev, size_t size, void *vaddr, } struct dma_mapping_ops nommu_dma_ops = { - .alloc_coherent = nommu_alloc_coherent, + .alloc_coherent = dma_generic_alloc_coherent, .free_coherent = nommu_free_coherent, .map_single = nommu_map_single, .map_sg = nommu_map_sg, diff --git a/include/asm-x86/dma-mapping.h b/include/asm-x86/dma-mapping.h index 7b0fdd6..7b9f308 100644 --- a/include/asm-x86/dma-mapping.h +++ b/include/asm-x86/dma-mapping.h @@ -90,6 +90,9 @@ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) extern int dma_supported(struct device *hwdev, u64 mask); extern int dma_set_mask(struct device *dev, u64 mask); +extern void *dma_generic_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_addr, gfp_t flag); + static inline dma_addr_t dma_map_single(struct device *hwdev, void *ptr, size_t size, int direction) -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 2/3] revert "x86: make GART to respect device's dma_mask about virtual mappings" 2008-09-24 11:48 ` [PATCH 1/3] x86: export pci-nommu's alloc_coherent FUJITA Tomonori @ 2008-09-24 11:48 ` FUJITA Tomonori 2008-09-24 11:48 ` [PATCH 3/3] x86: restore old GART alloc_coherent behavior FUJITA Tomonori 0 siblings, 1 reply; 11+ messages in thread From: FUJITA Tomonori @ 2008-09-24 11:48 UTC (permalink / raw) To: mingo; +Cc: linux-kernel, andi, joerg.roedel, fujita.tomonori This reverts: commit bee44f294efd8417f5e68553778a6cc957af1547 Author: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Date: Fri Sep 12 19:42:35 2008 +0900 x86: make GART to respect device's dma_mask about virtual mappings I wrote the above commit to fix a GART alloc_coherent regression, that can't handle a device having dma_masks > 24bit < 32bits, introduced by the alloc_coherent rewrite: http://lkml.org/lkml/2008/8/12/200 After the alloc_coherent rewrite, GART alloc_coherent tried to allocate pages with GFP_DMA32. If GART got an address that a device can't access to, GART mapped the address to a virtual I/O address. But GART mapping mechanism didn't take account of dma mask, so GART could use a virtual I/O address that the device can't access to again. The above commit modified GART mapping mechanism to take care of dma mask. But Andi pointed out, "The GART is somewhere in the 4GB range so you cannot use it to map anything < 4GB. Also GART is pretty small." http://lkml.org/lkml/2008/9/12/43 That means it's possible that GART doesn't have virtual I/O address space that a device can access to. The above commit (to modify GART mapping mechanism to take care of dma mask) can't fix the regression reliably so let's avoid making GART more complicated. We need a solution that always works for dma_masks > 24bit < 32bits. That's how GART worked before the alloc_coherent rewrite. Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- arch/x86/kernel/pci-gart_64.c | 39 +++++++++++---------------------------- 1 files changed, 11 insertions(+), 28 deletions(-) diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c index 9e390f1..7e08e46 100644 --- a/arch/x86/kernel/pci-gart_64.c +++ b/arch/x86/kernel/pci-gart_64.c @@ -83,34 +83,23 @@ static unsigned long next_bit; /* protected by iommu_bitmap_lock */ static int need_flush; /* global flush state. set for each gart wrap */ static unsigned long alloc_iommu(struct device *dev, int size, - unsigned long align_mask, u64 dma_mask) + unsigned long align_mask) { unsigned long offset, flags; unsigned long boundary_size; unsigned long base_index; - unsigned long limit; base_index = ALIGN(iommu_bus_base & dma_get_seg_boundary(dev), PAGE_SIZE) >> PAGE_SHIFT; boundary_size = ALIGN((unsigned long long)dma_get_seg_boundary(dev) + 1, PAGE_SIZE) >> PAGE_SHIFT; - limit = iommu_device_max_index(iommu_pages, - DIV_ROUND_UP(iommu_bus_base, PAGE_SIZE), - dma_mask >> PAGE_SHIFT); - spin_lock_irqsave(&iommu_bitmap_lock, flags); - - if (limit <= next_bit) { - need_flush = 1; - next_bit = 0; - } - - offset = iommu_area_alloc(iommu_gart_bitmap, limit, next_bit, + offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, next_bit, size, base_index, boundary_size, align_mask); - if (offset == -1 && next_bit) { + if (offset == -1) { need_flush = 1; - offset = iommu_area_alloc(iommu_gart_bitmap, limit, 0, + offset = iommu_area_alloc(iommu_gart_bitmap, iommu_pages, 0, size, base_index, boundary_size, align_mask); } @@ -239,14 +228,12 @@ nonforced_iommu(struct device *dev, unsigned long addr, size_t size) * Caller needs to check if the iommu is needed and flush. */ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, - size_t size, int dir, unsigned long align_mask, - u64 dma_mask) + size_t size, int dir, unsigned long align_mask) { unsigned long npages = iommu_num_pages(phys_mem, size); - unsigned long iommu_page; + unsigned long iommu_page = alloc_iommu(dev, npages, align_mask); int i; - iommu_page = alloc_iommu(dev, npages, align_mask, dma_mask); if (iommu_page == -1) { if (!nonforced_iommu(dev, phys_mem, size)) return phys_mem; @@ -276,7 +263,7 @@ gart_map_single(struct device *dev, phys_addr_t paddr, size_t size, int dir) if (!need_iommu(dev, paddr, size)) return paddr; - bus = dma_map_area(dev, paddr, size, dir, 0, dma_get_mask(dev)); + bus = dma_map_area(dev, paddr, size, dir, 0); flush_gart(); return bus; @@ -327,7 +314,6 @@ static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg, { struct scatterlist *s; int i; - u64 dma_mask = dma_get_mask(dev); #ifdef CONFIG_IOMMU_DEBUG printk(KERN_DEBUG "dma_map_sg overflow\n"); @@ -337,8 +323,7 @@ static int dma_map_sg_nonforce(struct device *dev, struct scatterlist *sg, unsigned long addr = sg_phys(s); if (nonforced_iommu(dev, addr, s->length)) { - addr = dma_map_area(dev, addr, s->length, dir, 0, - dma_mask); + addr = dma_map_area(dev, addr, s->length, dir, 0); if (addr == bad_dma_address) { if (i > 0) gart_unmap_sg(dev, sg, i, dir); @@ -360,16 +345,14 @@ static int __dma_map_cont(struct device *dev, struct scatterlist *start, int nelems, struct scatterlist *sout, unsigned long pages) { - unsigned long iommu_start; - unsigned long iommu_page; + unsigned long iommu_start = alloc_iommu(dev, pages, 0); + unsigned long iommu_page = iommu_start; struct scatterlist *s; int i; - iommu_start = alloc_iommu(dev, pages, 0, dma_get_mask(dev)); if (iommu_start == -1) return -1; - iommu_page = iommu_start; for_each_sg(start, s, nelems, i) { unsigned long pages, addr; unsigned long phys_addr = s->dma_address; @@ -522,7 +505,7 @@ gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr, align_mask = (1UL << get_order(size)) - 1; *dma_addr = dma_map_area(dev, paddr, size, DMA_BIDIRECTIONAL, - align_mask, dma_mask); + align_mask); flush_gart(); if (*dma_addr != bad_dma_address) -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH 3/3] x86: restore old GART alloc_coherent behavior 2008-09-24 11:48 ` [PATCH 2/3] revert "x86: make GART to respect device's dma_mask about virtual mappings" FUJITA Tomonori @ 2008-09-24 11:48 ` FUJITA Tomonori 2008-09-24 12:11 ` Joerg Roedel ` (2 more replies) 0 siblings, 3 replies; 11+ messages in thread From: FUJITA Tomonori @ 2008-09-24 11:48 UTC (permalink / raw) To: mingo; +Cc: linux-kernel, andi, joerg.roedel, fujita.tomonori Currently, GART alloc_coherent tries to allocate pages with GFP_DMA32 for a device having dma_masks > 24bit < 32bits. If GART gets an address that a device can't access to, GART try to map the address to a virtual I/O address that the device can access to. But Andi pointed out, "The GART is somewhere in the 4GB range so you cannot use it to map anything < 4GB. Also GART is pretty small." http://lkml.org/lkml/2008/9/12/43 That is, it's possible that GART doesn't have virtual I/O address space that a device can access to. The above behavior doesn't work for a device having dma_masks > 24bit < 32bits. This patch restores old GART alloc_coherent behavior (before the alloc_coherent rewrite). Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> --- arch/x86/kernel/pci-gart_64.c | 43 +++++++++++++++++++--------------------- 1 files changed, 20 insertions(+), 23 deletions(-) diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c index 7e08e46..25c94fb 100644 --- a/arch/x86/kernel/pci-gart_64.c +++ b/arch/x86/kernel/pci-gart_64.c @@ -487,31 +487,28 @@ static void * gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr, gfp_t flag) { - void *vaddr; dma_addr_t paddr; unsigned long align_mask; - u64 dma_mask = dma_alloc_coherent_mask(dev, flag); - - vaddr = (void *)__get_free_pages(flag | __GFP_ZERO, get_order(size)); - if (!vaddr) - return NULL; - - paddr = virt_to_phys(vaddr); - if (is_buffer_dma_capable(dma_mask, paddr, size)) { - *dma_addr = paddr; - return vaddr; - } - - align_mask = (1UL << get_order(size)) - 1; - - *dma_addr = dma_map_area(dev, paddr, size, DMA_BIDIRECTIONAL, - align_mask); - flush_gart(); - - if (*dma_addr != bad_dma_address) - return vaddr; - - free_pages((unsigned long)vaddr, get_order(size)); + struct page *page; + + if (force_iommu && !(flag & GFP_DMA)) { + flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); + page = alloc_pages(flag | __GFP_ZERO, get_order(size)); + if (!page) + return NULL; + + align_mask = (1UL << get_order(size)) - 1; + paddr = dma_map_area(dev, page_to_phys(page), size, + DMA_BIDIRECTIONAL, align_mask); + + flush_gart(); + if (paddr != bad_dma_address) { + *dma_addr = paddr; + return page_address(page); + } + __free_pages(page, get_order(size)); + } else + return dma_generic_alloc_coherent(dev, size, dma_addr, flag); return NULL; } -- 1.5.4.2 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] x86: restore old GART alloc_coherent behavior 2008-09-24 11:48 ` [PATCH 3/3] x86: restore old GART alloc_coherent behavior FUJITA Tomonori @ 2008-09-24 12:11 ` Joerg Roedel 2008-09-24 12:43 ` FUJITA Tomonori 2008-09-24 13:06 ` Alan Cox 2008-09-24 22:01 ` Andi Kleen 2 siblings, 1 reply; 11+ messages in thread From: Joerg Roedel @ 2008-09-24 12:11 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: mingo, linux-kernel, andi On Wed, Sep 24, 2008 at 08:48:37PM +0900, FUJITA Tomonori wrote: > Currently, GART alloc_coherent tries to allocate pages with GFP_DMA32 > for a device having dma_masks > 24bit < 32bits. If GART gets an > address that a device can't access to, GART try to map the address to > a virtual I/O address that the device can access to. > > But Andi pointed out, "The GART is somewhere in the 4GB range so you > cannot use it to map anything < 4GB. Also GART is pretty small." > > http://lkml.org/lkml/2008/9/12/43 > > That is, it's possible that GART doesn't have virtual I/O address > space that a device can access to. The above behavior doesn't work for > a device having dma_masks > 24bit < 32bits. > > This patch restores old GART alloc_coherent behavior (before the > alloc_coherent rewrite). Patchset looks good in principle. But I have a small question, see below. > > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > --- > arch/x86/kernel/pci-gart_64.c | 43 +++++++++++++++++++--------------------- > 1 files changed, 20 insertions(+), 23 deletions(-) > > diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c > index 7e08e46..25c94fb 100644 > --- a/arch/x86/kernel/pci-gart_64.c > +++ b/arch/x86/kernel/pci-gart_64.c > @@ -487,31 +487,28 @@ static void * > gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr, > gfp_t flag) > { > - void *vaddr; > dma_addr_t paddr; > unsigned long align_mask; > - u64 dma_mask = dma_alloc_coherent_mask(dev, flag); > - > - vaddr = (void *)__get_free_pages(flag | __GFP_ZERO, get_order(size)); > - if (!vaddr) > - return NULL; > - > - paddr = virt_to_phys(vaddr); > - if (is_buffer_dma_capable(dma_mask, paddr, size)) { > - *dma_addr = paddr; > - return vaddr; > - } > - > - align_mask = (1UL << get_order(size)) - 1; > - > - *dma_addr = dma_map_area(dev, paddr, size, DMA_BIDIRECTIONAL, > - align_mask); > - flush_gart(); > - > - if (*dma_addr != bad_dma_address) > - return vaddr; > - > - free_pages((unsigned long)vaddr, get_order(size)); > + struct page *page; > + > + if (force_iommu && !(flag & GFP_DMA)) { > + flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); > + page = alloc_pages(flag | __GFP_ZERO, get_order(size)); > + if (!page) > + return NULL; > + > + align_mask = (1UL << get_order(size)) - 1; > + paddr = dma_map_area(dev, page_to_phys(page), size, > + DMA_BIDIRECTIONAL, align_mask); Can't we check if a mapping is required before calling dma_map_area? Your recently introduced IOMMU helper functions should make that easy and GART address space is a rare resource. AFAIR this is also the behaviour of the old generic dma_alloc_coherent function. > + > + flush_gart(); This should depend on need_flush. > + if (paddr != bad_dma_address) { > + *dma_addr = paddr; > + return page_address(page); > + } > + __free_pages(page, get_order(size)); > + } else > + return dma_generic_alloc_coherent(dev, size, dma_addr, flag); > > return NULL; > } > -- > 1.5.4.2 > > -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] x86: restore old GART alloc_coherent behavior 2008-09-24 12:11 ` Joerg Roedel @ 2008-09-24 12:43 ` FUJITA Tomonori 2008-09-24 12:57 ` Joerg Roedel 0 siblings, 1 reply; 11+ messages in thread From: FUJITA Tomonori @ 2008-09-24 12:43 UTC (permalink / raw) To: joerg.roedel; +Cc: fujita.tomonori, mingo, linux-kernel, andi On Wed, 24 Sep 2008 14:11:27 +0200 Joerg Roedel <joerg.roedel@amd.com> wrote: > On Wed, Sep 24, 2008 at 08:48:37PM +0900, FUJITA Tomonori wrote: > > Currently, GART alloc_coherent tries to allocate pages with GFP_DMA32 > > for a device having dma_masks > 24bit < 32bits. If GART gets an > > address that a device can't access to, GART try to map the address to > > a virtual I/O address that the device can access to. > > > > But Andi pointed out, "The GART is somewhere in the 4GB range so you > > cannot use it to map anything < 4GB. Also GART is pretty small." > > > > http://lkml.org/lkml/2008/9/12/43 > > > > That is, it's possible that GART doesn't have virtual I/O address > > space that a device can access to. The above behavior doesn't work for > > a device having dma_masks > 24bit < 32bits. > > > > This patch restores old GART alloc_coherent behavior (before the > > alloc_coherent rewrite). > > Patchset looks good in principle. But I have a small question, see > below. > > > > > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > > --- > > arch/x86/kernel/pci-gart_64.c | 43 +++++++++++++++++++--------------------- > > 1 files changed, 20 insertions(+), 23 deletions(-) > > > > diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c > > index 7e08e46..25c94fb 100644 > > --- a/arch/x86/kernel/pci-gart_64.c > > +++ b/arch/x86/kernel/pci-gart_64.c > > @@ -487,31 +487,28 @@ static void * > > gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr, > > gfp_t flag) > > { > > - void *vaddr; > > dma_addr_t paddr; > > unsigned long align_mask; > > - u64 dma_mask = dma_alloc_coherent_mask(dev, flag); > > - > > - vaddr = (void *)__get_free_pages(flag | __GFP_ZERO, get_order(size)); > > - if (!vaddr) > > - return NULL; > > - > > - paddr = virt_to_phys(vaddr); > > - if (is_buffer_dma_capable(dma_mask, paddr, size)) { > > - *dma_addr = paddr; > > - return vaddr; > > - } > > - > > - align_mask = (1UL << get_order(size)) - 1; > > - > > - *dma_addr = dma_map_area(dev, paddr, size, DMA_BIDIRECTIONAL, > > - align_mask); > > - flush_gart(); > > - > > - if (*dma_addr != bad_dma_address) > > - return vaddr; > > - > > - free_pages((unsigned long)vaddr, get_order(size)); > > + struct page *page; > > + > > + if (force_iommu && !(flag & GFP_DMA)) { > > + flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); > > + page = alloc_pages(flag | __GFP_ZERO, get_order(size)); > > + if (!page) > > + return NULL; > > + > > + align_mask = (1UL << get_order(size)) - 1; > > + paddr = dma_map_area(dev, page_to_phys(page), size, > > + DMA_BIDIRECTIONAL, align_mask); > > Can't we check if a mapping is required before calling dma_map_area? > Your recently introduced IOMMU helper functions should make that easy > and GART address space is a rare resource. AFAIR this is also the > behaviour of the old generic dma_alloc_coherent function. I think that the behavior of the old generic dma_alloc_coherent function and GART is different. The old GART code does virtual mappings only with force_iommu option enabled. The old GART code always do virtual mappings with force_iommu option enabled (unless GFP_DMA is set). > > + > > + flush_gart(); > > This should depend on need_flush. Theoretically, yes, I think. But this patch restores the old GART code behavior. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] x86: restore old GART alloc_coherent behavior 2008-09-24 12:43 ` FUJITA Tomonori @ 2008-09-24 12:57 ` Joerg Roedel 0 siblings, 0 replies; 11+ messages in thread From: Joerg Roedel @ 2008-09-24 12:57 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: mingo, linux-kernel, andi On Wed, Sep 24, 2008 at 09:43:59PM +0900, FUJITA Tomonori wrote: > On Wed, 24 Sep 2008 14:11:27 +0200 > Joerg Roedel <joerg.roedel@amd.com> wrote: > > > On Wed, Sep 24, 2008 at 08:48:37PM +0900, FUJITA Tomonori wrote: > > > Currently, GART alloc_coherent tries to allocate pages with GFP_DMA32 > > > for a device having dma_masks > 24bit < 32bits. If GART gets an > > > address that a device can't access to, GART try to map the address to > > > a virtual I/O address that the device can access to. > > > > > > But Andi pointed out, "The GART is somewhere in the 4GB range so you > > > cannot use it to map anything < 4GB. Also GART is pretty small." > > > > > > http://lkml.org/lkml/2008/9/12/43 > > > > > > That is, it's possible that GART doesn't have virtual I/O address > > > space that a device can access to. The above behavior doesn't work for > > > a device having dma_masks > 24bit < 32bits. > > > > > > This patch restores old GART alloc_coherent behavior (before the > > > alloc_coherent rewrite). > > > > Patchset looks good in principle. But I have a small question, see > > below. > > > > > > > > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> > > > --- > > > arch/x86/kernel/pci-gart_64.c | 43 +++++++++++++++++++--------------------- > > > 1 files changed, 20 insertions(+), 23 deletions(-) > > > > > > diff --git a/arch/x86/kernel/pci-gart_64.c b/arch/x86/kernel/pci-gart_64.c > > > index 7e08e46..25c94fb 100644 > > > --- a/arch/x86/kernel/pci-gart_64.c > > > +++ b/arch/x86/kernel/pci-gart_64.c > > > @@ -487,31 +487,28 @@ static void * > > > gart_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_addr, > > > gfp_t flag) > > > { > > > - void *vaddr; > > > dma_addr_t paddr; > > > unsigned long align_mask; > > > - u64 dma_mask = dma_alloc_coherent_mask(dev, flag); > > > - > > > - vaddr = (void *)__get_free_pages(flag | __GFP_ZERO, get_order(size)); > > > - if (!vaddr) > > > - return NULL; > > > - > > > - paddr = virt_to_phys(vaddr); > > > - if (is_buffer_dma_capable(dma_mask, paddr, size)) { > > > - *dma_addr = paddr; > > > - return vaddr; > > > - } > > > - > > > - align_mask = (1UL << get_order(size)) - 1; > > > - > > > - *dma_addr = dma_map_area(dev, paddr, size, DMA_BIDIRECTIONAL, > > > - align_mask); > > > - flush_gart(); > > > - > > > - if (*dma_addr != bad_dma_address) > > > - return vaddr; > > > - > > > - free_pages((unsigned long)vaddr, get_order(size)); > > > + struct page *page; > > > + > > > + if (force_iommu && !(flag & GFP_DMA)) { > > > + flag &= ~(__GFP_DMA | __GFP_HIGHMEM | __GFP_DMA32); > > > + page = alloc_pages(flag | __GFP_ZERO, get_order(size)); > > > + if (!page) > > > + return NULL; > > > + > > > + align_mask = (1UL << get_order(size)) - 1; > > > + paddr = dma_map_area(dev, page_to_phys(page), size, > > > + DMA_BIDIRECTIONAL, align_mask); > > > > Can't we check if a mapping is required before calling dma_map_area? > > Your recently introduced IOMMU helper functions should make that easy > > and GART address space is a rare resource. AFAIR this is also the > > behaviour of the old generic dma_alloc_coherent function. > > I think that the behavior of the old generic dma_alloc_coherent > function and GART is different. > > The old GART code does virtual mappings only with force_iommu option > enabled. The old GART code always do virtual mappings with force_iommu > option enabled (unless GFP_DMA is set). Hmm true, the old code called map_simple and not map_single so it mapped always. > > > > > + > > > + flush_gart(); > > > > This should depend on need_flush. > > Theoretically, yes, I think. But this patch restores the old GART code > behavior. Ok. The map_simple function of GART did it unconditionally too so this is the old behavior. But its safe here to flush conditionally. I'll send a follow-up patch. Joerg -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] x86: restore old GART alloc_coherent behavior 2008-09-24 11:48 ` [PATCH 3/3] x86: restore old GART alloc_coherent behavior FUJITA Tomonori 2008-09-24 12:11 ` Joerg Roedel @ 2008-09-24 13:06 ` Alan Cox 2008-09-24 22:01 ` Andi Kleen 2 siblings, 0 replies; 11+ messages in thread From: Alan Cox @ 2008-09-24 13:06 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: mingo, linux-kernel, andi, joerg.roedel, fujita.tomonori On Wed, 24 Sep 2008 20:48:37 +0900 FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > Currently, GART alloc_coherent tries to allocate pages with GFP_DMA32 > for a device having dma_masks > 24bit < 32bits. If GART gets an > address that a device can't access to, GART try to map the address to > a virtual I/O address that the device can access to. > > But Andi pointed out, "The GART is somewhere in the 4GB range so you > cannot use it to map anything < 4GB. Also GART is pretty small." > > http://lkml.org/lkml/2008/9/12/43 > > That is, it's possible that GART doesn't have virtual I/O address > space that a device can access to. The above behavior doesn't work for > a device having dma_masks > 24bit < 32bits. > > This patch restores old GART alloc_coherent behavior (before the > alloc_coherent rewrite). > > Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Alan Cox <alan@redhat.com> This is indeed a specific problem found with things like older AACRAID where control blocks must be below 31bits and the GART is above 0x80000000 Alan ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 3/3] x86: restore old GART alloc_coherent behavior 2008-09-24 11:48 ` [PATCH 3/3] x86: restore old GART alloc_coherent behavior FUJITA Tomonori 2008-09-24 12:11 ` Joerg Roedel 2008-09-24 13:06 ` Alan Cox @ 2008-09-24 22:01 ` Andi Kleen 2 siblings, 0 replies; 11+ messages in thread From: Andi Kleen @ 2008-09-24 22:01 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: mingo, linux-kernel, andi, joerg.roedel Thanks for changing it back. Acked-by: Andi Kleen <ak@linux.intel.com> -- ak@linux.intel.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] x86: restore old GART alloc_coherent 2008-09-24 11:48 [PATCH 0/3] x86: restore old GART alloc_coherent FUJITA Tomonori 2008-09-24 11:48 ` [PATCH 1/3] x86: export pci-nommu's alloc_coherent FUJITA Tomonori @ 2008-09-24 12:58 ` Joerg Roedel 2008-09-25 9:04 ` Ingo Molnar 2 siblings, 0 replies; 11+ messages in thread From: Joerg Roedel @ 2008-09-24 12:58 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: mingo, linux-kernel, andi On Wed, Sep 24, 2008 at 08:48:34PM +0900, FUJITA Tomonori wrote: > This pachset is against tip/iommu. > > What this patchset does is restoring old GART alloc_coherent behavior > (before the alloc_coherent rewrite): > > http://lkml.org/lkml/2008/8/12/200 > > Currently, GART alloc_coherent tries to allocate pages with GFP_DMA32 > for a device having dma_masks > 24bit < 32bits. If GART gets an > address that a device can't access to, GART tries to map the address > to a virtual I/O address that the device can access to. > > But Andi pointed out, "The GART is somewhere in the 4GB range so you > cannot use it to map anything < 4GB. Also GART is pretty small." > > http://lkml.org/lkml/2008/9/12/43 > > So it's possible that GART doesn't have virtual I/O address space that > a device can access to. The current behavior might not work for a > device having dma_masks > 24bit < 32bits. This patchset restores old > GART alloc_coherent behavior, which doesn't use GART hardware (if an > user doesn't enable force_iommu option). Acked-by: Joerg Roedel <joerg.roedel@amd.com> -- | AMD Saxony Limited Liability Company & Co. KG Operating | Wilschdorfer Landstr. 101, 01109 Dresden, Germany System | Register Court Dresden: HRA 4896 Research | General Partner authorized to represent: Center | AMD Saxony LLC (Wilmington, Delaware, US) | General Manager of AMD Saxony LLC: Dr. Hans-R. Deppe, Thomas McCoy ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 0/3] x86: restore old GART alloc_coherent 2008-09-24 11:48 [PATCH 0/3] x86: restore old GART alloc_coherent FUJITA Tomonori 2008-09-24 11:48 ` [PATCH 1/3] x86: export pci-nommu's alloc_coherent FUJITA Tomonori 2008-09-24 12:58 ` [PATCH 0/3] x86: restore old GART alloc_coherent Joerg Roedel @ 2008-09-25 9:04 ` Ingo Molnar 2 siblings, 0 replies; 11+ messages in thread From: Ingo Molnar @ 2008-09-25 9:04 UTC (permalink / raw) To: FUJITA Tomonori; +Cc: linux-kernel, andi, joerg.roedel, Alan Cox * FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> wrote: > This pachset is against tip/iommu. > > What this patchset does is restoring old GART alloc_coherent behavior > (before the alloc_coherent rewrite): > > http://lkml.org/lkml/2008/8/12/200 applied your fixes to tip/x86/iommu: 1615965: x86 gart: remove unnecessary initialization 1d99088: x86: restore old GART alloc_coherent behavior ecef533: revert "x86: make GART to respect device's dma_mask about virtual mappings" 9f6ac57: x86: export pci-nommu's alloc_coherent thanks Fujita! Ingo ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-09-25 9:04 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-09-24 11:48 [PATCH 0/3] x86: restore old GART alloc_coherent FUJITA Tomonori 2008-09-24 11:48 ` [PATCH 1/3] x86: export pci-nommu's alloc_coherent FUJITA Tomonori 2008-09-24 11:48 ` [PATCH 2/3] revert "x86: make GART to respect device's dma_mask about virtual mappings" FUJITA Tomonori 2008-09-24 11:48 ` [PATCH 3/3] x86: restore old GART alloc_coherent behavior FUJITA Tomonori 2008-09-24 12:11 ` Joerg Roedel 2008-09-24 12:43 ` FUJITA Tomonori 2008-09-24 12:57 ` Joerg Roedel 2008-09-24 13:06 ` Alan Cox 2008-09-24 22:01 ` Andi Kleen 2008-09-24 12:58 ` [PATCH 0/3] x86: restore old GART alloc_coherent Joerg Roedel 2008-09-25 9:04 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox