* [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
@ 2005-03-18 21:23 Matt Domsch
2005-03-19 6:09 ` Arjan van de Ven
2005-03-19 19:26 ` Andi Kleen
0 siblings, 2 replies; 7+ messages in thread
From: Matt Domsch @ 2005-03-18 21:23 UTC (permalink / raw)
To: ak, linux-kernel; +Cc: linux-scsi
For review and comment.
On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever
there are any pages mapped above 4GB), pci_alloc_consistent() falls
back to using ZONE_DMA for all allocations, even if the device's
dma_mask could have supported using memory from other zones. Problems
can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume
all of ZONE_DMA, leaving none left for pci_alloc_consistent() use.
Patch below makes pci_alloc_consistent() for the nommu case (EM64T
processors) match the 2.6 implementation of dma_alloc_coherent(), with
the exception that this continues to use GFP_ATOMIC.
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Thanks,
Matt
--
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
--- linux-2.4/arch/x86_64/kernel/pci-nommu.c Fri Feb 25 13:01:44 2005
+++ linux-2.4/arch/x86_64/kernel/pci-nommu.c Fri Feb 25 06:56:55 2005
@@ -13,18 +13,28 @@ void *pci_alloc_consistent(struct pci_de
dma_addr_t *dma_handle)
{
void *ret;
+ u64 mask;
+ int order = get_order(size);
int gfp = GFP_ATOMIC;
-
- if (hwdev == NULL ||
- end_pfn > (hwdev->dma_mask>>PAGE_SHIFT) || /* XXX */
- (u32)hwdev->dma_mask < 0xffffffff)
- gfp |= GFP_DMA;
- ret = (void *)__get_free_pages(gfp, get_order(size));
- if (ret != NULL) {
- memset(ret, 0, size);
+ if (hwdev)
+ mask = hwdev->dma_mask;
+ else
+ mask = 0xffffffffULL;
+
+ for (;;) {
+ ret = (void *)__get_free_pages(gfp, order);
+ if (ret == NULL)
+ return NULL;
*dma_handle = virt_to_bus(ret);
+ if ((*dma_handle & ~mask) == 0)
+ break;
+ free_pages((unsigned long)ret, order);
+ if (gfp & GFP_DMA)
+ return NULL;
+ gfp |= GFP_DMA;
}
+ memset(ret, 0, size);
return ret;
}
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation 2005-03-18 21:23 [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation Matt Domsch @ 2005-03-19 6:09 ` Arjan van de Ven 2005-03-19 14:16 ` Matt Domsch 2005-03-19 19:26 ` Andi Kleen 1 sibling, 1 reply; 7+ messages in thread From: Arjan van de Ven @ 2005-03-19 6:09 UTC (permalink / raw) To: Matt Domsch; +Cc: ak, linux-kernel, linux-scsi On Fri, 2005-03-18 at 15:23 -0600, Matt Domsch wrote: > For review and comment. > > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever > there are any pages mapped above 4GB), pci_alloc_consistent() falls > back to using ZONE_DMA for all allocations, even if the device's > dma_mask could have supported using memory from other zones. Problems > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use. scsi_malloc no longer uses ZONE_DMA nowadays.... ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation 2005-03-19 6:09 ` Arjan van de Ven @ 2005-03-19 14:16 ` Matt Domsch 2005-03-19 16:27 ` Arjan van de Ven 0 siblings, 1 reply; 7+ messages in thread From: Matt Domsch @ 2005-03-19 14:16 UTC (permalink / raw) To: Arjan van de Ven; +Cc: ak, linux-kernel, linux-scsi On Sat, Mar 19, 2005 at 07:09:45AM +0100, Arjan van de Ven wrote: > On Fri, 2005-03-18 at 15:23 -0600, Matt Domsch wrote: > > For review and comment. > > > > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever > > there are any pages mapped above 4GB), pci_alloc_consistent() falls > > back to using ZONE_DMA for all allocations, even if the device's > > dma_mask could have supported using memory from other zones. Problems > > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume > > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use. > > scsi_malloc no longer uses ZONE_DMA nowadays.... In 2.4.x it does. scsi_resize_dma_pool() has: __get_free_pages(GFP_ATOMIC | GFP_DMA, 0); scsi_init_minimal_dma_pool() has similar. -- Matt Domsch Software Architect Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation 2005-03-19 14:16 ` Matt Domsch @ 2005-03-19 16:27 ` Arjan van de Ven 0 siblings, 0 replies; 7+ messages in thread From: Arjan van de Ven @ 2005-03-19 16:27 UTC (permalink / raw) To: Matt Domsch; +Cc: ak, linux-kernel, linux-scsi On Sat, 2005-03-19 at 08:16 -0600, Matt Domsch wrote: > On Sat, Mar 19, 2005 at 07:09:45AM +0100, Arjan van de Ven wrote: > > On Fri, 2005-03-18 at 15:23 -0600, Matt Domsch wrote: > > > For review and comment. > > > > > > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever > > > there are any pages mapped above 4GB), pci_alloc_consistent() falls > > > back to using ZONE_DMA for all allocations, even if the device's > > > dma_mask could have supported using memory from other zones. Problems > > > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume > > > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use. > > > > scsi_malloc no longer uses ZONE_DMA nowadays.... > > In 2.4.x it does. scsi_resize_dma_pool() has: > __get_free_pages(GFP_ATOMIC | GFP_DMA, 0); > scsi_init_minimal_dma_pool() has similar. > oh you want to do major changes to the 2.4 tree... sounds like a bad idea to change such vm behavior.. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation 2005-03-18 21:23 [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation Matt Domsch 2005-03-19 6:09 ` Arjan van de Ven @ 2005-03-19 19:26 ` Andi Kleen 2005-03-19 22:17 ` Matt Domsch 1 sibling, 1 reply; 7+ messages in thread From: Andi Kleen @ 2005-03-19 19:26 UTC (permalink / raw) To: Matt Domsch; +Cc: ak, linux-kernel, linux-scsi On Fri, Mar 18, 2005 at 03:23:44PM -0600, Matt Domsch wrote: > For review and comment. > > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever > there are any pages mapped above 4GB), pci_alloc_consistent() falls > back to using ZONE_DMA for all allocations, even if the device's > dma_mask could have supported using memory from other zones. Problems > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use. > > Patch below makes pci_alloc_consistent() for the nommu case (EM64T > processors) match the 2.6 implementation of dma_alloc_coherent(), with > the exception that this continues to use GFP_ATOMIC. You fixed the wrong code. The pci-nommu code is only used when IOMMU is disabled in the Kconfig. But most kernels have it enabled. You would need to change it in pci-gart.c too. The reason it is like this that nommu was always intended as a hackish kludge that would be only used for debugging - little did we know that it would become standard later. -Andi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation 2005-03-19 19:26 ` Andi Kleen @ 2005-03-19 22:17 ` Matt Domsch 2005-03-22 21:51 ` Siddha, Suresh B 0 siblings, 1 reply; 7+ messages in thread From: Matt Domsch @ 2005-03-19 22:17 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel, linux-scsi On Sat, Mar 19, 2005 at 08:26:45PM +0100, Andi Kleen wrote: > On Fri, Mar 18, 2005 at 03:23:44PM -0600, Matt Domsch wrote: > > For review and comment. > > > > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever > > there are any pages mapped above 4GB), pci_alloc_consistent() falls > > back to using ZONE_DMA for all allocations, even if the device's > > dma_mask could have supported using memory from other zones. Problems > > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume > > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use. > > > > Patch below makes pci_alloc_consistent() for the nommu case (EM64T > > processors) match the 2.6 implementation of dma_alloc_coherent(), with > > the exception that this continues to use GFP_ATOMIC. > > You fixed the wrong code. The pci-nommu code is only used > when IOMMU is disabled in the Kconfig. But most kernels have > it enabled. You would need to change it in pci-gart.c too. OK, then how's this for review? Compiles clean, can't test it myself for a few days. -- Matt Domsch Software Architect Dell Linux Solutions linux.dell.com & www.dell.com/linux Linux on Dell mailing lists @ http://lists.us.dell.com ===== arch/x86_64/kernel/pci-gart.c 1.12 vs edited ===== --- 1.12/arch/x86_64/kernel/pci-gart.c 2004-06-03 05:29:36 -05:00 +++ edited/arch/x86_64/kernel/pci-gart.c 2005-03-19 15:56:34 -06:00 @@ -154,27 +154,37 @@ void *pci_alloc_consistent(struct pci_de int gfp = GFP_ATOMIC; int i; unsigned long iommu_page; + dma_addr_t dma_mask; - if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || no_iommu) + if (hwdev == NULL || hwdev->dma_mask < 0xffffffff) gfp |= GFP_DMA; + dma_mask = hwdev ? hwdev->dma_mask : 0xffffffffULL; + if (dma_mask == 0) + dma_mask = 0xffffffffULL; + /* - * First try to allocate continuous and use directly if already - * in lowmem. + * First try to allocate continuous and use directly if + * our device supports it */ size = round_up(size, PAGE_SIZE); + again: memory = (void *)__get_free_pages(gfp, get_order(size)); if (memory == NULL) { return NULL; } else { - int high = 0, mmu; - if (((unsigned long)virt_to_bus(memory) + size) > 0xffffffffUL) - high = 1; - mmu = high; + int high = (((unsigned long)virt_to_bus(memory) + size) & ~dma_mask) != 0; + int mmu = high; if (force_mmu && !(gfp & GFP_DMA)) mmu = 1; if (no_iommu) { - if (high) goto error; + if (high && (gfp & GFP_DMA)) + goto error; + if (high) { + free_pages((unsigned long)memory, get_order(size)); + gfp |= GFP_DMA; + goto again; + } mmu = 0; } memset(memory, 0, size); ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation 2005-03-19 22:17 ` Matt Domsch @ 2005-03-22 21:51 ` Siddha, Suresh B 0 siblings, 0 replies; 7+ messages in thread From: Siddha, Suresh B @ 2005-03-22 21:51 UTC (permalink / raw) To: Matt Domsch; +Cc: Andi Kleen, linux-kernel, linux-scsi On Sat, Mar 19, 2005 at 04:17:51PM -0600, Matt Domsch wrote: > OK, then how's this for review? Compiles clean, can't test it myself > for a few days. > > - int high = 0, mmu; > - if (((unsigned long)virt_to_bus(memory) + size) > 0xffffffffUL) > - high = 1; > - mmu = high; > + int high = (((unsigned long)virt_to_bus(memory) + size) & ~dma_mask) != 0; > + int mmu = high; Documentation/DMA-mapping.txt says consistent DMA mapping interface will always return SAC addressable DMA address. Your patch breaks this behavior. (Though I don't know the reason why this behavior is expected!) Appended is a simple 2.4 patch which will sync the behavior with 2.6 thanks, suresh -- Sync 2.4 pci_alloc_consistent behavior with 2.6 Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> diff -Nru linux-2.4.29/arch/ia64/lib/swiotlb.c linux-2.4.29-swiotlb/arch/ia64/lib/swiotlb.c --- linux-2.4.29/arch/ia64/lib/swiotlb.c 2003-08-25 04:44:39.000000000 -0700 +++ linux-2.4.29-swiotlb/arch/ia64/lib/swiotlb.c 2005-03-22 10:51:21.968565920 -0800 @@ -50,13 +50,13 @@ * Used to do a quick range check in swiotlb_unmap_single and swiotlb_sync_single, to see * if the memory was in fact allocated by this API. */ -static char *io_tlb_start, *io_tlb_end; +char *io_tlb_start, *io_tlb_end; /* * The number of IO TLB blocks (in groups of 64) betweeen io_tlb_start and io_tlb_end. * This is command line adjustable via setup_io_tlb_npages. */ -static unsigned long io_tlb_nslabs = 1024; +static unsigned long io_tlb_nslabs = 32768; /* * This is a free list describing the number of free entries available from each index diff -Nru linux-2.4.29/arch/x86_64/kernel/pci-gart.c linux-2.4.29-swiotlb/arch/x86_64/kernel/pci-gart.c --- linux-2.4.29/arch/x86_64/kernel/pci-gart.c 2004-08-07 16:26:04.000000000 -0700 +++ linux-2.4.29-swiotlb/arch/x86_64/kernel/pci-gart.c 2005-03-22 10:38:45.211610464 -0800 @@ -155,7 +155,7 @@ int i; unsigned long iommu_page; - if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || no_iommu) + if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || (no_iommu && !swiotlb)) gfp |= GFP_DMA; /* @@ -174,6 +174,22 @@ if (force_mmu && !(gfp & GFP_DMA)) mmu = 1; if (no_iommu) { +#ifdef CONFIG_SWIOTLB + if (swiotlb && high && hwdev) { + unsigned long dma_mask = 0; + if (hwdev->dma_mask == ~0UL) { + hwdev->dma_mask = 0xffffffff; + dma_mask = ~0UL; + } + *dma_handle = swiotlb_map_single(hwdev, memory, size, + PCI_DMA_FROMDEVICE); + if (dma_mask) + hwdev->dma_mask = dma_mask; + memset(phys_to_virt(*dma_handle), 0, size); + free_pages((unsigned long)memory, get_order(size)); + return phys_to_virt(*dma_handle); + } +#endif if (high) goto error; mmu = 0; } @@ -218,8 +234,16 @@ void *vaddr, dma_addr_t bus) { unsigned long iommu_page; - + extern char *io_tlb_start, *io_tlb_end; + size = round_up(size, PAGE_SIZE); +#ifdef CONFIG_SWIOTLB + if (swiotlb && vaddr >= (void *)io_tlb_start && + vaddr < (void *)io_tlb_end) { + swiotlb_unmap_single (hwdev, bus, size, PCI_DMA_TODEVICE); + return; + } +#endif if (bus >= iommu_bus_base && bus < iommu_bus_base + iommu_size) { unsigned pages = size >> PAGE_SHIFT; iommu_page = (bus - iommu_bus_base) >> PAGE_SHIFT; ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-03-22 21:51 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-03-18 21:23 [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation Matt Domsch 2005-03-19 6:09 ` Arjan van de Ven 2005-03-19 14:16 ` Matt Domsch 2005-03-19 16:27 ` Arjan van de Ven 2005-03-19 19:26 ` Andi Kleen 2005-03-19 22:17 ` Matt Domsch 2005-03-22 21:51 ` Siddha, Suresh B
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox