public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
@ 2005-03-18 21:23 Matt Domsch
  2005-03-19  6:09 ` Arjan van de Ven
  2005-03-19 19:26 ` Andi Kleen
  0 siblings, 2 replies; 7+ messages in thread
From: Matt Domsch @ 2005-03-18 21:23 UTC (permalink / raw)
  To: ak, linux-kernel; +Cc: linux-scsi

For review and comment.

On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever
there are any pages mapped above 4GB), pci_alloc_consistent() falls
back to using ZONE_DMA for all allocations, even if the device's
dma_mask could have supported using memory from other zones.  Problems
can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume
all of ZONE_DMA, leaving none left for pci_alloc_consistent() use.

Patch below makes pci_alloc_consistent() for the nommu case (EM64T
processors) match the 2.6 implementation of dma_alloc_coherent(), with
the exception that this continues to use GFP_ATOMIC.

Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>

Thanks,
Matt

-- 
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

--- linux-2.4/arch/x86_64/kernel/pci-nommu.c	Fri Feb 25 13:01:44 2005
+++ linux-2.4/arch/x86_64/kernel/pci-nommu.c	Fri Feb 25 06:56:55 2005
@@ -13,18 +13,28 @@ void *pci_alloc_consistent(struct pci_de
 			   dma_addr_t *dma_handle)
 {
 	void *ret;
+	u64 mask;
+	int order = get_order(size);
 	int gfp = GFP_ATOMIC;
-	
-	if (hwdev == NULL ||
-	    end_pfn > (hwdev->dma_mask>>PAGE_SHIFT) ||  /* XXX */
-	    (u32)hwdev->dma_mask < 0xffffffff)
-		gfp |= GFP_DMA;
-	ret = (void *)__get_free_pages(gfp, get_order(size));
 
-	if (ret != NULL) {
-		memset(ret, 0, size);
+	if (hwdev)
+		mask = hwdev->dma_mask;
+	else
+		mask = 0xffffffffULL;
+
+	for (;;) {
+		ret = (void *)__get_free_pages(gfp, order);
+		if (ret == NULL)
+			return NULL;
 		*dma_handle = virt_to_bus(ret);
+		if ((*dma_handle & ~mask) == 0)
+			break;
+		free_pages((unsigned long)ret, order);
+		if (gfp & GFP_DMA)
+			return NULL;
+		gfp |= GFP_DMA;
 	}
+	memset(ret, 0, size);
 	return ret;
 }
 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
  2005-03-18 21:23 [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation Matt Domsch
@ 2005-03-19  6:09 ` Arjan van de Ven
  2005-03-19 14:16   ` Matt Domsch
  2005-03-19 19:26 ` Andi Kleen
  1 sibling, 1 reply; 7+ messages in thread
From: Arjan van de Ven @ 2005-03-19  6:09 UTC (permalink / raw)
  To: Matt Domsch; +Cc: ak, linux-kernel, linux-scsi

On Fri, 2005-03-18 at 15:23 -0600, Matt Domsch wrote:
> For review and comment.
> 
> On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever
> there are any pages mapped above 4GB), pci_alloc_consistent() falls
> back to using ZONE_DMA for all allocations, even if the device's
> dma_mask could have supported using memory from other zones.  Problems
> can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume
> all of ZONE_DMA, leaving none left for pci_alloc_consistent() use.

scsi_malloc no longer uses ZONE_DMA nowadays....



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
  2005-03-19  6:09 ` Arjan van de Ven
@ 2005-03-19 14:16   ` Matt Domsch
  2005-03-19 16:27     ` Arjan van de Ven
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Domsch @ 2005-03-19 14:16 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: ak, linux-kernel, linux-scsi

On Sat, Mar 19, 2005 at 07:09:45AM +0100, Arjan van de Ven wrote:
> On Fri, 2005-03-18 at 15:23 -0600, Matt Domsch wrote:
> > For review and comment.
> > 
> > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever
> > there are any pages mapped above 4GB), pci_alloc_consistent() falls
> > back to using ZONE_DMA for all allocations, even if the device's
> > dma_mask could have supported using memory from other zones.  Problems
> > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume
> > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use.
> 
> scsi_malloc no longer uses ZONE_DMA nowadays....

In 2.4.x it does.  scsi_resize_dma_pool() has:
      __get_free_pages(GFP_ATOMIC | GFP_DMA, 0);
scsi_init_minimal_dma_pool() has similar.


-- 
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
  2005-03-19 14:16   ` Matt Domsch
@ 2005-03-19 16:27     ` Arjan van de Ven
  0 siblings, 0 replies; 7+ messages in thread
From: Arjan van de Ven @ 2005-03-19 16:27 UTC (permalink / raw)
  To: Matt Domsch; +Cc: ak, linux-kernel, linux-scsi

On Sat, 2005-03-19 at 08:16 -0600, Matt Domsch wrote:
> On Sat, Mar 19, 2005 at 07:09:45AM +0100, Arjan van de Ven wrote:
> > On Fri, 2005-03-18 at 15:23 -0600, Matt Domsch wrote:
> > > For review and comment.
> > > 
> > > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever
> > > there are any pages mapped above 4GB), pci_alloc_consistent() falls
> > > back to using ZONE_DMA for all allocations, even if the device's
> > > dma_mask could have supported using memory from other zones.  Problems
> > > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume
> > > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use.
> > 
> > scsi_malloc no longer uses ZONE_DMA nowadays....
> 
> In 2.4.x it does.  scsi_resize_dma_pool() has:
>       __get_free_pages(GFP_ATOMIC | GFP_DMA, 0);
> scsi_init_minimal_dma_pool() has similar.
> 

oh you want to do major changes to the 2.4 tree... sounds like a bad
idea to change such vm behavior..



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
  2005-03-18 21:23 [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation Matt Domsch
  2005-03-19  6:09 ` Arjan van de Ven
@ 2005-03-19 19:26 ` Andi Kleen
  2005-03-19 22:17   ` Matt Domsch
  1 sibling, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2005-03-19 19:26 UTC (permalink / raw)
  To: Matt Domsch; +Cc: ak, linux-kernel, linux-scsi

On Fri, Mar 18, 2005 at 03:23:44PM -0600, Matt Domsch wrote:
> For review and comment.
> 
> On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever
> there are any pages mapped above 4GB), pci_alloc_consistent() falls
> back to using ZONE_DMA for all allocations, even if the device's
> dma_mask could have supported using memory from other zones.  Problems
> can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume
> all of ZONE_DMA, leaving none left for pci_alloc_consistent() use.
> 
> Patch below makes pci_alloc_consistent() for the nommu case (EM64T
> processors) match the 2.6 implementation of dma_alloc_coherent(), with
> the exception that this continues to use GFP_ATOMIC.

You fixed the wrong code. The pci-nommu code is only used
when IOMMU is disabled in the Kconfig. But most kernels have
it enabled. You would need to change it in pci-gart.c too.
 
The reason it is like this that nommu was always intended as a hackish kludge
that would be only used for debugging - little did we know that
it would become standard later.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
  2005-03-19 19:26 ` Andi Kleen
@ 2005-03-19 22:17   ` Matt Domsch
  2005-03-22 21:51     ` Siddha, Suresh B
  0 siblings, 1 reply; 7+ messages in thread
From: Matt Domsch @ 2005-03-19 22:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel, linux-scsi

On Sat, Mar 19, 2005 at 08:26:45PM +0100, Andi Kleen wrote:
> On Fri, Mar 18, 2005 at 03:23:44PM -0600, Matt Domsch wrote:
> > For review and comment.
> > 
> > On x86_64 systems with no IOMMU and with >4GB RAM (in fact, whenever
> > there are any pages mapped above 4GB), pci_alloc_consistent() falls
> > back to using ZONE_DMA for all allocations, even if the device's
> > dma_mask could have supported using memory from other zones.  Problems
> > can be seen when other ZONE_DMA users (SWIOTLB, scsi_malloc()) consume
> > all of ZONE_DMA, leaving none left for pci_alloc_consistent() use.
> > 
> > Patch below makes pci_alloc_consistent() for the nommu case (EM64T
> > processors) match the 2.6 implementation of dma_alloc_coherent(), with
> > the exception that this continues to use GFP_ATOMIC.
> 
> You fixed the wrong code. The pci-nommu code is only used
> when IOMMU is disabled in the Kconfig. But most kernels have
> it enabled. You would need to change it in pci-gart.c too.

OK, then how's this for review?  Compiles clean, can't test it myself
for a few days.

-- 
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

===== arch/x86_64/kernel/pci-gart.c 1.12 vs edited =====
--- 1.12/arch/x86_64/kernel/pci-gart.c	2004-06-03 05:29:36 -05:00
+++ edited/arch/x86_64/kernel/pci-gart.c	2005-03-19 15:56:34 -06:00
@@ -154,27 +154,37 @@ void *pci_alloc_consistent(struct pci_de
 	int gfp = GFP_ATOMIC;
 	int i;
 	unsigned long iommu_page;
+	dma_addr_t dma_mask;
 
-	if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || no_iommu)
+	if (hwdev == NULL || hwdev->dma_mask < 0xffffffff)
 		gfp |= GFP_DMA;
 
+	dma_mask = hwdev ? hwdev->dma_mask : 0xffffffffULL;
+	if (dma_mask == 0)
+		dma_mask = 0xffffffffULL;
+
 	/* 
-	 * First try to allocate continuous and use directly if already 
-	 * in lowmem. 
+	 * First try to allocate continuous and use directly if
+	 * our device supports it
 	 */ 
 	size = round_up(size, PAGE_SIZE); 
+ again:
 	memory = (void *)__get_free_pages(gfp, get_order(size));
 	if (memory == NULL) {
 		return NULL; 
 	} else {
-		int high = 0, mmu;
-		if (((unsigned long)virt_to_bus(memory) + size) > 0xffffffffUL)
-			high = 1;
-		mmu = high;
+		int high = (((unsigned long)virt_to_bus(memory) + size) & ~dma_mask) != 0;
+		int mmu = high;
 		if (force_mmu && !(gfp & GFP_DMA)) 
 			mmu = 1;
 		if (no_iommu) { 
-			if (high) goto error;
+			if (high && (gfp & GFP_DMA))
+				goto error;
+			if (high) {
+				free_pages((unsigned long)memory, get_order(size));
+				gfp |= GFP_DMA;
+				goto again;
+			}
 			mmu = 0; 
 		} 	
 		memset(memory, 0, size); 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation
  2005-03-19 22:17   ` Matt Domsch
@ 2005-03-22 21:51     ` Siddha, Suresh B
  0 siblings, 0 replies; 7+ messages in thread
From: Siddha, Suresh B @ 2005-03-22 21:51 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Andi Kleen, linux-kernel, linux-scsi

On Sat, Mar 19, 2005 at 04:17:51PM -0600, Matt Domsch wrote:
> OK, then how's this for review?  Compiles clean, can't test it myself
> for a few days.
> 
> -		int high = 0, mmu;
> -		if (((unsigned long)virt_to_bus(memory) + size) > 0xffffffffUL)
> -			high = 1;
> -		mmu = high;
> +		int high = (((unsigned long)virt_to_bus(memory) + size) & ~dma_mask) != 0;
> +		int mmu = high;

Documentation/DMA-mapping.txt says consistent DMA mapping interface will always 
return SAC addressable DMA address. Your patch breaks this behavior.
(Though I don't know the reason why this behavior is expected!)

Appended is a simple 2.4 patch which will sync the behavior with 2.6

thanks,
suresh
--

Sync 2.4 pci_alloc_consistent behavior with 2.6

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>


diff -Nru linux-2.4.29/arch/ia64/lib/swiotlb.c linux-2.4.29-swiotlb/arch/ia64/lib/swiotlb.c
--- linux-2.4.29/arch/ia64/lib/swiotlb.c	2003-08-25 04:44:39.000000000 -0700
+++ linux-2.4.29-swiotlb/arch/ia64/lib/swiotlb.c	2005-03-22 10:51:21.968565920 -0800
@@ -50,13 +50,13 @@
  * Used to do a quick range check in swiotlb_unmap_single and swiotlb_sync_single, to see
  * if the memory was in fact allocated by this API.
  */
-static char *io_tlb_start, *io_tlb_end;
+char *io_tlb_start, *io_tlb_end;
 
 /*
  * The number of IO TLB blocks (in groups of 64) betweeen io_tlb_start and io_tlb_end.
  * This is command line adjustable via setup_io_tlb_npages.
  */
-static unsigned long io_tlb_nslabs = 1024;
+static unsigned long io_tlb_nslabs = 32768;
 
 /*
  * This is a free list describing the number of free entries available from each index
diff -Nru linux-2.4.29/arch/x86_64/kernel/pci-gart.c linux-2.4.29-swiotlb/arch/x86_64/kernel/pci-gart.c
--- linux-2.4.29/arch/x86_64/kernel/pci-gart.c	2004-08-07 16:26:04.000000000 -0700
+++ linux-2.4.29-swiotlb/arch/x86_64/kernel/pci-gart.c	2005-03-22 10:38:45.211610464 -0800
@@ -155,7 +155,7 @@
 	int i;
 	unsigned long iommu_page;
 
-	if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || no_iommu)
+	if (hwdev == NULL || hwdev->dma_mask < 0xffffffff || (no_iommu && !swiotlb))
 		gfp |= GFP_DMA;
 
 	/* 
@@ -174,6 +174,22 @@
 		if (force_mmu && !(gfp & GFP_DMA)) 
 			mmu = 1;
 		if (no_iommu) { 
+#ifdef CONFIG_SWIOTLB
+			if (swiotlb && high && hwdev) {
+				unsigned long dma_mask = 0;
+				if (hwdev->dma_mask == ~0UL) {
+					hwdev->dma_mask = 0xffffffff;
+					dma_mask = ~0UL;
+				}
+				*dma_handle = swiotlb_map_single(hwdev, memory, size,
+						   		 PCI_DMA_FROMDEVICE);
+				if (dma_mask)
+					hwdev->dma_mask = dma_mask;
+				memset(phys_to_virt(*dma_handle), 0, size); 
+				free_pages((unsigned long)memory, get_order(size));
+				return phys_to_virt(*dma_handle);
+			}
+#endif
 			if (high) goto error;
 			mmu = 0; 
 		} 	
@@ -218,8 +234,16 @@
 			 void *vaddr, dma_addr_t bus)
 {
 	unsigned long iommu_page;
-
+ 	extern  char *io_tlb_start, *io_tlb_end;
+ 
 	size = round_up(size, PAGE_SIZE); 
+#ifdef CONFIG_SWIOTLB
+ 	if (swiotlb && vaddr >= (void *)io_tlb_start &&
+ 	    vaddr < (void *)io_tlb_end) {
+ 		swiotlb_unmap_single (hwdev, bus, size, PCI_DMA_TODEVICE);
+ 		return;
+ 	}
+#endif
 	if (bus >= iommu_bus_base && bus < iommu_bus_base + iommu_size) { 
 		unsigned pages = size >> PAGE_SHIFT;
 		iommu_page = (bus - iommu_bus_base) >> PAGE_SHIFT;

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-03-22 21:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-18 21:23 [PATCH 2.4.30-pre3] x86_64: pci_alloc_consistent() match 2.6 implementation Matt Domsch
2005-03-19  6:09 ` Arjan van de Ven
2005-03-19 14:16   ` Matt Domsch
2005-03-19 16:27     ` Arjan van de Ven
2005-03-19 19:26 ` Andi Kleen
2005-03-19 22:17   ` Matt Domsch
2005-03-22 21:51     ` Siddha, Suresh B

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox