linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* ixp4xx dmabounce
@ 2009-09-17 21:02 Brian Walsh
  2009-09-17 21:53 ` Russell King - ARM Linux
  2009-09-22 23:42 ` Mikael Pettersson
  0 siblings, 2 replies; 14+ messages in thread
From: Brian Walsh @ 2009-09-17 21:02 UTC (permalink / raw)
  To: linux-arm-kernel

I am running into an issue with high speed USB devices running on an ixp4xx
processor.  It looks like a dmabounce related problem.  I am running on an
ixp425 with 128 MB of RAM.  I am attaching a Sierra Wireless USB phone module
which attaches as a high speed USB ethernet device.  I am running the 2.6.31
kernel version.

I found a patch which seems to be trying to address the same issue I am
having but it left the USB device unable to attach.

http://user.it.uu.se/~mikpe/linux/patches/2.6/patch-ixp4xx-disable-dmabounce-2.6.30-rc2

The relevant kernel messages with and without the patch applied are below.
The warning stack dump continues to stream out leaving the system basically
unresponsive until the data transfer completes.  Removing that stack dump
warning from dma-mapping.c the device is usable but operates with transfer
rates below that when it is attached to the full speed controller.

Any ideas or suggestions?

Thanks
Brian

---------------------------------------
Without dmabounce disable patch:
---------------------------------------

usb 6-3: new high speed USB device using ehci_hcd and address 2
usb 6-3: config 1 has an invalid interface number: 7 but max is 4
usb 6-3: config 1 has no interface number 2
usb 6-3: configuration #1 chosen from 1 choice
usb0: register 'sierra_net' at usb-0000:00:0f.2-3, Sierra Wireless
USB-Ethernet Modem, 56:1f:34:de:01:07
usbcore: registered new interface driver sierra_net
udev: renamed network interface usb0 to phone0
usbcore: registered new interface driver usbserial
USB Serial support registered for generic
usbcore: registered new interface driver usbserial_generic
usbserial: USB Serial Driver core
USB Serial support registered for Sierra USB modem
sierra 6-3:1.0: Sierra USB modem converter detected
usb 6-3: Sierra USB modem converter now attached to ttyUSB0
sierra 6-3:1.1: Sierra USB modem converter detected
usb 6-3: Sierra USB modem converter now attached to ttyUSB1
sierra 6-3:1.3: Sierra USB modem converter detected
usb 6-3: Sierra USB modem converter now attached to ttyUSB2
sierra 6-3:1.4: Sierra USB modem converter detected
usb 6-3: Sierra USB modem converter now attached to ttyUSB3
usbcore: registered new interface driver sierra
sierra: v.1.7.12:USB Driver for Sierra Wireless USB modems



------------[ cut here ]------------
WARNING: at arch/arm/mm/dma-mapping.c:369 dma_free_coherent+0x2c/0x218()
Modules linked in: af_packet ipv6 sierra usbserial pxa25x_udc
ixp4xx_eth ixp4xx_npe ixp4xx_qmgr libphy ehci_hcd sierra_net usbnetmii
ohci_hcd usb_storage usbcore sg scsi_mod
Backtrace:
[<c0021018>] (dump_backtrace+0x0/0x100) from [<c0021130>] (dump_stack+0x18/0x1c)
 r7:00000000 r6:c0022c68 r5:c0216a00 r4:00000171
[<c0021118>] (dump_stack+0x0/0x1c) from [<c002ee28>]
(warn_slowpath_common+0x4c/0x64)
[<c002eddc>] (warn_slowpath_common+0x0/0x64) from [<c002ee58>]
(warn_slowpath_null+0x18/0x1c)
 r7:00002000 r6:00002000 r5:ffc43000 r4:c7812c58
[<c002ee40>] (warn_slowpath_null+0x0/0x1c) from [<c0022c68>]
(dma_free_coherent+0x2c/0x218)
[<c0022c3c>] (dma_free_coherent+0x0/0x218) from [<c0025e18>]
(dma_unmap_single+0xec/0x110)
[<c0025d2c>] (dma_unmap_single+0x0/0x110) from [<bf06cb64>]
(unmap_urb_for_dma+0xd8/0x10c [usbcore])
 r8:c68f7e00 r7:ffc0d120 r6:c7a04a00 r5:c7a04a00 r4:c68f7d80
[<bf06ca8c>] (unmap_urb_for_dma+0x0/0x10c [usbcore]) from [<bf06cc04>]
(usb_hcd_giveback_urb+0x6c/0xf4 [usbcore])
 r5:00000000 r4:c68f7d80
[<bf06cb98>] (usb_hcd_giveback_urb+0x0/0xf4 [usbcore]) from
[<bf0c700c>] (ehci_urb_done+0xa0/0xa4 [ehci_hcd])
 r6:c68f7d80 r5:00000000 r4:c7a04a00
[<bf0c6f6c>] (ehci_urb_done+0x0/0xa4 [ehci_hcd]) from [<bf0c88ac>]
(qh_completions+0xb8/0x534 [ehci_hcd])
 r6:c7a04ad8 r5:ffc0c300 r4:00000000
[<bf0c87f4>] (qh_completions+0x0/0x534 [ehci_hcd]) from [<bf0c9ee4>]
(ehci_work+0x12c/0xa84 [ehci_hcd])
[<bf0c9db8>] (ehci_work+0x0/0xa84 [ehci_hcd]) from [<bf0cb66c>]
(ehci_irq+0x32c/0x354 [ehci_hcd])
[<bf0cb340>] (ehci_irq+0x0/0x354 [ehci_hcd]) from [<bf06dca8>]
(usb_hcd_irq+0x4c/0x9c [usbcore])
[<bf06dc5c>] (usb_hcd_irq+0x0/0x9c [usbcore]) from [<c005d938>]
(handle_IRQ_event+0x7c/0x188)
 r5:c7bc66c0 r4:c0252234
[<c005d8bc>] (handle_IRQ_event+0x0/0x188) from [<c005f2f0>]
(handle_level_irq+0x94/0xe4)
[<c005f25c>] (handle_level_irq+0x0/0xe4) from [<c001d06c>] (_text+0x6c/0x84)
 r5:c0254fac r4:00000017
[<c001d000>] (_text+0x0/0x84) from [<c01b5bc4>] (__irq_svc+0x24/0x60)
Exception stack(0xc024bf58 to 0xc024bfa0)
bf40:                                                       c0215a88 000000ab
bf60: 00000000 60000013 c024a000 c025d99c c001be08 c024e23c 0001a23c 690541c1
bf80: 0001a138 c024bfac c024bfb0 c024bfa0 c001e514 c001e4ac 60000013 ffffffff
 r6:00800000 r5:0000001f r4:ffffffff
[<c001e484>] (default_idle+0x0/0x2c) from [<c001e514>] (cpu_idle+0x64/0x9c)
[<c001e4b0>] (cpu_idle+0x0/0x9c) from [<c01af5b4>] (rest_init+0x58/0x6c)
 r4:c028dbf4
[<c01af55c>] (rest_init+0x0/0x6c) from [<c0008930>] (start_kernel+0x268/0x2d4)
[<c00086c8>] (start_kernel+0x0/0x2d4) from [<00008034>] (0x8034)
 r6:c001be04 r5:c025d9f8 r4:000039fd
---[ end trace fe6886bfb3a3248e ]---



---------------------------------------
With dmabounce disable patch:
---------------------------------------

usb 6-3: new high speed USB device using ehci_hcd and address 2
ehci_hcd 0000:00:0f.2: fatal error
ehci_hcd 0000:00:0f.2: HC died; cleaning up
hub 6-0:1.0: cannot reset port 3 (err = -19)
hub 6-0:1.0: cannot disable port 3 (err = -19)
hub 6-0:1.0: cannot reset port 3 (err = -19)
hub 6-0:1.0: cannot disable port 3 (err = -19)
hub 6-0:1.0: cannot reset port 3 (err = -19)
hub 6-0:1.0: cannot disable port 3 (err = -19)
hub 6-0:1.0: cannot reset port 3 (err = -19)
hub 6-0:1.0: cannot disable port 3 (err = -19)
hub 6-0:1.0: cannot disable port 3 (err = -19)
usb 3-2: new full speed USB device using ohci_hcd and address 2
usb 3-2: device descriptor read/64, error -62
usb 3-2: device descriptor read/64, error -62
usb 3-2: new full speed USB device using ohci_hcd and address 3
usb 3-2: device descriptor read/64, error -62
usb 3-2: device descriptor read/64, error -62
usb 3-2: new full speed USB device using ohci_hcd and address 4
usb 3-2: device not accepting address 4, error -62
usb 3-2: new full speed USB device using ohci_hcd and address 5
usb 3-2: device not accepting address 5, error -62
hub 3-0:1.0: unable to enumerate USB device on port 2

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-17 21:02 ixp4xx dmabounce Brian Walsh
@ 2009-09-17 21:53 ` Russell King - ARM Linux
  2009-09-22 22:02   ` Brian Walsh
  2009-09-22 23:42 ` Mikael Pettersson
  1 sibling, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2009-09-17 21:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 17, 2009 at 05:02:59PM -0400, Brian Walsh wrote:
> Any ideas or suggestions?

It's caused because we don't allow dma_free_coherent() to be called from
IRQ context (which is reasonable because it needs to flush TLBs across
all processors on SMP systems.)

Unfortunately, with the DMA bounce code enabled, this function does get
called from IRQ context, and so tends to spit out these warnings.

I did have a patch which made dma_free_coherent() lazy, but it was
reported that the suffered disk corruption (though it was never
conclusive whether it was caused by the patch or not.)  Here's an
updated version of that patch.

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index db7b3e3..2d1dcb0 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -16,6 +16,7 @@
 #include <linux/list.h>
 #include <linux/init.h>
 #include <linux/device.h>
+#include <linux/workqueue.h>
 #include <linux/dma-mapping.h>
 
 #include <asm/memory.h>
@@ -68,7 +69,6 @@
  * These are the page tables (2MB each) covering uncached, DMA consistent allocations
  */
 static pte_t *consistent_pte[NUM_CONSISTENT_PTES];
-static DEFINE_SPINLOCK(consistent_lock);
 
 /*
  * VM region handling support.
@@ -101,20 +101,24 @@ static DEFINE_SPINLOCK(consistent_lock);
  */
 struct arm_vm_region {
 	struct list_head	vm_list;
+	struct list_head	vm_gc;
 	unsigned long		vm_start;
 	unsigned long		vm_end;
 	struct page		*vm_pages;
 	int			vm_active;
 };
 
-static struct arm_vm_region consistent_head = {
-	.vm_list	= LIST_HEAD_INIT(consistent_head.vm_list),
-	.vm_start	= CONSISTENT_BASE,
-	.vm_end		= CONSISTENT_END,
-};
+struct arm_vm_region_head {
+	spinlock_t		vm_lock;
+	struct list_head	vm_list;
+	struct list_head	vm_gc;
+	unsigned long		vm_start;
+	unsigned long		vm_end;
+	struct work_struct	work;
+};	
 
 static struct arm_vm_region *
-arm_vm_region_alloc(struct arm_vm_region *head, size_t size, gfp_t gfp)
+arm_vm_region_alloc(struct arm_vm_region_head *head, size_t size, gfp_t gfp)
 {
 	unsigned long addr = head->vm_start, end = head->vm_end - size;
 	unsigned long flags;
@@ -124,7 +128,7 @@ arm_vm_region_alloc(struct arm_vm_region *head, size_t size, gfp_t gfp)
 	if (!new)
 		goto out;
 
-	spin_lock_irqsave(&consistent_lock, flags);
+	spin_lock_irqsave(&head->vm_lock, flags);
 
 	list_for_each_entry(c, &head->vm_list, vm_list) {
 		if ((addr + size) < addr)
@@ -145,17 +149,17 @@ arm_vm_region_alloc(struct arm_vm_region *head, size_t size, gfp_t gfp)
 	new->vm_end = addr + size;
 	new->vm_active = 1;
 
-	spin_unlock_irqrestore(&consistent_lock, flags);
+	spin_unlock_irqrestore(&head->vm_lock, flags);
 	return new;
 
  nospc:
-	spin_unlock_irqrestore(&consistent_lock, flags);
+	spin_unlock_irqrestore(&head->vm_lock, flags);
 	kfree(new);
  out:
 	return NULL;
 }
 
-static struct arm_vm_region *arm_vm_region_find(struct arm_vm_region *head, unsigned long addr)
+static struct arm_vm_region *arm_vm_region_find(struct arm_vm_region_head *head, unsigned long addr)
 {
 	struct arm_vm_region *c;
 	
@@ -168,10 +172,114 @@ static struct arm_vm_region *arm_vm_region_find(struct arm_vm_region *head, unsigned long ad
 	return c;
 }
 
+static void __dma_free(struct arm_vm_region *region);
+
+/*
+ * GC the region.  Walk the gc list, and free each entry.  This is done
+ * in process context, so __dma_free() can sleep as required.  Only
+ * after __dma_free() has completed do we take it off the active vm_list,
+ * at which point the region becomes available for further allocations.
+ */
+static void arm_vm_region_gc(struct work_struct *work)
+{
+	struct arm_vm_region_head *head = container_of(work, struct arm_vm_region_head, work);
+	unsigned long flags;
+	struct list_head h;
+	struct arm_vm_region *region, *tmp;
+
+	spin_lock_irqsave(&head->vm_lock, flags);
+	list_replace_init(&head->vm_gc, &h);
+	spin_unlock_irqrestore(&head->vm_lock, flags);
+
+	list_for_each_entry_safe(region, tmp, &h, vm_gc) {
+		__dma_free(region);
+
+		flush_tlb_kernel_range(region->vm_start, region->vm_end);
+
+		spin_lock_irqsave(&head->vm_lock, flags);
+		list_del(&region->vm_list);
+		spin_unlock_irqrestore(&head->vm_lock, flags);
+
+		kfree(region);
+	}
+}
+
+/*
+ * Mark the region not in use, and place it on to the gc list.
+ * Note: we leave the region on the active vm_list until the
+ * region is actually free, so we avoid reallocating the region.
+ */
+static struct arm_vm_region *vm_region_free(struct arm_vm_region_head *head, unsigned long addr)
+{
+	unsigned long flags;
+	struct arm_vm_region *c;
+
+	spin_lock_irqsave(&head->vm_lock, flags);
+	c = vm_region_find(head, addr);
+	if (c) {
+		c->vm_active = 0;
+		list_add(&c->vm_gc, &head->vm_gc);
+		schedule_work(&head->work);
+	}
+	spin_unlock_irqrestore(&head->vm_lock, flags);
+
+	return c;
+}
+
+static struct arm_vm_region_head consistent_head = {
+	.vm_lock	= __SPIN_LOCK_UNLOCKED(&consistent_head.vm_lock),
+	.vm_list	= LIST_HEAD_INIT(consistent_head.vm_list),
+	.vm_gc		= LIST_HEAD_INIT(consistent_head.vm_gc),
+	.vm_start	= CONSISTENT_BASE,
+	.vm_end		= CONSISTENT_END,
+	.work		= __WORK_INITIALIZER(consistent_head.work, arm_vm_region_gc),
+};
+
 #ifdef CONFIG_HUGETLB_PAGE
 #error ARM Coherent DMA allocator does not (yet) support huge TLB
 #endif
 
+static void __dma_free(struct arm_vm_region *region)
+{
+	unsigned long addr = region->vm_start;
+	pte_t *ptep;
+	int idx = CONSISTENT_PTE_INDEX(addr);
+	u32 off = CONSISTENT_OFFSET(addr) & (PTRS_PER_PTE-1);
+
+	ptep = consistent_pte[idx] + off;
+	do {
+		pte_t pte = ptep_get_and_clear(&init_mm, addr, ptep);
+		unsigned long pfn;
+
+		ptep++;
+		addr += PAGE_SIZE;
+		off++;
+		if (off >= PTRS_PER_PTE) {
+			off = 0;
+			ptep = consistent_pte[++idx];
+		}
+
+		if (!pte_none(pte) && pte_present(pte)) {
+			pfn = pte_pfn(pte);
+
+			if (pfn_valid(pfn)) {
+				struct page *page = pfn_to_page(pfn);
+
+				/*
+				 * x86 does not mark the pages reserved...
+				 */
+				ClearPageReserved(page);
+
+				__free_page(page);
+				continue;
+			}
+		}
+
+		printk(KERN_CRIT "%s: bad page in kernel page table\n",
+		       __func__);
+	} while (addr != region->vm_end);
+}
+
 static void *
 __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp,
 	    pgprot_t prot)
@@ -354,9 +462,9 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
 
 	user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
 
-	spin_lock_irqsave(&consistent_lock, flags);
+	spin_lock_irqsave(&consistent_head.vm_lock, flags);
 	c = arm_vm_region_find(&consistent_head, (unsigned long)cpu_addr);
-	spin_unlock_irqrestore(&consistent_lock, flags);
+	spin_unlock_irqrestore(&consistent_head.vm_lock, flags);
 
 	if (c) {
 		unsigned long off = vma->vm_pgoff;
@@ -400,12 +508,6 @@ EXPORT_SYMBOL(dma_mmap_writecombine);
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)
 {
 	struct arm_vm_region *c;
-	unsigned long flags, addr;
-	pte_t *ptep;
-	int idx;
-	u32 off;
-
-	WARN_ON(irqs_disabled());
 
 	if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
 		return;
@@ -415,73 +517,20 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
 		return;
 	}
 
-	size = PAGE_ALIGN(size);
-
-	spin_lock_irqsave(&consistent_lock, flags);
-	c = arm_vm_region_find(&consistent_head, (unsigned long)cpu_addr);
-	if (!c)
-		goto no_area;
-
-	c->vm_active = 0;
-	spin_unlock_irqrestore(&consistent_lock, flags);
+	c = arm_vm_region_free(&consistent_head, (unsigned long)cpu_addr);
+	if (!c) {
+		printk(KERN_ERR "%s: trying to free invalid coherent area: %p\n",
+		       __func__, cpu_addr);
+		dump_stack();
+		return;
+	}
 
+	size = PAGE_ALIGN(size);
 	if ((c->vm_end - c->vm_start) != size) {
 		printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n",
 		       __func__, c->vm_end - c->vm_start, size);
 		dump_stack();
-		size = c->vm_end - c->vm_start;
 	}
-
-	idx = CONSISTENT_PTE_INDEX(c->vm_start);
-	off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
-	ptep = consistent_pte[idx] + off;
-	addr = c->vm_start;
-	do {
-		pte_t pte = ptep_get_and_clear(&init_mm, addr, ptep);
-		unsigned long pfn;
-
-		ptep++;
-		addr += PAGE_SIZE;
-		off++;
-		if (off >= PTRS_PER_PTE) {
-			off = 0;
-			ptep = consistent_pte[++idx];
-		}
-
-		if (!pte_none(pte) && pte_present(pte)) {
-			pfn = pte_pfn(pte);
-
-			if (pfn_valid(pfn)) {
-				struct page *page = pfn_to_page(pfn);
-
-				/*
-				 * x86 does not mark the pages reserved...
-				 */
-				ClearPageReserved(page);
-
-				__free_page(page);
-				continue;
-			}
-		}
-
-		printk(KERN_CRIT "%s: bad page in kernel page table\n",
-		       __func__);
-	} while (size -= PAGE_SIZE);
-
-	flush_tlb_kernel_range(c->vm_start, c->vm_end);
-
-	spin_lock_irqsave(&consistent_lock, flags);
-	list_del(&c->vm_list);
-	spin_unlock_irqrestore(&consistent_lock, flags);
-
-	kfree(c);
-	return;
-
- no_area:
-	spin_unlock_irqrestore(&consistent_lock, flags);
-	printk(KERN_ERR "%s: trying to free invalid coherent area: %p\n",
-	       __func__, cpu_addr);
-	dump_stack();
 }
 #else	/* !CONFIG_MMU */
 void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle)

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-17 21:53 ` Russell King - ARM Linux
@ 2009-09-22 22:02   ` Brian Walsh
  2009-09-27 16:55     ` Russell King - ARM Linux
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Walsh @ 2009-09-22 22:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 17, 2009 at 5:53 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
>
> On Thu, Sep 17, 2009 at 05:02:59PM -0400, Brian Walsh wrote:
> > Any ideas or suggestions?
>
> It's caused because we don't allow dma_free_coherent() to be called from
> IRQ context (which is reasonable because it needs to flush TLBs across
> all processors on SMP systems.)
>

I am not running on an SMP system so would this even be a problem?

>
> Unfortunately, with the DMA bounce code enabled, this function does get
> called from IRQ context, and so tends to spit out these warnings.
>
> I did have a patch which made dma_free_coherent() lazy, but it was
> reported that the suffered disk corruption (though it was never
> conclusive whether it was caused by the patch or not.) ?Here's an
> updated version of that patch.
>

I did not see any data rate improvement using this patch over just commenting
out the warning stack dump.? I am still seeing about half the data transfer rate
using the high speed ehci USB controller over the full speed ohci USB
controller.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-17 21:02 ixp4xx dmabounce Brian Walsh
  2009-09-17 21:53 ` Russell King - ARM Linux
@ 2009-09-22 23:42 ` Mikael Pettersson
  2009-09-23 14:40   ` Krzysztof Halasa
  1 sibling, 1 reply; 14+ messages in thread
From: Mikael Pettersson @ 2009-09-22 23:42 UTC (permalink / raw)
  To: linux-arm-kernel

Brian Walsh writes:
 > I am running into an issue with high speed USB devices running on an ixp4xx
 > processor.  It looks like a dmabounce related problem.  I am running on an
 > ixp425 with 128 MB of RAM.  I am attaching a Sierra Wireless USB phone module
 > which attaches as a high speed USB ethernet device.  I am running the 2.6.31
 > kernel version.
 > 
 > I found a patch which seems to be trying to address the same issue I am
 > having but it left the USB device unable to attach.
 > 
 > http://user.it.uu.se/~mikpe/linux/patches/2.6/patch-ixp4xx-disable-dmabounce-2.6.30-rc2
 > 
 > The relevant kernel messages with and without the patch applied are below.
 > The warning stack dump continues to stream out leaving the system basically
 > unresponsive until the data transfer completes.  Removing that stack dump
 > warning from dma-mapping.c the device is usable but operates with transfer
 > rates below that when it is attached to the full speed controller.
 > 
 > Any ideas or suggestions?

I strongly suspect that something on the USB or networking side
is allocating I/O buffers without observing the correct DMA APIs.
In particular, DMA buffers for a PCI device must respect that
device's dma_mask.

In my case I tested a libata-driven PCI ATA controller, and such
devices work because:
(a) they allocate buffers via the proper DMA APIs, and
(b) the block layer takes care of bouncing when necessary.

I'd add debugging code to trace what I/O buffers are being used,
and if they are invalid (above the 64MB limit), more code to see
who allocated them.

I think Krzysztof Halasa mentioned running ixp4xx devices with 128MB
RAM and a kernel hacked so kernel-private allocations would always be
served from memory below 64MB. I think he mentioned doing that because
of networking components that would ignore PCI DMA mask constraints.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-22 23:42 ` Mikael Pettersson
@ 2009-09-23 14:40   ` Krzysztof Halasa
  2009-09-24 16:02     ` Brian Walsh
  0 siblings, 1 reply; 14+ messages in thread
From: Krzysztof Halasa @ 2009-09-23 14:40 UTC (permalink / raw)
  To: linux-arm-kernel

Mikael Pettersson <mikpe@it.uu.se> writes:

> I strongly suspect that something on the USB or networking side
> is allocating I/O buffers without observing the correct DMA APIs.

At least the network stack allocates buffers ignoring the DMA masks.
The buffers may be allocated by one device (driver) and passed to
another device. The only plausible way to fix it is IMHO limiting all
skb allocations to the common mask (drivers would be free to either
handle or drop skbs outside of their mask).

This is relatively easy to implement and I'm going to try it, when time
permits.

> I think Krzysztof Halasa mentioned running ixp4xx devices with 128MB
> RAM and a kernel hacked so kernel-private allocations would always be
> served from memory below 64MB. I think he mentioned doing that because
> of networking components that would ignore PCI DMA mask constraints.

Right. This works fine for network buffers because they aren't that
large. The current patch is suboptimal, though.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-23 14:40   ` Krzysztof Halasa
@ 2009-09-24 16:02     ` Brian Walsh
  2009-09-24 16:50       ` Mikael Pettersson
  2009-09-24 16:51       ` Krzysztof Halasa
  0 siblings, 2 replies; 14+ messages in thread
From: Brian Walsh @ 2009-09-24 16:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Sep 23, 2009 at 10:40 AM, Krzysztof Halasa <khc@pm.waw.pl> wrote:
> Mikael Pettersson <mikpe@it.uu.se> writes:
>
>> I strongly suspect that something on the USB or networking side
>> is allocating I/O buffers without observing the correct DMA APIs.
>
> At least the network stack allocates buffers ignoring the DMA masks.
> The buffers may be allocated by one device (driver) and passed to
> another device. The only plausible way to fix it is IMHO limiting all
> skb allocations to the common mask (drivers would be free to either
> handle or drop skbs outside of their mask).
>
> This is relatively easy to implement and I'm going to try it, when time
> permits.
>
>> I think Krzysztof Halasa mentioned running ixp4xx devices with 128MB
>> RAM and a kernel hacked so kernel-private allocations would always be
>> served from memory below 64MB. I think he mentioned doing that because
>> of networking components that would ignore PCI DMA mask constraints.
>
> Right. This works fine for network buffers because they aren't that
> large. The current patch is suboptimal, though.
> --
> Krzysztof Halasa
>

I tried Krzysztof's patch and it had no noticeable affect.  I am still getting
about 6.3 Mbps IP data throughput when only using the ohci controller and
about 3.6 Mbps when the device is attached to the ehci controller.  This
device works fine when running the same testing attached to an x86
configured machine and gets about 18 Mbps IP data throughput.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-24 16:02     ` Brian Walsh
@ 2009-09-24 16:50       ` Mikael Pettersson
  2009-09-24 22:15         ` Brian Walsh
  2009-09-24 16:51       ` Krzysztof Halasa
  1 sibling, 1 reply; 14+ messages in thread
From: Mikael Pettersson @ 2009-09-24 16:50 UTC (permalink / raw)
  To: linux-arm-kernel

Brian Walsh writes:
 > On Wed, Sep 23, 2009 at 10:40 AM, Krzysztof Halasa <khc@pm.waw.pl> wrote:
 > > Mikael Pettersson <mikpe@it.uu.se> writes:
 > >
 > >> I strongly suspect that something on the USB or networking side
 > >> is allocating I/O buffers without observing the correct DMA APIs.
 > >
 > > At least the network stack allocates buffers ignoring the DMA masks.
 > > The buffers may be allocated by one device (driver) and passed to
 > > another device. The only plausible way to fix it is IMHO limiting all
 > > skb allocations to the common mask (drivers would be free to either
 > > handle or drop skbs outside of their mask).
 > >
 > > This is relatively easy to implement and I'm going to try it, when time
 > > permits.
 > >
 > >> I think Krzysztof Halasa mentioned running ixp4xx devices with 128MB
 > >> RAM and a kernel hacked so kernel-private allocations would always be
 > >> served from memory below 64MB. I think he mentioned doing that because
 > >> of networking components that would ignore PCI DMA mask constraints.
 > >
 > > Right. This works fine for network buffers because they aren't that
 > > large. The current patch is suboptimal, though.
 > > --
 > > Krzysztof Halasa
 > >
 > 
 > I tried Krzysztof's patch and it had no noticeable affect.  I am still getting
 > about 6.3 Mbps IP data throughput when only using the ohci controller and
 > about 3.6 Mbps when the device is attached to the ehci controller.  This
 > device works fine when running the same testing attached to an x86
 > configured machine and gets about 18 Mbps IP data throughput.

If your application can operate in 64MB RAM, you may want to try
a kernel that includes only my ixp4xx disable dmabounce patch,
and boot it with mem=64M. (Look in the kernel boot log and verify
that it only sees 64M of RAM.)

If performance increases, then your performance loss is due to bounces.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-24 16:02     ` Brian Walsh
  2009-09-24 16:50       ` Mikael Pettersson
@ 2009-09-24 16:51       ` Krzysztof Halasa
  2009-09-24 16:57         ` Brian Walsh
  1 sibling, 1 reply; 14+ messages in thread
From: Krzysztof Halasa @ 2009-09-24 16:51 UTC (permalink / raw)
  To: linux-arm-kernel

Brian Walsh <brian@walsh.ws> writes:

> I tried Krzysztof's patch and it had no noticeable affect.  I am still getting
> about 6.3 Mbps IP data throughput when only using the ohci controller and
> about 3.6 Mbps when the device is attached to the ehci controller.  This
> device works fine when running the same testing attached to an x86
> configured machine and gets about 18 Mbps IP data throughput.

I assume you've enabled CONFIG_ZONE_DMA_ALL_KERNEL :-)

Either the bouncing isn't the problem in this case, or the allocations
are GFP_USER.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-24 16:51       ` Krzysztof Halasa
@ 2009-09-24 16:57         ` Brian Walsh
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Walsh @ 2009-09-24 16:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 24, 2009 at 12:51 PM, Krzysztof Halasa <khc@pm.waw.pl> wrote:
> Brian Walsh <brian@walsh.ws> writes:
>
>> I tried Krzysztof's patch and it had no noticeable affect. ?I am still getting
>> about 6.3 Mbps IP data throughput when only using the ohci controller and
>> about 3.6 Mbps when the device is attached to the ehci controller. ?This
>> device works fine when running the same testing attached to an x86
>> configured machine and gets about 18 Mbps IP data throughput.
>
> I assume you've enabled CONFIG_ZONE_DMA_ALL_KERNEL :-)
>
> Either the bouncing isn't the problem in this case, or the allocations
> are GFP_USER.
> --
> Krzysztof Halasa
>

Ha.  Yes, I did enable that option.

I am doing my testing just by using wget to pull a file over the connection.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-24 16:50       ` Mikael Pettersson
@ 2009-09-24 22:15         ` Brian Walsh
  2009-09-24 23:34           ` Mikael Pettersson
  0 siblings, 1 reply; 14+ messages in thread
From: Brian Walsh @ 2009-09-24 22:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 24, 2009 at 12:50 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> Brian Walsh writes:
> ?> On Wed, Sep 23, 2009 at 10:40 AM, Krzysztof Halasa <khc@pm.waw.pl> wrote:
> ?> > Mikael Pettersson <mikpe@it.uu.se> writes:
> ?> >
> ?> >> I strongly suspect that something on the USB or networking side
> ?> >> is allocating I/O buffers without observing the correct DMA APIs.
> ?> >
> ?> > At least the network stack allocates buffers ignoring the DMA masks.
> ?> > The buffers may be allocated by one device (driver) and passed to
> ?> > another device. The only plausible way to fix it is IMHO limiting all
> ?> > skb allocations to the common mask (drivers would be free to either
> ?> > handle or drop skbs outside of their mask).
> ?> >
> ?> > This is relatively easy to implement and I'm going to try it, when time
> ?> > permits.
> ?> >
> ?> >> I think Krzysztof Halasa mentioned running ixp4xx devices with 128MB
> ?> >> RAM and a kernel hacked so kernel-private allocations would always be
> ?> >> served from memory below 64MB. I think he mentioned doing that because
> ?> >> of networking components that would ignore PCI DMA mask constraints.
> ?> >
> ?> > Right. This works fine for network buffers because they aren't that
> ?> > large. The current patch is suboptimal, though.
> ?> > --
> ?> > Krzysztof Halasa
> ?> >
> ?>
> ?> I tried Krzysztof's patch and it had no noticeable affect. ?I am still getting
> ?> about 6.3 Mbps IP data throughput when only using the ohci controller and
> ?> about 3.6 Mbps when the device is attached to the ehci controller. ?This
> ?> device works fine when running the same testing attached to an x86
> ?> configured machine and gets about 18 Mbps IP data throughput.
>
> If your application can operate in 64MB RAM, you may want to try
> a kernel that includes only my ixp4xx disable dmabounce patch,
> and boot it with mem=64M. (Look in the kernel boot log and verify
> that it only sees 64M of RAM.)
>
> If performance increases, then your performance loss is due to bounces.
>

Mikael

I used your patch to disable legacy bounce, disabled support for > 64MB RAM,
and used the mem=64M kernel option.  There was no change in the data
throughput.

I am not sure where this leaves me.

Brian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-24 22:15         ` Brian Walsh
@ 2009-09-24 23:34           ` Mikael Pettersson
  2009-09-24 23:43             ` Brian Walsh
  0 siblings, 1 reply; 14+ messages in thread
From: Mikael Pettersson @ 2009-09-24 23:34 UTC (permalink / raw)
  To: linux-arm-kernel

Brian Walsh wrote:
> > =C2=A0> I tried Krzysztof's patch and it had no noticeable affect. =C2=A0=
> I am still getting
> > =C2=A0> about 6.3 Mbps IP data throughput when only using the ohci contro=
> ller and
> > =C2=A0> about 3.6 Mbps when the device is attached to the ehci controller=
> . =C2=A0This
> > =C2=A0> device works fine when running the same testing attached to an x8=
> 6
> > =C2=A0> configured machine and gets about 18 Mbps IP data throughput.
> >
> > If your application can operate in 64MB RAM, you may want to try
> > a kernel that includes only my ixp4xx disable dmabounce patch,
> > and boot it with mem=3D64M. (Look in the kernel boot log and verify
> > that it only sees 64M of RAM.)
> >
> > If performance increases, then your performance loss is due to bounces.
> >
> 
> Mikael
> 
> I used your patch to disable legacy bounce, disabled support for > 64MB RAM=
> ,
> and used the mem=3D64M kernel option.  There was no change in the data
> throughput.
> 
> I am not sure where this leaves me.

To me it implies that the performance issues are unrelated to
your initial bouncing issues. Since you get better performance
from OHCI I'd have to suspect a hardware or driver issue with
your EHCI controller.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-24 23:34           ` Mikael Pettersson
@ 2009-09-24 23:43             ` Brian Walsh
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Walsh @ 2009-09-24 23:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 24, 2009 at 7:34 PM, Mikael Pettersson <mikpe@it.uu.se> wrote:
> Brian Walsh wrote:
>> > =C2=A0> I tried Krzysztof's patch and it had no noticeable affect. =C2=A0=
>> I am still getting
>> > =C2=A0> about 6.3 Mbps IP data throughput when only using the ohci contro=
>> ller and
>> > =C2=A0> about 3.6 Mbps when the device is attached to the ehci controller=
>> . =C2=A0This
>> > =C2=A0> device works fine when running the same testing attached to an x8=
>> 6
>> > =C2=A0> configured machine and gets about 18 Mbps IP data throughput.
>> >
>> > If your application can operate in 64MB RAM, you may want to try
>> > a kernel that includes only my ixp4xx disable dmabounce patch,
>> > and boot it with mem=3D64M. (Look in the kernel boot log and verify
>> > that it only sees 64M of RAM.)
>> >
>> > If performance increases, then your performance loss is due to bounces.
>> >
>>
>> Mikael
>>
>> I used your patch to disable legacy bounce, disabled support for > 64MB RAM=
>> ,
>> and used the mem=3D64M kernel option. ?There was no change in the data
>> throughput.
>>
>> I am not sure where this leaves me.
>
> To me it implies that the performance issues are unrelated to
> your initial bouncing issues. Since you get better performance
> from OHCI I'd have to suspect a hardware or driver issue with
> your EHCI controller.
>

Yes that is what I was thinking.  I just tried 3 other PCI USB
controller cards with
different chipsets.  All had the same results.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-22 22:02   ` Brian Walsh
@ 2009-09-27 16:55     ` Russell King - ARM Linux
  2009-09-29 15:16       ` Brian Walsh
  0 siblings, 1 reply; 14+ messages in thread
From: Russell King - ARM Linux @ 2009-09-27 16:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Sep 22, 2009 at 06:02:19PM -0400, Brian Walsh wrote:
> On Thu, Sep 17, 2009 at 5:53 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> >
> > On Thu, Sep 17, 2009 at 05:02:59PM -0400, Brian Walsh wrote:
> > > Any ideas or suggestions?
> >
> > It's caused because we don't allow dma_free_coherent() to be called from
> > IRQ context (which is reasonable because it needs to flush TLBs across
> > all processors on SMP systems.)
> 
> I am not running on an SMP system so would this even be a problem?

Yes - because it's there to ensure that the API is used in a consistent
way.

> > Unfortunately, with the DMA bounce code enabled, this function does get
> > called from IRQ context, and so tends to spit out these warnings.
> >
> > I did have a patch which made dma_free_coherent() lazy, but it was
> > reported that the suffered disk corruption (though it was never
> > conclusive whether it was caused by the patch or not.) ?Here's an
> > updated version of that patch.
> >
> 
> I did not see any data rate improvement using this patch over just commenting
> out the warning stack dump.? I am still seeing about half the data transfer rate
> using the high speed ehci USB controller over the full speed ohci USB
> controller.

I didn't suggest it would improve the data rate - only that it should
fix the stack dump.

I'd really like to get this patch properly tested and confirmed that it
does _not_ actually cause corruption, so that I can get it merged.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ixp4xx dmabounce
  2009-09-27 16:55     ` Russell King - ARM Linux
@ 2009-09-29 15:16       ` Brian Walsh
  0 siblings, 0 replies; 14+ messages in thread
From: Brian Walsh @ 2009-09-29 15:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, Sep 27, 2009 at 12:55 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Sep 22, 2009 at 06:02:19PM -0400, Brian Walsh wrote:
>> On Thu, Sep 17, 2009 at 5:53 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> >
>> > On Thu, Sep 17, 2009 at 05:02:59PM -0400, Brian Walsh wrote:
>> > > Any ideas or suggestions?
>> >
>> > It's caused because we don't allow dma_free_coherent() to be called from
>> > IRQ context (which is reasonable because it needs to flush TLBs across
>> > all processors on SMP systems.)
>>
>> I am not running on an SMP system so would this even be a problem?
>
> Yes - because it's there to ensure that the API is used in a consistent
> way.

Right, I was just referring to it being a problem in my specific
situation not in the
general case.

>
>> > Unfortunately, with the DMA bounce code enabled, this function does get
>> > called from IRQ context, and so tends to spit out these warnings.
>> >
>> > I did have a patch which made dma_free_coherent() lazy, but it was
>> > reported that the suffered disk corruption (though it was never
>> > conclusive whether it was caused by the patch or not.) ?Here's an
>> > updated version of that patch.
>> >
>>
>> I did not see any data rate improvement using this patch over just commenting
>> out the warning stack dump.? I am still seeing about half the data transfer rate
>> using the high speed ehci USB controller over the full speed ohci USB
>> controller.
>
> I didn't suggest it would improve the data rate - only that it should
> fix the stack dump.
>
> I'd really like to get this patch properly tested and confirmed that it
> does _not_ actually cause corruption, so that I can get it merged.
>

I will be able to help you test this patch once I get past this issue
I am dealing with.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2009-09-29 15:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-09-17 21:02 ixp4xx dmabounce Brian Walsh
2009-09-17 21:53 ` Russell King - ARM Linux
2009-09-22 22:02   ` Brian Walsh
2009-09-27 16:55     ` Russell King - ARM Linux
2009-09-29 15:16       ` Brian Walsh
2009-09-22 23:42 ` Mikael Pettersson
2009-09-23 14:40   ` Krzysztof Halasa
2009-09-24 16:02     ` Brian Walsh
2009-09-24 16:50       ` Mikael Pettersson
2009-09-24 22:15         ` Brian Walsh
2009-09-24 23:34           ` Mikael Pettersson
2009-09-24 23:43             ` Brian Walsh
2009-09-24 16:51       ` Krzysztof Halasa
2009-09-24 16:57         ` Brian Walsh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).