public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* USB storage crash report in 2.6 SMP
@ 2004-09-30 23:54 Hubert Tonneau
  2004-10-02 18:09 ` Olaf Hering
  0 siblings, 1 reply; 2+ messages in thread
From: Hubert Tonneau @ 2004-09-30 23:54 UTC (permalink / raw)
  To: linux-kernel

Copying a large amount of datas (several gigabytes) between two USB 2.0 attached
disks will crash any Linux 2.6 SMP kernel, including 2.6.9-rc3.

The stack report is:
qh_completions 0x7B/0x118 [ehci_hcd]
scan_async
ehci_work
echi_irq
handle_IRQ_event
commnon_interrupt
default_idle
default_idle
cpu_idle

The crash will append on any attempt to copy something like 100 GB.
Copying something like 1 GB or less works nicely.
Switching to an UP kernel solves the problem.
Switching to a 2.4 kernel solves the problem.
I tested it on two different machines, with two different disks sets.


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: USB storage crash report in 2.6 SMP
  2004-09-30 23:54 USB storage crash report in 2.6 SMP Hubert Tonneau
@ 2004-10-02 18:09 ` Olaf Hering
  0 siblings, 0 replies; 2+ messages in thread
From: Olaf Hering @ 2004-10-02 18:09 UTC (permalink / raw)
  To: Hubert Tonneau; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 947 bytes --]

 On Thu, Sep 30, Hubert Tonneau wrote:

> Copying a large amount of datas (several gigabytes) between two USB 2.0 attached
> disks will crash any Linux 2.6 SMP kernel, including 2.6.9-rc3.
> 
> The stack report is:
> qh_completions 0x7B/0x118 [ehci_hcd]

this was fixed a while ago, but not yet synced with Linus.

...
Maybe the call chain is something like this:

ehci_irq
spin_lock (&ehci->lock)
  ehci_work                             ehci_watchdog
    end_unlink_async
      qh_completions
        ehci_urb_done
        spin_unlock (&ehci->lock)
          usb_hcd_giveback_urb          spin_lock (&ehci->lock)
                                          ehci_work

now ehci_watchdog could proceed until usb_hcd_giveback_urb returns, then 
ehci_urb_done must wait until the watchdog is done. both seem to operate  
on the same list. I cant test it right now, box crashed.
...


-- 
USB is for mice, FireWire is for men!

sUse lINUX ag, nÜRNBERG

[-- Attachment #2: ehci-0802.patch --]
[-- Type: text/plain, Size: 1508 bytes --]

This addresses an SMP-only issue with the EHCI driver, where only one CPU
should scan the schedule at a time (scanning is not re-entrant) but either
the IRQ handler or a watchdog timer could end up starting it.  Many thanks
to Olaf Hering for isolating the failure mode!

Once once CPU starts scanning, any other might as well finish right
away.  This fix just adds a flag to detect that case.

Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>

--- 1.93/drivers/usb/host/ehci-hcd.c	Mon Aug 23 16:48:53 2004
+++ edited/ehci-hcd.c	Thu Sep  2 16:05:47 2004
@@ -695,9 +695,18 @@
 	timer_action_done (ehci, TIMER_IO_WATCHDOG);
 	if (ehci->reclaim_ready)
 		end_unlink_async (ehci, regs);
+
+	/* another CPU may drop ehci->lock during a schedule scan while
+	 * it reports urb completions.  this flag guards against bogus
+	 * attempts at re-entrant schedule scanning.
+	 */
+	if (ehci->scanning)
+		return;
+	ehci->scanning = 1;
 	scan_async (ehci, regs);
 	if (ehci->next_uframe != -1)
 		scan_periodic (ehci, regs);
+	ehci->scanning = 0;
 
 	/* the IO watchdog guards against hardware or driver bugs that
 	 * misplace IRQs, and should let us run completely without IRQs.
--- 1.43/drivers/usb/host/ehci.h	Tue Aug 24 12:18:34 2004
+++ edited/ehci.h	Thu Sep  2 15:32:30 2004
@@ -53,6 +53,7 @@
 	struct ehci_qh		*async;
 	struct ehci_qh		*reclaim;
 	unsigned		reclaim_ready : 1;
+	unsigned		scanning : 1;
 
 	/* periodic schedule support */
 #define	DEFAULT_I_TDPS		1024		/* some HCs can do less */

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-10-02 18:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-09-30 23:54 USB storage crash report in 2.6 SMP Hubert Tonneau
2004-10-02 18:09 ` Olaf Hering

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox