Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?

public inbox for linux-ia64@vger.kernel.org
 help / color / mirror / Atom feed

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
       [not found]           ` <3FA28C9A.5010608@pacbell.net>
@ 2004-03-06  2:08             ` David Mosberger
  2004-03-06  2:13               ` David Mosberger
  2004-03-06  4:55               ` David Brownell
  0 siblings, 2 replies; 25+ messages in thread
From: David Mosberger @ 2004-03-06  2:08 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

OK, finally a bit of progress.  If you remember back in October 2003 I
reported:

 > One-line summary: plug-in your USB keyboard, see your machine die.

 > So, I have this non-name USB keyboard (with built-in 2-port USB
 > hub) which reliably crashes 2.6.0-test{8,9} on both x86 and ia64.
 > In retrospect, it's clear to me that the same keyboard also
 > occasionally crashes 2.4 kernels, but there the problem appears
 > more seldom.  Perhaps once in 10 reboots and once the machine is
 > booted and the keyboard is running, it keeps on working.  The
 > keyboard in question is a BTC 5141H.

After this, I spent a (small) amount of time looking over the HID code
etc to see what could be causing it.  I could find nothing wrong so I
gave up, connected another USB keyboard, and basically ignored the
problem.  In retrospect, that was Good Thinking, because I was
apparently looking at the wrong code: the problem _does_ appear to be
coming from the USB HCD, not from the HIDeous code.

Specifically, after upgrading to 2.6.4-rc2, _all_ of the ia64 machines
I tested would crash as soon as they had _any_ USB keyboard plugged
in.  That is, the problem no longer was limited to the BTC keyboard,
which is special because it has a built-in hub.  This was encouraging.

Turns out it's this patch that was causing the crashes:

 http://linux.bkbits.net:8080/linux-2.5/cset@1.1619.1.17

That was strange, because even to my USB-untrained eye the patch
looked obviously correct.  However, I think the root cause of the
problem really has to do with a race-condition between the controller
and the driver.  In particular, if I apply the patch below, my USB
keyboards (including the BTC keyboard) work just fine!

=== drivers/usb/host/ohci-q.c 1.48 vs edited ==--- 1.48/drivers/usb/host/ohci-q.c	Tue Mar  2 05:52:46 2004
+++ edited/drivers/usb/host/ohci-q.c	Fri Mar  5 17:25:55 2004
@@ -438,7 +451,7 @@
 	 * behave.  frame_no wraps every 2^16 msec, and changes right before
 	 * SF is triggered.
 	 */
-	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 1;
+	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 2;

 	/* rm_list is just singly linked, for simplicity */
 	ed->ed_next = ohci->ed_rm_list;

However, I think the root-cause of the problem may be this optimization
in ohci_irq():

	/* we can eliminate a (slow) readl() if _only_ WDH caused this irq */

Indeed, if I apply this patch instead:

=== drivers/usb/host/ohci-hcd.c 1.56 vs edited ==--- 1.56/drivers/usb/host/ohci-hcd.c	Tue Mar  2 05:52:40 2004
+++ edited/drivers/usb/host/ohci-hcd.c	Fri Mar  5 17:45:09 2004
@@ -584,7 +584,7 @@
  	int			ints; 

 	/* we can eliminate a (slow) readl() if _only_ WDH caused this irq */
-	if ((ohci->hcca->done_head != 0)
+	if (0 && (ohci->hcca->done_head != 0)
 			&& ! (le32_to_cpup (&ohci->hcca->done_head) & 0x01)) {
 		ints =  OHCI_INTR_WDH;

there are no crashes either.

So my theory is that I was seeing this sequence of events:

 - HCD signals WDH interrupt & sends DMA to update the frame number in
   the host-controller communication area (HCCA)

 - host gets interrupt, but skips readl() and hence reads a stale
   frame number N instead of the up-to-date value (N+1)

 - HCD cancels a transfer descriptor (TD), moves it to the "remove list"
   and calculates the frame number at which it can be remove from
   the host-controller's list as N+1

 - SOF interrupt arrives (probably was pending already?)

 - interrupt handler does a readl() and now sees the updated
   frame-number N+1

 - HCD sees that the cancelled TD's time stamp N+1 is <= the current
   current time stamp (N+1) and goes ahead and removes it from the
   host-list, while the controller is still looking at the entry being
   removed

 - HCD ends up dereferencing a bad pointer and ends up reading from
   address 0xf0000000, which on our ia64 machines is a read-only area,
   which then results in a machine-check abort

Does this sound plausible?

What beats me is why UHCI would have the same issue.  I know even less
about UHCI than I do about OHCI but perhaps there is a similar
problem.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  2:08             ` [linux-usb-devel] Re: serious 2.6 bug in USB subsystem? David Mosberger
@ 2004-03-06  2:13               ` David Mosberger
  2004-03-06  4:55               ` David Brownell
  1 sibling, 0 replies; 25+ messages in thread
From: David Mosberger @ 2004-03-06  2:13 UTC (permalink / raw)
  To: davidm
  Cc: David Brownell, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

Typo-alert:

>>>>> On Fri, 5 Mar 2004 18:08:40 -0800, David Mosberger <davidm@linux.hpl.hp.com> said:

  David>  - HCD ends up dereferencing a bad pointer and ends up
  David> reading from address 0xf0000000, which on our ia64 machines
  David> is a read-only area, which then results in a machine-check
  David> abort
              ^^^^^^^^^
              make that "write-only"

  --david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  2:08             ` [linux-usb-devel] Re: serious 2.6 bug in USB subsystem? David Mosberger
  2004-03-06  2:13               ` David Mosberger
@ 2004-03-06  4:55               ` David Brownell
  2004-03-06  5:49                 ` David Mosberger
  2004-03-06  9:17                 ` David Mosberger
  1 sibling, 2 replies; 25+ messages in thread
From: David Brownell @ 2004-03-06  4:55 UTC (permalink / raw)
  To: davidm; +Cc: Greg KH, vojtech, linux-usb-devel, linux-kernel, linux-ia64,
	pochini

David Mosberger wrote:
> OK, finally a bit of progress.  If you remember back in October 2003 I
> reported:
> 
>  > One-line summary: plug-in your USB keyboard, see your machine die.
> 
>  > So, I have this non-name USB keyboard (with built-in 2-port USB
>  > hub) which reliably crashes 2.6.0-test{8,9} on both x86 and ia64.
>  > In retrospect, it's clear to me that the same keyboard also
>  > occasionally crashes 2.4 kernels, but there the problem appears
>  > more seldom.  
> 
> Specifically, after upgrading to 2.6.4-rc2, _all_ of the ia64 machines
> I tested would crash as soon as they had _any_ USB keyboard plugged
> in.  That is, the problem no longer was limited to the BTC keyboard,
> which is special because it has a built-in hub.  This was encouraging.
> 
> Turns out it's this patch that was causing the crashes:
> 
>  http://linux.bkbits.net:8080/linux-2.5/cset@1.1619.1.17

Maybe in 2.6.4-rc2... but not in 2.6.0-test{8,9}!!

There's something wierd going on recently for sure, and it's
not caused just by that patch.  I'm not sure reverting that
would make things better overall right now; hard to say.

I'm thinking the "disable periodic schedule" patch is also
worth looking at.  I can imagine some silicon having bugs
if that's turned off (just like some _systems_ have issues
when it's left on).

> That was strange, because even to my USB-untrained eye the patch
> looked obviously correct.  However, I think the root cause of the
> problem really has to do with a race-condition between the controller
> and the driver.  In particular, if I apply the patch below, my USB
> keyboards (including the BTC keyboard) work just fine!
> 
> ...
> -	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 1;
> +	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 2;

I've seen that _change_ behavior in some regression tests
(usbtest test11/test12), as if the extra msec let things
quiesce (so only one of two broken states showed, and
not the oopsable one) but not _fix_ it.

> However, I think the root-cause of the problem may be this optimization
> in ohci_irq():
> 
> 	/* we can eliminate a (slow) readl() if _only_ WDH caused this irq */
> 
> Indeed, if I apply this patch instead:
> 
> ...
>  	/* we can eliminate a (slow) readl() if _only_ WDH caused this irq */
> -	if ((ohci->hcca->done_head != 0)
> +	if (0 && (ohci->hcca->done_head != 0)
> ...
> 
> there are no crashes either.

See this post from Martin Diehl ... my response isn't out yet:

   http://marc.theaimsgroup.com/?l=linux-usb-devel&m\x107850825815775&w=2

The reason I keep ending up thinking that readl-elimination
must be OK (me agreeing with Martin) is that the memory there
came from dma_alloc_coherent() ... so if anything's wrong,
it'd be at most lack of rmb(), not a stale-cache kind of thing.

> So my theory is that I was seeing this sequence of events:
> 
>  - HCD signals WDH interrupt & sends DMA to update the frame number in
>    the host-controller communication area (HCCA)
> 
>  - host gets interrupt, but skips readl() and hence reads a stale
>    frame number N instead of the up-to-date value (N+1)

It reads the frame number directly from the controller, so it's
not possible that it can be so stale that an rmb() wouldn't fix
sequencing issues.

What might be possible though is that the donelist gets modified
by the time the unlinks get processed, with some extra TDs changing
state (from HC perspective) ... haven't explored that possibility.

>  - HCD cancels a transfer descriptor (TD), moves it to the "remove list"
>    and calculates the frame number at which it can be remove from
>    the host-controller's list as N+1

That'd be an ED on the remove list, not a TD.  Also in dma-coherent
memory.  The cancelation would apply to one or more of the TDs
queued to that ED.

>  - SOF interrupt arrives (probably was pending already?)
> 
>  - interrupt handler does a readl() and now sees the updated
>    frame-number N+1
> 
>  - HCD sees that the cancelled TD's time stamp N+1 is <= the current
>    current time stamp (N+1) and goes ahead and removes it from the
>    host-list, while the controller is still looking at the entry being
>    removed

This ED-level disagreement between the HC and HCD might explain
some issues.  I think the current trouble cases are usually where
there's only one TD queued to the ED, so that TD and the dummy
keep ping-ponging back and forth.  The HC certainly seems to
overwrite the "dummy TD" --

>  - HCD ends up dereferencing a bad pointer and ends up reading from
>    address 0xf0000000, which on our ia64 machines is a read-only area,
>    which then results in a machine-check abort

I'm surprised that DMA from a read-only area would be a problem!  :)
If OHCI is getting a PCI error, I'd expect a "UE" IRQ.

> Does this sound plausible?

Parts of it.  There's definite recent nastiness.  Of the type
that other eyes sometimes see better.

> What beats me is why UHCI would have the same issue.  I know even less
> about UHCI than I do about OHCI but perhaps there is a similar
> problem.

I still suspect some problem in the HID code.  But right now
I'm quite certain of a recent-ish OHCI issue.

- Dave

> 
> 	--david
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  4:55               ` David Brownell
@ 2004-03-06  5:49                 ` David Mosberger
  2004-03-06  7:21                   ` David Mosberger
  2004-03-06 16:37                   ` David Brownell
  2004-03-06  9:17                 ` David Mosberger
  1 sibling, 2 replies; 25+ messages in thread
From: David Mosberger @ 2004-03-06  5:49 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

>>>>> On Fri, 05 Mar 2004 20:55:01 -0800, David Brownell <david-b@pacbell.net> said:
  >> Turns out it's this patch that was causing the crashes:

  >> http://linux.bkbits.net:8080/linux-2.5/cset@1.1619.1.17

  David.B> Maybe in 2.6.4-rc2... but not in 2.6.0-test{8,9}!!

Of course.  What I'm saying is that in 2.6.0-test{8,9} it was rare to
trigger the problem (only with BTC keyboard) and the change above made
it trivial to trigger the keyboard.  Basically, your fix in
cset 1.1619.1.17 made it more common for stuff to be unlinked in
the "deferred" (proper) manner and that made it much more likely to
trigger the bug.

  David.B> The reason I keep ending up thinking that readl-elimination
  David.B> must be OK (me agreeing with Martin) is that the memory
  David.B> there came from dma_alloc_coherent() ... so if anything's
  David.B> wrong, it'd be at most lack of rmb(), not a stale-cache
  David.B> kind of thing.

It's not an issue of DMA coherency, it's an issue of DMA vs. interrupt
ordering.  I believe the WHD interrupt is arriving at the CPU before
the DMA update to the HCCA is done.  In my second patch, the readl()
at the beginning of the interrupt ensures that the DMA update to
the HCCA is completed before the readl() returns data.

  David.B> It reads the frame number directly from the controller, so
  David.B> it's not possible that it can be so stale that an rmb()
  David.B> wouldn't fix sequencing issues.

Eh, it's read like this:

   #define OHCI_FRAME_NO(hccap) ((u16)le32_to_cpup(&(hccap)->frame_no))

   finish_unlinks (ohci, OHCI_FRAME_NO(ohci->hcca), ptregs);

The HCCA is in host memory.

  >> - HCD ends up dereferencing a bad pointer and ends up reading
  >> from address 0xf0000000, which on our ia64 machines is a
  >> read-only area, which then results in a machine-check abort

  David.B> I'm surprised that DMA from a read-only area would be a
  David.B> problem!  :) If OHCI is getting a PCI error, I'd expect a
  David.B> "UE" IRQ.

You must have not received my follow-up message.  There was a typo in
my message: it was supposed to say "write-only" area.

  David.B> I still suspect some problem in the HID code.  But right
  David.B> now I'm quite certain of a recent-ish OHCI issue.

I'm 99% certain that the problem I saw back in October (BTC keyboard)
was identical to the one triggered by cset 1.1619.1.17.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  5:49                 ` David Mosberger
@ 2004-03-06  7:21                   ` David Mosberger
  2004-03-06  8:39                     ` David Mosberger
  2004-03-06 16:37                   ` David Brownell
  1 sibling, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-06  7:21 UTC (permalink / raw)
  To: David Brownell, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini, davidm

>>>>> On Fri, 5 Mar 2004 21:49:20 -0800, David Mosberger <davidm@linux.hpl.hp.com> said:

  David> It's not an issue of DMA coherency, it's an issue of DMA
  David> vs. interrupt ordering.  I believe the WHD interrupt is
  David> arriving at the CPU before the DMA update to the HCCA is
  David> done.

Actually, it looks like I misunderstood the OHCI spec on first reading.
It seems like the causal relationship goes like this:

 (1) Start of Frame -> (2) update HccaFrameNumber -> (3) trigger SF interrupt

Now, suppose you get a WDH interrupt between (1) and (2).  You'd read
the old frame-number yet by the time the interrupt from (3) arrives
the HC might already be accessing the ED that you're about to remove.

If this is correct, then the first patch is probably a better
approach:

=== drivers/usb/host/ohci-q.c 1.48 vs edited ==--- 1.48/drivers/usb/host/ohci-q.c	Tue Mar  2 05:52:46 2004
+++ edited/drivers/usb/host/ohci-q.c	Fri Mar  5 17:25:55 2004
@@ -438,7 +451,7 @@
 	 * behave.  frame_no wraps every 2^16 msec, and changes right before
 	 * SF is triggered.
 	 */
-	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 1;
+	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 2;

 	/* rm_list is just singly linked, for simplicity */
 	ed->ed_next = ohci->ed_rm_list;

This actually makes tons of sense if you think of it like jiffies: you
need to make sure you delay at least one full frame-interval.  If you
set the tick to "+ 1" and the current tick is almost over, that
requirement is violated.  Setting it to "+ 2" should be safe.  The
only problem I can think of is if the delay between point (1) and (2)
were to exceed one frame-interval (1 msec).  While unlikely, the right
PCI topology and heavy bus traffic perhaps could cause such delays.
However, even then it's probably OK because the HC would presumably
stall when trying to update the HccaFrameNumber the second time and
the previous update hasn't completed yet.

Here is one little piece of evidence that's consistent with this
explanation: last week I tried to rip some audio tracks off a CD.
With PIO, this caused interrupts to get delayed 2-3msec and that
caused all kinds of weird effects on the USB bus.  Mostly, I'd
suddenly lose the keyboard or the mouse, though reconnecting them
would "fix" the problem for a short time.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  7:21                   ` David Mosberger
@ 2004-03-06  8:39                     ` David Mosberger
  0 siblings, 0 replies; 25+ messages in thread
From: David Mosberger @ 2004-03-06  8:39 UTC (permalink / raw)
  To: davidm
  Cc: David Brownell, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

>>>>> On Fri, 5 Mar 2004 23:21:32 -0800, David Mosberger <davidm@linux.hpl.hp.com> said:

  David>  (1) Start of Frame -> (2) update HccaFrameNumber -> (3)
  David> trigger SF interrupt

  David> Now, suppose you get a WDH interrupt between (1) and (2).
  David> You'd read the old frame-number yet by the time the interrupt
  David> from (3) arrives the HC might already be accessing the ED
  David> that you're about to remove.

Sorry for the monologue---trying to learn how this is all supposed to
work...

The OHCI spec says that HccaFrameNumber is updated in this fashion:

 (a) send Start-of-Frame
 (b) HccaFrameNumber <- HcFmNumber.StartingFrame
 (c) start processing ED (& post SF intr if requested)

Since start_ed_unlink() uses the following sequence:

 (1) ed->hwINFO |= ED_DEQUEUE
 (2) ed->tick = OHCI_FRAME_NO(ohci->hcca) + 1

Then as long as (1) is observed by the HC before (2) (which it should
be), the race I described isn't possible: if (2) read the "old"
frame-number, then the HC wouldn't have started step (c) yet and hence
the HC would observe step (1) and notice that the ED is being
dequeued.  Converseley, if the HC started to process the ED before (1)
completed (i.e., it missed the ED_DEQUEUE flag), then step (2) must
have been reading the the new frame-number.

OK, I see now the conundrum...

BTW: does the value 0xf0000000 bear any special meaning in USB?  We
already considered whether this would be a NULL-pointer after I/O MMU
translation but it is not: the I/O MMU window is at
0x40000000-0x80000000 on the machines in question.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  5:49                 ` David Mosberger
  2004-03-06  7:21                   ` David Mosberger
@ 2004-03-06 16:37                   ` David Brownell
  2004-03-08  6:18                     ` Grant Grundler
  1 sibling, 1 reply; 25+ messages in thread
From: David Brownell @ 2004-03-06 16:37 UTC (permalink / raw)
  To: davidm; +Cc: Greg KH, vojtech, linux-usb-devel, linux-kernel, linux-ia64,
	pochini

David Mosberger wrote:

>   David.B> The reason I keep ending up thinking that readl-elimination
>   David.B> must be OK (me agreeing with Martin) is that the memory
>   David.B> there came from dma_alloc_coherent() ... so if anything's
>   David.B> wrong, it'd be at most lack of rmb(), not a stale-cache
>   David.B> kind of thing.
> 
> It's not an issue of DMA coherency, it's an issue of DMA vs. interrupt
> ordering.  I believe the WHD interrupt is arriving at the CPU before

Which is what I sketched to Martin, as the reason to be interested
in a patch that was equivalent to your second patch.

Unfortunately that doesn't check out as being "the" fix here.
Martin still saw the problem.  (And I don't see how it'd be
what gave me several hard lockups ... but it didn't work either,
and the day before, without that change, no lockup.)

> the DMA update to the HCCA is done.  In my second patch, the readl()
> at the beginning of the interrupt ensures that the DMA update to
> the HCCA is completed before the readl() returns data.

DMA-coherent memory is defined as "memory for which a write by either
the device or the processor can immediately be read by the processor
or device without having to worry about caching effects."  Such a
write-buffering mechanism is clearly a type of (write-)caching effect,
and readl() would be a kind of dma_rmb(), if you will.  I suspect the
docs are wrong about what dma-coherent means.

> If this is correct, then the first patch is probably a better
> approach:
> 
> ...
> -	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 1;
> +	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 2;
> ..
> 
> This actually makes tons of sense if you think of it like jiffies: you
> need to make sure you delay at least one full frame-interval.  

Right, I had that theory too.  As I reported, it doesn't check out
as the only issue.  Plus, even assuming it's correct, adding another
millisecond slows down some common operations even more.  I'd rather
just slow down just the "right" paths, rather than all of them ... :)

- Dave

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06 16:37                   ` David Brownell
@ 2004-03-08  6:18                     ` Grant Grundler
  2004-03-08 18:58                       ` David Mosberger
  0 siblings, 1 reply; 25+ messages in thread
From: Grant Grundler @ 2004-03-08  6:18 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

On Sat, Mar 06, 2004 at 08:37:43AM -0800, David Brownell wrote:
> DMA-coherent memory is defined as "memory for which a write by either
> the device or the processor can immediately be read by the processor
> or device without having to worry about caching effects."

The use of "immediate" here means no other sync function needs to
be called to access the data - ie don't need to call pci_sync_single().

In general, the accesses are ordered following PCI ordering rules.
But every architecture (including x86) has issues with "inflight" DMA.
Line based Interrupts are delivered on a different path than DMA 
and thus ordering can't be enforced.
For example, the code around the following comment in drivers/net/tg3.c:
	/*
	 * Flush PCI write.  This also guarantees that our
	 * status block has been flushed to host memory.
	 */


> `Such a
> write-buffering mechanism is clearly a type of (write-)caching effect,

No - the data is still in flight and in some deterministic time frame
will become visible to the CPU.
Calling it a "caching effect" confuses the issues even worse.

> and readl() would be a kind of dma_rmb(), if you will.

Yes, that's correct - but it's orthogonal to "cache coherent".

> I suspect the docs are wrong about what dma-coherent means.

Not "wrong", just misunderstood. ;^)

hth,
grant

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-08  6:18                     ` Grant Grundler
@ 2004-03-08 18:58                       ` David Mosberger
  2004-03-08 21:48                         ` David Brownell
  0 siblings, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-08 18:58 UTC (permalink / raw)
  To: Grant Grundler
  Cc: David Brownell, davidm, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

>>>>> On Sun, 7 Mar 2004 22:18:02 -0800, Grant Grundler <iod00d@hp.com> said:

  >> `Such a write-buffering mechanism is clearly a type of
  >> (write-)caching effect,

  Grant> No - the data is still in flight and in some deterministic
  Grant> time frame will become visible to the CPU.  Calling it a
  Grant> "caching effect" confuses the issues even worse.

That's why I'm so unhappy that the PCI interface used the term
"consistent" memory, when it should have said "coherent".  We
should nail a plate on everbody's forehead saying:

 consistency = coherency + ordering

Perhaps then people would start to have a clear distincition between
the meaning of the two terms (or at least it would force them to think
about it! ;-).

But in any case, as later experimentation confirmed, the USB bug isn't
(just) an ordering issue.  The order of operation described in the
OHCI spec does not rely on any specific order of interrupt delivery at
all, so I was wrong about that.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-08 18:58                       ` David Mosberger
@ 2004-03-08 21:48                         ` David Brownell
  2004-03-09  9:15                           ` David Mosberger
  0 siblings, 1 reply; 25+ messages in thread
From: David Brownell @ 2004-03-08 21:48 UTC (permalink / raw)
  To: davidm
  Cc: Grant Grundler, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

[-- Attachment #1: Type: text/plain, Size: 647 bytes --]

David Mosberger wrote:

>  consistency = coherency + ordering

That's one of the things I meant when I said the docs weren't
quite there yet ... the dma_*() API docs don't address how
to achieve the ordering, or even mention that it's an issue.

Plus, there are definitions of coherency that include event
ordering; it's not clear which definition is intended.


Anyway, see the attached patch.  It goes on top of 2.6.4-bk2
and passes my unlink tests (the ones that haven't been run
much on recent kernels!) on several different OHCI hosts.

Other issues may be lurking, but this seems to fix a handful
of nasties that I could reproduce.

- Dave


[-- Attachment #2: ohci-0308.patch --]
[-- Type: text/plain, Size: 3017 bytes --]

Fixes for some recent OHCI instability.

    - Never munge ed->hwTailP once it's set up, except when appending
      to the queue.

    - Don't clear control/bulk "current ED" registers when taking
      entries off the queue.  The chip may still be using the value.

    - Revert part of previous bk1 patch, which removed a fast path
      that makes a visible difference in some common operations.
      Removing it just slowed timings, and didn't fix anything.

    - Make the "unlink during submit" path behave better (SMP).


--- 1.50/drivers/usb/host/ohci-q.c	Tue Mar  2 22:48:07 2004
+++ edited/drivers/usb/host/ohci-q.c	Mon Mar  8 13:15:08 2004
@@ -282,7 +282,6 @@
 			if (!ed->hwNextED) {
 				ohci->hc_control &= ~OHCI_CTRL_CLE;
 				writel (ohci->hc_control, &ohci->regs->control);
-				writel (0, &ohci->regs->ed_controlcurrent);
 				// post those pci writes
 				(void) readl (&ohci->regs->control);
 			}
@@ -306,7 +305,6 @@
 			if (!ed->hwNextED) {
 				ohci->hc_control &= ~OHCI_CTRL_BLE;
 				writel (ohci->hc_control, &ohci->regs->control);
-				writel (0, &ohci->regs->ed_bulkcurrent);
 				// post those pci writes
 				(void) readl (&ohci->regs->control);
 			}
@@ -331,6 +329,19 @@
 		periodic_unlink (ohci, ed);
 		break;
 	}
+
+	/* NOTE: Except for a couple of exceptionally clean unlink cases
+	 * (like unlinking the only c/b ED, with no TDs) HCs may still be
+	 * caching this operational ED (or its address).  Safe unlinking
+	 * involves not marking it ED_IDLE till INTR_SF; we always do that
+	 * if td_list isn't empty.  Otherwise the race is small; but ...
+	 */
+	if (ed->state == ED_OPER) {
+		ed->state = ED_IDLE;
+		ed->hwINFO &= ~(ED_SKIP | ED_DEQUEUE);
+		ed->hwHeadP &= ~ED_H;
+		wmb ();
+	}
 }
 
 
@@ -794,8 +805,6 @@
 		next->next_dl_td = rev;	
 		rev = next;
 
-		if (ed->hwTailP == cpu_to_le32 (next->td_dma))
-			ed->hwTailP = next->hwNextTD;
 		ed->hwHeadP = next->hwNextTD | toggle;
 	}
 
@@ -941,12 +950,7 @@
 				continue;
 			}
 
-			/* patch pointers hc uses ... tail, if we're removing
-			 * an otherwise active td, and whatever td pointer
-			 * points to this td
-			 */
-			if (ed->hwTailP == cpu_to_le32 (td->td_dma))
-				ed->hwTailP = td->hwNextTD;
+			/* patch pointer hc uses */
 			savebits = *prev & ~cpu_to_le32 (TD_MASK);
 			*prev = td->hwNextTD | savebits;
 
@@ -1041,7 +1045,7 @@
 
 		/* clean schedule:  unlink EDs that are no longer busy */
 		if (list_empty (&ed->td_list))
-			start_ed_unlink (ohci, ed);
+			ed_deschedule (ohci, ed);
 		/* ... reenabling halted EDs only after fault cleanup */
 		else if ((ed->hwINFO & (ED_SKIP | ED_DEQUEUE)) == ED_SKIP) {
 			td = list_entry (ed->td_list.next, struct td, td_list);
--- 1.57/drivers/usb/host/ohci-hcd.c	Tue Mar  2 22:48:07 2004
+++ edited/drivers/usb/host/ohci-hcd.c	Mon Mar  8 12:39:24 2004
@@ -233,7 +233,7 @@
 	spin_lock (&urb->lock);
 	if (urb->status != -EINPROGRESS) {
 		spin_unlock (&urb->lock);
-
+		urb->hcpriv = urb_priv;
 		finish_urb (ohci, urb, 0);
 		retval = 0;
 		goto fail;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-08 21:48                         ` David Brownell
@ 2004-03-09  9:15                           ` David Mosberger
  2004-03-09 17:36                             ` David Brownell
  0 siblings, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-09  9:15 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Grant Grundler, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

How about something along the following lines?  The patch is relative
to 2.6.4-rc1.  What it does is add a new state ED_DESCHEDULED, which
is treated exactly like ED_IDLE, except that in this state, the HC may
still be referring to the ED in question.  Thus, if
ohci_endpoint_disable() finds an ED in this state, it will take it
down via start_ed_unlink() and once that is done, the ED will be in
ED_IDLE state and hence safe for removal (HC won't be referencing it
anymore).

I left some printk's for low-frequency events enabled, to give you a
better idea of how this is working.  Note that I removed the
posting of the PCI-writes in ed_deschedule().  I don't think those
are needed.  As far as I can see, a write-memory barrier at the
end of this routine should suffice.

With this patch applied:

 - My dual-CPU Itanium machine boots fine (doesn't hang anymore
   as with 2.6.4-rc1)

 - My home workstation (single CPU Itanium with 3 USB devices)
   now boots and works fine again.  Moreover, I don't see any
   USB keyboard/mouse disconnects anymore when doing heavy
   IDE traffic.

The BTC keyboard, however, still does NOT work.  I'm fairly certain
now that this is indeed a separate problem in the HID.  The reason
being that I'm now fairly comfortable with the OHCI HCD and if this
bug truly were in the OHCI HCD, then why does it trigger with UHCI as
well (I haven't tried EHCI yet; maybe I should try connecting the BTC
keyboard via a USB 2.0 hub).

Comments?

	--david

=== drivers/usb/host/ohci-hcd.c 1.56 vs edited ==--- 1.56/drivers/usb/host/ohci-hcd.c	Tue Mar  2 05:52:40 2004
+++ edited/drivers/usb/host/ohci-hcd.c	Tue Mar  9 00:24:54 2004
@@ -239,8 +239,11 @@
 		goto fail;
 	}
 
-	/* schedule the ed if needed */
-	if (ed->state = ED_IDLE) {
+	if (ed->state = ED_UNLINK) {
+		printk("%s: ed %p being unlinked\n", __FUNCTION__, ed);
+		goto fail0;
+	} else if (ed->state = ED_IDLE || ed->state = ED_DESCHEDULED) {
+		/* schedule the ed if needed */
 		retval = ed_schedule (ohci, ed);
 		if (retval < 0)
 			goto fail0;
@@ -344,9 +347,15 @@
 	if (!ed)
 		goto done;
 
+	printk("%s(ed=%p, type=%d, state=%u)\n", __FUNCTION__, ed, ed->type, ed->state);
+
 	if (!HCD_IS_RUNNING (ohci->hcd.state))
 		ed->state = ED_IDLE;
 	switch (ed->state) {
+	case ED_DESCHEDULED:
+		start_ed_unlink (ohci, ed);
+		BUG_ON (ed->state != ED_UNLINK);
+		/* fall through */
 	case ED_UNLINK:		/* wait for hw to finish? */
 		/* major IRQ delivery trouble loses INTR_SF too... */
 		WARN_ON (limit-- = 0);
=== drivers/usb/host/ohci-q.c 1.48 vs edited ==--- 1.48/drivers/usb/host/ohci-q.c	Tue Mar  2 05:52:46 2004
+++ edited/drivers/usb/host/ohci-q.c	Tue Mar  9 00:39:39 2004
@@ -168,10 +168,16 @@
 {	 
 	int	branch;
 
+#if 0
+printk("%s: ed=%p type=%d\n", __FUNCTION__, ed, ed->type);
+#endif
 	ed->state = ED_OPER;
 	ed->ed_prev = 0;
 	ed->ed_next = 0;
 	ed->hwNextED = 0;
+	ed->hwINFO &= ~ED_SKIP;
+	WARN_ON (ed->hwHeadP & ED_H);
+	ed->hwHeadP &= ~ED_H;	/* XXX is this really needed?? --davidm */
 	wmb ();
 
 	/* we care about rm_list when setting CLE/BLE in case the HC was at
@@ -267,24 +273,30 @@
 		ed, ed->branch, ed->load, ed->interval);
 }
 
-/* unlink an ed from one of the HC chains. 
+/* Deschedule an ed from one of the HC chains.
  * just the link to the ed is unlinked.
  * the link from the ed still points to another operational ed or 0
- * so the HC can eventually finish the processing of the unlinked ed
+ * so the HC can eventually finish the processing of the unlinked ed.
+ * The ED isn't idle after returning from this routine, because the HC
+ * may continue to refer to it, but as far as the HCD is concerned,
+ * the ED is reusable.  If it is necessary to get the ED into the
+ * ED_IDLE state, the full unlink-sequence needs to be performed
+ * (via start_ed_unlink()).
  */
-static void ed_deschedule (struct ohci_hcd *ohci, struct ed *ed) 
+static void ed_deschedule (struct ohci_hcd *ohci, struct ed *ed)
 {
+#if 0
+printk("%s: ed=%p type=%d\n", __FUNCTION__, ed, ed->type);
+#endif
 	ed->hwINFO |= ED_SKIP;
 
 	switch (ed->type) {
 	case PIPE_CONTROL:
+		/* remove ED from the HC's list: */
 		if (ed->ed_prev = NULL) {
 			if (!ed->hwNextED) {
 				ohci->hc_control &= ~OHCI_CTRL_CLE;
 				writel (ohci->hc_control, &ohci->regs->control);
-				writel (0, &ohci->regs->ed_controlcurrent);
-				// post those pci writes
-				(void) readl (&ohci->regs->control);
 			}
 			writel (le32_to_cpup (&ed->hwNextED),
 				&ohci->regs->ed_controlhead);
@@ -292,6 +304,7 @@
 			ed->ed_prev->ed_next = ed->ed_next;
 			ed->ed_prev->hwNextED = ed->hwNextED;
 		}
+		/* remove ED from the HCD's list: */
 		if (ohci->ed_controltail = ed) {
 			ohci->ed_controltail = ed->ed_prev;
 			if (ohci->ed_controltail)
@@ -302,13 +315,11 @@
 		break;
 
 	case PIPE_BULK:
+		/* remove ED from the singly-linked HC's list: */
 		if (ed->ed_prev = NULL) {
 			if (!ed->hwNextED) {
 				ohci->hc_control &= ~OHCI_CTRL_BLE;
 				writel (ohci->hc_control, &ohci->regs->control);
-				writel (0, &ohci->regs->ed_bulkcurrent);
-				// post those pci writes
-				(void) readl (&ohci->regs->control);
 			}
 			writel (le32_to_cpup (&ed->hwNextED),
 				&ohci->regs->ed_bulkhead);
@@ -316,6 +327,7 @@
 			ed->ed_prev->ed_next = ed->ed_next;
 			ed->ed_prev->hwNextED = ed->hwNextED;
 		}
+		/* remove ED from the HCD's list: */
 		if (ohci->ed_bulktail = ed) {
 			ohci->ed_bulktail = ed->ed_prev;
 			if (ohci->ed_bulktail)
@@ -331,6 +343,12 @@
 		periodic_unlink (ohci, ed);
 		break;
 	}
+	ed->state = ED_DESCHEDULED;
+	/*
+	 * Ensure HCD sees these updates (in particular update of
+	 * hwINFO) before any following updates.
+	 */
+	wmb ();
 }
 
 
@@ -387,7 +405,7 @@
 	/* NOTE: only ep0 currently needs this "re"init logic, during
 	 * enumeration (after set_address, or if ep0 maxpacket >8).
 	 */
-  	if (ed->state = ED_IDLE) {
+  	if (ed->state = ED_IDLE || ed->state = ED_DESCHEDULED) {
 		u32	info;
 
 		info = usb_pipedevice (pipe);
@@ -418,6 +436,9 @@
 
 done:
 	spin_unlock_irqrestore (&ohci->lock, flags);
+#if 0
+printk("new ed %p: type=%u state=%u hwTailP=%x hwHeadP=%x\n", ed, ed->type, ed->state, ed->hwTailP, ed->hwHeadP);
+#endif
 	return ed; 
 }
 
@@ -428,17 +449,38 @@
  * real work is done at the next start frame (SF) hardware interrupt
  */
 static void start_ed_unlink (struct ohci_hcd *ohci, struct ed *ed)
-{    
+{
+	u32 mask = 0;
+
+	/* deschedule ED as far as the HCD is concerned: */
+	ed_deschedule (ohci, ed);
+
+	/* now initiate the unlink sequence to ensure the HC isn't using the ED anymore: */
+	if (ed->type = PIPE_CONTROL)
+		mask = OHCI_CTRL_CLE;
+	else if (ed->type = PIPE_BULK)
+		mask = OHCI_CTRL_BLE;
+	if (mask) {
+		/*
+		 * Disable scanning of the control or bulk list; this
+		 * ensures that when we get an interrupt with a frame
+		 * # greater than the current frame #, the HC isn't
+		 * using the list anymore.
+		 */
+		ohci->hc_control &= ~mask;
+		writel (ohci->hc_control, &ohci->regs->control);
+	}
 	ed->hwINFO |= ED_DEQUEUE;
 	ed->state = ED_UNLINK;
-	ed_deschedule (ohci, ed);
 
 	/* SF interrupt might get delayed; record the frame counter value that
 	 * indicates when the HC isn't looking at it, so concurrent unlinks
 	 * behave.  frame_no wraps every 2^16 msec, and changes right before
 	 * SF is triggered.
 	 */
+	rmb();	/* ensure ed_deschedule() work is done before we read the frame # */
 	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 1;
+printk("%s: ed=%p tick=%u\n", __FUNCTION__, ed, ed->tick);
 
 	/* rm_list is just singly linked, for simplicity */
 	ed->ed_next = ohci->ed_rm_list;
@@ -452,6 +494,7 @@
 		// flush those pci writes
 		(void) readl (&ohci->regs->control);
 	}
+	/* ??? What if HCD isn't running?  Shouldn't that be handled as well?  */
 }
 
 /*-------------------------------------------------------------------------*
@@ -614,6 +657,7 @@
 		info = TD_CC | TD_DP_SETUP | TD_T_DATA0;
 		td_fill (ohci, info, urb->setup_dma, 8, urb, cnt++);
 		if (data_len > 0) {
+			BUG_ON(data_len >= 4096);
 			info = TD_CC | TD_R | TD_T_DATA1;
 			info |= is_out ? TD_DP_OUT : TD_DP_IN;
 			/* NOTE:  mishandles transfers >8K, some >4K */
@@ -758,6 +802,7 @@
 	struct list_head	*tmp = td->td_list.next;
 	u32			toggle = ed->hwHeadP & ED_C;
 
+printk("%s: ed=%p\n", __FUNCTION__, ed);
 	/* clear ed halt; this is the td that caused it, but keep it inactive
 	 * until its urb->complete() has a chance to clean up.
 	 */
@@ -794,8 +839,6 @@
 		next->next_dl_td = rev;	
 		rev = next;
 
-		if (ed->hwTailP = cpu_to_le32 (next->td_dma))
-			ed->hwTailP = next->hwNextTD;
 		ed->hwHeadP = next->hwNextTD | toggle;
 	}
 
@@ -881,6 +924,7 @@
 {
 	struct ed	*ed, **last;
 
+printk("%s(tick=%u)\n", __FUNCTION__, tick);
 rescan_all:
 	for (last = &ohci->ed_rm_list, ed = *last; ed != NULL; ed = *last) {
 		struct list_head	*entry, *tmp;
@@ -941,12 +985,7 @@
 				continue;
 			}
 
-			/* patch pointers hc uses ... tail, if we're removing
-			 * an otherwise active td, and whatever td pointer
-			 * points to this td
-			 */
-			if (ed->hwTailP = cpu_to_le32 (td->td_dma))
-				ed->hwTailP = td->hwNextTD;
+			/* patch pointers hc uses  */
 			savebits = *prev & ~cpu_to_le32 (TD_MASK);
 			*prev = td->hwNextTD | savebits;
 
@@ -957,23 +996,22 @@
 			/* if URB is done, clean up */
 			if (urb_priv->td_cnt = urb_priv->length) {
 				modified = completed = 1;
-				finish_urb (ohci, urb, regs);
+				finish_urb (ohci, urb, regs);	/* this drops ohci->lock! */
 			}
 		}
 		if (completed && !list_empty (&ed->td_list))
 			goto rescan_this;
 
 		/* ED's now officially unlinked, hc doesn't see */
+#if 1
+printk("%s: done with ed %p\n", __FUNCTION__, ed);
+#endif
 		ed->state = ED_IDLE;
 		ed->hwINFO &= ~(ED_SKIP | ED_DEQUEUE);
 		ed->hwHeadP &= ~ED_H;
 		ed->hwNextED = 0;
 
-		/* but if there's work queued, reschedule */
-		if (!list_empty (&ed->td_list)) {
-			if (HCD_IS_RUNNING(ohci->hcd.state))
-				ed_schedule (ohci, ed);
-		}
+		BUG_ON (!list_empty (&ed->td_list));
 
 		if (modified)
 			goto rescan_all;
@@ -1039,9 +1077,9 @@
   		if (urb_priv->td_cnt = urb_priv->length)
   			finish_urb (ohci, urb, regs);
 
-		/* clean schedule:  unlink EDs that are no longer busy */
+		/* clean schedule:  deschedule EDs that are no longer busy */
 		if (list_empty (&ed->td_list))
-			start_ed_unlink (ohci, ed);
+			ed_deschedule (ohci, ed);
 		/* ... reenabling halted EDs only after fault cleanup */
 		else if ((ed->hwINFO & (ED_SKIP | ED_DEQUEUE)) = ED_SKIP) {
 			td = list_entry (ed->td_list.next, struct td, td_list);
=== drivers/usb/host/ohci.h 1.19 vs edited ==--- 1.19/drivers/usb/host/ohci.h	Wed Feb 11 03:42:39 2004
+++ edited/drivers/usb/host/ohci.h	Mon Mar  8 23:03:31 2004
@@ -48,6 +48,7 @@
 #define ED_IDLE 	0x00		/* NOT linked to HC */
 #define ED_UNLINK 	0x01		/* being unlinked from hc */
 #define ED_OPER		0x02		/* IS linked to hc */
+#define ED_DESCHEDULED	0x03		/* idle for HCD, but HC may still hold reference to it */
 
 	u8			type; 		/* PIPE_{BULK,...} */
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-09  9:15                           ` David Mosberger
@ 2004-03-09 17:36                             ` David Brownell
  2004-03-09 17:58                               ` David Mosberger
  0 siblings, 1 reply; 25+ messages in thread
From: David Brownell @ 2004-03-09 17:36 UTC (permalink / raw)
  To: davidm
  Cc: Grant Grundler, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

David Mosberger wrote:
> How about something along the following lines?  The patch is relative
> to 2.6.4-rc1.  What it does is add a new state ED_DESCHEDULED, which
> is treated exactly like ED_IDLE, except that in this state, the HC may
> still be referring to the ED in question.  Thus, if

Sounds exactly like ED_UNLINK -- except maybe that it's not
been put onto ed_rm_list (with ED_DEQUEUE set).

Why add another state?

The parts of this patch that came from the one I sent earlier
are obviously correct (what were your test results for that?),
and there's non-worrisome noise (printks etc).

But some parts worry me.  Like changing that code to BUG()
on a driver behavior that's perfectly reasonable; and removing
some of the PCI posting, which makes it easier for the HC
and its driver to disagree about schedule status.

> The BTC keyboard, however, still does NOT work.  I'm fairly certain
> now that this is indeed a separate problem in the HID.  The reason

That was my original suspicion, you may recall ... :)

- Dave

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-09 17:36                             ` David Brownell
@ 2004-03-09 17:58                               ` David Mosberger
  2004-03-09 20:39                                 ` David Brownell
  0 siblings, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-09 17:58 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Grant Grundler, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

>>>>> On Tue, 09 Mar 2004 09:36:53 -0800, David Brownell <david-b@pacbell.net> said:

  David.B> David Mosberger wrote:
  >> How about something along the following lines?  The patch is
  >> relative to 2.6.4-rc1.  What it does is add a new state
  >> ED_DESCHEDULED, which is treated exactly like ED_IDLE, except
  >> that in this state, the HC may still be referring to the ED in
  >> question.  Thus, if

  David.B> Sounds exactly like ED_UNLINK -- except maybe that it's not
  David.B> been put onto ed_rm_list (with ED_DEQUEUE set).

  David.B> Why add another state?

So you can tell them apart.  See ohci_endpoint_disable().

You seem to think small races are OK.  I disagree.  The smaller the
race, the harder it is to debug (and yes, it _will_ bite you
eventually).  With my patch, you're _guaranteed_ that ED_IDLE means
the HC doesn't refer to the ED anymore so you can free the memory and
do whatever you want without worry about potentially causing the HC to
do something bad.

  David.B> But some parts worry me.  Like changing that code to BUG()
  David.B> on a driver behavior that's perfectly reasonable;

With my patch, the only reason you enter ED_UNLINK state is
ohci_endpoint_disable().  If you try to ohci_urb_enqueue() on an ED in
this state, it will fail, so I don't there is a problem (I see now
that I forgot to set the error-code for this case, that's obviously
something that needs to be fixed).  But perhaps you had different
semantics in mind for ED_UNLINK?

  David.B> removing some of the PCI posting, which makes it easier for
  David.B> the HC and its driver to disagree about schedule status.

Meaning what?  Please explain what sequence of events would cause
problems that could be solved by flushing the posted writes.  AFAICT,
the flushes are just expensive NO-OPs in this particular case.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-09 17:58                               ` David Mosberger
@ 2004-03-09 20:39                                 ` David Brownell
  2004-03-09 23:32                                   ` David Mosberger
  2004-03-10  6:59                                   ` David Mosberger
  0 siblings, 2 replies; 25+ messages in thread
From: David Brownell @ 2004-03-09 20:39 UTC (permalink / raw)
  To: davidm
  Cc: Grant Grundler, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

David Mosberger wrote:
> 
>   > David Mosberger wrote:
>   >> ...  add a new state
>   >> ED_DESCHEDULED, which is treated exactly like ED_IDLE, except
>   >> that in this state, the HC may still be referring to the ED in
>   >> question.  Thus, if
> 
>   David.B> Sounds exactly like ED_UNLINK -- except maybe that it's not
>   David.B> been put onto ed_rm_list (with ED_DEQUEUE set).
>   David.B> Why add another state?
> 
> So you can tell them apart.  See ohci_endpoint_disable().

Not informative.  Why doesn't UNLINK work as-is?
If there's a bug in how that's handled, better to fix it.
I'd be willing to believe there is one, given proof.


> You seem to think small races are OK.  I disagree.  The smaller the

I certainly didn't say that, any more than you said
that there's no such thing as an engineering tradeoff!

(With which statement I'd likewise disagree.)

But I certainly did mean to say that I was skeptical
you were actually running into one particular race,
which I've had an eye on for some time.  (And which
shouldn't have the failure mode you reported, though
some other things might...)


>   David.B> But some parts worry me.  Like changing that code to BUG()
>   David.B> on a driver behavior that's perfectly reasonable;
> 
> With my patch, the only reason you enter ED_UNLINK state is
> ohci_endpoint_disable().  If you try to ohci_urb_enqueue() on an ED in
> this state, it will fail, so I don't there is a problem (I see now

It's entered in usb_unlink_urb() as always.  So as I
noted, your patch would make drivers BUG() in cases
that are perfectly reasonable according to the API.


> that I forgot to set the error-code for this case, that's obviously
> something that needs to be fixed).  But perhaps you had different
> semantics in mind for ED_UNLINK?

As I've said more than once on this thread.

- Dave


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-09 20:39                                 ` David Brownell
@ 2004-03-09 23:32                                   ` David Mosberger
  2004-03-10  2:53                                     ` David Brownell
  2004-03-10  6:59                                   ` David Mosberger
  1 sibling, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-09 23:32 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Grant Grundler, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

David B.,

I'll respond to your concerns about my OHCI changes later on.  I
wanted to make progress on the BTC HID issue first so we have the full
picture.

Remember how I was asking what's so special about 0xf00000 because
that's the address the OHCI HC was reading from that caused the crash?
Well, looky here what you get when you interpret the memory at address
0 as an ED:

 hw=(infoð000000 tailpð000000 headpð000000 nextEDð000000)

D'oh!

(I suspect this strange memory pattern is a left-over from the
memory-testing done by the firmware; I should say thanks to whoever
invented this pattern; the problem would have been much harder to
debug if the address hadn't convenently fallen on a write-only region
of the address-space.)

Anyhow, I have been wondering since about Sunday whether it's really
safe to write HcControlHeadED (and HcBulkHeadED) with 0.  The register
description itself is ambiguous.  Now I'm finding that Figure 6-5
"List Service Flow" and Section 6.4.2.2 "Locating Endpoint
Descriptors" are outright contradictory!

The flow-chart suggests that after loading CurrentED with the contents
of HeadED, the HC checks whether CurrentED is 0 and, if so, does nothing.
However, the text in Section 6.4.2.2 says:

 ... At this point, the Host Controller checks the BulkListFilled bit
 or ControlListFilled bit of the HcCommandStatus register.  If the
 respective "Filled" bit is set to 1, there is at least one Endpoint
 Descriptor on the list which needs service.  In this case, the
 HostController will copy the value of HcControlHeadED or HcBulkHeadED
 into HcControlCurrentED or HcBulkCurrentED respectively, clear the
 "Filled" bit to 0, and attempt to process the Endpoint Descriptor now
 present in the CurrentED register. ...

So, if the HC behaves as described in the text, then there is an
obvious race:

  HC						HCD

  - start new frame
  - find list enabled (CLE/BLE set)
  - find "Filled" bit enabled (CLF/BLF set)
						- ed_deschedule() gets called
						- turn off "list-enabled" bit
						- store 0 into *HeadED
  - read *HeadED and store in *CurrentEd
  - clear "Filled" bit to 0
  - process ED at "Current"

In other words, whenever de-scheduling the first ED on the control or
bulk list, there is a risk that with the right timing, you'll end up
processing the "ED" at address zero!

So I changed ohci-q.c from donig this:

	if (ed->ed_prev = NULL) {
		if (!ed->hwNextED) {
			ohci->hc_control &= ~OHCI_CTRL_CLE;
			writel (ohci->hc_control, &ohci->regs->control);
		}
		writel (le32_to_cpup (&ed->hwNextED),
			&ohci->regs->ed_controlhead);
	} else ...

to this:

	if (ed->ed_prev = NULL) {
		if (!ed->hwNextED) {
			ohci->hc_control &= ~OHCI_CTRL_CLE;
			writel (ohci->hc_control, &ohci->regs->control);
		} else
			writel (le32_to_cpup (&ed->hwNextED),
				&ohci->regs->ed_controlhead);
	} else ...

and now there are no more crashes when plugging in the BTC keyboard.
(Note: the change is safe only if the ED being removed remains valid
until the clearing of the CLE is observed, i.e, until the next start
of frame, which is the case with my patch from yesterday.)

Now the next problem is to figure out why one of the URBs submitted by
the HID consistently times out the first time round.  (Oh, and there
is an infinite loop in hidinput_connect() which is trivial to fix.)

	--david

PS: It would have been nice if I had been smart enough to check the
    memory at address 0 upfront (as described above), but in truth, I
    found the problem in the reverse order by noticing that the
    machine died right after/during the ed_deschedule() and the
    clearing of the *HeadED register was just about the only thing
    left that could cause the crash.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-09 23:32                                   ` David Mosberger
@ 2004-03-10  2:53                                     ` David Brownell
  2004-03-10  6:11                                       ` David Mosberger
  0 siblings, 1 reply; 25+ messages in thread
From: David Brownell @ 2004-03-10  2:53 UTC (permalink / raw)
  To: davidm
  Cc: Grant Grundler, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

David Mosberger wrote:

> Anyhow, I have been wondering since about Sunday whether it's really
> safe to write HcControlHeadED (and HcBulkHeadED) with 0.  The register
> description itself is ambiguous.  Now I'm finding that Figure 6-5
> "List Service Flow" and Section 6.4.2.2 "Locating Endpoint
> Descriptors" are outright contradictory!

OHCI isn't as tightly specified as one might like.  There are also
differences in how vendors implemented it (notably initialization);
but at least there don't seem to be major incompatibilities that
crept in between implementations.

> So, if the HC behaves as described in the text, then there is an
> obvious race:
> ....
> So I changed ohci-q.c ... to this:
> 
> 	if (ed->ed_prev = NULL) {
> 		if (!ed->hwNextED) {
> 			ohci->hc_control &= ~OHCI_CTRL_CLE;
> 			writel (ohci->hc_control, &ohci->regs->control);
> 		} else

(added the ELSE)

> 			writel (le32_to_cpup (&ed->hwNextED),
> 				&ohci->regs->ed_controlhead);
> 	} else ...
> 
> and now there are no more crashes when plugging in the BTC keyboard.

Interesting.  I had those "else"s in the patch I was testing too,
and they were literally the last "is this really needed?" change I
removed before posting that patch yesterday.  I had tested it on
four different implementations of OHCI (two on SMP) and the "else"
didn't make a difference ... as it might not, given the race
you identified.   This is a good example of something better
found with a fresh set of eyes looking at spec and code!

Looks like that's really needed then!  Whose OHCI silicon are
you testing with, by the way?  Good catch, regardless.

- Dave

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-10  2:53                                     ` David Brownell
@ 2004-03-10  6:11                                       ` David Mosberger
  0 siblings, 0 replies; 25+ messages in thread
From: David Mosberger @ 2004-03-10  6:11 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Grant Grundler, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

>>>>> On Tue, 09 Mar 2004 18:53:58 -0800, David Brownell <david-b@pacbell.net> said:

  David.B> Whose OHCI silicon David.B> are you testing with, by the way?

lspci claims it's a NEC chip:

a0:01.0 USB Controller: NEC Corporation USB (rev 41) (prog-if 10 [OHCI])
        Subsystem: NEC Corporation USB
        Flags: bus master, medium devsel, latency 128, IRQ 62
        Memory at 00000000d0062000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: <available only to root>

  --david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-09 20:39                                 ` David Brownell
  2004-03-09 23:32                                   ` David Mosberger
@ 2004-03-10  6:59                                   ` David Mosberger
  2004-03-10 16:22                                     ` David Brownell
  1 sibling, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-10  6:59 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Grant Grundler, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

>>>>> On Tue, 09 Mar 2004 12:39:52 -0800, David Brownell <david-b@pacbell.net> said:

  David.B> David Mosberger wrote:
  >>  > David Mosberger wrote: >> ...  add a new state >>

  >> ED_DESCHEDULED, which is treated exactly like ED_IDLE, except >>
  >> that in this state, the HC may still be referring to the ED in >>
  >> question.  Thus, if

  David.B> Sounds exactly like ED_UNLINK -- except maybe that it's not
  David.B> been put onto ed_rm_list (with ED_DEQUEUE set).  Why add
  David.B> another state?

  >>  So you can tell them apart.  See ohci_endpoint_disable().

  David.B> Not informative.  Why doesn't UNLINK work as-is?  If
  David.B> there's a bug in how that's handled, better to fix it.  I'd
  David.B> be willing to believe there is one, given proof.

The current OHCI relies on the internals of the dma_pool()
implementation.  If the implementation changed to, say, modify the
memory that is free or, heaven forbid, return the memory to the
kernel, you'd get _extremely_ difficult to track down race conditions.
Even so, the code isn't race-free, like I explained yesterday:

 - ed_alloc() clears the ED to 0 via memset()
 - the order in which memset() clears memory is undefined (various
   from platform to platform etc)
 - thus you might get a case where hwTailP is 0 but hwHeadP
   is non-zero, which would cause the HC to happily start
   dereferencing the descriptor

There are probably other potential issues. Frankly, I _don't_ want to
have to deal with this.  Especially considering that the alternative
costs virtually nothing, requires just a few source code changes, and
contains the EDs that are potentially still referenced by the HC to
the HC code itself.

  David.B> But some parts worry me.  Like changing that code to BUG()
  David.B> on a driver behavior that's perfectly reasonable;

I think the patch below incorporates your suggestions and fixes the
problems you pointed out.  In particular:

 - dropped debug printing
 - allow URB submission during ED_UNLINK again
 - add a big fat comment in ed_deschedule() why we must not update
   controlhead when the list goes empty (it's quite counter-intuitive
   to _not_ do it and I'm worried someone in the feature may want
   to "fix" this again...)

The rest should be the samae as yesterday.

The patch is still against 2.6.4-rc2 (and still contains your hwTailP
update fixes).

Please consider for inclusion.

	--david

=== drivers/usb/host/ohci-hcd.c 1.56 vs edited ==--- 1.56/drivers/usb/host/ohci-hcd.c	Tue Mar  2 05:52:40 2004
+++ edited/drivers/usb/host/ohci-hcd.c	Tue Mar  9 22:29:08 2004
@@ -239,8 +239,8 @@
 		goto fail;
 	}
 
-	/* schedule the ed if needed */
-	if (ed->state = ED_IDLE) {
+	if (ed->state = ED_IDLE || ed->state = ED_DESCHEDULED) {
+		/* schedule the ed if needed */
 		retval = ed_schedule (ohci, ed);
 		if (retval < 0)
 			goto fail0;
@@ -347,6 +347,10 @@
 	if (!HCD_IS_RUNNING (ohci->hcd.state))
 		ed->state = ED_IDLE;
 	switch (ed->state) {
+	case ED_DESCHEDULED:
+		start_ed_unlink (ohci, ed);
+		BUG_ON (ed->state != ED_UNLINK);
+		/* fall through */
 	case ED_UNLINK:		/* wait for hw to finish? */
 		/* major IRQ delivery trouble loses INTR_SF too... */
 		WARN_ON (limit-- = 0);
=== drivers/usb/host/ohci-q.c 1.48 vs edited ==--- 1.48/drivers/usb/host/ohci-q.c	Tue Mar  2 05:52:46 2004
+++ edited/drivers/usb/host/ohci-q.c	Tue Mar  9 22:27:04 2004
@@ -172,6 +172,7 @@
 	ed->ed_prev = 0;
 	ed->ed_next = 0;
 	ed->hwNextED = 0;
+	ed->hwINFO &= ~ED_SKIP;
 	wmb ();
 
 	/* we care about rm_list when setting CLE/BLE in case the HC was at
@@ -187,6 +188,8 @@
 	switch (ed->type) {
 	case PIPE_CONTROL:
 		if (ohci->ed_controltail = NULL) {
+			/* Writing of control-head is safe only if CLE is off. */
+			BUG_ON (ohci->hc_control & OHCI_CTRL_CLE);
 			writel (ed->dma, &ohci->regs->ed_controlhead);
 		} else {
 			ohci->ed_controltail->ed_next = ed;
@@ -203,6 +206,8 @@
 
 	case PIPE_BULK:
 		if (ohci->ed_bulktail = NULL) {
+			/* Writing of bulk-head is safe only if BLE is off. */
+			BUG_ON (ohci->hc_control & OHCI_CTRL_BLE);
 			writel (ed->dma, &ohci->regs->ed_bulkhead);
 		} else {
 			ohci->ed_bulktail->ed_next = ed;
@@ -267,10 +272,15 @@
 		ed, ed->branch, ed->load, ed->interval);
 }
 
-/* unlink an ed from one of the HC chains. 
+/* Deschedule an ed from one of the HC chains.
  * just the link to the ed is unlinked.
  * the link from the ed still points to another operational ed or 0
- * so the HC can eventually finish the processing of the unlinked ed
+ * so the HC can eventually finish the processing of the unlinked ed.
+ * The ED isn't idle after returning from this routine, because the HC
+ * may continue to refer to it, but as far as the HCD is concerned,
+ * the ED is reusable.  If it is necessary to get the ED into the
+ * ED_IDLE state, the full unlink-sequence needs to be performed
+ * (via start_ed_unlink()).
  */
 static void ed_deschedule (struct ohci_hcd *ohci, struct ed *ed) 
 {
@@ -278,20 +288,33 @@
 
 	switch (ed->type) {
 	case PIPE_CONTROL:
+		/* remove ED from the HC's list: */
 		if (ed->ed_prev = NULL) {
 			if (!ed->hwNextED) {
 				ohci->hc_control &= ~OHCI_CTRL_CLE;
 				writel (ohci->hc_control, &ohci->regs->control);
-				writel (0, &ohci->regs->ed_controlcurrent);
-				// post those pci writes
-				(void) readl (&ohci->regs->control);
-			}
-			writel (le32_to_cpup (&ed->hwNextED),
-				&ohci->regs->ed_controlhead);
+				/*
+				 * Caveat: even though the list is
+				 * empty now, we MUST NOT write 0 to
+				 * controlhead.  The OHCI spec is
+				 * ambiguous on this point (Figure 6-5
+				 * and Section 6.4.2.2 specify
+				 * conflicting behaviors), but there
+				 * are definitely HCs out there which
+				 * will happily try to process an ED
+				 * from address 0.  It's OK not to
+				 * update controlhead because we leave
+				 * the ED alive for long enough that
+				 * this is safe.
+				 */
+			} else
+				writel (le32_to_cpup (&ed->hwNextED),
+					&ohci->regs->ed_controlhead);
 		} else {
 			ed->ed_prev->ed_next = ed->ed_next;
 			ed->ed_prev->hwNextED = ed->hwNextED;
 		}
+		/* remove ED from the HCD's list: */
 		if (ohci->ed_controltail = ed) {
 			ohci->ed_controltail = ed->ed_prev;
 			if (ohci->ed_controltail)
@@ -302,20 +325,33 @@
 		break;
 
 	case PIPE_BULK:
+		/* remove ED from the singly-linked HC's list: */
 		if (ed->ed_prev = NULL) {
 			if (!ed->hwNextED) {
 				ohci->hc_control &= ~OHCI_CTRL_BLE;
 				writel (ohci->hc_control, &ohci->regs->control);
-				writel (0, &ohci->regs->ed_bulkcurrent);
-				// post those pci writes
-				(void) readl (&ohci->regs->control);
-			}
-			writel (le32_to_cpup (&ed->hwNextED),
-				&ohci->regs->ed_bulkhead);
+				/*
+				 * Caveat: even though the list is
+				 * empty now, we MUST NOT write 0 to
+				 * bulkhead.  The OHCI spec is
+				 * ambiguous on this point (Figure 6-5
+				 * and Section 6.4.2.2 specify
+				 * conflicting behaviors), but there
+				 * are definitely HCs out there which
+				 * will happily try to process an ED
+				 * from address 0.  It's OK not to
+				 * update controlhead because we leave
+				 * the ED alive for long enough that
+				 * this is safe.
+				 */
+			} else
+				writel (le32_to_cpup (&ed->hwNextED),
+					&ohci->regs->ed_bulkhead);
 		} else {
 			ed->ed_prev->ed_next = ed->ed_next;
 			ed->ed_prev->hwNextED = ed->hwNextED;
 		}
+		/* remove ED from the HCD's list: */
 		if (ohci->ed_bulktail = ed) {
 			ohci->ed_bulktail = ed->ed_prev;
 			if (ohci->ed_bulktail)
@@ -331,6 +367,12 @@
 		periodic_unlink (ohci, ed);
 		break;
 	}
+	ed->state = ED_DESCHEDULED;
+	/*
+	 * Ensure HCD sees these updates (in particular update of
+	 * hwINFO) before any following updates.
+	 */
+	wmb ();
 }
 
 
@@ -387,7 +429,7 @@
 	/* NOTE: only ep0 currently needs this "re"init logic, during
 	 * enumeration (after set_address, or if ep0 maxpacket >8).
 	 */
-  	if (ed->state = ED_IDLE) {
+  	if (ed->state = ED_IDLE || ed->state = ED_DESCHEDULED) {
 		u32	info;
 
 		info = usb_pipedevice (pipe);
@@ -429,15 +471,35 @@
  */
 static void start_ed_unlink (struct ohci_hcd *ohci, struct ed *ed)
 {    
+	u32 mask = 0;
+
+	/* deschedule ED as far as the HCD is concerned: */
+	ed_deschedule (ohci, ed);
+
+	/* now initiate the unlink sequence to ensure the HC isn't using the ED anymore: */
+	if (ed->type = PIPE_CONTROL)
+		mask = OHCI_CTRL_CLE;
+	else if (ed->type = PIPE_BULK)
+		mask = OHCI_CTRL_BLE;
+	if (mask) {
+		/*
+		 * Disable scanning of the control or bulk list; this
+		 * ensures that when we get an interrupt with a frame
+		 * # greater than the current frame #, the HC isn't
+		 * using the list anymore.
+		 */
+		ohci->hc_control &= ~mask;
+		writel (ohci->hc_control, &ohci->regs->control);
+	}
 	ed->hwINFO |= ED_DEQUEUE;
 	ed->state = ED_UNLINK;
-	ed_deschedule (ohci, ed);
 
 	/* SF interrupt might get delayed; record the frame counter value that
 	 * indicates when the HC isn't looking at it, so concurrent unlinks
 	 * behave.  frame_no wraps every 2^16 msec, and changes right before
 	 * SF is triggered.
 	 */
+	rmb();	/* ensure ed_deschedule() work is done before we read the frame # */
 	ed->tick = OHCI_FRAME_NO(ohci->hcca) + 1;
 
 	/* rm_list is just singly linked, for simplicity */
@@ -452,6 +514,7 @@
 		// flush those pci writes
 		(void) readl (&ohci->regs->control);
 	}
+	/* ??? What if HCD isn't running?  Shouldn't that be handled as well?  */
 }
 
 /*-------------------------------------------------------------------------*
@@ -614,6 +677,7 @@
 		info = TD_CC | TD_DP_SETUP | TD_T_DATA0;
 		td_fill (ohci, info, urb->setup_dma, 8, urb, cnt++);
 		if (data_len > 0) {
+			BUG_ON(data_len >= 4096);
 			info = TD_CC | TD_R | TD_T_DATA1;
 			info |= is_out ? TD_DP_OUT : TD_DP_IN;
 			/* NOTE:  mishandles transfers >8K, some >4K */
@@ -794,8 +858,6 @@
 		next->next_dl_td = rev;	
 		rev = next;
 
-		if (ed->hwTailP = cpu_to_le32 (next->td_dma))
-			ed->hwTailP = next->hwNextTD;
 		ed->hwHeadP = next->hwNextTD | toggle;
 	}
 
@@ -941,12 +1003,7 @@
 				continue;
 			}
 
-			/* patch pointers hc uses ... tail, if we're removing
-			 * an otherwise active td, and whatever td pointer
-			 * points to this td
-			 */
-			if (ed->hwTailP = cpu_to_le32 (td->td_dma))
-				ed->hwTailP = td->hwNextTD;
+			/* patch pointers hc uses  */
 			savebits = *prev & ~cpu_to_le32 (TD_MASK);
 			*prev = td->hwNextTD | savebits;
 
@@ -957,7 +1014,7 @@
 			/* if URB is done, clean up */
 			if (urb_priv->td_cnt = urb_priv->length) {
 				modified = completed = 1;
-				finish_urb (ohci, urb, regs);
+				finish_urb (ohci, urb, regs);	/* this drops ohci->lock! */
 			}
 		}
 		if (completed && !list_empty (&ed->td_list))
@@ -1039,9 +1096,9 @@
   		if (urb_priv->td_cnt = urb_priv->length)
   			finish_urb (ohci, urb, regs);
 
-		/* clean schedule:  unlink EDs that are no longer busy */
+		/* clean schedule:  deschedule EDs that are no longer busy */
 		if (list_empty (&ed->td_list))
-			start_ed_unlink (ohci, ed);
+			ed_deschedule (ohci, ed);
 		/* ... reenabling halted EDs only after fault cleanup */
 		else if ((ed->hwINFO & (ED_SKIP | ED_DEQUEUE)) = ED_SKIP) {
 			td = list_entry (ed->td_list.next, struct td, td_list);
=== drivers/usb/host/ohci.h 1.19 vs edited ==--- 1.19/drivers/usb/host/ohci.h	Wed Feb 11 03:42:39 2004
+++ edited/drivers/usb/host/ohci.h	Mon Mar  8 23:03:31 2004
@@ -48,6 +48,7 @@
 #define ED_IDLE 	0x00		/* NOT linked to HC */
 #define ED_UNLINK 	0x01		/* being unlinked from hc */
 #define ED_OPER		0x02		/* IS linked to hc */
+#define ED_DESCHEDULED	0x03		/* idle for HCD, but HC may still hold reference to it */
 
 	u8			type; 		/* PIPE_{BULK,...} */
 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-10  6:59                                   ` David Mosberger
@ 2004-03-10 16:22                                     ` David Brownell
  2004-03-10 18:04                                       ` David Mosberger
  0 siblings, 1 reply; 25+ messages in thread
From: David Brownell @ 2004-03-10 16:22 UTC (permalink / raw)
  To: davidm
  Cc: Grant Grundler, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

I'll send out a revised patch later, thanks!  It's also good
this code got a more careful read.  It seems like some things
are not as obvious as I might like...

That patch will merge those list corruption fixes I sent, the
"else" you verified was needed (ugh!!!), and some of what you
include here.   It won't add new BUG_ON calls (WARN at worst)
or a duplicate ED state (see next).

>   >>   ...  add a new state
>   >> ED_DESCHEDULED, which is treated exactly like ED_IDLE, except
>   >> that in this state, the HC may still be referring to the ED in
>   >> question.  Thus, if
> 
>   David.B> Sounds exactly like ED_UNLINK -- except maybe that it's not
>   David.B> been put onto ed_rm_list (with ED_DEQUEUE set).  Why add
>   David.B> another state?
>   ...
> 
> The current OHCI relies on the internals of the dma_pool()
> implementation.  If the implementation changed to, say, modify the
> memory that is free or, heaven forbid, return the memory to the
> kernel, you'd get _extremely_ difficult to track down race conditions.

The current implementation _does_ poison memory on free, if
slab poisoning is enabled.  (That's why I asked if you were
using it.)

And that's been quite handy for reporting list corruption bugs,
from races or otherwise. But those list corruption bugs hit a
blind spot in that code:  it doesn't check for modify-after-free.
Which is why those bugs were able to hide for so long!

It'd be good if you said _how_ you think it relies on such
internals.  Some of your debug diagnostics wrongly claimed
allocation of "new" EDs when they were just being re-used.
That'd make intentional/safe re-use look like a bug or race.

> Even so, the code isn't race-free, like I explained yesterday:
> 
>  - ed_alloc() clears the ED to 0 via memset()
>  - the order in which memset() clears memory is undefined (various
>    from platform to platform etc)

There's a wmb() before any ED is handed off to the OHCI silicon;
that forces a defined order.  Top of ed_schedule().  First use,
or Nth re-use; no matter.

>  - thus you might get a case where hwTailP is 0 but hwHeadP
>    is non-zero, which would cause the HC to happily start
>    dereferencing the descriptor

If you assume a bug where the ED is freed but still in use,
that's hardly the only thing that'd go wrong!!  You can't
use such a potential bug to prove something else is broken.

- Dave

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-10 16:22                                     ` David Brownell
@ 2004-03-10 18:04                                       ` David Mosberger
  2004-03-11  2:43                                         ` David Brownell
  0 siblings, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-10 18:04 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Grant Grundler, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

>>>>> On Wed, 10 Mar 2004 08:22:26 -0800, David Brownell <david-b@pacbell.net> said:

  David.B> It won't add new BUG_ON calls (WARN at worst)

I put them there mostly as assertions.  What I'd really want there is
a DEBUG_BUG_ON, which is more like assert() in user-land (i.e.,
production code would drop the checks).  WARN_ON() would be fine, too.

  >> The current OHCI relies on the internals of the dma_pool()
  >> implementation.  If the implementation changed to, say, modify
  >> the memory that is free or, heaven forbid, return the memory to
  >> the kernel, you'd get _extremely_ difficult to track down race
  >> conditions.

  David.B> It'd be good if you said _how_ you think it relies on such
  David.B> internals.

I thought I did.  Suppose somebody changed the dma_pool code such that
it would overwrite freed memory with an 0xf00000000000000 pattern.  If
the HC can still hold a reference to a freed ED (it can without my
patch), the HC could see this kind of ED:

 hw=(info\0000000 tailpð000000 headp\0000000 nextEDð000000)

If so, the HC would go ahead and try to interpret the memory at
address 0 as a transfer descriptor.  Depending on the memory contents,
this could cause silent data corruption at an arbitrary address.

  >> - thus you might get a case where hwTailP is 0 but hwHeadP is
  >> non-zero, which would cause the HC to happily start dereferencing
  >> the descriptor

  David.B> If you assume a bug where the ED is freed but still in use,
  David.B> that's hardly the only thing that'd go wrong!!  You can't
  David.B> use such a potential bug to prove something else is broken.

You lost me here.  All I'm saying is that the current code has a
dangerous race that can corrupt memory, crash machines, or have all
sorts of other wild side-effects.  I never claimed this bug had
anything todo with the BTC keyboard problem.

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-10 18:04                                       ` David Mosberger
@ 2004-03-11  2:43                                         ` David Brownell
  2004-03-11  5:35                                           ` David Mosberger
  0 siblings, 1 reply; 25+ messages in thread
From: David Brownell @ 2004-03-11  2:43 UTC (permalink / raw)
  To: davidm
  Cc: Grant Grundler, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

David Mosberger wrote:

>   >> The current OHCI relies on the internals of the dma_pool()
>   >> implementation.  ...
>   David.B> It'd be good if you said _how_ you think it relies on such
>   David.B> internals.
> 
> I thought I did.  Suppose somebody changed the dma_pool code such that
> it would overwrite freed memory with an 0xf00000000000000 pattern. 

Erm, _anything_ the dma_pool code does with freed memory is legal.
Even the old "monkeys flying out of the back of the server" trick!  :)


Anyway, please (a) see if 2.6.4 works for you, and (b) direct any
future followups on this thread _only_ to linux-usb-devel.

- Dave



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-11  2:43                                         ` David Brownell
@ 2004-03-11  5:35                                           ` David Mosberger
  0 siblings, 0 replies; 25+ messages in thread
From: David Mosberger @ 2004-03-11  5:35 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Grant Grundler, Greg KH, vojtech, linux-usb-devel,
	linux-kernel, linux-ia64, pochini

>>>>> On Wed, 10 Mar 2004 18:43:25 -0800, David Brownell <david-b@pacbell.net> said:

  David.B> David Mosberger wrote:

  >> The current OHCI relies on the internals of the dma_pool()
  >> implementation.  ...

  David.B> It'd be good if you said _how_ you think it relies on such
  David.B> internals.

  >>  I thought I did.  Suppose somebody changed the dma_pool code
  >> such that it would overwrite freed memory with an
  >> 0xf00000000000000 pattern.

  David.B> Erm, _anything_ the dma_pool code does with freed memory is
  David.B> legal.

Glad to see we agree on that!

  David.B> Anyway, please (a) see if 2.6.4 works for you, and (b)
  David.B> direct any future followups on this thread _only_ to
  David.B> linux-usb-devel.

The patch that's in 2.6.4 definitely looks like a step in the right
direction.  I still have some concerns.  I'll follow up on
linux-usb-devel with more details.

Thanks,

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  4:55               ` David Brownell
  2004-03-06  5:49                 ` David Mosberger
@ 2004-03-06  9:17                 ` David Mosberger
  2004-03-06 17:30                   ` David Brownell
  1 sibling, 1 reply; 25+ messages in thread
From: David Mosberger @ 2004-03-06  9:17 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

>>>>> On Fri, 05 Mar 2004 20:55:01 -0800, David Brownell <david-b@pacbell.net> said:

  >> Does this sound plausible?

  David.B> Parts of it.  There's definite recent nastiness.  Of the
  David.B> type that other eyes sometimes see better.

Here is patch #3.  It also Works For Me.  I was wondering whether it
it is really safe to mess with the OHCI control registers the way
ed_deschedule() does at a time the OHCI is running.  To test this
theory, I delayed the ed_deschedule() handling to finish_unlinks(), as
shown in the patch below.  I don't know whether this is really safe as
far as the host's lists are concerned, but it does avoid the crashes.

What's the argument as to why it's safe to update the OHCI control
registers in ed_deschedule() at the time start_ed_unlink() is running?

	--david

=== drivers/usb/host/ohci-q.c 1.48 vs edited ==--- 1.48/drivers/usb/host/ohci-q.c	Tue Mar  2 05:52:46 2004
+++ edited/drivers/usb/host/ohci-q.c	Sat Mar  6 01:09:16 2004
@@ -274,7 +274,10 @@
  */
 static void ed_deschedule (struct ohci_hcd *ohci, struct ed *ed) 
 {
+#if 0
 	ed->hwINFO |= ED_SKIP;
+	wmb();
+#endif

 	switch (ed->type) {
 	case PIPE_CONTROL:
@@ -431,7 +434,12 @@
 {    
 	ed->hwINFO |= ED_DEQUEUE;
 	ed->state = ED_UNLINK;
+#if 0
 	ed_deschedule (ohci, ed);
+#else
+	ed->hwINFO |= ED_SKIP;
+	wmb();
+#endif

 	/* SF interrupt might get delayed; record the frame counter value that
 	 * indicates when the HC isn't looking at it, so concurrent unlinks
@@ -896,6 +904,11 @@
 				last = &ed->ed_next;
 				continue;
 			}
+
+#if 0
+#else
+			ed_deschedule (ohci, ed);
+#endif

 			if (!list_empty (&ed->td_list)) {
 				struct td	*td;

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06  9:17                 ` David Mosberger
@ 2004-03-06 17:30                   ` David Brownell
  2004-03-08 18:49                     ` David Mosberger
  0 siblings, 1 reply; 25+ messages in thread
From: David Brownell @ 2004-03-06 17:30 UTC (permalink / raw)
  To: davidm; +Cc: Greg KH, vojtech, linux-usb-devel, linux-kernel, linux-ia64,
	pochini

David Mosberger wrote:

> Here is patch #3.  It also Works For Me.  I was wondering whether it

I've had several "Works For Me" patches too, but then if the
silicion got kicked a bit differently it'd not behave... :(

> it is really safe to mess with the OHCI control registers the way
> ed_deschedule() does at a time the OHCI is running.  To test this

It must be, or we'd not have had a driver working for several
years now!

The quick stop/restart cycles haven't seemed to be a problem
with any OHCI silicion in the way they are with, for example,
VIA EHCI.

> theory, I delayed the ed_deschedule() handling to finish_unlinks(), as
> shown in the patch below.  I don't know whether this is really safe as
> far as the host's lists are concerned, but it does avoid the crashes.

My suspicions have been focussing on finish_unlinks().

That's really the only place the HCD does anything
that could corrupt the ED queues, which is what looks
to be happening.

Your change doesn't actually _unlink_ in the same way;
interesting change, I'll have to think about it.  It
certainly changes timings.

> What's the argument as to why it's safe to update the OHCI control
> registers in ed_deschedule() at the time start_ed_unlink() is running?

It's always safe to update those registers, except
that some silicon doesn't support that while the
controller is suspended.

- Dave

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [linux-usb-devel] Re: serious 2.6 bug in USB subsystem?
  2004-03-06 17:30                   ` David Brownell
@ 2004-03-08 18:49                     ` David Mosberger
  0 siblings, 0 replies; 25+ messages in thread
From: David Mosberger @ 2004-03-08 18:49 UTC (permalink / raw)
  To: David Brownell
  Cc: davidm, Greg KH, vojtech, linux-usb-devel, linux-kernel,
	linux-ia64, pochini

>>>>> On Sat, 06 Mar 2004 09:30:31 -0800, David Brownell <david-b@pacbell.net> said:

  David.B> David Mosberger wrote:

  >> Here is patch #3.  It also Works For Me.  I was wondering whether
  >> it

  David.B> I've had several "Works For Me" patches too, but then if
  David.B> the silicion got kicked a bit differently it'd not
  David.B> behave... :(

Just for completeness, I can confirm this.  Depending on the exact USB
configuration (varying number and types of devices connected via a
varying number of hubs), there are still problems.  So clearly my
patches didn't address the root cause (yet).

	--david

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2004-03-11  5:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200310272235.h9RMZ9x1000602@napali.hpl.hp.com>
     [not found] ` <20031028013013.GA3991@kroah.com>
     [not found]   ` <200310280300.h9S30Hkw003073@napali.hpl.hp.com>
     [not found]     ` <3FA12A2E.4090308@pacbell.net>
     [not found]       ` <16289.29015.81760.774530@napali.hpl.hp.com>
     [not found]         ` <16289.55171.278494.17172@napali.hpl.hp.com>
     [not found]           ` <3FA28C9A.5010608@pacbell.net>
2004-03-06  2:08             ` [linux-usb-devel] Re: serious 2.6 bug in USB subsystem? David Mosberger
2004-03-06  2:13               ` David Mosberger
2004-03-06  4:55               ` David Brownell
2004-03-06  5:49                 ` David Mosberger
2004-03-06  7:21                   ` David Mosberger
2004-03-06  8:39                     ` David Mosberger
2004-03-06 16:37                   ` David Brownell
2004-03-08  6:18                     ` Grant Grundler
2004-03-08 18:58                       ` David Mosberger
2004-03-08 21:48                         ` David Brownell
2004-03-09  9:15                           ` David Mosberger
2004-03-09 17:36                             ` David Brownell
2004-03-09 17:58                               ` David Mosberger
2004-03-09 20:39                                 ` David Brownell
2004-03-09 23:32                                   ` David Mosberger
2004-03-10  2:53                                     ` David Brownell
2004-03-10  6:11                                       ` David Mosberger
2004-03-10  6:59                                   ` David Mosberger
2004-03-10 16:22                                     ` David Brownell
2004-03-10 18:04                                       ` David Mosberger
2004-03-11  2:43                                         ` David Brownell
2004-03-11  5:35                                           ` David Mosberger
2004-03-06  9:17                 ` David Mosberger
2004-03-06 17:30                   ` David Brownell
2004-03-08 18:49                     ` David Mosberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox