[BUG] pxa27x_udc: possible recursive locking detected in pxa_ep

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
@ 2009-12-05 10:57 Antonio Ospite
  2009-12-06 18:34 ` Robert Jarzmik
  2009-12-12 14:28 ` Robert Jarzmik
  0 siblings, 2 replies; 17+ messages in thread
From: Antonio Ospite @ 2009-12-05 10:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I've run into this recently, I get it with 2.6.32 (plus some code for
the EZX platform) especially using ROOT_NFS over usblan. It looks like
I can also trigger it regularly by connecting and disconnecting usb
cable repeatedly while the kernel on the pxa system is loading
(in a _non_ ROOT_NFS scenario).

=============================================
[ INFO: possible recursive locking detected ]
2.6.32-ezx200912031127 #3
---------------------------------------------
portmap/579 is trying to acquire lock:
 (&ep->lock){-.....}, at: [<c019c640>] pxa_ep_queue+0x78/0x2ec

but task is already holding lock:
 (&ep->lock){-.....}, at: [<c019c640>] pxa_ep_queue+0x78/0x2ec

other info that might help us debug this:
2 locks held by portmap/579:
 #0:  (uids_mutex){+.+...}, at: [<c004e8c8>] alloc_uid+0x28/0x140
 #1:  (&ep->lock){-.....}, at: [<c019c640>] pxa_ep_queue+0x78/0x2ec

stack backtrace:
[<c002febc>] (unwind_backtrace+0x0/0xe0) from [<c0068484>] (validate_chain+0x5b0/0xdd8)
[<c0068484>] (validate_chain+0x5b0/0xdd8) from [<c00694ec>] (__lock_acquire+0x840/0x918)
[<c00694ec>] (__lock_acquire+0x840/0x918) from [<c006a49c>] (lock_acquire+0x60/0x74)
[<c006a49c>] (lock_acquire+0x60/0x74) from [<c02a4c6c>] (_spin_lock_irqsave+0x4c/0x60)
[<c02a4c6c>] (_spin_lock_irqsave+0x4c/0x60) from [<c019c640>] (pxa_ep_queue+0x78/0x2ec)
[<c019c640>] (pxa_ep_queue+0x78/0x2ec) from [<c019d450>] (rx_submit+0xd0/0x144)
[<c019d450>] (rx_submit+0xd0/0x144) from [<c019b63c>] (req_done+0x30/0x38)
[<c019b63c>] (req_done+0x30/0x38) from [<c019b900>] (handle_ep+0x218/0x228)
[<c019b900>] (handle_ep+0x218/0x228) from [<c019c84c>] (pxa_ep_queue+0x284/0x2ec)
[<c019c84c>] (pxa_ep_queue+0x284/0x2ec) from [<c019d450>] (rx_submit+0xd0/0x144)
[<c019d450>] (rx_submit+0xd0/0x144) from [<c019b63c>] (req_done+0x30/0x38)
[<c019b63c>] (req_done+0x30/0x38) from [<c019b900>] (handle_ep+0x218/0x228)
[<c019b900>] (handle_ep+0x218/0x228) from [<c019c240>] (pxa_udc_irq+0x74c/0x80c)
[<c019c240>] (pxa_udc_irq+0x74c/0x80c) from [<c0075ccc>] (handle_IRQ_event+0x28/0xf8)
[<c0075ccc>] (handle_IRQ_event+0x28/0xf8) from [<c0077f98>] (handle_level_irq+0x118/0x130)
[<c0077f98>] (handle_level_irq+0x118/0x130) from [<c0029070>] (asm_do_IRQ+0x70/0x94)
[<c0029070>] (asm_do_IRQ+0x70/0x94) from [<c0029ad0>] (__irq_svc+0x50/0xe0)
Exception stack(0xccad7f10 to 0xccad7f58)
7f00:                                     00000001 cc8ed5d0 00000110 20000013
7f20: c03aa88c 00000001 00000001 00000000 c03aa3d8 ccad6000 00000000 bef51c54
7f40: ccad6000 ccad7f58 c0066e08 c02a4934 20000013 ffffffff
[<c0029ad0>] (__irq_svc+0x50/0xe0) from [<c02a4934>] (_spin_unlock_irq+0x30/0x58)
[<c02a4934>] (_spin_unlock_irq+0x30/0x58) from [<c004e8e8>] (alloc_uid+0x48/0x140)
[<c004e8e8>] (alloc_uid+0x48/0x140) from [<c00527c0>] (set_user+0x28/0x9c)
[<c00527c0>] (set_user+0x28/0x9c) from [<c005426c>] (sys_setuid+0x58/0xc4)
[<c005426c>] (sys_setuid+0x58/0xc4) from [<c0029fc0>] (ret_fast_syscall+0x0/0x34)
BUG: spinlock lockup on CPU#0, portmap/579, c03c1d98
[<c002febc>] (unwind_backtrace+0x0/0xe0) from [<c013ebf4>] (_raw_spin_lock+0xe8/0x124)
[<c013ebf4>] (_raw_spin_lock+0xe8/0x124) from [<c02a4c74>] (_spin_lock_irqsave+0x54/0x60)
[<c02a4c74>] (_spin_lock_irqsave+0x54/0x60) from [<c019c640>] (pxa_ep_queue+0x78/0x2ec)
[<c019c640>] (pxa_ep_queue+0x78/0x2ec) from [<c019d450>] (rx_submit+0xd0/0x144)
[<c019d450>] (rx_submit+0xd0/0x144) from [<c019b63c>] (req_done+0x30/0x38)
[<c019b63c>] (req_done+0x30/0x38) from [<c019b900>] (handle_ep+0x218/0x228)
[<c019b900>] (handle_ep+0x218/0x228) from [<c019c84c>] (pxa_ep_queue+0x284/0x2ec)
[<c019c84c>] (pxa_ep_queue+0x284/0x2ec) from [<c019d450>] (rx_submit+0xd0/0x144)
[<c019d450>] (rx_submit+0xd0/0x144) from [<c019b63c>] (req_done+0x30/0x38)
[<c019b63c>] (req_done+0x30/0x38) from [<c019b900>] (handle_ep+0x218/0x228)
[<c019b900>] (handle_ep+0x218/0x228) from [<c019c240>] (pxa_udc_irq+0x74c/0x80c)
[<c019c240>] (pxa_udc_irq+0x74c/0x80c) from [<c0075ccc>] (handle_IRQ_event+0x28/0xf8)
[<c0075ccc>] (handle_IRQ_event+0x28/0xf8) from [<c0077f98>] (handle_level_irq+0x118/0x130)

Regards,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20091205/d0c4afbd/attachment.sig>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-05 10:57 [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue Antonio Ospite
@ 2009-12-06 18:34 ` Robert Jarzmik
  2009-12-06 20:01   ` Alan Stern
  2009-12-06 20:13   ` David Brownell
  2009-12-12 14:28 ` Robert Jarzmik
  1 sibling, 2 replies; 17+ messages in thread
From: Robert Jarzmik @ 2009-12-06 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

Antonio Ospite <ospite@studenti.unina.it> writes:

> Hi,
>
> I've run into this recently, I get it with 2.6.32 (plus some code for
> the EZX platform) especially using ROOT_NFS over usblan. It looks like
> I can also trigger it regularly by connecting and disconnecting usb
> cable repeatedly while the kernel on the pxa system is loading
> (in a _non_ ROOT_NFS scenario).

Your discovery is very ... unfortunate for me.
What you discovered is a real locking issue in pxa27x_udc, which can be
outlined as :

 1) an irq comes in for endpoint 1 (OUT endpoint)
 2) irq handler kick in
    handle_ep()
 3) the packet is smaller than the endpoint fifo
    3a) it gets read fully
    3b) it's a usb short packet
    3c) the transfer is completed
    req_done() is called
 4) req_done() calls gadget layer
    req->req.complete()
 5) gadget layer complete() function pushes another request to pxa27x_udc
    (notice we're still in the irq handler)
    pxa_ep_queue()
    (notice we take the ep->lock)
 6) pxa27x_udc calls handle_ep()
 7) same as (3)
 8) same as (4)
 9) same as (5)
    => here, pxa_ep_queue() tries to take the ep->lock twice !!!
    => this is the deadlock

Summary is :
 irq_handler
   \
    -> gadget.complete()
       \
        -> pxa27x_udc.pxa_ep_queue() : implies ep->lock is taken
           \
            -> gadget.complete()
               \
                -> pxa27x_udc.pxa_ep_queue() : implies ep->lock is attempted
                                               ==> *deadlock*

The point here an architectural one : can the gadget layer, in its completion
method, call endpoint queuing methods ?

If so, when nuke() is called, gadget_complete() is always called, which could
call request queuing, etc ..., which will become an infinite loop.

I may modify the locking model of pxa27x_udc : whenether I call the gadget
complete() method, I relax the ep->lock, and take it just after. That makes me a
bit nervous, but I'll do it if this is the thing to do.

David, could you give me the point of view of the gadget architecture please ?

Cheers.

--
Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-06 18:34 ` Robert Jarzmik
@ 2009-12-06 20:01   ` Alan Stern
  2009-12-06 20:23     ` David Brownell
  2009-12-06 20:13   ` David Brownell
  1 sibling, 1 reply; 17+ messages in thread
From: Alan Stern @ 2009-12-06 20:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 6 Dec 2009, Robert Jarzmik wrote:

> Your discovery is very ... unfortunate for me.
> What you discovered is a real locking issue in pxa27x_udc, which can be
> outlined as :
> 
>  1) an irq comes in for endpoint 1 (OUT endpoint)
>  2) irq handler kick in
>     handle_ep()
>  3) the packet is smaller than the endpoint fifo
>     3a) it gets read fully
>     3b) it's a usb short packet
>     3c) the transfer is completed
>     req_done() is called
>  4) req_done() calls gadget layer
>     req->req.complete()
>  5) gadget layer complete() function pushes another request to pxa27x_udc
>     (notice we're still in the irq handler)
>     pxa_ep_queue()
>     (notice we take the ep->lock)
>  6) pxa27x_udc calls handle_ep()
>  7) same as (3)
>  8) same as (4)
>  9) same as (5)
>     => here, pxa_ep_queue() tries to take the ep->lock twice !!!
>     => this is the deadlock
> 
> Summary is :
>  irq_handler
>    \
>     -> gadget.complete()
>        \
>         -> pxa27x_udc.pxa_ep_queue() : implies ep->lock is taken
>            \
>             -> gadget.complete()
>                \
>                 -> pxa27x_udc.pxa_ep_queue() : implies ep->lock is attempted
>                                                ==> *deadlock*
> 
> The point here an architectural one : can the gadget layer, in its completion
> method, call endpoint queuing methods ?
> 
> If so, when nuke() is called, gadget_complete() is always called, which could
> call request queuing, etc ..., which will become an infinite loop.
> 
> I may modify the locking model of pxa27x_udc : whenether I call the gadget
> complete() method, I relax the ep->lock, and take it just after. That makes me a
> bit nervous, but I'll do it if this is the thing to do.

That's what other device controller drivers do.  There's no choice.  
(Except that I'm not sure they have individual locks for endpoints -- 
just one single big spinlock.)

In fact, host controller drivers do the analogous thing.  When they
give back completed URBs, they release their private spinlocks.

> David, could you give me the point of view of the gadget architecture please ?

Alan Stern

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-06 20:01   ` Alan Stern
@ 2009-12-06 20:23     ` David Brownell
  2009-12-10 17:58       ` Robert Jarzmik
  0 siblings, 1 reply; 17+ messages in thread
From: David Brownell @ 2009-12-06 20:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 06 December 2009, Alan Stern wrote:
> That's what other device controller drivers do. ?There's no choice. ?
> (Except that I'm not sure they have individual locks for endpoints -- 
> just one single big spinlock.)

Right; such fine grain locking tends to be more hassle than it's
worth.  ISTR studies from a while back showing locks need to be
quite heavily contended before splitting them is worth much.  Any
per-controller lock for a UDC is unlikely to see much contention.

> In fact, host controller drivers do the analogous thing. ?When they
> give back completed URBs, they release their private spinlocks.

For *exactly* the same reason.  Upper level driver code shouldn't
need to worry about whether it's re-entering some random bit of
lower level code.  As a rule, it can't know about that code.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-06 20:23     ` David Brownell
@ 2009-12-10 17:58       ` Robert Jarzmik
  2009-12-10 21:01         ` David Brownell
  0 siblings, 1 reply; 17+ messages in thread
From: Robert Jarzmik @ 2009-12-10 17:58 UTC (permalink / raw)
  To: linux-arm-kernel

David Brownell <david-b@pacbell.net> writes:

> Right; such fine grain locking tends to be more hassle than it's
> worth.  ISTR studies from a while back showing locks need to be
> quite heavily contended before splitting them is worth much.  Any
> per-controller lock for a UDC is unlikely to see much contention.
I would gladly look at a reference for the studies if you had it somewhere.

The pxa27x_udc driver is not in a position where parallelism will be a problem
(as there are no pxa multi-core device AFAIK). But if there was an evolution
with multiple cores, slow enough to be unable to saturate the USB bandwith
alone, but which could saturate it with both cores attacking the UDC, the
trouble of per-ep spinlock would be worth it ...

Well, maybe a future 48-cores pxa4xx, who knows ? :)

As a side note, from a very quick glance at pxa25x_udc and omap_udc, I'm a bit
surprised they didn't suffer from the same behaviour.

Cheers.

--
Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-10 17:58       ` Robert Jarzmik
@ 2009-12-10 21:01         ` David Brownell
  0 siblings, 0 replies; 17+ messages in thread
From: David Brownell @ 2009-12-10 21:01 UTC (permalink / raw)
  To: linux-arm-kernel

> As a side note, from a very quick glance at pxa25x_udc and omap_udc, I'm a bit
> surprised they didn't suffer from the same behaviour.

Haven't touched either of those drivers in years.  But a really
quick glance shows pxa25x_udc::done() setting a "stopped" flag,
and omap_udc::done() doing the same, before calling out to the
completion callback.

- Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-06 18:34 ` Robert Jarzmik
  2009-12-06 20:01   ` Alan Stern
@ 2009-12-06 20:13   ` David Brownell
  2009-12-10 17:49     ` Robert Jarzmik
  1 sibling, 1 reply; 17+ messages in thread
From: David Brownell @ 2009-12-06 20:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Sunday 06 December 2009, Robert Jarzmik wrote:
> I may modify the locking model of pxa27x_udc : whenether I call the gadget
> complete() method, I relax the ep->lock, and take it just after. That makes me a
> bit nervous, but I'll do it if this is the thing to do.
> 
> David, could you give me the point of view of the gadget architecture please ?

Dropping the lock before complete() is exactly the right thing to do.

But make sure your ep_queue() method understands that sometimes it's
only supposed to *queue* and not activate tx/rx.

So the IRQ handler sets some "don't touch the hardware" flag before
it drops that lock, reentrant queue() sees it, complete() returns with
the only change being queue updates ... and before the IRQ handler
returns, it looks to see if there's work to be done on that EP.

Most UDC drivers don't have hardware queues to worry about, so the
completion callback should also be able to dequeue() requests that
aren't yet being handled by the HW ... another nuance, but one that
won't usually mean extra work.

- Dave

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-06 20:13   ` David Brownell
@ 2009-12-10 17:49     ` Robert Jarzmik
  0 siblings, 0 replies; 17+ messages in thread
From: Robert Jarzmik @ 2009-12-10 17:49 UTC (permalink / raw)
  To: linux-arm-kernel

David Brownell <david-b@pacbell.net> writes:

> On Sunday 06 December 2009, Robert Jarzmik wrote:
> Dropping the lock before complete() is exactly the right thing to do.
Good, just as Alan said too.

> But make sure your ep_queue() method understands that sometimes it's
> only supposed to *queue* and not activate tx/rx.
Yes, I understand. A kind of atomic count, on all methods called from gadget,
which prevents reentrance (ie. no chains queue()->...->complete()->queue()).

As I have a bit of luck, all transfers are triggered through handle_ep(), which
if not called won't trigger any rx/tx (if counter was incremented).

Thanks for the info.

--
Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-05 10:57 [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue Antonio Ospite
  2009-12-06 18:34 ` Robert Jarzmik
@ 2009-12-12 14:28 ` Robert Jarzmik
  2009-12-12 16:31   ` Antonio Ospite
  1 sibling, 1 reply; 17+ messages in thread
From: Robert Jarzmik @ 2009-12-12 14:28 UTC (permalink / raw)
  To: linux-arm-kernel

Antonio Ospite <ospite@studenti.unina.it> writes:

> Hi,
>
> I've run into this recently, I get it with 2.6.32 (plus some code for
> the EZX platform) especially using ROOT_NFS over usblan. It looks like
> I can also trigger it regularly by connecting and disconnecting usb
> cable repeatedly while the kernel on the pxa system is loading
> (in a _non_ ROOT_NFS scenario).

Hi Antonio,

Could I ask you to try that patch please. I can't trigger the deadlock on my
hardware, and I'd like some confirmation before I submit that one. Notice that
the patch deals with all endpoints excepting control endpoint (ie. USB EP0), but
that's not what matters now.

If you can give me some feedback, I would appreciate.

Cheers.

--
Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-12 14:28 ` Robert Jarzmik
@ 2009-12-12 16:31   ` Antonio Ospite
  2009-12-20 18:36     ` Robert Jarzmik
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Ospite @ 2009-12-12 16:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Sat, 12 Dec 2009 15:28:51 +0100
Robert Jarzmik <robert.jarzmik@free.fr> wrote:

> Antonio Ospite <ospite@studenti.unina.it> writes:
> 
> > Hi,
> >
> > I've run into this recently, I get it with 2.6.32 (plus some code for
> > the EZX platform) especially using ROOT_NFS over usblan. It looks like
> > I can also trigger it regularly by connecting and disconnecting usb
> > cable repeatedly while the kernel on the pxa system is loading
> > (in a _non_ ROOT_NFS scenario).
> 
> Hi Antonio,
> 
> Could I ask you to try that patch please. I can't trigger the deadlock on my
> hardware, and I'd like some confirmation before I submit that one. Notice that
> the patch deals with all endpoints excepting control endpoint (ie. USB EP0), but
> that's not what matters now.
> 
> If you can give me some feedback, I would appreciate.
>

Patch does not apply cleanly on top of 2.6.32: Hunk #13 FAILED at 2102.
This is in handle_ep(), you have this snippet:
	/* some code below uses registers not available for ep0 */
 	BUG_ON(is_ep0(ep));

which is not in the code I have, I rebased the patch, and tested it,
tho.

I only tested the patch in one of my previous scenarios: the insanely repeated
connect-disconnect manoeuvre, and I am hitting another problem, log appended
below. Note that usblan is working even after I get this message.

=================================
[ INFO: inconsistent lock state ]
2.6.32-ezxdev #45
---------------------------------
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
swapper/0 [HC0[0]:SC1[2]:HE1:SE0] takes:
 (&dev->req_lock){?.-...}, at: [<c019e5c4>] tx_complete+0x68/0x100
{IN-HARDIRQ-W} state was registered at:
  [<c00692a0>] __lock_acquire+0x5f4/0x918
  [<c006a49c>] lock_acquire+0x60/0x74
  [<c02a4b94>] _spin_lock+0x40/0x50
  [<c019d6b0>] gether_connect+0x7c/0x180
  [<c019e9f8>] geth_set_alt+0x34/0x40
  [<c019f0a0>] composite_setup+0x538/0x7b0
  [<c019be3c>] pxa_udc_irq+0x160/0x7dc
  [<c0075ccc>] handle_IRQ_event+0x28/0xf8
  [<c0077f98>] handle_level_irq+0x118/0x130
  [<c0029070>] asm_do_IRQ+0x70/0x94
  [<c0029ad0>] __irq_svc+0x50/0xe0
  [<c002b618>] default_idle+0x30/0x38
  [<c002b4f4>] cpu_idle+0x64/0xc0
  [<c0008aac>] start_kernel+0x31c/0x38c
  [<a0008034>] 0xa0008034
irq event stamp: 52984
hardirqs last  enabled at (52984): [<c02a49d4>] _spin_unlock_irqrestore+0x40/0x74
hardirqs last disabled at (52983): [<c02a4c78>] _spin_lock_irqsave+0x20/0x60
softirqs last  enabled at (52128): [<c0049274>] irq_exit+0x50/0xa4
softirqs last disabled at (52963): [<c0049274>] irq_exit+0x50/0xa4

other info that might help us debug this:
3 locks held by swapper/0:
 #0:  (&n->timer){+.-...}, at: [<c004d7e4>] run_timer_softirq+0x144/0x264
 #1:  (rcu_read_lock){.+.+..}, at: [<c020cad8>] dev_queue_xmit+0x110/0x430
 #2:  (_xmit_ETHER#2){+.-...}, at: [<c0219d0c>] sch_direct_xmit+0x40/0x1a4

stack backtrace:
[<c002febc>] (unwind_backtrace+0x0/0xe0) from [<c006677c>] (print_usage_bug+0x168/0x1ac)
[<c006677c>] (print_usage_bug+0x168/0x1ac) from [<c0066af8>] (mark_lock+0x338/0x5f0)
[<c0066af8>] (mark_lock+0x338/0x5f0) from [<c0069324>] (__lock_acquire+0x678/0x918)
[<c0069324>] (__lock_acquire+0x678/0x918) from [<c006a49c>] (lock_acquire+0x60/0x74)
[<c006a49c>] (lock_acquire+0x60/0x74) from [<c02a4b94>] (_spin_lock+0x40/0x50)
[<c02a4b94>] (_spin_lock+0x40/0x50) from [<c019e5c4>] (tx_complete+0x68/0x100)
[<c019e5c4>] (tx_complete+0x68/0x100) from [<c019b670>] (req_done+0x30/0x38)
[<c019b670>] (req_done+0x30/0x38) from [<c019b944>] (handle_ep+0x234/0x278)
[<c019b944>] (handle_ep+0x234/0x278) from [<c019c894>] (pxa_ep_queue+0x28c/0x2e4)
[<c019c894>] (pxa_ep_queue+0x28c/0x2e4) from [<c019e270>] (eth_start_xmit+0x1fc/0x2c8)
[<c019e270>] (eth_start_xmit+0x1fc/0x2c8) from [<c02098fc>] (dev_hard_start_xmit+0x264/0x334)
[<c02098fc>] (dev_hard_start_xmit+0x264/0x334) from [<c0219d38>] (sch_direct_xmit+0x6c/0x1a4)
[<c0219d38>] (sch_direct_xmit+0x6c/0x1a4) from [<c020cc04>] (dev_queue_xmit+0x23c/0x430)
[<c020cc04>] (dev_queue_xmit+0x23c/0x430) from [<c024b29c>] (arp_solicit+0x200/0x230)
[<c024b29c>] (arp_solicit+0x200/0x230) from [<c0213cd4>] (neigh_timer_handler+0x294/0x318)
[<c0213cd4>] (neigh_timer_handler+0x294/0x318) from [<c004d860>] (run_timer_softirq+0x1c0/0x264)
[<c004d860>] (run_timer_softirq+0x1c0/0x264) from [<c0048e6c>] (__do_softirq+0x9c/0x14c)
[<c0048e6c>] (__do_softirq+0x9c/0x14c) from [<c0049274>] (irq_exit+0x50/0xa4)
[<c0049274>] (irq_exit+0x50/0xa4) from [<c0029074>] (asm_do_IRQ+0x74/0x94)
[<c0029074>] (asm_do_IRQ+0x74/0x94) from [<c0029ad0>] (__irq_svc+0x50/0xe0)
Exception stack(0xc039ff78 to 0xc039ffc0)
ff60:                                                       00000001 00000004
ff80: 00000001 20000013 c039e000 c03a217c c03cd568 c03a2170 a0024e98 69054117
ffa0: a0024d60 00000000 c039fef8 c039ffc0 c0066f7c c002b618 20000013 ffffffff
[<c0029ad0>] (__irq_svc+0x50/0xe0) from [<c002b618>] (default_idle+0x30/0x38)
[<c002b618>] (default_idle+0x30/0x38) from [<c002b4f4>] (cpu_idle+0x64/0xc0)
[<c002b4f4>] (cpu_idle+0x64/0xc0) from [<c0008aac>] (start_kernel+0x31c/0x38c)
[<c0008aac>] (start_kernel+0x31c/0x38c) from [<a0008034>] (0xa0008034)

Thanks,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20091212/651ca2fe/attachment.sig>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-12 16:31   ` Antonio Ospite
@ 2009-12-20 18:36     ` Robert Jarzmik
  2009-12-22 23:53       ` Antonio Ospite
  0 siblings, 1 reply; 17+ messages in thread
From: Robert Jarzmik @ 2009-12-20 18:36 UTC (permalink / raw)
  To: linux-arm-kernel

Antonio Ospite <ospite@studenti.unina.it> writes:

> Patch does not apply cleanly on top of 2.6.32: Hunk #13 FAILED at 2102.
> This is in handle_ep(), you have this snippet:
> 	/* some code below uses registers not available for ep0 */
>  	BUG_ON(is_ep0(ep));
>
> which is not in the code I have, I rebased the patch, and tested it,
> tho.
>
> I only tested the patch in one of my previous scenarios: the insanely repeated
> connect-disconnect manoeuvre, and I am hitting another problem, log appended
> below. Note that usblan is working even after I get this message.
>
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.32-ezxdev #45
> ---------------------------------
<snip>

Hi Antonio,

I tried to trigger the same message, to no avail. Even after activating spinlock
debugging and plugging/unplugging like a mad man, nothing ...
Well, I must rely on your testing I'm afraid. Could you test this patch ? I
rebased my tree on v2.6.32 so you will have no git-am complaint.

Cheers.

-- 
Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-20 18:36     ` Robert Jarzmik
@ 2009-12-22 23:53       ` Antonio Ospite
  2009-12-28 20:23         ` Robert Jarzmik
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Ospite @ 2009-12-22 23:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Sun, 20 Dec 2009 19:36:52 +0100
Robert Jarzmik <robert.jarzmik@free.fr> wrote:

> Antonio Ospite <ospite@studenti.unina.it> writes:
>
[...]
> > =================================
> > [ INFO: inconsistent lock state ]
> > 2.6.32-ezxdev #45
> > ---------------------------------
> <snip>
> 
> Hi Antonio,
> 
> I tried to trigger the same message, to no avail. Even after activating spinlock
> debugging and plugging/unplugging like a mad man, nothing ...
> Well, I must rely on your testing I'm afraid. Could you test this patch ? I
> rebased my tree on v2.6.32 so you will have no git-am complaint.
>

With your latest patch on top of 2.6.32 I get the "possible recursive
locking" message at the *first* cable unplug/plug cycle, I am appending
it here, the log is partial because I dumped it from RAM and something
was lost.

=============================================
[ INFO: possible recursive locking detected ]
2.6.32-ezxdev #5
---------------------------------------------
swapper/0 is trying to acquire lock:
 (&ep->lock){-.....}, at: [<c019c6e0>] pxa_ep_queue+0x78/0x300

but task is already holding lock:
 (&ep->lock){-.....}, at: [<c019c158>] pxa_udc_irq+0x43c/0x7fc

other info that might help us debug this:
1 lock held by swapper/0:
 #0:  (&ep->lock){-.....}, at: [<c019c158>] pxa_udc_irq+0x43c/0x7fc

stack backtrace:
[<c002febc>] (unwind_backtrace+0x0/0xe0) from [<c0068484>] (validate_chain+0x5b0/0xdd8)
[<c0068484>] (validate_chain+0x5b0/0xdd8) from [<c00694ec>] (__lock_acquire+0x840/0x918)
[<c00694ec>] (__lock_acquire+0x840/0x918) from [<c006a49c>] (lock_acquire+0x60/0x74)
[<c006a49c>] (lock_acquire+0x60/0x74) from [<c02a4d24>] (_spin_lock_irqsave+0x4c/0x60)
[<c02a4d24>] (_spin_lock_irqsave+0x4c/0x60) from [<c019c6e0>] (pxa_ep_queue+0x78/0x300)
[<c019c6e0>] (pxa_ep_queue+0x78/0x300) from [<c019f348>] (composite_setup+0x764/0x7b0)
[<c019f348>] (composite_setup+0x764/0x7b0) from [<c019c29c>] (pxa_udc_irq+0x580/0x7fc)
[<c019c29c>] (pxa_udc_irq+0x580/0x7fc) from [<c0075ccc>] (handle_IRQ_event+0x28/0xf8)
[<c0075ccc>] (handle_IRQ_event+0x28/0xf8) from [<c0077f98>] (handle_level_irq+0x118/0x130)
[<c0077f98>] (handle_level_irq+0x118/0x130) from [<c0029070>] (asm_do_IRQ+0x70/0x94)
[<c0029070>] (asm_do_IRQ+0x70/0x94) from [<c0029ad0>] (__irq_svc+0x50/0xe0)
Exception stack(0xc039ff78 to 0xc039ffc0)
ff60:                                                       00000001 00000004
ff80: 00000001 20000013 c039e000 c03a217c c03cd568 c03a2170 a0024e98 69054117
ffa0: a0024d60 00000000 c039ff10 c039ffc0 c0066f7c c002b618 20000013 ffffffff
[<c0029ad0>] (__irq_svc+0x50/0xe0) from [<c002b618>] (default_idle+0x30/0x38)
[<c002b618>] (default_idle+0x30/0x38) from [<c002b4f4>] (cpu_idle+0x64/0xc0)
[<c002b4f4>] (cpu_idle+0x64/0xc0) from [<c0008aac>] (start_kernel+0x31c/0x38c)
[<c0008aac>] (start_kernel+0x31c/0x38c) from [<a0008034>] (0xa0008034)
BUG: spinlock lockup on CPU#0, swapper/0, c03c19d0
[<c002febc>] (unwind_backtrace+0x0/0xe0) from [<c013ebf4>] (_raw_spin_lock+0xe8/0x124)
[<c013ebf4>] (_raw_spin_lock+0xe8/0x124) from [<c02a4d2c>] (_spin_lock_irqsave+0x54/0x60)
[<c02a4d2c>] (_spin_lock_irqsave+0x54/0x60) from [<c019c6e0>] (pxa_ep_queue+0x78/0x300)
[<c019c6e0>] (pxa_ep_queue+0x78/0x300) from [<c019f348>] (composite_setup+0x764/0x7b0)
           ] (composite_setup+0x764/0x7b0) from [<c019c29c>] (pxa_udc_irq+0x580/0x7fc)
[<c019c29c>] (pxa_udc_irq+0x580/0x7fc) from [<c0075ccc>] (handle_IRQ_event+0x28/0xf8)
[<c0075ccc>] (handle_IRQ_event+0x28/0xf8) from [<c0077f98>] (handle_level_irq+0x118/0x130)
...

Could any furher info about my platform be useful to find out why you can't
reproduce the issue?

Thanks for your time.

All the best,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20091223/cc831f9d/attachment.sig>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-22 23:53       ` Antonio Ospite
@ 2009-12-28 20:23         ` Robert Jarzmik
  2009-12-28 23:03           ` Antonio Ospite
  2010-03-30 21:26           ` Michael Trimarchi
  0 siblings, 2 replies; 17+ messages in thread
From: Robert Jarzmik @ 2009-12-28 20:23 UTC (permalink / raw)
  To: linux-arm-kernel

Antonio Ospite <ospite@studenti.unina.it> writes:

> With your latest patch on top of 2.6.32 I get the "possible recursive
> locking" message at the *first* cable unplug/plug cycle, I am appending
> it here, the log is partial because I dumped it from RAM and something
> was lost.

Thanks.
Now, lets have another try. This patch is a bit saner, and though I got the
"locking detected" you noticed without it, I didn't manage to get it with this
patch applied.

As your test platform is far better than mine, would you mind a bit of testing
... again. This patch applies on top of v2.6.32.

Cheers.

--
Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-28 20:23         ` Robert Jarzmik
@ 2009-12-28 23:03           ` Antonio Ospite
  2010-01-17 12:41             ` Antonio Ospite
  2010-03-30 21:26           ` Michael Trimarchi
  1 sibling, 1 reply; 17+ messages in thread
From: Antonio Ospite @ 2009-12-28 23:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 28 Dec 2009 21:23:55 +0100
Robert Jarzmik <robert.jarzmik@free.fr> wrote:

> Antonio Ospite <ospite@studenti.unina.it> writes:
> 
> > With your latest patch on top of 2.6.32 I get the "possible recursive
> > locking" message at the *first* cable unplug/plug cycle, I am appending
> > it here, the log is partial because I dumped it from RAM and something
> > was lost.
> 
> Thanks.
> Now, lets have another try. This patch is a bit saner, and though I got the
> "locking detected" you noticed without it, I didn't manage to get it with this
> patch applied.
> 
> As your test platform is far better than mine, would you mind a bit of testing
> ... again. This patch applies on top of v2.6.32.
>

Just FYI our repository has been updated to 2.6.33-rc2 and I haven't
been able to reproduce the message here yet _without_ the patch.
I'll go back to 2.6.32 and test again with and without this patch in
next days, and report promptly.

> Cheers.
> 
> --
> Robert
> 

Thanks,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20091229/3e2548c6/attachment.sig>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-28 23:03           ` Antonio Ospite
@ 2010-01-17 12:41             ` Antonio Ospite
  2010-01-17 19:33               ` Robert Jarzmik
  0 siblings, 1 reply; 17+ messages in thread
From: Antonio Ospite @ 2010-01-17 12:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 29 Dec 2009 00:03:06 +0100
Antonio Ospite <ospite@studenti.unina.it> wrote:

> On Mon, 28 Dec 2009 21:23:55 +0100
> Robert Jarzmik <robert.jarzmik@free.fr> wrote:
> 
> > Antonio Ospite <ospite@studenti.unina.it> writes:
> > 
> > > With your latest patch on top of 2.6.32 I get the "possible recursive
> > > locking" message at the *first* cable unplug/plug cycle, I am appending
> > > it here, the log is partial because I dumped it from RAM and something
> > > was lost.
> > 
> > Thanks.
> > Now, lets have another try. This patch is a bit saner, and though I got the
> > "locking detected" you noticed without it, I didn't manage to get it with this
> > patch applied.
> > 
> > As your test platform is far better than mine, would you mind a bit of testing
> > ... again. This patch applies on top of v2.6.32.
> >
> 
> Just FYI our repository has been updated to 2.6.33-rc2 and I haven't
> been able to reproduce the message here yet _without_ the patch.
> I'll go back to 2.6.32 and test again with and without this patch in
> next days, and report promptly.
> 

Hi Robert,

Only today I've found some time to test the patch in a 2.6.32 setup, I
am not able to reproduce the issue with your latest patch _applied_,
while it it still there with unpatched code, so I'd say it is an
improvement.

Thanks,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20100117/85c09e71/attachment.sig>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2010-01-17 12:41             ` Antonio Ospite
@ 2010-01-17 19:33               ` Robert Jarzmik
  0 siblings, 0 replies; 17+ messages in thread
From: Robert Jarzmik @ 2010-01-17 19:33 UTC (permalink / raw)
  To: linux-arm-kernel

Antonio Ospite <ospite@studenti.unina.it> writes:

> Hi Robert,
>
> Only today I've found some time to test the patch in a 2.6.32 setup, I
> am not able to reproduce the issue with your latest patch _applied_,
> while it it still there with unpatched code, so I'd say it is an
> improvement.
Thanks for your time.
I'll submit it to the usb mailing list for review.

Cheers.

--
Robert

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue
  2009-12-28 20:23         ` Robert Jarzmik
  2009-12-28 23:03           ` Antonio Ospite
@ 2010-03-30 21:26           ` Michael Trimarchi
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Trimarchi @ 2010-03-30 21:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,


Robert Jarzmik wrote:
> Antonio Ospite <ospite@studenti.unina.it> writes:
>
>   
>> With your latest patch on top of 2.6.32 I get the "possible recursive
>> locking" message at the *first* cable unplug/plug cycle, I am appending
>> it here, the log is partial because I dumped it from RAM and something
>> was lost.
>>     
>
> Thanks.
> Now, lets have another try. This patch is a bit saner, and though I got the
> "locking detected" you noticed without it, I didn't manage to get it with this
> patch applied.
>
> As your test platform is far better than mine, would you mind a bit of testing
> ... again. This patch applies on top of v2.6.32.
>
> Cheers.
>
> --
> Robert
>
> >From 3ca70f842651f607790a2ff94f2b3a7ec223196d Mon Sep 17 00:00:00 2001
> From: Robert Jarzmik <robert.jarzmik@free.fr>
> Date: Sat, 12 Dec 2009 15:13:24 +0100
> Subject: [PATCH] pxa27x_udc: Fix deadlocks on request queueing
>
> As reported by Antonio, there are cases where the ep->lock
> can be taken twice, triggering a deadlock.
> The typical sequence is :
>  irq_handler
>    \
>     -> gadget.complete()
>        \
>         -> pxa27x_udc.pxa_ep_queue() : ep->lock is taken
>            \
>             -> gadget.complete()
>                \
>                 -> pxa27x_udc.pxa_ep_queue() : ep->lock is taken
>                                                ==> *deadlock*
> The patch fixes this by :
>  - releasing the lock each time gadget.complete() is called
>  - adding a check in handle_ep() to detect a recursive call,
>    in which case the function becomes on no-op.
>
> The patch is still not good enough for ep0. For this unique
> endpoint, another well thought over patch will be needed.
>
> Reported-by: Antonio Ospite <ospite@studenti.unina.it>
> Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
> ---
>  drivers/usb/gadget/pxa27x_udc.c |  114 +++++++++++++++++++++++++++------------
>  drivers/usb/gadget/pxa27x_udc.h |    6 ++
>  2 files changed, 85 insertions(+), 35 deletions(-)
>   
+       /* don't modify queue heads during completion callback */
+       if (spin_is_locked(&ep->lock)) {
+               spin_unlock(&ep->lock);
+               req->req.complete(&req->udc_usb_ep->usb_ep, &req->req);
+               spin_lock(&ep->lock);
+       } else
+               req->req.complete(&req->udc_usb_ep->usb_ep, &req->req);
 }

I have fixed in the same way without pass the flag

Michael

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-03-30 21:26 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-12-05 10:57 [BUG] pxa27x_udc: possible recursive locking detected in pxa_ep_queue Antonio Ospite
2009-12-06 18:34 ` Robert Jarzmik
2009-12-06 20:01   ` Alan Stern
2009-12-06 20:23     ` David Brownell
2009-12-10 17:58       ` Robert Jarzmik
2009-12-10 21:01         ` David Brownell
2009-12-06 20:13   ` David Brownell
2009-12-10 17:49     ` Robert Jarzmik
2009-12-12 14:28 ` Robert Jarzmik
2009-12-12 16:31   ` Antonio Ospite
2009-12-20 18:36     ` Robert Jarzmik
2009-12-22 23:53       ` Antonio Ospite
2009-12-28 20:23         ` Robert Jarzmik
2009-12-28 23:03           ` Antonio Ospite
2010-01-17 12:41             ` Antonio Ospite
2010-01-17 19:33               ` Robert Jarzmik
2010-03-30 21:26           ` Michael Trimarchi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).