public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ondrej Zary <linux@rainbow-software.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: debugging oops after disconnecting Nexio USB touchscreen
Date: Thu, 3 Dec 2009 21:55:21 +0100	[thread overview]
Message-ID: <200912032155.26018.linux@rainbow-software.org> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0912031424270.4795-100000@iolanthe.rowland.org>

On Thursday 03 December 2009 20:39:35 Alan Stern wrote:
> On Thu, 3 Dec 2009, Ondrej Zary wrote:
> > Luckily, it appeared with usbmon active, here's the output:
>
> ...
>
> > > Also, try adding some more debugging output (and let's hope it doesn't
> > > also make the problem disappear).  In start_unlink_async(), just before
> > > your "after:" label, add
> > >
> > > 	ehci_info(ehci, "unlink qh %p %p\n", qh, qh->qh_next);
> > >
> > > In qh_link_async(), just after the wmb(), add
> > >
> > > 	ehci_info(ehci, "link qh %p %p\n", qh, qh->qh_next);
> > >
> > > In end_unlink_async(), just after the iaa_watchdog_done(ehci), add
> > >
> > > 	ehci_info(ehci, "end unlink qh %p %p\n", qh, qh->next);
> > >
> > > And in qh_make(), just before the end, add
> > >
> > > 	ehci_info(ehci, "create qh %p, dev %s, ep %x\n",
> > > 		qh, urb->dev->devpath, urb->ep->desc.bEndpointAddress);
> >
> > Thanks for suggestion, here's the output:
>
> I wish you hadn't removed all the "create qh" log messages.

I haven't removed them - I was surprised too that they are missing. I probably 
did something wrong (again).

> Anyway, it looks like the problem is caused by your driver overwriting
> the data structure owned by ehci-hcd.  Here's the important part of the
>
> log:
> > [  151.688299] ehci_hcd 0000:00:1d.7: link qh f65cf700 (null)
> > [  151.688428] ehci_hcd 0000:00:1d.7: unlink qh f65cf700 (null)
>
> Here f65cf700 is the only qh on the async list (it is linked in at the
> head and its qh_next pointer is NULL).
>
> > [  151.688497] ehci_hcd 0000:00:1d.7: link qh f65cf080 (null)
>
> Now f65cf080 is added to the start of the list.
>
> > [  151.688534] ehci_hcd 0000:00:1d.7: end unlink qh f65cf700 (null)
> > [  151.688546] ehci_hcd 0000:00:1d.7: link qh f65cf700 f65cf080
>
> And f65cf700 is added to the start, preceding f65cf080.
>
> > [  151.688675] ehci_hcd 0000:00:1d.7: unlink qh f65cf700 f65cf080
> > [  151.688784] ehci_hcd 0000:00:1d.7: end unlink qh f65cf700 f65cf080
>
> f65cf700 is removed from the start position, leaving f65cf080 at the
> start.
>
> > [  151.688796] ehci_hcd 0000:00:1d.7: link qh f65cf700 f65cf080
>
> It is added again at the start, preceding f65cf080.
>
> > [  151.688923] ehci_hcd 0000:00:1d.7: unlink qh f65cf700 f65cf080
> > [  151.689033] ehci_hcd 0000:00:1d.7: end unlink qh f65cf700 f65cf080
>
> It is removed again from the start position.
>
> > [  151.689045] ehci_hcd 0000:00:1d.7: link qh f65cf700 f65cf080
>
> It is added again at the start.
>
> > [  151.689106] usb 1-1.1: USB disconnect, address 9
> > [  152.712104] prev is NULL, qh=f65cf080, ehci->async=f65cf000
>
> Evidently prev is f65cf700->qh_next.  We know that the value was set to
> f65cf080 just above, and you added log messages to every place where
> ehci-hcd changes qh_next.  Hence something your driver did must have
> been responsible.  Does it access urb->hcpriv anywhere?

Thanks for explaining this.

No, it doesn't access urb->hcpriv. The driver should not do anything special. 
Just sends one interrupt urb, reads the replies and sends ACK (a bulk urb) 
when touch data was received. When idle, the device sends no reply most of 
the time, sometimes "8204abaa".
Here's the latest version: http://lkml.org/lkml/2009/12/3/74

> Incidentally, look at the usbmon trace:
> > f60eecc0 1501056647 S Bi:1:009:2 -115 128 <
> > f60eecc0 1501056905 C Bi:1:009:2 -32 0
> > f60eecc0 1501056916 S Bi:1:009:2 -115 128 <
> > f60eecc0 1501057172 C Bi:1:009:2 -32 0
> > f60eecc0 1501057183 S Bi:1:009:2 -115 128 <
> > f60eecc0 1501057394 C Bi:1:009:2 -32 0
>
> Why does your driver keep submitting the same request over and over
> again when each time it fails?

Looks like it's resubmitting the interrupt urb. This -EPIPE case is not 
covered in usbtouch_irq() callback. According to some other drivers, -EPIPE 
means "halt" or "stall" which should be cleared by using usb_clear_halt(). It 
cannot be used in interrupt context.

>
> Alan Stern



-- 
Ondrej Zary

  reply	other threads:[~2009-12-03 20:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-27 13:38 debugging oops after disconnecting Nexio USB touchscreen Ondrej Zary
2009-11-27 18:19 ` Alan Stern
2009-11-30 15:30   ` Ondrej Zary
2009-11-30 20:19     ` Alan Stern
2009-12-01 10:06       ` Ondrej Zary
2009-12-01 15:11         ` Alan Stern
2009-12-02  8:52           ` Ondrej Zary
2009-12-02  9:42             ` Oliver Neukum
2009-12-03  9:30               ` Ondrej Zary
2009-12-02 15:58             ` Alan Stern
2009-12-03 12:31               ` Ondrej Zary
2009-12-03 19:39                 ` Alan Stern
2009-12-03 20:55                   ` Ondrej Zary [this message]
2009-12-03 22:22                     ` Alan Stern
2009-12-04 12:22                       ` Ondrej Zary
2009-12-04 15:47                         ` Alan Stern
2009-12-04 19:17                           ` Ondrej Zary
2009-12-04 19:34                             ` Alan Stern
2009-12-04 19:55                               ` Ondrej Zary
2009-12-04 21:24                                 ` Alan Stern
2009-12-07  9:02                                   ` Ondrej Zary
2009-12-07 15:22                                     ` Alan Stern
2009-12-08  9:03                                       ` Ondrej Zary
2009-12-08 15:03                                         ` Alan Stern
2009-12-08 15:21                                       ` Ondrej Zary
2009-12-07 15:07                               ` Ondrej Zary
2009-12-07 16:02                                 ` Alan Stern
2009-12-10 15:40                                 ` Ondrej Zary
2009-12-10 20:38                                   ` Alan Stern
2009-12-11 19:42                                     ` Ondrej Zary
2009-12-11 20:49                                       ` Alan Stern
2009-12-05  7:36                       ` Andreas Mohr
2009-12-05 17:16                         ` Alan Stern
2009-12-06 11:38                           ` Andreas Mohr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200912032155.26018.linux@rainbow-software.org \
    --to=linux@rainbow-software.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox