All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ondrej Zary <linux@rainbow-software.org>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: debugging oops after disconnecting Nexio USB touchscreen
Date: Wed, 2 Dec 2009 09:52:08 +0100	[thread overview]
Message-ID: <200912020952.09354.linux@rainbow-software.org> (raw)
In-Reply-To: <Pine.LNX.4.44L0.0912011005140.2983-100000@iolanthe.rowland.org>

On Tuesday 01 December 2009, Alan Stern wrote:
> On Tue, 1 Dec 2009, Ondrej Zary wrote:
> > On Monday 30 November 2009, Alan Stern wrote:
> > > On Mon, 30 Nov 2009, Ondrej Zary wrote:
> > > > It does not make much sense to me but I think that it crashes iside
> > > > this list manipulation:
> > > >
> > > >         prev = ehci->async;
> > > >         while (prev->qh_next.qh != qh)
> > > >                 prev = prev->qh_next.qh;
> > >
> > > Yes, it's crashing in the "while" test because prev is NULL.  This
> > > means the code is looking for qh in the async list but not finding it.
> > > That's supposed to be impossible.
> > >
> > > The assembly code is peculiar because it includes stuff that isn't in
> > > the source code!  For example, right at this point (after the end of
> > > the loop) there's a test to see whether prev is NULL.  Where could that
> > > have come from?  Do you have any idea?
> >
> > I'm not sure, I might did something wrong and left it there from my
> > previous debugging attempt.
>
> Okay.  But it was part of the reason you had difficulty matching up the
> ehci-hcd.s file with the crash code dump.
>
> > > >         prev->hw_next = qh->hw_next;
> > > >         prev->qh_next = qh->qh_next;
> > > >         wmb ();
> > >
> > > These lines aren't reached.
> > >
> > > Does this happen every time you disconnect the Nexio?
> >
> > The crash happens almost always when disconnecting the touchscreen.
> > When booted without X, it often survives the first disconnect.
>
> Then it will be easy to recreate the problem.  :-)

Yes, seems so. But it's not that easy (see below).

> > > You can try patching that loop.  If prev is NULL then print an error
> > > message in the log, including the value of qh and the value of
> > > ehci->async, and jump past the following three statements.
> > >
> > > With that change the system shouldn't crash, although khubd might hang.
> > > But we still need to find out how this could have happened.  Try
> > > collecting a usbmon trace while running the test; then let's compare
> > > the usbmon output with the error messages in the log.
> >
> > gcc version is: gcc (Debian 4.3.4-6) 4.3.4
> >
> > Tried something like that before but it did not help at all.
> > The check is not triggered and it still oopses. Now it looks like this:
> >
> >         qh->qh_state = QH_STATE_UNLINK;
> >         ehci->reclaim = qh = qh_get (qh);
> >
> >         prev = ehci->async;
> >         if (!prev) {
> >                 printk("prev is NULL, qh=%p, ehci->async=%p\n", qh,
> > ehci->async); goto after;
> >         }
>
> [*] >         while (prev->qh_next.qh != qh) {
>
> >                 if (!prev) {
> >                         printk("prev is NULL, qh=%p, ehci->async=%p\n",
> > qh, ehci->async); goto after;
> >                 }
> >                 prev = prev->qh_next.qh;
> >         }
> >
> >         prev->hw_next = qh->hw_next;
> >         prev->qh_next = qh->qh_next;
> >         wmb ();
> >
> > after:
>
> No, that is wrong.  The [*] line can still perform an invalid
> dereference.  You need to move the inside test to the end of the loop,
> after the assignment statement.  Or else do this:

Oops, such a stupid mistake. Fixed now:
        prev = ehci->async;
        if (!prev) {
                printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async);
                goto after;
        }
        while (prev->qh_next.qh != qh) {
                prev = prev->qh_next.qh;
                if (!prev) {
                        printk("prev is NULL, qh=%p, ehci->async=%p\n", qh, ehci->async);
                        goto after;
                }
        }

        prev->hw_next = qh->hw_next;
        prev->qh_next = qh->qh_next;
        wmb ();
after:


It shows "prev is NULL, qh=f6581080, ehci->async=f6581000".

The problem is that activating usbmon causes the problem to disappear. No errors
in maybe 15 attempts. It appeared on 2nd attempt after unloading usbmon.

-- 
Ondrej Zary

  reply	other threads:[~2009-12-02  8:52 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-27 13:38 debugging oops after disconnecting Nexio USB touchscreen Ondrej Zary
2009-11-27 18:19 ` Alan Stern
2009-11-30 15:30   ` Ondrej Zary
2009-11-30 20:19     ` Alan Stern
2009-12-01 10:06       ` Ondrej Zary
2009-12-01 15:11         ` Alan Stern
2009-12-02  8:52           ` Ondrej Zary [this message]
2009-12-02  9:42             ` Oliver Neukum
2009-12-03  9:30               ` Ondrej Zary
2009-12-02 15:58             ` Alan Stern
2009-12-03 12:31               ` Ondrej Zary
2009-12-03 19:39                 ` Alan Stern
2009-12-03 20:55                   ` Ondrej Zary
2009-12-03 22:22                     ` Alan Stern
2009-12-04 12:22                       ` Ondrej Zary
2009-12-04 15:47                         ` Alan Stern
2009-12-04 19:17                           ` Ondrej Zary
2009-12-04 19:34                             ` Alan Stern
2009-12-04 19:55                               ` Ondrej Zary
2009-12-04 21:24                                 ` Alan Stern
2009-12-07  9:02                                   ` Ondrej Zary
2009-12-07 15:22                                     ` Alan Stern
2009-12-08  9:03                                       ` Ondrej Zary
2009-12-08 15:03                                         ` Alan Stern
2009-12-08 15:21                                       ` Ondrej Zary
2009-12-07 15:07                               ` Ondrej Zary
2009-12-07 16:02                                 ` Alan Stern
2009-12-10 15:40                                 ` Ondrej Zary
2009-12-10 20:38                                   ` Alan Stern
2009-12-11 19:42                                     ` Ondrej Zary
2009-12-11 20:49                                       ` Alan Stern
2009-12-05  7:36                       ` Andreas Mohr
2009-12-05 17:16                         ` Alan Stern
2009-12-06 11:38                           ` Andreas Mohr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200912020952.09354.linux@rainbow-software.org \
    --to=linux@rainbow-software.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.