MUSB interrupt storm on device removal

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bin Liu <b-liu@ti.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: "Johan Hovold" <johan@kernel.org>, "Greg KH" <greg@kroah.com>,
	"Måns Rullgård" <mans@mansr.com>,
	linux-usb@vger.kernel.org
Subject: MUSB interrupt storm on device removal
Date: Wed, 23 Jan 2019 10:53:41 -0600	[thread overview]
Message-ID: <20190123165341.GC18982@uda0271908> (raw)

On Wed, Jan 23, 2019 at 11:05:40AM -0500, Alan Stern wrote:
> On Wed, 23 Jan 2019, Bin Liu wrote:
> 
> > On Wed, Jan 23, 2019 at 03:55:47PM +0100, Johan Hovold wrote:
> > > On Wed, Jan 23, 2019 at 08:09:47AM -0600, Bin Liu wrote:
> > > > On Wed, Jan 23, 2019 at 09:55:49AM +0100, Johan Hovold wrote:
> > > > > On Wed, Jan 23, 2019 at 07:52:12AM +0100, Greg Kroah-Hartman wrote:
> > > 
> > > > > > That's not what any other host controller returns when a device is
> > > > > > removed, so either you are going to have to fix all USB drives for this
> > > > > > issue, or you need to fix the musb driver to not send this error for
> > > > > > when a device is removed (hint, do the latter...)
> > > > > 
> > > > > Right, this needs to be handle at the HCD level.
> > > > 
> > > > Any reason usb_serial_generic_read_bulk_callback() doesn't handle
> > > > -EPROTO in the same way as -EPIPE?
> > > 
> > > Since it is supposed to be intermittent unlike, for example, -ENOENT or
> > > -EPIPE (the latter which the device driver can recover from if it cares
> > > to implement clearing of halt).
> 
> Wait a minute.  Nothing says any of those errors is supposed to be 
> intermittent.  As long as an error has a systematic cause (as opposed 
> to random noise, for example), it will recur as often as the cause 
> does.
> 
> At least when -EPROTO errors are caused by device disconnect, we know 
> that they will eventually go away when the upstream hub reports the 
> port disconnect event.  But until then, an interrupt storm is certainly 
> possible.
> 
> > Okay, makes sense.
> > 
> > > 
> > > > > dwc2 fixed a similar lockup issue due to retried NAKed transaction by
> > > > > not retrying immediately:
> > > > > 
> > > > > 	38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")
> > > > 
> > > > Both cases are all about device removal, but this musb case is slightly
> > > > different from this dwc2 case.
> > > > 
> > > > It is all about re-transmitting which causes interrupt storm, but in
> > > > this dwc2 case, it is the dwc2 driver doing the re-transmitting, so it
> > > > makes sense to delay it in the dwc2 driver as this referred patch does,
> > > >
> > > > but in this musb case, musb driver reports transaction error to the usb
> > > > serial driver, the usb serial driver issues the re-transmitting not the
> > > > musb driver, so I don't think the delay should be added in the musb
> > > > driver.
> > > 
> > > I didn't say it was exactly the same.
> > 
> > Yeah, I know. My point was the fix is in the place where re-transmitting
> > happens, but
> > 
> > > My point was that unless you fix this at the HCD level, you will need to
> > > add complex recovery handling to every USB driver and completion handler
> > > (~500 of those). But perhaps that is what it needed.
> > 
> > okay, it probably make sense to handle the case in HCD because the
> > number of HCD is much less.
> 
> One possibility is to giveback URBs with certain errors (such as
> -EPROTO) only at a frame boundary, or at 1-ms intervals.  This feels 
> like a very artificial solution, though.

My plan is to add an error counter in musb driver endpoint struct, if
-EPROTO has happened consequentially for a certain times, for example 3,
giveback URBs with -EPIPE instead -EPROTO. This is the simplest solution
I can think of, though I hate expending struct unnecessarily, this is
one of the cases.

> 
> > > I do see now that of all USB drivers we have two drivers that handles
> > > -EPROTO by resubmitting after a delay, while a handful explicitly deals
> > > with -EPROTO by simply stopping to resubmit (some probably bail out on
> > > all errors, but the majority appear to resubmit on -EPROTO).
> 
> Any driver which immediately retries an URB after getting -EPROTO or
> -EILSEQ or -ETIME, and has no mechanism for backing off or limiting the
> retries, is buggy.  As far as I can see, that's all there is to it.

Agreed, but given that majority appear to resubmit on -EPROTO as Johan
said, I think better to handle it in HCD.

> > Thanks for the info.
> > I will handle this case in musb driver.
> 
> Why doesn't the same problem occur with other types of host controller?

Not sure, I am on musb for most of the times. Maybe other HCD doesn't
giveback URBs with -EPROTO in such error case.

musb controller has a register bit telling the controller has tried the
transaction 3 times but didn't receive any response, then the musb
driver just giveback this URB with -EPROTO.

Regards,
-Bin.

next             reply	other threads:[~2019-01-23 16:53 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 16:53 Bin Liu [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-03-07 16:16 MUSB interrupt storm on device removal Bin Liu
2019-03-05 11:30 Måns Rullgård
2019-01-25 15:43 Bin Liu
2019-01-24 16:31 Måns Rullgård
2019-01-24 15:54 Bin Liu
2019-01-24 15:49 Alan Stern
2019-01-24 15:43 Bin Liu
2019-01-24 15:40 Bin Liu
2019-01-24 15:22 Alan Stern
2019-01-24 12:56 Måns Rullgård
2019-01-24  9:25 Johan Hovold
2019-01-24  9:22 Johan Hovold
2019-01-24  8:11 Greg Kroah-Hartman
2019-01-23 20:50 Måns Rullgård
2019-01-23 20:44 Alan Stern
2019-01-23 20:12 Bin Liu
2019-01-23 17:42 Alan Stern
2019-01-23 16:05 Alan Stern
2019-01-23 15:21 Bin Liu
2019-01-23 14:55 Johan Hovold
2019-01-23 14:09 Bin Liu
2019-01-23  8:55 Johan Hovold
2019-01-23  6:52 Greg KH
2019-01-22 20:52 Bin Liu
2019-01-22 20:16 Bin Liu
2019-01-22 17:19 Måns Rullgård
2019-01-22 14:57 Bin Liu
2019-01-21 21:20 Måns Rullgård
2019-01-21 16:31 Bin Liu
2019-01-18 20:15 Måns Rullgård
2019-01-10  3:07 Bin Liu
2019-01-09 13:19 Måns Rullgård
2018-12-17 21:36 Måns Rullgård
2018-12-17 20:56 Bin Liu
2018-12-17 19:16 Måns Rullgård
2018-12-17 18:44 Bin Liu
2018-12-17 15:13 Måns Rullgård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190123165341.GC18982@uda0271908 \
    --to=b-liu@ti.com \
    --cc=greg@kroah.com \
    --cc=johan@kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mans@mansr.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.