MUSB interrupt storm on device removal

linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Bin Liu <b-liu@ti.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: "Johan Hovold" <johan@kernel.org>, "Greg KH" <greg@kroah.com>,
	"Måns Rullgård" <mans@mansr.com>,
	linux-usb@vger.kernel.org
Subject: MUSB interrupt storm on device removal
Date: Wed, 23 Jan 2019 10:53:41 -0600	[thread overview]
Message-ID: <20190123165341.GC18982@uda0271908> (raw)

On Wed, Jan 23, 2019 at 11:05:40AM -0500, Alan Stern wrote:
> On Wed, 23 Jan 2019, Bin Liu wrote:
> 
> > On Wed, Jan 23, 2019 at 03:55:47PM +0100, Johan Hovold wrote:
> > > On Wed, Jan 23, 2019 at 08:09:47AM -0600, Bin Liu wrote:
> > > > On Wed, Jan 23, 2019 at 09:55:49AM +0100, Johan Hovold wrote:
> > > > > On Wed, Jan 23, 2019 at 07:52:12AM +0100, Greg Kroah-Hartman wrote:
> > > 
> > > > > > That's not what any other host controller returns when a device is
> > > > > > removed, so either you are going to have to fix all USB drives for this
> > > > > > issue, or you need to fix the musb driver to not send this error for
> > > > > > when a device is removed (hint, do the latter...)
> > > > > 
> > > > > Right, this needs to be handle at the HCD level.
> > > > 
> > > > Any reason usb_serial_generic_read_bulk_callback() doesn't handle
> > > > -EPROTO in the same way as -EPIPE?
> > > 
> > > Since it is supposed to be intermittent unlike, for example, -ENOENT or
> > > -EPIPE (the latter which the device driver can recover from if it cares
> > > to implement clearing of halt).
> 
> Wait a minute.  Nothing says any of those errors is supposed to be 
> intermittent.  As long as an error has a systematic cause (as opposed 
> to random noise, for example), it will recur as often as the cause 
> does.
> 
> At least when -EPROTO errors are caused by device disconnect, we know 
> that they will eventually go away when the upstream hub reports the 
> port disconnect event.  But until then, an interrupt storm is certainly 
> possible.
> 
> > Okay, makes sense.
> > 
> > > 
> > > > > dwc2 fixed a similar lockup issue due to retried NAKed transaction by
> > > > > not retrying immediately:
> > > > > 
> > > > > 	38d2b5fb75c1 ("usb: dwc2: host: Don't retry NAKed transactions right away")
> > > > 
> > > > Both cases are all about device removal, but this musb case is slightly
> > > > different from this dwc2 case.
> > > > 
> > > > It is all about re-transmitting which causes interrupt storm, but in
> > > > this dwc2 case, it is the dwc2 driver doing the re-transmitting, so it
> > > > makes sense to delay it in the dwc2 driver as this referred patch does,
> > > >
> > > > but in this musb case, musb driver reports transaction error to the usb
> > > > serial driver, the usb serial driver issues the re-transmitting not the
> > > > musb driver, so I don't think the delay should be added in the musb
> > > > driver.
> > > 
> > > I didn't say it was exactly the same.
> > 
> > Yeah, I know. My point was the fix is in the place where re-transmitting
> > happens, but
> > 
> > > My point was that unless you fix this at the HCD level, you will need to
> > > add complex recovery handling to every USB driver and completion handler
> > > (~500 of those). But perhaps that is what it needed.
> > 
> > okay, it probably make sense to handle the case in HCD because the
> > number of HCD is much less.
> 
> One possibility is to giveback URBs with certain errors (such as
> -EPROTO) only at a frame boundary, or at 1-ms intervals.  This feels 
> like a very artificial solution, though.

My plan is to add an error counter in musb driver endpoint struct, if
-EPROTO has happened consequentially for a certain times, for example 3,
giveback URBs with -EPIPE instead -EPROTO. This is the simplest solution
I can think of, though I hate expending struct unnecessarily, this is
one of the cases.

> 
> > > I do see now that of all USB drivers we have two drivers that handles
> > > -EPROTO by resubmitting after a delay, while a handful explicitly deals
> > > with -EPROTO by simply stopping to resubmit (some probably bail out on
> > > all errors, but the majority appear to resubmit on -EPROTO).
> 
> Any driver which immediately retries an URB after getting -EPROTO or
> -EILSEQ or -ETIME, and has no mechanism for backing off or limiting the
> retries, is buggy.  As far as I can see, that's all there is to it.

Agreed, but given that majority appear to resubmit on -EPROTO as Johan
said, I think better to handle it in HCD.

> > Thanks for the info.
> > I will handle this case in musb driver.
> 
> Why doesn't the same problem occur with other types of host controller?

Not sure, I am on musb for most of the times. Maybe other HCD doesn't
giveback URBs with -EPROTO in such error case.

musb controller has a register bit telling the controller has tried the
transaction 3 times but didn't receive any response, then the musb
driver just giveback this URB with -EPROTO.

Regards,
-Bin.

next             reply	other threads:[~2019-01-23 16:53 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-23 16:53 Bin Liu [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-03-07 16:16 MUSB interrupt storm on device removal Bin Liu
2019-03-05 11:30 Måns Rullgård
2019-01-25 15:43 Bin Liu
2019-01-24 16:31 Måns Rullgård
2019-01-24 15:54 Bin Liu
2019-01-24 15:49 Alan Stern
2019-01-24 15:43 Bin Liu
2019-01-24 15:40 Bin Liu
2019-01-24 15:22 Alan Stern
2019-01-24 12:56 Måns Rullgård
2019-01-24  9:25 Johan Hovold
2019-01-24  9:22 Johan Hovold
2019-01-24  8:11 Greg Kroah-Hartman
2019-01-23 20:50 Måns Rullgård
2019-01-23 20:44 Alan Stern
2019-01-23 20:12 Bin Liu
2019-01-23 17:42 Alan Stern
2019-01-23 16:05 Alan Stern
2019-01-23 15:21 Bin Liu
2019-01-23 14:55 Johan Hovold
2019-01-23 14:09 Bin Liu
2019-01-23  8:55 Johan Hovold
2019-01-23  6:52 Greg KH
2019-01-22 20:52 Bin Liu
2019-01-22 20:16 Bin Liu
2019-01-22 17:19 Måns Rullgård
2019-01-22 14:57 Bin Liu
2019-01-21 21:20 Måns Rullgård
2019-01-21 16:31 Bin Liu
2019-01-18 20:15 Måns Rullgård
2019-01-10  3:07 Bin Liu
2019-01-09 13:19 Måns Rullgård
2018-12-17 21:36 Måns Rullgård
2018-12-17 20:56 Bin Liu
2018-12-17 19:16 Måns Rullgård
2018-12-17 18:44 Bin Liu
2018-12-17 15:13 Måns Rullgård

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190123165341.GC18982@uda0271908 \
    --to=b-liu@ti.com \
    --cc=greg@kroah.com \
    --cc=johan@kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mans@mansr.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).