public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: Benjamin Berg <benjamin@sipsolutions.net>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Marcel Holtmann <marcel@holtmann.org>,
	linux-usb@vger.kernel.org, linux-bluetooth@vger.kernel.org
Subject: Re: Userspace enumeration hang while btusb tries to load firmware of removed device
Date: Thu, 04 Nov 2021 15:19:20 +0100	[thread overview]
Message-ID: <3f318d3b8b03c6cde8a871e802689fa4e90107be.camel@sipsolutions.net> (raw)
In-Reply-To: <20211104132828.GA1557201@rowland.harvard.edu>

[-- Attachment #1: Type: text/plain, Size: 3996 bytes --]

On Thu, 2021-11-04 at 09:28 -0400, Alan Stern wrote:
> On Thu, Nov 04, 2021 at 10:34:22AM +0100, Benjamin Berg wrote:
> > Hi Marcel and Alan,
> > 
> > On Wed, 2021-11-03 at 20:31 +0100, Marcel Holtmann wrote:
> > > > I'm not familiar with the btusb driver, so someone on the 
> > > > linux-bluetooth mailing list would have a better idea about this. 
> > > > However, it does look as though btusb keeps the device locked during the 
> > > > entire 10-second period while it tries to send over the firmware, and it 
> > > > doesn't abort the procedure when it starts getting disconnection errors 
> > > > but instead persists until a timeout expires.  Keeping the device locked 
> > > > would certainly block lsusb.
> > > > 
> > > > In general, locking the device during a firmware upload seems like
> > > > the right thing to do -- you don't want extraneous transfers from
> > > > other processes messing up the firmware!  So overall, it appears that
> > > > the whole problem would be solved if the firmware transfer were
> > > > aborted as soon as the -ENODEV errors start appearing.
> > > 
> > > the problem seems to be that we hitting HCI command timeout. So the
> > > firmware download is done via HCI commands. These commands are send
> > > to the transport driver btusb.c via hdev->send (as btusb_send_frame).
> > > This triggers the usb_submit_urb or queues them via data->deferred
> > > anchor. All this reports back the error properly except that nobody
> > > does anything with it.
> > > 
> > > See hci_send_frame() last portion:
> > > 
> > >         err = hdev->send(hdev, skb);                                             
> > >         if (err < 0) {                                                           
> > >                 bt_dev_err(hdev, "sending frame failed (%d)", err);              
> > >                 kfree_skb(skb);                                                  
> > >         }
> > > 
> > > And that is it. We are not checking for ENODEV or any error here.
> > > That means the failure of the HCI command gets only caught via the
> > > HCI command timeout. I don’t know how to do this yet, but you would
> > > have to look there to fail HCI command right away instead of waiting
> > > for the timeout.
> > 
> > Hmm, true, I don't see a "sending frame failed" error message during
> > the firmware download though.
> 
> It is in the log you posted:
> 
> [Mi Nov 3 11:55:23 2021] Bluetooth: hci0: Failed to send firmware data (-110)
> [Mi Nov 3 11:55:23 2021] Bluetooth: hci0: sending frame failed (-19)
> 
> But this occurred after the timeout, so maybe you had in mind something 
> occurring earlier.

Yep, that one happens when the driver tries to reset the device. We
need to catch the error earlier in order to avoid the 10s wait.

> >  You are right that this codepath is
> > loosing the error, but this does not seem to be the scenario we are
> > running into while loading the firmware. This error only happens later
> > on from the btintel_reset_to_bootloader function.
> > 
> > What seems to happen in the posted log is that the URB is initially
> > submitted just fine and the transfer errors out afterwards.
> > Unfortunately, the btusb_tx_complete is only used for statistics
> > (stat.err_tx is increased) and has no further error handling that could
> > abort the firmware upload.
> 
> While detecting the errors during URB completion would be nice, it isn't 
> necessary.  Things would work just as well if the disconnect error were 
> detected during submission of the following URB.

Ah, good point. The in-flight interrupt URB responsible of retrieving
the event from the device will fail. It should be sufficient to inject
a hardware error event at that point in order to fix this.

As such, a simple solution may be to call hci_reset_dev from inside
btusb_intr_complete and btusb_submit_intr_urb when the URB submission
fails (for "err != -EPERM").

Benjamin

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      reply	other threads:[~2021-11-04 14:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-03 14:55 Userspace enumeration hang while btusb tries to load firmware of removed device Benjamin Berg
2021-11-03 18:23 ` Alan Stern
2021-11-03 19:31   ` Marcel Holtmann
2021-11-03 20:11     ` Alan Stern
2021-11-09 17:03       ` Benjamin Berg
2021-11-04  9:34     ` Benjamin Berg
2021-11-04 13:28       ` Alan Stern
2021-11-04 14:19         ` Benjamin Berg [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f318d3b8b03c6cde8a871e802689fa4e90107be.camel@sipsolutions.net \
    --to=benjamin@sipsolutions.net \
    --cc=linux-bluetooth@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=marcel@holtmann.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox