From: Alan Stern <stern@rowland.harvard.edu>
To: Marcel Holtmann <marcel@holtmann.org>
Cc: Benjamin Berg <benjamin@sipsolutions.net>,
linux-usb@vger.kernel.org, linux-bluetooth@vger.kernel.org
Subject: Re: Userspace enumeration hang while btusb tries to load firmware of removed device
Date: Wed, 3 Nov 2021 16:11:38 -0400 [thread overview]
Message-ID: <20211103201138.GC1529362@rowland.harvard.edu> (raw)
In-Reply-To: <BCD95F43-3C6E-4B50-9228-9F2AD93BBBA4@holtmann.org>
On Wed, Nov 03, 2021 at 08:31:03PM +0100, Marcel Holtmann wrote:
> Hi Alan,
>
> >> a user is seeing a hang in fprintd while enumerating devices which
> >> appears to be caused by an interaction of:
> >>
> >> * system is resuming from S3
> >> * btusb starts loading firmware
> >> * bluetooth device disappears (probably thinkpad_acpi rfkill)
> >> * libusb enumerates USB devices (fprintd in this case)
> >>
> >> When this happens, the firmware load fails after a timeout of 10s. It
> >> appears that if userspace queries information about the root hub in
> >> question during this time, it will hang until the btusb firmware load
> >> has timed out.
> >>
> >> Attaching the full kernel log, below an excerpt, you can see:
> >> * At :12 device removal: "usb 5-4: USB disconnect, device number 33"
> >> * libusb enumeration retrieves information about the usb5 root hub,
> >> and blocks on this
> >> * At :14 there is a tx timeout on hci0
> >> * At :23 the firmware load finally fails
> >> * Then usb_disable_device happens
> >> * libusb/fprintd gets the usb5 HUB information and continues its
> >> enumeration
> >>
> >> As I see it, there may be two issues:
> >> 1. userspace should not block due to the firmware load hanging
> >> 2. btusb should give up more quickly when the device disappears
> >>
> >> Does anyone have a good idea about the possible cause or how we can fix
> >> the problem?
> >>
> >> Downstream issue: https://bugzilla.redhat.com/show_bug.cgi?id=2019857
> >
> > I'm not familiar with the btusb driver, so someone on the
> > linux-bluetooth mailing list would have a better idea about this.
> > However, it does look as though btusb keeps the device locked during the
> > entire 10-second period while it tries to send over the firmware, and it
> > doesn't abort the procedure when it starts getting disconnection errors
> > but instead persists until a timeout expires. Keeping the device locked
> > would certainly block lsusb.
> >
> > In general, locking the device during a firmware upload seems like
> > the right thing to do -- you don't want extraneous transfers from
> > other processes messing up the firmware! So overall, it appears that
> > the whole problem would be solved if the firmware transfer were
> > aborted as soon as the -ENODEV errors start appearing.
>
> the problem seems to be that we hitting HCI command timeout. So the
> firmware download is done via HCI commands. These commands are send to
> the transport driver btusb.c via hdev->send (as btusb_send_frame).
> This triggers the usb_submit_urb or queues them via data->deferred
> anchor. All this reports back the error properly except that nobody
> does anything with it.
>
> See hci_send_frame() last portion:
>
> err = hdev->send(hdev, skb);
> if (err < 0) {
> bt_dev_err(hdev, "sending frame failed (%d)", err);
> kfree_skb(skb);
> }
>
> And that is it. We are not checking for ENODEV or any error here. That
> means the failure of the HCI command gets only caught via the HCI
> command timeout. I don’t know how to do this yet, but you would have
> to look there to fail HCI command right away instead of waiting for
> the timeout.
I have no idea how all the different layers work here. Clearly
something has to signal hdev->req_wait_q after setting hdev->req_status
to some appropriate value. Can this be done in btusb.c, either when the
URB is submitted or when it completes? Or in hci_send_frame?
Alan Stern
next prev parent reply other threads:[~2021-11-03 20:11 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-03 14:55 Userspace enumeration hang while btusb tries to load firmware of removed device Benjamin Berg
2021-11-03 18:23 ` Alan Stern
2021-11-03 19:31 ` Marcel Holtmann
2021-11-03 20:11 ` Alan Stern [this message]
2021-11-09 17:03 ` Benjamin Berg
2021-11-04 9:34 ` Benjamin Berg
2021-11-04 13:28 ` Alan Stern
2021-11-04 14:19 ` Benjamin Berg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211103201138.GC1529362@rowland.harvard.edu \
--to=stern@rowland.harvard.edu \
--cc=benjamin@sipsolutions.net \
--cc=linux-bluetooth@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=marcel@holtmann.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox