From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Oliver Neukum <oneukum@suse.com>,
Jonas Karlsson <jonas.karlsson@actia.se>,
Greg KH <gregkh@linuxfoundation.org>
Cc: "linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
Alan Stern <stern@rowland.harvard.edu>
Subject: Re: USB transaction errors causing RCU stalls and kernel panics
Date: Wed, 4 Mar 2020 18:21:38 +0200 [thread overview]
Message-ID: <4fa64e92-64ce-07f3-ed8e-ea4e07d091bb@linux.intel.com> (raw)
In-Reply-To: <1583331173.12738.26.camel@suse.com>
On 4.3.2020 16.12, Oliver Neukum wrote:
> Am Mittwoch, den 04.03.2020, 14:11 +0200 schrieb Mathias Nyman:
>> On 3.3.2020 22.08, Jonas Karlsson wrote:
>
>>
>> I recently got a report about a bit similar issue on a 4.4 stable kernel, so this
>> might not be xhci-cdns specific.
>>
>> That case involved autosuspend of the cdc-acm, and there was only a short burst of
>> transaction erros and resubmitted URBs even if the device was supposed to be suspended.
>> It looks like cdc_acm autosuspended even if it had URBs pending.
>
> That must not happen. Do you have details?
Shared what I got in:
https://drive.google.com/open?id=1PjwmIK97bIfugWL697lJCe65yuDVOZhx
cdc-acm is unfortunately not traced, but usb core and xhci is.
grep for "MATTU" to find the relevant places.
>
>> I'm guessing that in that case the transfer ring restarted even if link was already "suspeded",
>> causing transaction errors. Ring could be restarted if URBs were resubmitted
>> by the class driver when usb core suspends all interfaces, flushing all pending URBs which
>> calls the URB completion handler.
>
> How should a class driver do that? It gets -EPROTO and that's it,
>
> Regards
> Oliver
>
This is just my speculation, I haven't checked details yet.
It's not just the class driver that causes this, but a combination of
the following gaps in xhci, cdc-acm, and usb core.
A class driver autosuspending with URBs pending,
and not killing all URBs synchronously when usb core calls suspend for the interface drivers.
xhci restarting an endpoint ring due to a race between xhci_stop_device(), handling
a Transaction error, and having pending URB (unhandled trb) on the ring. [1]
Sleeping between setting port to U3 link state, and clearing udev->can_submit.
allowing URBs to be submitted during that gap when link is in U3 already.
URBs will complete with -EPROTO, and resubmitted until udev->can_submit is cleared.
usb_runtime_suspend()
usb_suspend_both() // suspend a device and its interfaces
for (each udev->actconfig->interface)
usb_suspend_interface(udev, intf, msg);
driver = to_usb_driver(intf->dev.driver);
status = driver->suspend(intf, msg); // URBs shuold be killed, sync, miss one??
usb_suspend_device()
generic_suspend()
usb_port_suspend()
hub_set_port_link_state(USB_SS_PORT_LS_U3)
xhci_stop_device()
xhci_set_link_state(USB_SS_PORT_LS_U3) // port link is in U3
msleep() // during this URBs can be resubmitteded and complete with -EPROTO loop
udev->can_submit = 0; // we can submit URBs until here (except the killed ones, they are flagged).
for (each endpoint)
usb_hcd_flush_endpoint(udev, udev->ep_out[i]);
usb_hcd_flush_endpoint(udev, udev->ep_in[i]);
[1]
xhci_stop_device()
xhci_queue_stop_endpoint()
-> interrupt (transfer event, ring has not stopped yet)
handle_tx_event() // bulk transfer with Transaction error
process_bulk_intr_td()
finish_td()
xhci_cleanup_halted_endpoint()
xhci_queue_reset_ep()
xhci_queue_new_dequeue_state()
-> interupt, command completion event for stop endpoint,
-> interrupt handle reset_ep command, xhci_handle_cmd_stop_ep()
-> interrupt handle new deq state
xhci_handle_cmd_set_deq()
ring_doorbell_for_active_rings(xhci, slot_id, ep_index) - this restarts the ring.
-Mathias
next prev parent reply other threads:[~2020-03-04 16:19 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-03 15:05 USB transaction errors causing RCU stalls and kernel panics Jonas Karlsson
2020-03-03 16:39 ` Greg KH
2020-03-03 20:08 ` Jonas Karlsson
2020-03-04 6:37 ` Greg KH
2020-03-04 10:29 ` Oliver Neukum
2020-03-04 12:11 ` Mathias Nyman
2020-03-04 14:12 ` Oliver Neukum
2020-03-04 16:21 ` Mathias Nyman [this message]
2020-03-06 1:31 ` Peter Chen
2020-03-09 14:21 ` Jonas Karlsson
2020-03-10 8:14 ` Peter Chen
2020-03-10 10:04 ` Jonas Karlsson
2020-03-10 11:04 ` Oliver Neukum
2020-03-10 11:21 ` Oliver Neukum
2020-03-10 12:26 ` Jonas Karlsson
2020-03-10 16:04 ` Jonas Karlsson
2020-03-10 16:11 ` Fabio Estevam
2020-03-11 6:25 ` Jonas Karlsson
2020-03-11 10:28 ` Oliver Neukum
2020-03-11 14:59 ` Jonas Karlsson
2020-03-12 13:45 ` Oliver Neukum
2020-03-12 15:37 ` Jonas Karlsson
2020-03-13 9:27 ` Oliver Neukum
2020-03-16 7:07 ` Jonas Karlsson
2020-03-23 11:37 ` Jonas Karlsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4fa64e92-64ce-07f3-ed8e-ea4e07d091bb@linux.intel.com \
--to=mathias.nyman@linux.intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=jonas.karlsson@actia.se \
--cc=linux-usb@vger.kernel.org \
--cc=oneukum@suse.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).