public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>,
	Thinh Nguyen <Thinh.Nguyen@synopsys.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"oneukum@suse.com" <oneukum@suse.com>,
	"niklas.neronin@linux.intel.com" <niklas.neronin@linux.intel.com>
Subject: Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
Date: Wed, 22 Apr 2026 07:30:54 +0200	[thread overview]
Message-ID: <20260422073054.0bd482ba.michal.pecio@gmail.com> (raw)
In-Reply-To: <54fd265d-4ae8-4573-b618-587af98176c9@rowland.harvard.edu>

On Tue, 21 Apr 2026 22:11:41 -0400, Alan Stern wrote:
> On Tue, Apr 07, 2026 at 11:24:01PM +0300, Mathias Nyman wrote:
> > On 4/7/26 18:23, Alan Stern wrote:  
> > > It's been a while now, and nobody has objected to the proposed
> > > plan for handling this issue, so I'm going to assume that
> > > everyone is on board with the idea.  
> > 
> > Yes, I support this
> > 
> > So basically usb core will call usb_clear_halt() after EPROTO URB
> > completion handler finishes, and xhci-hcd needs to prevent
> > bulk/interrupt endpoint from restarting after returning a EPROTO
> > URB up until usb_reset_endpoint() is called  
> 
> Can you confirm that it's also possible for xhci-hcd to prevent
> control endpoints from restarting when a transaction error (-EPROTO)
> occurs?  Up until usb_reset_endpoint() or a new callback?

Doable. They halt like any other and it's SW's choice how to restart.

BTW, what about EOVERFLOW? It's treated similarly by xhci-hcd.

> I've been thinking about how to implement all this, and some issues
> have to be solved.  In particular, suppose we have sent a Clear-Halt
> request for an endpoint that has gotten an error, and either the
> request times out or the device replies with a stall.

(... or the TT replies with STALL due to downstream bus EPROTO).

> My feeling is that either of these events would mean that the device
> is so far out of whack that the only thing to do is try resetting it.
> Any proposals for something a little less drastic?

Let's look at possible causes:

1. disconnected device
Doesn't matter what happens.

2. completely brain dead host controller
Ditto. Just be sure not to lock up so xhci-hcd can be reloaded.

3. temporary EMI or low link quality
This should clear itself after a few retries.

4. broken D+/D- wire in a LS/FS cable
Issues can last arbitrarily long and yet still clear.
Least disruptive solution: wait forever with sporadic retries.
Acceptable alternative: request user attention, i.e. disconnect.

Note: we would disconnect instantly if the opposite wire broke.

5. crashed device firmware
In this case a reset seems more productive than retrying forever.

A compromise betwen 4 and 5 could be to retry for some time, then
reset a few times, then disconnect.

6. device doesn't support clear-halt, stalls or does something odd

Nightmare fuel.

> Also, it seems reasonable to devote only a single thread to endpoint 
> error recovery.  Another possibility would be to have one thread for 
> each device having problems, but I think the likelihood of this 
> happening to multiple devices at once is pretty small unless the
> problem affects a hub upstream from all of them -- in which case
> having multiple threads wouldn't really help much.  Other opinions?

Well, another option is asynchronous URBs and "callback hell". For
instance, besides hcd methods, all xhci-hcd endpoint management is
asynchronous, tracks current state with bit flags and defers actions
blocked by flags until the flags are cleared. This includes waiting
for Reset Endpoint commands, TT clearing, ongoing unlinks, etc.

One practical complication is that hcd->endpoint_reset() may sleep.
But it will only extremely rarely take 5 seconds and time out.

Regards,
Michal

  reply	other threads:[~2026-04-22  5:31 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman
2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
2026-03-25  1:52   ` Thinh Nguyen
2026-03-25  9:38     ` Mathias Nyman
2026-03-26  1:19       ` Thinh Nguyen
2026-03-26 11:25         ` Mathias Nyman
2026-03-26 23:24           ` Thinh Nguyen
2026-03-30 12:51             ` Mathias Nyman
2026-03-30 14:17               ` stern
2026-03-31  9:34                 ` Mathias Nyman
2026-03-31 15:31                   ` stern
2026-04-01 22:08                     ` Mathias Nyman
2026-04-02  2:36                       ` stern
2026-04-03  1:59                         ` Thinh Nguyen
2026-04-03  2:42                           ` stern
2026-04-03  8:51                             ` Michal Pecio
2026-04-03 14:55                               ` stern
2026-04-03 19:13                                 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio
2026-04-03 20:17                                   ` stern
2026-04-04  1:15                             ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
2026-04-04  1:54                               ` stern
2026-04-04 20:41                                 ` Thinh Nguyen
2026-04-04 21:54                                   ` Alan Stern
2026-04-04 22:15                                     ` Thinh Nguyen
2026-04-04 22:28                                       ` Thinh Nguyen
2026-04-05  1:30                                         ` Alan Stern
2026-04-05  3:10                                           ` Thinh Nguyen
2026-04-07 15:23                                             ` Alan Stern
2026-04-07 20:24                                               ` Mathias Nyman
2026-04-17 17:38                                                 ` Alan Stern
2026-04-17 21:48                                                   ` Michal Pecio
2026-04-18  2:34                                                     ` Alan Stern
2026-04-18  9:21                                                       ` Michal Pecio
2026-04-18 14:56                                                         ` Alan Stern
2026-04-22  2:11                                                 ` Alan Stern
2026-04-22  5:30                                                   ` Michal Pecio [this message]
2026-04-22 13:51                                                     ` Mathias Nyman
2026-04-22 15:35                                                       ` Alan Stern
2026-04-22 14:57                                                     ` Alan Stern
2026-04-01 22:08               ` Thinh Nguyen
2026-04-01 22:34                 ` Mathias Nyman
2026-04-01 22:47                   ` Thinh Nguyen
2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260422073054.0bd482ba.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=Thinh.Nguyen@synopsys.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=niklas.neronin@linux.intel.com \
    --cc=oneukum@suse.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox