public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Mathias Nyman <mathias.nyman@linux.intel.com>,
	Thinh Nguyen <Thinh.Nguyen@synopsys.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"oneukum@suse.com" <oneukum@suse.com>,
	"niklas.neronin@linux.intel.com" <niklas.neronin@linux.intel.com>
Subject: Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
Date: Sat, 18 Apr 2026 11:21:46 +0200	[thread overview]
Message-ID: <20260418112146.3ae60b58.michal.pecio@gmail.com> (raw)
In-Reply-To: <e89a6683-4570-4ca6-95ea-0e9932614974@rowland.harvard.edu>

On Fri, 17 Apr 2026 22:34:58 -0400, Alan Stern wrote:
> Okay, good, we'll require all HCDs to reset control endpoints 
> automatically after every error and stall.

Are they not doing it?

Say that something like lsusb encounters protocol stall while another
URB from the class driver is pending, will the other URB time out just
because host endpoint halted on an earlier one?

> > Currently, by the time the URB is given back, its endpoint is
> > already in a "stopped but runnable" state and its sequence state is
> > zeroed. And it may have already been restarted if there are more
> > pending URBs.  
> 
> Ah, I was going to ask about that.  This will be different from the
> way bulk and interrupt endpoints will behave, but I think it is
> acceptable. Control endpoints aren't used for anything that requires
> high throughput; if a driver wants an error to prevent later
> transfers from starting right away then it can simply avoid
> submitting those later transfers until the earlier ones have
> completed.

Or it could unlink if the async giveback race is fixed with a new
callback separate from endpoint_reset(), but IDK if any demand exists.

Same thing with "chain unlinking" - unlink one URB and expect others
not to execute so that unlink completion can unlink them later. Looks
odd, but it's guaranteed by kerneldocs. And currently broken.

> > > Recovery from a transaction error on a bulk or interrupt endpoint 
> > > involves sending a Clear-Halt request to the device.  But if the
> > > error was caused by some sort of transient interference that
> > > hasn't ended yet, the Clear-Halt might itself fail with the same
> > > error. To handle this, we should retry the Clear-Halt at
> > > increasing time intervals. At what point should the core give up?
> > 
> > Good question, I don't know. One thing I noticed is that Windows
> > does tend to lose patience with completely unresponsive devices and
> > kicks them out, but I don't know the exact criteria.  
> 
> Two reasonable possibilities are 250 ms (because that's about how
> long an intermediate hub might take to notify the core about a
> disconnect) or 5 seconds (the normal timeout for control transfers).
> Of course, 5 seconds is an awfully long time to wait for a mouse or
> keyboard to recover, so maybe something in between would be best.

What happens after giving up? If control requests don't work, most
likely nothing works anyway. Reset may work, or not if it's bad cable.

Retrying too long may cause class drivers to time out on pending URBs,
not sure if it matters. Drivers may have no way to distinguish this
from any other timeout, not sure if this matters either.

> I will set things up so that an extraneous clear-halt (such as one 
> submitted by the driver) will prevent the core from doing its own.
> This leaves the possibility of the core clearing the halt and
> restarting the endpoint and then the driver doing it again, while the
> endpoint is running and the queue is nonempty.  Hopefully drivers
> avoid doing this.

Yes, that's just dodgy, what would such driver even expect to happen?
An URB may be in progress and then what? On xHCI we would need to throw
out this URB, so it simply isn't supported.

> But if it helps, I could print a warning if usb_clear_halt() is
> called for an endpoint that isn't stopped and has a nonempty queue.

Not sure what core considers a "stopped" endpoint. FYI, xhci-hcd
logs dev_err() when reset is attempted while URBs are running.

> > A related issue is clearing TT buffers. AFAIK this has no retries,
> > it fails silently and leaves the endpoint potentially broken, and
> > it is waited for to complete in case of usb_set_interface().  
> 
> Is there anything we can do besides calling usb_clear_halt() and 
> usb_reset_endpoint()?  If not, and data loss is unavoidable, then so
> be it.

If this "clear-halt by usbcore" materializes and survives confrotation
with the real world, it could make sense to look into combining TT
clearing with it. It's a similar thing, but tracked separately now.

One thing that could reduce data loss is never giving up on those
control requests, or resetting/disconnecting the device if giving up.
It's a general problem that control requests can fail and nobody has
much idea what to do then. Some drivers ignore errors. If the device
returns to operation, it may end up running in an unknown state.

This is apparently rare enough that nobody complains, though on
low- and full-speed it's relatively easy to produce artificially with
a particularly defective cable.

Regards,
Michal

  reply	other threads:[~2026-04-18  9:21 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman
2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
2026-03-25  1:52   ` Thinh Nguyen
2026-03-25  9:38     ` Mathias Nyman
2026-03-26  1:19       ` Thinh Nguyen
2026-03-26 11:25         ` Mathias Nyman
2026-03-26 23:24           ` Thinh Nguyen
2026-03-30 12:51             ` Mathias Nyman
2026-03-30 14:17               ` stern
2026-03-31  9:34                 ` Mathias Nyman
2026-03-31 15:31                   ` stern
2026-04-01 22:08                     ` Mathias Nyman
2026-04-02  2:36                       ` stern
2026-04-03  1:59                         ` Thinh Nguyen
2026-04-03  2:42                           ` stern
2026-04-03  8:51                             ` Michal Pecio
2026-04-03 14:55                               ` stern
2026-04-03 19:13                                 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio
2026-04-03 20:17                                   ` stern
2026-04-04  1:15                             ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
2026-04-04  1:54                               ` stern
2026-04-04 20:41                                 ` Thinh Nguyen
2026-04-04 21:54                                   ` Alan Stern
2026-04-04 22:15                                     ` Thinh Nguyen
2026-04-04 22:28                                       ` Thinh Nguyen
2026-04-05  1:30                                         ` Alan Stern
2026-04-05  3:10                                           ` Thinh Nguyen
2026-04-07 15:23                                             ` Alan Stern
2026-04-07 20:24                                               ` Mathias Nyman
2026-04-17 17:38                                                 ` Alan Stern
2026-04-17 21:48                                                   ` Michal Pecio
2026-04-18  2:34                                                     ` Alan Stern
2026-04-18  9:21                                                       ` Michal Pecio [this message]
2026-04-18 14:56                                                         ` Alan Stern
2026-04-01 22:08               ` Thinh Nguyen
2026-04-01 22:34                 ` Mathias Nyman
2026-04-01 22:47                   ` Thinh Nguyen
2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260418112146.3ae60b58.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=Thinh.Nguyen@synopsys.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=niklas.neronin@linux.intel.com \
    --cc=oneukum@suse.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox