Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
To: "stern@rowland.harvard.edu" <stern@rowland.harvard.edu>
Cc: Thinh Nguyen <Thinh.Nguyen@synopsys.com>,
	Mathias Nyman <mathias.nyman@linux.intel.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"michal.pecio@gmail.com" <michal.pecio@gmail.com>,
	"oneukum@suse.com" <oneukum@suse.com>,
	"niklas.neronin@linux.intel.com" <niklas.neronin@linux.intel.com>
Subject: Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
Date: Sat, 4 Apr 2026 20:41:36 +0000	[thread overview]
Message-ID: <20260404204133.3mcizeeokw3ln5r4@synopsys.com> (raw)
In-Reply-To: <616e2a64-6feb-4ee6-bf39-a6284549f18f@rowland.harvard.edu>

On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote:
> On Sat, Apr 04, 2026 at 01:15:52AM +0000, Thinh Nguyen wrote:
> > > > How about this:
> > > > 
> > > > Introduce a halted flag the following conditions:
> > > > 
> > > > * Introduce the halted flag in usb_host_endpoint
> > > > * The halted flag must be implemented as a bit in a unsigned long so
> > > >   we can use atomic bit operation
> > > 
> > > I don't see why it needs to use atomic operations.  Unless you're 
> > > thinking that we might want to add other bitflags into the same unsigned 
> > > long?
> > 
> > Now that I think about it again, it's not needed if we have the
> > requirement for clearing the flag only after usb_reset_endpoint.
> > 
> > > 
> > > > * Only the HCD may set the halted flag, and only upon checking the
> > > >   first URB completing with a halted status
> > > 
> > > Every status other than 0 should mean that the endpoint's queue is 
> > > halted.  But not all statuses require a clear-halt or reset-endpoint to 
> > > recover.  For instance, a short transfer when the URB_SHORT_NOT_OK flag 
> > > is set.
> > 
> > It seems we're using the "halted" flag for different things. The halted
> > flag to me is the endpoint's state and rather than the endpoint queue
> > state. That is, if the endpoint is halted, there's an error that was
> > reported on the pipe. So, an -ECONNRESET would not cause a halted
> > endpoint.
> > 
> > > 
> > > Why do you want the HCD to set the flag instead of 
> > > usb_hcd_giveback_urb()?  Just one central place is simpler than in every 
> > > HCD.
> > 
> > I'm just thinking in term of whose role to return the pending URBs.
> > Typically the HCD handles when to give back URBs. If the HCD were to
> > handle giving back pending URBs due to halted endpoint, then it should
> > be the one to set the halted flag. However, if the core tracks and does
> > both setting and clearing of the halted flag, then the core should
> > handle canceling the URBs. As you said, if we can shift the burden to
> > the core, that would help keep the flow consistent and centralized.
> > 
> > > 
> > > > * Only the USB core may clear the halted flag, and only after
> > > >   usb_reset_endpoint returns (this makes sure the HCD drained and reset
> > > >   the endpoint before the flag is cleared and new URBs are accepted)
> > > > * The usb_reset_endpoint must be called after clear-halt, SetInterface,
> > > >   and SetConfiguration.
> > > 
> > > That's easy to do just by adding it into the appropriate core routines.
> > 
> > Yes.
> > 
> > > 
> > > > * The USB core will not attempt to unlink pending URBs due to halted
> > > >   condition
> > > > * The HCD is responsible for completing or canceling queued URBs
> > > >   when the halted flag is set. Cancelled and newly submitted URBs will
> > > >   be returned with -EPIPE as long as the halted flag is set
> > > 
> > > Why make the HCD responsible for draining the queue?  It's like setting 
> > > the halted flag; one central place is simpler and less error-prone than 
> > > lots of separate places.
> > > 
> > > For newly submitted URBs, should the submission fail with -EPIPE, or 
> > > should the submission succeed and then the URB complete with -EPIPE?  
> > > The first is simpler, but drivers probably aren't prepared for it -- it 
> > > would mean adding error handling to the submission code as well as to 
> > > the completion handler code.
> > > 
> > > (Actually, that wouldn't matter unless the driver was queuing up 
> > > multiple URBs, in which case it should be prepared to handle submission 
> > > errors in the middle.)
> > 
> > I didn't think about failing immediately on submission because we would
> > need to audit every class driver for that. Perhaps the core can
> > intercept the URBs and immediately schedule a completion handler with
> > the updated URB's status?
> > 
> > > 
> > > > * The class driver is responsible to check the halted flag to
> > > >   determine whether to initiate error recovery via usb_clear_halt
> > > > 
> > > > I'm trying to keep a clear separation of responsibilities between HCD
> > > > and the USB core. Also, I try to keep the halted flag more closely match
> > > > the state of the endpoint. 
> > > > 
> > > > Let me know what you think?
> > > 
> > > Making the flag match the endpoint state is a good idea.  But there is 
> > > some ambiguity: Do you mean the state in the device, or the state in the 
> > > HC hardware, or the state in the HCD?  After all, these aren't always 
> > > the same.  For instance, unlinking an URB won't affect the device's 
> > > state but it will affect the state on the host side.
> > > 
> > 
> > The HCD state. If we can let the core manage the halted state, then here
> > are the points where the halted state is updated (hopefully I didn't
> > miss any other ones):
> > 
> > Endpoint is halted (reported by HCD):
> > * URB returned with -EPROTO or -EPIPE
> > 
> > Endpoint is active (updated by the core):
> > * SetConfiguration, SetInterface, clear-halt
> 
> Hmmm.  What did you think of my recent proposal in a message to Michal?  
> Summarizing:
> 
> If the class driver wants to unlink queued URBs when a transaction error 
> occurs, it has to do so itself in the failed URB's completion handler.  
> We can make this easier by adding a usb_flush_endpoint_queue() routine 
> to the core.  In the meantime, the HCD ensures that the queue is stopped 
> before giving back the URB.  (Note: -EPIPE, -ENOENT, -ECONNRESET, and 
> -EREMOTEIO aren't considered to be transaction errors.)
> 
> When the completion handler returns, the core will automatically call 
> usb_clear_halt(), which will also reset the endpoint on the host side 
> and will restart the queue.  This also happens after SetConfiguration 
> and SetInterface, either explicitly or implicitly.

I like that the core will handle this automatically. But one concern:
How will the class driver know when the clear-halt complete so it can
perform the recovery? (ie. it shouldn't perform recovery immediately
after seeing -EPROTO)

> 
> For -EPIPE (device sent a STALL token), the class driver has to clear 
> the halt itself.  This is because stalls aren't errors; they are an 
> intentional part of the USB protocol.
> 
> -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet 
> received with URB_SHORT_NOT_OK set) are a little trickier.  The HCD may 
> or may not need to stop the queue for an unlink, possibly depending on 
> whether the URB being unlinked is the active one.  When a short packet 
> is received, the HC hardware may or may not stop its queue.  Either way, 
> the class driver shouldn't need to take any special recovery action; any 
> necessary actions should be taken automatically by the HCD and the core.
> 
> All of this applies only to bulk and interrupt endpoints.  Control 
> endpoints generally need error recovery only on the host side, because 
> the device resets automatically when it gets a new SETUP packet, and so 
> the HCD should handle whatever is needed.  Isochronous endpoints don't 
> ever halt and they shouldn't need to be reset when an error occurs.
> 
> Overall, this seems simpler than anything else we have discussed.
> 

The rest sounds good to me!

Thanks Alan,
Thinh

next prev parent reply	other threads:[~2026-04-04 20:42 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman
2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
2026-03-25  1:52   ` Thinh Nguyen
2026-03-25  9:38     ` Mathias Nyman
2026-03-26  1:19       ` Thinh Nguyen
2026-03-26 11:25         ` Mathias Nyman
2026-03-26 23:24           ` Thinh Nguyen
2026-03-30 12:51             ` Mathias Nyman
2026-03-30 14:17               ` stern
2026-03-31  9:34                 ` Mathias Nyman
2026-03-31 15:31                   ` stern
2026-04-01 22:08                     ` Mathias Nyman
2026-04-02  2:36                       ` stern
2026-04-03  1:59                         ` Thinh Nguyen
2026-04-03  2:42                           ` stern
2026-04-03  8:51                             ` Michal Pecio
2026-04-03 14:55                               ` stern
2026-04-03 19:13                                 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio
2026-04-03 20:17                                   ` stern
2026-04-04  1:15                             ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
2026-04-04  1:54                               ` stern
2026-04-04 20:41                                 ` Thinh Nguyen [this message]
2026-04-04 21:54                                   ` Alan Stern
2026-04-04 22:15                                     ` Thinh Nguyen
2026-04-04 22:28                                       ` Thinh Nguyen
2026-04-05  1:30                                         ` Alan Stern
2026-04-05  3:10                                           ` Thinh Nguyen
2026-04-07 15:23                                             ` Alan Stern
2026-04-07 20:24                                               ` Mathias Nyman
2026-04-01 22:08               ` Thinh Nguyen
2026-04-01 22:34                 ` Mathias Nyman
2026-04-01 22:47                   ` Thinh Nguyen
2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260404204133.3mcizeeokw3ln5r4@synopsys.com \
    --to=thinh.nguyen@synopsys.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=michal.pecio@gmail.com \
    --cc=niklas.neronin@linux.intel.com \
    --cc=oneukum@suse.com \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox