From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
To: "stern@rowland.harvard.edu" <stern@rowland.harvard.edu>
Cc: Thinh Nguyen <Thinh.Nguyen@synopsys.com>,
Mathias Nyman <mathias.nyman@linux.intel.com>,
"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
"michal.pecio@gmail.com" <michal.pecio@gmail.com>,
"oneukum@suse.com" <oneukum@suse.com>,
"niklas.neronin@linux.intel.com" <niklas.neronin@linux.intel.com>
Subject: Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
Date: Sat, 4 Apr 2026 20:41:36 +0000 [thread overview]
Message-ID: <20260404204133.3mcizeeokw3ln5r4@synopsys.com> (raw)
In-Reply-To: <616e2a64-6feb-4ee6-bf39-a6284549f18f@rowland.harvard.edu>
On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote:
> On Sat, Apr 04, 2026 at 01:15:52AM +0000, Thinh Nguyen wrote:
> > > > How about this:
> > > >
> > > > Introduce a halted flag the following conditions:
> > > >
> > > > * Introduce the halted flag in usb_host_endpoint
> > > > * The halted flag must be implemented as a bit in a unsigned long so
> > > > we can use atomic bit operation
> > >
> > > I don't see why it needs to use atomic operations. Unless you're
> > > thinking that we might want to add other bitflags into the same unsigned
> > > long?
> >
> > Now that I think about it again, it's not needed if we have the
> > requirement for clearing the flag only after usb_reset_endpoint.
> >
> > >
> > > > * Only the HCD may set the halted flag, and only upon checking the
> > > > first URB completing with a halted status
> > >
> > > Every status other than 0 should mean that the endpoint's queue is
> > > halted. But not all statuses require a clear-halt or reset-endpoint to
> > > recover. For instance, a short transfer when the URB_SHORT_NOT_OK flag
> > > is set.
> >
> > It seems we're using the "halted" flag for different things. The halted
> > flag to me is the endpoint's state and rather than the endpoint queue
> > state. That is, if the endpoint is halted, there's an error that was
> > reported on the pipe. So, an -ECONNRESET would not cause a halted
> > endpoint.
> >
> > >
> > > Why do you want the HCD to set the flag instead of
> > > usb_hcd_giveback_urb()? Just one central place is simpler than in every
> > > HCD.
> >
> > I'm just thinking in term of whose role to return the pending URBs.
> > Typically the HCD handles when to give back URBs. If the HCD were to
> > handle giving back pending URBs due to halted endpoint, then it should
> > be the one to set the halted flag. However, if the core tracks and does
> > both setting and clearing of the halted flag, then the core should
> > handle canceling the URBs. As you said, if we can shift the burden to
> > the core, that would help keep the flow consistent and centralized.
> >
> > >
> > > > * Only the USB core may clear the halted flag, and only after
> > > > usb_reset_endpoint returns (this makes sure the HCD drained and reset
> > > > the endpoint before the flag is cleared and new URBs are accepted)
> > > > * The usb_reset_endpoint must be called after clear-halt, SetInterface,
> > > > and SetConfiguration.
> > >
> > > That's easy to do just by adding it into the appropriate core routines.
> >
> > Yes.
> >
> > >
> > > > * The USB core will not attempt to unlink pending URBs due to halted
> > > > condition
> > > > * The HCD is responsible for completing or canceling queued URBs
> > > > when the halted flag is set. Cancelled and newly submitted URBs will
> > > > be returned with -EPIPE as long as the halted flag is set
> > >
> > > Why make the HCD responsible for draining the queue? It's like setting
> > > the halted flag; one central place is simpler and less error-prone than
> > > lots of separate places.
> > >
> > > For newly submitted URBs, should the submission fail with -EPIPE, or
> > > should the submission succeed and then the URB complete with -EPIPE?
> > > The first is simpler, but drivers probably aren't prepared for it -- it
> > > would mean adding error handling to the submission code as well as to
> > > the completion handler code.
> > >
> > > (Actually, that wouldn't matter unless the driver was queuing up
> > > multiple URBs, in which case it should be prepared to handle submission
> > > errors in the middle.)
> >
> > I didn't think about failing immediately on submission because we would
> > need to audit every class driver for that. Perhaps the core can
> > intercept the URBs and immediately schedule a completion handler with
> > the updated URB's status?
> >
> > >
> > > > * The class driver is responsible to check the halted flag to
> > > > determine whether to initiate error recovery via usb_clear_halt
> > > >
> > > > I'm trying to keep a clear separation of responsibilities between HCD
> > > > and the USB core. Also, I try to keep the halted flag more closely match
> > > > the state of the endpoint.
> > > >
> > > > Let me know what you think?
> > >
> > > Making the flag match the endpoint state is a good idea. But there is
> > > some ambiguity: Do you mean the state in the device, or the state in the
> > > HC hardware, or the state in the HCD? After all, these aren't always
> > > the same. For instance, unlinking an URB won't affect the device's
> > > state but it will affect the state on the host side.
> > >
> >
> > The HCD state. If we can let the core manage the halted state, then here
> > are the points where the halted state is updated (hopefully I didn't
> > miss any other ones):
> >
> > Endpoint is halted (reported by HCD):
> > * URB returned with -EPROTO or -EPIPE
> >
> > Endpoint is active (updated by the core):
> > * SetConfiguration, SetInterface, clear-halt
>
> Hmmm. What did you think of my recent proposal in a message to Michal?
> Summarizing:
>
> If the class driver wants to unlink queued URBs when a transaction error
> occurs, it has to do so itself in the failed URB's completion handler.
> We can make this easier by adding a usb_flush_endpoint_queue() routine
> to the core. In the meantime, the HCD ensures that the queue is stopped
> before giving back the URB. (Note: -EPIPE, -ENOENT, -ECONNRESET, and
> -EREMOTEIO aren't considered to be transaction errors.)
>
> When the completion handler returns, the core will automatically call
> usb_clear_halt(), which will also reset the endpoint on the host side
> and will restart the queue. This also happens after SetConfiguration
> and SetInterface, either explicitly or implicitly.
I like that the core will handle this automatically. But one concern:
How will the class driver know when the clear-halt complete so it can
perform the recovery? (ie. it shouldn't perform recovery immediately
after seeing -EPROTO)
>
> For -EPIPE (device sent a STALL token), the class driver has to clear
> the halt itself. This is because stalls aren't errors; they are an
> intentional part of the USB protocol.
>
> -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet
> received with URB_SHORT_NOT_OK set) are a little trickier. The HCD may
> or may not need to stop the queue for an unlink, possibly depending on
> whether the URB being unlinked is the active one. When a short packet
> is received, the HC hardware may or may not stop its queue. Either way,
> the class driver shouldn't need to take any special recovery action; any
> necessary actions should be taken automatically by the HCD and the core.
>
> All of this applies only to bulk and interrupt endpoints. Control
> endpoints generally need error recovery only on the host side, because
> the device resets automatically when it gets a new SETUP packet, and so
> the HCD should handle whatever is needed. Isochronous endpoints don't
> ever halt and they shouldn't need to be reset when an error occurs.
>
> Overall, this seems simpler than anything else we have discussed.
>
The rest sounds good to me!
Thanks Alan,
Thinh
next prev parent reply other threads:[~2026-04-04 20:42 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman
2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
2026-03-25 1:52 ` Thinh Nguyen
2026-03-25 9:38 ` Mathias Nyman
2026-03-26 1:19 ` Thinh Nguyen
2026-03-26 11:25 ` Mathias Nyman
2026-03-26 23:24 ` Thinh Nguyen
2026-03-30 12:51 ` Mathias Nyman
2026-03-30 14:17 ` stern
2026-03-31 9:34 ` Mathias Nyman
2026-03-31 15:31 ` stern
2026-04-01 22:08 ` Mathias Nyman
2026-04-02 2:36 ` stern
2026-04-03 1:59 ` Thinh Nguyen
2026-04-03 2:42 ` stern
2026-04-03 8:51 ` Michal Pecio
2026-04-03 14:55 ` stern
2026-04-03 19:13 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio
2026-04-03 20:17 ` stern
2026-04-04 1:15 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
2026-04-04 1:54 ` stern
2026-04-04 20:41 ` Thinh Nguyen [this message]
2026-04-04 21:54 ` Alan Stern
2026-04-04 22:15 ` Thinh Nguyen
2026-04-04 22:28 ` Thinh Nguyen
2026-04-05 1:30 ` Alan Stern
2026-04-05 3:10 ` Thinh Nguyen
2026-04-07 15:23 ` Alan Stern
2026-04-07 20:24 ` Mathias Nyman
2026-04-01 22:08 ` Thinh Nguyen
2026-04-01 22:34 ` Mathias Nyman
2026-04-01 22:47 ` Thinh Nguyen
2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260404204133.3mcizeeokw3ln5r4@synopsys.com \
--to=thinh.nguyen@synopsys.com \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@linux.intel.com \
--cc=michal.pecio@gmail.com \
--cc=niklas.neronin@linux.intel.com \
--cc=oneukum@suse.com \
--cc=stern@rowland.harvard.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox