* [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO @ 2026-03-23 12:25 Mathias Nyman 2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman 2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman 0 siblings, 2 replies; 33+ messages in thread From: Mathias Nyman @ 2026-03-23 12:25 UTC (permalink / raw) To: linux-usb Cc: stern, Thinh.Nguyen, michal.pecio, oneukum, niklas.neronin, Mathias Nyman Hi These patches are related to the 'correctly handling EPROTO' discussion, and aim to fix the xhci endpoint automatic restart issue. It also makes sure that multi TD URBs are given back with proper EPROTO/EPIPE status if a stall/transaction error is triggered on any TD of the URB, not just the last one. endpoints will still be restarted when a new URB is queued, this does not fix the issue with BH URBs completion, but will print a debug message if a new URB is queued before previous 'tainted' pending URBs are cancelled. I haven't been able to test the actual codepaths in these patches yet. Can't easily queue several URBs and somehow trigger stall or transaction error on early URBs, but would like to get feeback on this solution early. Thanks Mathias Mathias Nyman (2): xhci: prevent automatic endpoint restart after stall or error xhci: Ensure URB is given back when endpoint halts on a multi-TD URB drivers/usb/host/xhci-ring.c | 50 +++++++++++++++++++++++++++++++----- drivers/usb/host/xhci.c | 7 +++++ drivers/usb/host/xhci.h | 4 ++- 3 files changed, 53 insertions(+), 8 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 33+ messages in thread
* [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman @ 2026-03-23 12:25 ` Mathias Nyman 2026-03-25 1:52 ` Thinh Nguyen 2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman 1 sibling, 1 reply; 33+ messages in thread From: Mathias Nyman @ 2026-03-23 12:25 UTC (permalink / raw) To: linux-usb Cc: stern, Thinh.Nguyen, michal.pecio, oneukum, niklas.neronin, Mathias Nyman Avoid automatically restarting bulk or interrupt transfers after a URB is given back due to stall or error. Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when it halted. The actual TD the endpoint halted on is marked TD_HALTED, and its URB is given back with proper EPROTO or EPIPE error code. Don't automatically restart an endpoint if the next queued TD after the TD_HALTED one is marked tainted. class driver is expected to cancel all those pending URBs before submitting more URBs, or before clearing halt, but is not forced to. Always start the endpoint when a new URB is queued, but print a debug message if there are still tainted URBs left uncancelled. The same code is used for clearing up cached TDs from xHC in both the URB stall/error case as when a class driver cancels URBs. In the cancel case endpoint is restarted after removing the cancelled URB. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> --- drivers/usb/host/xhci-ring.c | 48 ++++++++++++++++++++++++++++++------ drivers/usb/host/xhci.c | 7 ++++++ drivers/usb/host/xhci.h | 4 ++- 3 files changed, 51 insertions(+), 8 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 1cbefee3c4ca..f01f343a7e37 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -981,6 +981,31 @@ static int xhci_reset_halted_ep(struct xhci_hcd *xhci, unsigned int slot_id, return ret; } +/* + * Mark the TD the endpoint halted on as TD_HALTED and add it to the cancelled + * td list, ensuring invalidate_cancelled_tds() clears the TD from xHC cache. + * Mark the other TDs on that bulk or interrupt endpoint as TD_TAINTED to + * prevent the ring from being restarted too early. + */ + +static void xhci_cancel_halted_tds(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, + struct xhci_td *td) +{ + struct xhci_ring *ring; + struct xhci_td *tdi; + + ring = xhci_urb_to_transfer_ring(xhci, td->urb); + + td->cancel_status = TD_HALTED; + if (list_empty(&td->cancelled_td_list)) + list_add_tail(&td->cancelled_td_list, &ep->cancelled_td_list); + + list_for_each_entry(tdi, &ring->td_list, td_list) { + if (!tdi->cancel_status && ring->type != TYPE_CTRL) + tdi->cancel_status = TD_TAINTED; + } +} + static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, struct xhci_td *td, @@ -999,10 +1024,8 @@ static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci, /* add td to cancelled list and let reset ep handler take care of it */ if (reset_type == EP_HARD_RESET) { ep->ep_state |= EP_HARD_CLEAR_TOGGLE; - if (td && list_empty(&td->cancelled_td_list)) { - list_add_tail(&td->cancelled_td_list, &ep->cancelled_td_list); - td->cancel_status = TD_HALTED; - } + if (td) + xhci_cancel_halted_tds(xhci, ep, td); } if (ep->ep_state & EP_HALTED) { @@ -1079,6 +1102,7 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep) case TD_CLEARED: /* TD is already no-op */ case TD_CLEARING_CACHE: /* set TR deq command already queued */ break; + case TD_TAINTED: case TD_DIRTY: /* TD is cached, clear it */ case TD_HALTED: case TD_CLEARING_CACHE_DEFERRED: @@ -1406,6 +1430,14 @@ void xhci_hc_died(struct xhci_hcd *xhci) usb_hc_died(xhci_to_hcd(xhci)); } +bool next_td_is_tainted(struct xhci_ring *ring) +{ + struct xhci_td *td; + + td = list_first_entry_or_null(&ring->td_list, struct xhci_td, td_list); + return td && td->cancel_status == TD_TAINTED; +} + /* * When we get a completion for a Set Transfer Ring Dequeue Pointer command, * we need to clear the set deq pending flag in the endpoint ring state, so that @@ -1543,13 +1575,15 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id, __func__); xhci_invalidate_cancelled_tds(ep); /* Try to restart the endpoint if all is done */ - ring_doorbell_for_active_rings(xhci, slot_id, ep_index); + if (!next_td_is_tainted(ep_ring)) + ring_doorbell_for_active_rings(xhci, slot_id, ep_index); /* Start giving back any TDs invalidated above */ xhci_giveback_invalidated_tds(ep); } else { /* Restart any rings with pending URBs */ - xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__); - ring_doorbell_for_active_rings(xhci, slot_id, ep_index); + xhci_dbg(ep->xhci, "%s: All cancelled TDs cleared\n", __func__); + if (!next_td_is_tainted(ep_ring)) + ring_doorbell_for_active_rings(xhci, slot_id, ep_index); } } diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index ef6d8662adec..8f417836ae1f 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -1625,6 +1625,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag unsigned long flags; int ret = 0; unsigned int slot_id, ep_index; + struct xhci_ring *ring; unsigned int *ep_state; struct urb_priv *urb_priv; int num_tds; @@ -1661,6 +1662,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag } slot_id = urb->dev->slot_id; + ring = xhci_urb_to_transfer_ring(xhci, urb); if (!HCD_HW_ACCESSIBLE(hcd)) { ret = -ESHUTDOWN; @@ -1694,6 +1696,10 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag goto free_priv; } + ring = xhci_urb_to_transfer_ring(xhci, urb); + if (ring && next_td_is_tainted(ring)) + xhci_dbg(xhci, "WARN, enqueue URB without canceling old tainted URBs\n"); + switch (usb_endpoint_type(&urb->ep->desc)) { case USB_ENDPOINT_XFER_CONTROL: @@ -3354,6 +3360,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd, /* Bail out if toggle is already being cleared by a endpoint reset */ spin_lock_irqsave(&xhci->lock, flags); + if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) { ep->ep_state &= ~EP_HARD_CLEAR_TOGGLE; spin_unlock_irqrestore(&xhci->lock, flags); diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 2b0796f6d00e..0be040f87abf 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1292,7 +1292,8 @@ struct xhci_segment { }; enum xhci_cancelled_td_status { - TD_DIRTY = 0, + TD_TAINTED = 1, + TD_DIRTY, TD_HALTED, TD_CLEARING_CACHE, TD_CLEARING_CACHE_DEFERRED, @@ -1951,6 +1952,7 @@ void xhci_ring_doorbell_for_active_rings(struct xhci_hcd *xhci, void xhci_cleanup_command_queue(struct xhci_hcd *xhci); void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring); unsigned int count_trbs(u64 addr, u64 len); +bool next_td_is_tainted(struct xhci_ring *ring); int xhci_stop_endpoint_sync(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, int suspend, gfp_t gfp_flags); void xhci_process_cancelled_tds(struct xhci_virt_ep *ep); -- 2.43.0 ^ permalink raw reply related [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman @ 2026-03-25 1:52 ` Thinh Nguyen 2026-03-25 9:38 ` Mathias Nyman 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-03-25 1:52 UTC (permalink / raw) To: Mathias Nyman Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu, Thinh Nguyen, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Mon, Mar 23, 2026, Mathias Nyman wrote: > Avoid automatically restarting bulk or interrupt transfers after a > URB is given back due to stall or error. > > Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when > it halted. The actual TD the endpoint halted on is marked TD_HALTED, > and its URB is given back with proper EPROTO or EPIPE error code. > > Don't automatically restart an endpoint if the next queued TD after > the TD_HALTED one is marked tainted. Sounds good for -EPROTO, but will a clear-halt ring the corresponding the endpoint's doorbell for STALL endpoint? BR, Thinh > > class driver is expected to cancel all those pending URBs before > submitting more URBs, or before clearing halt, but is not forced to. > Always start the endpoint when a new URB is queued, but print a debug > message if there are still tainted URBs left uncancelled. > > The same code is used for clearing up cached TDs from xHC in both > the URB stall/error case as when a class driver cancels URBs. > In the cancel case endpoint is restarted after removing the cancelled > URB. > > Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> > --- > drivers/usb/host/xhci-ring.c | 48 ++++++++++++++++++++++++++++++------ > drivers/usb/host/xhci.c | 7 ++++++ > drivers/usb/host/xhci.h | 4 ++- > 3 files changed, 51 insertions(+), 8 deletions(-) > ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-25 1:52 ` Thinh Nguyen @ 2026-03-25 9:38 ` Mathias Nyman 2026-03-26 1:19 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: Mathias Nyman @ 2026-03-25 9:38 UTC (permalink / raw) To: Thinh Nguyen Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On 3/25/26 03:52, Thinh Nguyen wrote: > On Mon, Mar 23, 2026, Mathias Nyman wrote: >> Avoid automatically restarting bulk or interrupt transfers after a >> URB is given back due to stall or error. >> >> Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when >> it halted. The actual TD the endpoint halted on is marked TD_HALTED, >> and its URB is given back with proper EPROTO or EPIPE error code. >> >> Don't automatically restart an endpoint if the next queued TD after >> the TD_HALTED one is marked tainted. > > Sounds good for -EPROTO, but will a clear-halt ring the corresponding > the endpoint's doorbell for STALL endpoint? > With this change xhci would not restart the stalled endpoint after a clear-halt request. The first usb_enqueue() call after clear-halt would start it. Could make sense to restart the endpoint after a clear-halt, and just add a small debug message if the next queued URB is marked 'tainted'. Thanks for the feedback Mathias ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-25 9:38 ` Mathias Nyman @ 2026-03-26 1:19 ` Thinh Nguyen 2026-03-26 11:25 ` Mathias Nyman 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-03-26 1:19 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Wed, Mar 25, 2026, Mathias Nyman wrote: > On 3/25/26 03:52, Thinh Nguyen wrote: > > On Mon, Mar 23, 2026, Mathias Nyman wrote: > > > Avoid automatically restarting bulk or interrupt transfers after a > > > URB is given back due to stall or error. > > > > > > Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when > > > it halted. The actual TD the endpoint halted on is marked TD_HALTED, > > > and its URB is given back with proper EPROTO or EPIPE error code. > > > > > > Don't automatically restart an endpoint if the next queued TD after > > > the TD_HALTED one is marked tainted. > > > > Sounds good for -EPROTO, but will a clear-halt ring the corresponding > > the endpoint's doorbell for STALL endpoint? > > > > With this change xhci would not restart the stalled endpoint after a clear-halt > request. The first usb_enqueue() call after clear-halt would start it. > > Could make sense to restart the endpoint after a clear-halt, and just add a small > debug message if the next queued URB is marked 'tainted'. > The -EPROTO should be handled differently than -EPIPE. A STALL endpoint is part of a normal usb flow. Should the class driver submit a new URB while the endpoint is STALL, we would always expect a STALL error response after the endpoint is restarted. That's not the case with -EPROTO where the data may be corrupted and/or the host and device are out of sync. We should not continue until the class driver do some recovery. IMHO, we can keep the handling of -EPIPE how it was before. Let the xHC tell whether the STALL error still persists instead of managing it by the xhci driver. Thanks, Thinh ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-26 1:19 ` Thinh Nguyen @ 2026-03-26 11:25 ` Mathias Nyman 2026-03-26 23:24 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: Mathias Nyman @ 2026-03-26 11:25 UTC (permalink / raw) To: Thinh Nguyen Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On 3/26/26 03:19, Thinh Nguyen wrote: > On Wed, Mar 25, 2026, Mathias Nyman wrote: >> On 3/25/26 03:52, Thinh Nguyen wrote: >>> On Mon, Mar 23, 2026, Mathias Nyman wrote: >>>> Avoid automatically restarting bulk or interrupt transfers after a >>>> URB is given back due to stall or error. >>>> >>>> Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when >>>> it halted. The actual TD the endpoint halted on is marked TD_HALTED, >>>> and its URB is given back with proper EPROTO or EPIPE error code. >>>> >>>> Don't automatically restart an endpoint if the next queued TD after >>>> the TD_HALTED one is marked tainted. >>> >>> Sounds good for -EPROTO, but will a clear-halt ring the corresponding >>> the endpoint's doorbell for STALL endpoint? >>> >> >> With this change xhci would not restart the stalled endpoint after a clear-halt >> request. The first usb_enqueue() call after clear-halt would start it. >> >> Could make sense to restart the endpoint after a clear-halt, and just add a small >> debug message if the next queued URB is marked 'tainted'. >> > > The -EPROTO should be handled differently than -EPIPE. A STALL endpoint > is part of a normal usb flow. Should the class driver submit a new URB > while the endpoint is STALL, we would always expect a STALL error > response after the endpoint is restarted. That's not the case with > -EPROTO where the data may be corrupted and/or the host and device are > out of sync. We should not continue until the class driver do some > recovery. IMHO, we can keep the handling of -EPIPE how it was before. > Let the xHC tell whether the STALL error still persists instead of > managing it by the xhci driver. > I agree that that we should restart the endpoint if class/core enqueues a new URB _after_ xhci gave back an URB with EPIPE after endpoint STALL. But I don't think we should restart the ring to continue processing URBs that were queued before the endpoint stalled. This would prevent the class/core from even attempting to retire the pending URBs, something USB2.0 spec, '5.8.5 Bulk Transfer Data Sequences' requires: "If a halt condition is detected on a bulk pipe due to transmission errors or a STALL handshake being returned from the endpoint, all pending IRPs are retired. Removal of the halt condition is achieved via software intervention through a separate control pipe." Thanks Mathias ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-26 11:25 ` Mathias Nyman @ 2026-03-26 23:24 ` Thinh Nguyen 2026-03-30 12:51 ` Mathias Nyman 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-03-26 23:24 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Thu, Mar 26, 2026, Mathias Nyman wrote: > On 3/26/26 03:19, Thinh Nguyen wrote: > > On Wed, Mar 25, 2026, Mathias Nyman wrote: > > > On 3/25/26 03:52, Thinh Nguyen wrote: > > > > On Mon, Mar 23, 2026, Mathias Nyman wrote: > > > > > Avoid automatically restarting bulk or interrupt transfers after a > > > > > URB is given back due to stall or error. > > > > > > > > > > Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when > > > > > it halted. The actual TD the endpoint halted on is marked TD_HALTED, > > > > > and its URB is given back with proper EPROTO or EPIPE error code. > > > > > > > > > > Don't automatically restart an endpoint if the next queued TD after > > > > > the TD_HALTED one is marked tainted. > > > > > > > > Sounds good for -EPROTO, but will a clear-halt ring the corresponding > > > > the endpoint's doorbell for STALL endpoint? > > > > > > > > > > With this change xhci would not restart the stalled endpoint after a clear-halt > > > request. The first usb_enqueue() call after clear-halt would start it. > > > > > > Could make sense to restart the endpoint after a clear-halt, and just add a small > > > debug message if the next queued URB is marked 'tainted'. > > > > > > > The -EPROTO should be handled differently than -EPIPE. A STALL endpoint > > is part of a normal usb flow. Should the class driver submit a new URB > > while the endpoint is STALL, we would always expect a STALL error > > response after the endpoint is restarted. That's not the case with > > -EPROTO where the data may be corrupted and/or the host and device are > > out of sync. We should not continue until the class driver do some > > recovery. IMHO, we can keep the handling of -EPIPE how it was before. > > Let the xHC tell whether the STALL error still persists instead of > > managing it by the xhci driver. > > > I agree that that we should restart the endpoint if class/core enqueues a new > URB _after_ xhci gave back an URB with EPIPE after endpoint STALL. > > But I don't think we should restart the ring to continue processing URBs that > were queued before the endpoint stalled. This would prevent the class/core > from even attempting to retire the pending URBs, something USB2.0 spec, > '5.8.5 Bulk Transfer Data Sequences' requires: > > "If a halt condition is detected on a bulk pipe due to transmission errors or > a STALL handshake being returned from the endpoint, all pending IRPs are > retired. Removal of the halt condition is achieved via software intervention > through a separate control pipe." > Fair point. Then the core will need to track the endpoint's STALL state and parse the clear-halt request to know which endpoint and when to clear the STALL before it can accept new URB. So the first usb_enqueue() call after clear-halt can start the endpoint again. The xhci will also need to have access to this state. Currently you have the xhci driver to "retire" the halted URBs. However, you also noted that class/core may attempt to retire the pending URBs. Who's expected to handle the retirement here? On a separate note, will you plan to implement the clear-halt for EPROTO in xhci? Thanks! Thinh ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-26 23:24 ` Thinh Nguyen @ 2026-03-30 12:51 ` Mathias Nyman 2026-03-30 14:17 ` stern 2026-04-01 22:08 ` Thinh Nguyen 0 siblings, 2 replies; 33+ messages in thread From: Mathias Nyman @ 2026-03-30 12:51 UTC (permalink / raw) To: Thinh Nguyen Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On 3/27/26 01:24, Thinh Nguyen wrote: > On Thu, Mar 26, 2026, Mathias Nyman wrote: >> On 3/26/26 03:19, Thinh Nguyen wrote: >>> On Wed, Mar 25, 2026, Mathias Nyman wrote: >>>> On 3/25/26 03:52, Thinh Nguyen wrote: >>>>> On Mon, Mar 23, 2026, Mathias Nyman wrote: >>>>>> Avoid automatically restarting bulk or interrupt transfers after a >>>>>> URB is given back due to stall or error. >>>>>> >>>>>> Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when >>>>>> it halted. The actual TD the endpoint halted on is marked TD_HALTED, >>>>>> and its URB is given back with proper EPROTO or EPIPE error code. >>>>>> >>>>>> Don't automatically restart an endpoint if the next queued TD after >>>>>> the TD_HALTED one is marked tainted. >>>>> >>>>> Sounds good for -EPROTO, but will a clear-halt ring the corresponding >>>>> the endpoint's doorbell for STALL endpoint? >>>>> >>>> >>>> With this change xhci would not restart the stalled endpoint after a clear-halt >>>> request. The first usb_enqueue() call after clear-halt would start it. >>>> >>>> Could make sense to restart the endpoint after a clear-halt, and just add a small >>>> debug message if the next queued URB is marked 'tainted'. >>>> >>> >>> The -EPROTO should be handled differently than -EPIPE. A STALL endpoint >>> is part of a normal usb flow. Should the class driver submit a new URB >>> while the endpoint is STALL, we would always expect a STALL error >>> response after the endpoint is restarted. That's not the case with >>> -EPROTO where the data may be corrupted and/or the host and device are >>> out of sync. We should not continue until the class driver do some >>> recovery. IMHO, we can keep the handling of -EPIPE how it was before. >>> Let the xHC tell whether the STALL error still persists instead of >>> managing it by the xhci driver. >>> >> I agree that that we should restart the endpoint if class/core enqueues a new >> URB _after_ xhci gave back an URB with EPIPE after endpoint STALL. >> >> But I don't think we should restart the ring to continue processing URBs that >> were queued before the endpoint stalled. This would prevent the class/core >> from even attempting to retire the pending URBs, something USB2.0 spec, >> '5.8.5 Bulk Transfer Data Sequences' requires: >> >> "If a halt condition is detected on a bulk pipe due to transmission errors or >> a STALL handshake being returned from the endpoint, all pending IRPs are >> retired. Removal of the halt condition is achieved via software intervention >> through a separate control pipe." >> > > Fair point. Then the core will need to track the endpoint's STALL state > and parse the clear-halt request to know which endpoint and when to > clear the STALL before it can accept new URB. So the first usb_enqueue() > call after clear-halt can start the endpoint again. The xhci will also > need to have access to this state. Ideally xhci driver would return the URB with EPIPE after STALL, and not continue processing URBs before a clear halt is sent, or a new URB is enqueued. USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB. Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to the bh workqueue. Then after urb->complete() work is called and core is sure class driver is aware of the EPIPE, then core would clear the flag and flush the buffered URBs to the host controller drivers, restarting the ring Class driver urb->complete() would most likely retire/cancel the pending URBs, both the earlier queued 'tainted' URBs, and the most recent 'buffered' URBs in usb core. Class driver should clear the halt, but is free to do whatever it wants. It could choose to send a new URB without clearing the halt, have it processed, trigger STALL again, and get URB returned with EPIPE status. I don't think most class/usb device drivers really handle stall that well. Above might be wishful thinking. > > Currently you have the xhci driver to "retire" the halted URBs. However, > you also noted that class/core may attempt to retire the pending URBs. > Who's expected to handle the retirement here? Maybe we should let core retire the pending URBS, and only fix the xhci driver 'automatic endpoint restart after stall' part after that core change is done. Should cause less regression. > > On a separate note, will you plan to implement the clear-halt for EPROTO > in xhci? I don't think this should be part of xhci driver. Decision to send control requests to the device should be done by core or class drivers. Thanks Mathias ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-30 12:51 ` Mathias Nyman @ 2026-03-30 14:17 ` stern 2026-03-31 9:34 ` Mathias Nyman 2026-04-01 22:08 ` Thinh Nguyen 1 sibling, 1 reply; 33+ messages in thread From: stern @ 2026-03-30 14:17 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Mon, Mar 30, 2026 at 03:51:46PM +0300, Mathias Nyman wrote: > Ideally xhci driver would return the URB with EPIPE after STALL, and not continue > processing URBs before a clear halt is sent, or a new URB is enqueued. > USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this > STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB. > > Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to > the bh workqueue. Then after urb->complete() work is called and core is sure class > driver is aware of the EPIPE, then core would clear the flag and flush the buffered > URBs to the host controller drivers, restarting the ring This is not practical; it would result in a big slowdown for large bulk transfers. Doing this would mean the core could not send an URB to the HCD until all the previous URBs for that endpoint had completed and the interrupt handler had run, which would add significant latency to transfers. Exactly what should happen to URBs coming after one that completes with -EPIPE is not immediately obvious. If the HCD did try to send them to the device right away then they would also be stalled, which is obviously non-productive. The only guarantee the kernel makes about this situation is that the endpoint queue won't restart until all completed or unlinked URBs have been given back (likewise for -EPROTO errors). The only safe course available to class drivers is to unlink all the pending URBs. In theory the core could do this for all drivers automatically, but at present it doesn't. > Class driver urb->complete() would most likely retire/cancel the pending URBs, both the > earlier queued 'tainted' URBs, and the most recent 'buffered' URBs in usb core. > Class driver should clear the halt, but is free to do whatever it wants. > It could choose to send a new URB without clearing the halt, > have it processed, trigger STALL again, and get URB returned with EPIPE status. > > I don't think most class/usb device drivers really handle stall that well. > Above might be wishful thinking. Indeed. We can make life a little easier for drivers, but clearing the endpoint halt is up to them. > > Currently you have the xhci driver to "retire" the halted URBs. However, > > you also noted that class/core may attempt to retire the pending URBs. > > Who's expected to handle the retirement here? > > Maybe we should let core retire the pending URBS, and only fix the xhci driver > 'automatic endpoint restart after stall' part after that core change is done. > > Should cause less regression. > > > > > On a separate note, will you plan to implement the clear-halt for EPROTO > > in xhci? > > I don't think this should be part of xhci driver. Decision to send control requests > to the device should be done by core or class drivers. Here's a troubling consequence for people to consider: Suppose an endpoint queue stops with -EPROTO or -EPIPE, and before the class driver gets around to calling usb_clear_halt(), it is unbound. What happens the next time a driver binds to the device and tries to use the endpoint? In particular: What does xhci-hcd do if an URB is submitted for an endpoint whose queue is currently stopped? Does it reject the submission with some sort of error code, or does it go ahead and add the URB to the end of the non-advancing queue? If the latter is true, how will a newly bound driver ever discover that the queue is stopped? Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-30 14:17 ` stern @ 2026-03-31 9:34 ` Mathias Nyman 2026-03-31 15:31 ` stern 0 siblings, 1 reply; 33+ messages in thread From: Mathias Nyman @ 2026-03-31 9:34 UTC (permalink / raw) To: stern@rowland.harvard.edu Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On 3/30/26 17:17, stern@rowland.harvard.edu wrote: > On Mon, Mar 30, 2026 at 03:51:46PM +0300, Mathias Nyman wrote: >> Ideally xhci driver would return the URB with EPIPE after STALL, and not continue >> processing URBs before a clear halt is sent, or a new URB is enqueued. >> USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this >> STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB. >> >> Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to >> the bh workqueue. Then after urb->complete() work is called and core is sure class >> driver is aware of the EPIPE, then core would clear the flag and flush the buffered >> URBs to the host controller drivers, restarting the ring > > This is not practical; it would result in a big slowdown for large bulk > transfers. Doing this would mean the core could not send an URB to the > HCD until all the previous URBs for that endpoint had completed and the > interrupt handler had run, which would add significant latency to > transfers. The URB submit buffering in core would only take place if endpoint is halted with EPIPE/EPROTO. Additional latency should be limited to the time between interrupt handler returns a URB with EPIPE/EPROTO, and the bh workqueue finishing urb->complete() for that URB URBs would normally be sent to HCD directly. Yes, this might end up being quite complex, need to make sure it solves more issues than it creates. Usb core is aware of the halted endpoint before the class driver due to the bh workqueue giveback. To me it would make sense to make core responsible for babysitting the class driver urb submission for the time it withholds this information. Whole reason for this is to prevent new URB submission from HCD restarting an endpoint after HCD gave back a halted URB, and HCD assuming core/class drivers are aware of the halt when the new URB is submitted. Other option is that usb core would just refuse class driver from submitting URBs during this time. usb_submit_urb() would return with an error, but I think this might impact existing class drivers more. > > Exactly what should happen to URBs coming after one that completes with > -EPIPE is not immediately obvious. If the HCD did try to send them to > the device right away then they would also be stalled, which is > obviously non-productive. The only guarantee the kernel makes > about this situation is that the endpoint queue won't restart > until all completed or unlinked URBs have been given back (likewise for > -EPROTO errors). My take is that endpoint should stop processing URBs, existing pending URBs should be retired by class/core, new URBs should restart the endpoint but new URBs are only permitted _after_ submitter is aware of the halt. A class driver testing USB specification should be able to resubmit a stalled URB (EPIPE) and expect it to complete with EPIPE again until it clears the halt. > > The only safe course available to class drivers is to unlink all the > pending URBs. In theory the core could do this for all drivers > automatically, but at present it doesn't. > >> Class driver urb->complete() would most likely retire/cancel the pending URBs, both the >> earlier queued 'tainted' URBs, and the most recent 'buffered' URBs in usb core. >> Class driver should clear the halt, but is free to do whatever it wants. >> It could choose to send a new URB without clearing the halt, >> have it processed, trigger STALL again, and get URB returned with EPIPE status. >> >> I don't think most class/usb device drivers really handle stall that well. >> Above might be wishful thinking. > > Indeed. We can make life a little easier for drivers, but clearing the > endpoint halt is up to them. > >>> Currently you have the xhci driver to "retire" the halted URBs. However, >>> you also noted that class/core may attempt to retire the pending URBs. >>> Who's expected to handle the retirement here? >> >> Maybe we should let core retire the pending URBS, and only fix the xhci driver >> 'automatic endpoint restart after stall' part after that core change is done. >> >> Should cause less regression. >> >>> >>> On a separate note, will you plan to implement the clear-halt for EPROTO >>> in xhci? >> >> I don't think this should be part of xhci driver. Decision to send control requests >> to the device should be done by core or class drivers. > > Here's a troubling consequence for people to consider: Suppose an > endpoint queue stops with -EPROTO or -EPIPE, and before the class driver > gets around to calling usb_clear_halt(), it is unbound. What happens > the next time a driver binds to the device and tries to use the > endpoint? The disable/enable interface and set config calls during unbind/bind should, if I remember correctly flush pending URBs and drop and re-add the endpoint, clearing the xhci side halt and reset toggle. > > In particular: What does xhci-hcd do if an URB is submitted for an > endpoint whose queue is currently stopped? Does it reject the > submission with some sort of error code, or does it go ahead and add the > URB to the end of the non-advancing queue? If the latter is true, how > will a newly bound driver ever discover that the queue is stopped? xhci-hcd will queue the new URB and restart the ring, if device side endpoint remains halted due to uncleared halt then transfer stalls and URB is returned with EPIPE. Thanks Mathias ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-31 9:34 ` Mathias Nyman @ 2026-03-31 15:31 ` stern 2026-04-01 22:08 ` Mathias Nyman 0 siblings, 1 reply; 33+ messages in thread From: stern @ 2026-03-31 15:31 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Tue, Mar 31, 2026 at 12:34:54PM +0300, Mathias Nyman wrote: > On 3/30/26 17:17, stern@rowland.harvard.edu wrote: > > On Mon, Mar 30, 2026 at 03:51:46PM +0300, Mathias Nyman wrote: > > > Ideally xhci driver would return the URB with EPIPE after STALL, and not continue > > > processing URBs before a clear halt is sent, or a new URB is enqueued. > > > USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this > > > STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB. > > > > > > Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to > > > the bh workqueue. Then after urb->complete() work is called and core is sure class > > > driver is aware of the EPIPE, then core would clear the flag and flush the buffered > > > URBs to the host controller drivers, restarting the ring > > > > This is not practical; it would result in a big slowdown for large bulk > > transfers. Doing this would mean the core could not send an URB to the > > HCD until all the previous URBs for that endpoint had completed and the > > interrupt handler had run, which would add significant latency to > > transfers. > > The URB submit buffering in core would only take place if endpoint is halted > with EPIPE/EPROTO. > > Additional latency should be limited to the time between interrupt handler > returns a URB with EPIPE/EPROTO, and the bh workqueue finishing urb->complete() > for that URB > > URBs would normally be sent to HCD directly. > > Yes, this might end up being quite complex, need to make sure it solves more > issues than it creates. Can that be done precisely? I doubt it -- there are races between the core and the HCD and between the HCD and the hardware. As for doing it imprecisely... I'm not sure the advantages outweigh the complexity. Anyway, if the class driver is well behaved, this buffering won't be needed. > Usb core is aware of the halted endpoint before the class driver due to the bh > workqueue giveback. To me it would make sense to make core responsible for babysitting > the class driver urb submission for the time it withholds this information. > > Whole reason for this is to prevent new URB submission from HCD restarting an endpoint > after HCD gave back a halted URB, and HCD assuming core/class drivers are aware of > the halt when the new URB is submitted. > > Other option is that usb core would just refuse class driver from submitting URBs > during this time. usb_submit_urb() would return with an error, but I think this might > impact existing class drivers more. How about this instead? We add a "halted" flag to the usb_host_endpoint structure, and the core will set this flag whenever a bulk or interrupt URB gets a status other than 0 (before putting the URB on the bh list). If an URB has one of these statuses, when its completion handler returns the core will unlink all the URBs queued to the same endpoint. Finally, the "halted" flag should be cleared after a completion handler returns if there are no more unlinked URBs still in the queue or URBs waiting on the bh list to be given back. The result of this is that any URB remaining in the queue when the flag is cleared must have been submitted by the class driver _after_ the failing URB's completion handler has run. We can assume the class driver knows what it's doing in this case. The endpoint queue shouldn't be restarted until the "halted" flag is cleared. Either right away, if there are any URBs in the queue, or not until the next URB is submitted. Doing this might require a new HCD callback. (It would also mean the kerneldoc for usb_unlink_urb() would need to be updated, because the endpoint might restart before all the completion handlers for the unlinked URBs have run.) What I'm trying to do here is come up with a single, consistent proposal for exactly when halted endpoint queues should restart. Maybe someone else has a better suggestion. We also should add a special case for control endpoints because their halt conditions can get cleared automatically when the device receives a SETUP packet. > > Exactly what should happen to URBs coming after one that completes with > > -EPIPE is not immediately obvious. If the HCD did try to send them to > > the device right away then they would also be stalled, which is > > obviously non-productive. The only guarantee the kernel makes > > about this situation is that the endpoint queue won't restart > > until all completed or unlinked URBs have been given back (likewise for > > -EPROTO errors). > > My take is that endpoint should stop processing URBs, existing pending URBs > should be retired by class/core, new URBs should restart the endpoint but new URBs > are only permitted _after_ submitter is aware of the halt. Yes, that's more or less what my proposal attempts to accomplish. > A class driver testing USB specification should be able to resubmit a stalled URB > (EPIPE) and expect it to complete with EPIPE again until it clears the halt. Agreed. > > Here's a troubling consequence for people to consider: Suppose an > > endpoint queue stops with -EPROTO or -EPIPE, and before the class driver > > gets around to calling usb_clear_halt(), it is unbound. What happens > > the next time a driver binds to the device and tries to use the > > endpoint? > > The disable/enable interface and set config calls during unbind/bind should, > if I remember correctly flush pending URBs and drop and re-add the endpoint, > clearing the xhci side halt and reset toggle. usb_probe_interface() doesn't do any of that stuff, other than a deferred switch to altsetting 0 if needed. usb_unbind_interface() does call usb_enable_interface() if the interface is already in altsetting 0, but the reset_ep argument is false so the endpoint state doesn't get affected. Should that be changed? Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-31 15:31 ` stern @ 2026-04-01 22:08 ` Mathias Nyman 2026-04-02 2:36 ` stern 0 siblings, 1 reply; 33+ messages in thread From: Mathias Nyman @ 2026-04-01 22:08 UTC (permalink / raw) To: stern@rowland.harvard.edu Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On 3/31/26 18:31, stern@rowland.harvard.edu wrote: > > How about this instead? We add a "halted" flag to the usb_host_endpoint > structure, and the core will set this flag whenever a bulk or interrupt > URB gets a status other than 0 (before putting the URB on the bh list). > If an URB has one of these statuses, when its completion handler returns > the core will unlink all the URBs queued to the same endpoint. Finally, > the "halted" flag should be cleared after a completion handler returns > if there are no more unlinked URBs still in the queue or URBs waiting on > the bh list to be given back. > > The result of this is that any URB remaining in the queue when the flag > is cleared must have been submitted by the class driver _after_ the > failing URB's completion handler has run. We can assume the class > driver knows what it's doing in this case. > > The endpoint queue shouldn't be restarted until the "halted" flag is > cleared. Either right away, if there are any URBs in the queue, or not > until the next URB is submitted. Doing this might require a new HCD > callback. (It would also mean the kerneldoc for usb_unlink_urb() would > need to be updated, because the endpoint might restart before all the > completion handlers for the unlinked URBs have run.) > > What I'm trying to do here is come up with a single, consistent proposal > for exactly when halted endpoint queues should restart. Maybe someone > else has a better suggestion. Sounds like a possible solution to me. Just to clarify, core should unlink the remaining URBs queued to that endpoint after setting the "halted" flag, but before URB completion is called. "Halted" flag should be cleared after URB completion returns, and endpoint should be restarted if there are any pending URBs. This allows the class driver URB completion handler to re-queue the halted URB without core unlinking it. > >>> Here's a troubling consequence for people to consider: Suppose an >>> endpoint queue stops with -EPROTO or -EPIPE, and before the class driver >>> gets around to calling usb_clear_halt(), it is unbound. What happens >>> the next time a driver binds to the device and tries to use the >>> endpoint? >> >> The disable/enable interface and set config calls during unbind/bind should, >> if I remember correctly flush pending URBs and drop and re-add the endpoint, >> clearing the xhci side halt and reset toggle. > > usb_probe_interface() doesn't do any of that stuff, other than a > deferred switch to altsetting 0 if needed. > > usb_unbind_interface() does call usb_enable_interface() if the interface > is already in altsetting 0, but the reset_ep argument is false so the > endpoint state doesn't get affected. Should that be changed? Looks like this needs more attention. Interface driver unbind/bind with halted endpoint could be an issue. I don't have an answer right now. Thanks Mathias ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-01 22:08 ` Mathias Nyman @ 2026-04-02 2:36 ` stern 2026-04-03 1:59 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: stern @ 2026-04-02 2:36 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote: > On 3/31/26 18:31, stern@rowland.harvard.edu wrote: > > > > How about this instead? We add a "halted" flag to the usb_host_endpoint > > structure, and the core will set this flag whenever a bulk or interrupt > > URB gets a status other than 0 (before putting the URB on the bh list). > > If an URB has one of these statuses, when its completion handler returns > > the core will unlink all the URBs queued to the same endpoint. Finally, > > the "halted" flag should be cleared after a completion handler returns > > if there are no more unlinked URBs still in the queue or URBs waiting on > > the bh list to be given back. > > > > The result of this is that any URB remaining in the queue when the flag > > is cleared must have been submitted by the class driver _after_ the > > failing URB's completion handler has run. We can assume the class > > driver knows what it's doing in this case. > > > > The endpoint queue shouldn't be restarted until the "halted" flag is > > cleared. Either right away, if there are any URBs in the queue, or not > > until the next URB is submitted. Doing this might require a new HCD > > callback. (It would also mean the kerneldoc for usb_unlink_urb() would > > need to be updated, because the endpoint might restart before all the > > completion handlers for the unlinked URBs have run.) > > > > What I'm trying to do here is come up with a single, consistent proposal > > for exactly when halted endpoint queues should restart. Maybe someone > > else has a better suggestion. > > Sounds like a possible solution to me. > > Just to clarify, core should unlink the remaining URBs queued to that endpoint > after setting the "halted" flag, but before URB completion is called. Above I wrote that the remaining URBs should be unlinked _after_ the completion handler is called. If we did the unlinks before then the class driver might submit a new URB after the unlinks were finished and before the completion handler learned about the transaction error, and this new URB then wouldn't get unlinked. It's a race between completion of one URB and submission of another. > "Halted" flag should be cleared after URB completion returns, and endpoint > should be restarted if there are any pending URBs. To be clear, the flag should be cleared after the completion handlers for _all_ the unlinked URBs (as well as the URB getting the original error) have returned. > This allows the class driver URB completion handler to re-queue the halted URB > without core unlinking it. The completion handler shouldn't do this, because it would mean resubmitting without doing a clear-halt first. (Completion handlers can't do clear-halts because they run in atomic context.) If it does try to do this anyway, I see nothing wrong with the core unlinking the resubmitted URB. (Are you going to ask about verification tests that set the endpoint's Halt feature, submit an URB, wait for it to fail with -EPIPE, and then submit another URB from within the completion handler? :-) ) > > > > Here's a troubling consequence for people to consider: Suppose an > > > > endpoint queue stops with -EPROTO or -EPIPE, and before the class driver > > > > gets around to calling usb_clear_halt(), it is unbound. What happens > > > > the next time a driver binds to the device and tries to use the > > > > endpoint? > > > > > > The disable/enable interface and set config calls during unbind/bind should, > > > if I remember correctly flush pending URBs and drop and re-add the endpoint, > > > clearing the xhci side halt and reset toggle. > > > > usb_probe_interface() doesn't do any of that stuff, other than a > > deferred switch to altsetting 0 if needed. > > > > usb_unbind_interface() does call usb_enable_interface() if the interface > > is already in altsetting 0, but the reset_ep argument is false so the > > endpoint state doesn't get affected. Should that be changed? > > Looks like this needs more attention. Interface driver unbind/bind with > halted endpoint could be an issue. I don't have an answer right now. We can ponder it... Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-02 2:36 ` stern @ 2026-04-03 1:59 ` Thinh Nguyen 2026-04-03 2:42 ` stern 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-04-03 1:59 UTC (permalink / raw) To: stern@rowland.harvard.edu Cc: Mathias Nyman, Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Wed, Apr 01, 2026, stern@rowland.harvard.edu wrote: > On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote: > > On 3/31/26 18:31, stern@rowland.harvard.edu wrote: > > > > > > How about this instead? We add a "halted" flag to the usb_host_endpoint > > > structure, and the core will set this flag whenever a bulk or interrupt > > > URB gets a status other than 0 (before putting the URB on the bh list). > > > If an URB has one of these statuses, when its completion handler returns > > > the core will unlink all the URBs queued to the same endpoint. Finally, > > > the "halted" flag should be cleared after a completion handler returns > > > if there are no more unlinked URBs still in the queue or URBs waiting on > > > the bh list to be given back. > > > > > > The result of this is that any URB remaining in the queue when the flag > > > is cleared must have been submitted by the class driver _after_ the > > > failing URB's completion handler has run. We can assume the class > > > driver knows what it's doing in this case. > > > > > > The endpoint queue shouldn't be restarted until the "halted" flag is > > > cleared. Either right away, if there are any URBs in the queue, or not > > > until the next URB is submitted. Doing this might require a new HCD > > > callback. (It would also mean the kerneldoc for usb_unlink_urb() would > > > need to be updated, because the endpoint might restart before all the > > > completion handlers for the unlinked URBs have run.) > > > > > > What I'm trying to do here is come up with a single, consistent proposal > > > for exactly when halted endpoint queues should restart. Maybe someone > > > else has a better suggestion. > > > > Sounds like a possible solution to me. > > > > Just to clarify, core should unlink the remaining URBs queued to that endpoint > > after setting the "halted" flag, but before URB completion is called. > > Above I wrote that the remaining URBs should be unlinked _after_ the > completion handler is called. If we did the unlinks before then the > class driver might submit a new URB after the unlinks were finished and > before the completion handler learned about the transaction error, and > this new URB then wouldn't get unlinked. > > It's a race between completion of one URB and submission of another. > > > "Halted" flag should be cleared after URB completion returns, and endpoint > > should be restarted if there are any pending URBs. > > To be clear, the flag should be cleared after the completion handlers > for _all_ the unlinked URBs (as well as the URB getting the original > error) have returned. > > > This allows the class driver URB completion handler to re-queue the halted URB > > without core unlinking it. > > The completion handler shouldn't do this, because it would mean > resubmitting without doing a clear-halt first. (Completion handlers > can't do clear-halts because they run in atomic context.) If it does > try to do this anyway, I see nothing wrong with the core unlinking the > resubmitted URB. > > (Are you going to ask about verification tests that set the endpoint's > Halt feature, submit an URB, wait for it to fail with -EPIPE, and then > submit another URB from within the completion handler? :-) ) > How about this: Introduce a halted flag the following conditions: * Introduce the halted flag in usb_host_endpoint * The halted flag must be implemented as a bit in a unsigned long so we can use atomic bit operation * Only the HCD may set the halted flag, and only upon checking the first URB completing with a halted status * Only the USB core may clear the halted flag, and only after usb_reset_endpoint returns (this makes sure the HCD drained and reset the endpoint before the flag is cleared and new URBs are accepted) * The usb_reset_endpoint must be called after clear-halt, SetInterface, and SetConfiguration. * The USB core will not attempt to unlink pending URBs due to halted condition * The HCD is responsible for completing or canceling queued URBs when the halted flag is set. Cancelled and newly submitted URBs will be returned with -EPIPE as long as the halted flag is set * The class driver is responsible to check the halted flag to determine whether to initiate error recovery via usb_clear_halt I'm trying to keep a clear separation of responsibilities between HCD and the USB core. Also, I try to keep the halted flag more closely match the state of the endpoint. Let me know what you think? BR, Thinh ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-03 1:59 ` Thinh Nguyen @ 2026-04-03 2:42 ` stern 2026-04-03 8:51 ` Michal Pecio 2026-04-04 1:15 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen 0 siblings, 2 replies; 33+ messages in thread From: stern @ 2026-04-03 2:42 UTC (permalink / raw) To: Thinh Nguyen Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Fri, Apr 03, 2026 at 01:59:58AM +0000, Thinh Nguyen wrote: > On Wed, Apr 01, 2026, stern@rowland.harvard.edu wrote: > > On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote: > > > On 3/31/26 18:31, stern@rowland.harvard.edu wrote: > > > > > > > > How about this instead? We add a "halted" flag to the usb_host_endpoint > > > > structure, and the core will set this flag whenever a bulk or interrupt > > > > URB gets a status other than 0 (before putting the URB on the bh list). > > > > If an URB has one of these statuses, when its completion handler returns > > > > the core will unlink all the URBs queued to the same endpoint. Finally, > > > > the "halted" flag should be cleared after a completion handler returns > > > > if there are no more unlinked URBs still in the queue or URBs waiting on > > > > the bh list to be given back. > > > > > > > > The result of this is that any URB remaining in the queue when the flag > > > > is cleared must have been submitted by the class driver _after_ the > > > > failing URB's completion handler has run. We can assume the class > > > > driver knows what it's doing in this case. > > > > > > > > The endpoint queue shouldn't be restarted until the "halted" flag is > > > > cleared. Either right away, if there are any URBs in the queue, or not > > > > until the next URB is submitted. Doing this might require a new HCD > > > > callback. (It would also mean the kerneldoc for usb_unlink_urb() would > > > > need to be updated, because the endpoint might restart before all the > > > > completion handlers for the unlinked URBs have run.) > > > > > > > > What I'm trying to do here is come up with a single, consistent proposal > > > > for exactly when halted endpoint queues should restart. Maybe someone > > > > else has a better suggestion. > How about this: > > Introduce a halted flag the following conditions: > > * Introduce the halted flag in usb_host_endpoint > * The halted flag must be implemented as a bit in a unsigned long so > we can use atomic bit operation I don't see why it needs to use atomic operations. Unless you're thinking that we might want to add other bitflags into the same unsigned long? > * Only the HCD may set the halted flag, and only upon checking the > first URB completing with a halted status Every status other than 0 should mean that the endpoint's queue is halted. But not all statuses require a clear-halt or reset-endpoint to recover. For instance, a short transfer when the URB_SHORT_NOT_OK flag is set. Why do you want the HCD to set the flag instead of usb_hcd_giveback_urb()? Just one central place is simpler than in every HCD. > * Only the USB core may clear the halted flag, and only after > usb_reset_endpoint returns (this makes sure the HCD drained and reset > the endpoint before the flag is cleared and new URBs are accepted) > * The usb_reset_endpoint must be called after clear-halt, SetInterface, > and SetConfiguration. That's easy to do just by adding it into the appropriate core routines. > * The USB core will not attempt to unlink pending URBs due to halted > condition > * The HCD is responsible for completing or canceling queued URBs > when the halted flag is set. Cancelled and newly submitted URBs will > be returned with -EPIPE as long as the halted flag is set Why make the HCD responsible for draining the queue? It's like setting the halted flag; one central place is simpler and less error-prone than lots of separate places. For newly submitted URBs, should the submission fail with -EPIPE, or should the submission succeed and then the URB complete with -EPIPE? The first is simpler, but drivers probably aren't prepared for it -- it would mean adding error handling to the submission code as well as to the completion handler code. (Actually, that wouldn't matter unless the driver was queuing up multiple URBs, in which case it should be prepared to handle submission errors in the middle.) > * The class driver is responsible to check the halted flag to > determine whether to initiate error recovery via usb_clear_halt > > I'm trying to keep a clear separation of responsibilities between HCD > and the USB core. Also, I try to keep the halted flag more closely match > the state of the endpoint. > > Let me know what you think? Making the flag match the endpoint state is a good idea. But there is some ambiguity: Do you mean the state in the device, or the state in the HC hardware, or the state in the HCD? After all, these aren't always the same. For instance, unlinking an URB won't affect the device's state but it will affect the state on the host side. Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-03 2:42 ` stern @ 2026-04-03 8:51 ` Michal Pecio 2026-04-03 14:55 ` stern 2026-04-04 1:15 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen 1 sibling, 1 reply; 33+ messages in thread From: Michal Pecio @ 2026-04-03 8:51 UTC (permalink / raw) To: stern@rowland.harvard.edu Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, oneukum@suse.com, niklas.neronin@linux.intel.com On Thu, 2 Apr 2026 22:42:39 -0400, stern@rowland.harvard.edu wrote: > Every status other than 0 should mean that the endpoint's queue is > halted. But not all statuses require a clear-halt or reset-endpoint > to recover. For instance, a short transfer when the URB_SHORT_NOT_OK > flag is set. This doesn't work in xhci-hcd and I'm not sure if it can. I don't recall any way to make endpoints halt on short transfer at all. The driver does exactly two things with this flag: 1. Isochronus frames get EREMOTEIO status when it's detected, although ISTR that URB_SHORT_NOT_OK isn't defined for isochronous URBs. 2. All others log a dbg() message. Supposedly, core is responsible for updating urb->status then. > > * The USB core will not attempt to unlink pending URBs due to halted > > condition > > * The HCD is responsible for completing or canceling queued URBs > > when the halted flag is set. Cancelled and newly submitted URBs > > will be returned with -EPIPE as long as the halted flag is set > > Why make the HCD responsible for draining the queue? It's like > setting the halted flag; one central place is simpler and less > error-prone than lots of separate places. Is emptying the queue really a good idea at all? It obviously breaks lazy drivers which just ignore errors and continue with the URBs they have already submitted. Removing the URBs only adds more data loss. > For newly submitted URBs, should the submission fail with -EPIPE, or > should the submission succeed and then the URB complete with -EPIPE? > The first is simpler, but drivers probably aren't prepared for it Are they truly expecting the alternative? I bet most of them would usb_clear_halt() for each received EPIPE, or not at all. If completion unlinks remaining URBs before returning, this question doesn't exist (if we fix restarting before completion). If it doesn't unlink them, who even knows what the driver wants? For example, some drivers resubmit the URB while handling EPROTO and don't unlink anything. To me, it says "screw data loss and continue". It admittedly doesn't work on SuperSpeed anymore, but not all drivers need to worry about SuperSpeed. Including some old and lazy ones. Regards, Michal ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-03 8:51 ` Michal Pecio @ 2026-04-03 14:55 ` stern 2026-04-03 19:13 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio 0 siblings, 1 reply; 33+ messages in thread From: stern @ 2026-04-03 14:55 UTC (permalink / raw) To: Michal Pecio Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, oneukum@suse.com, niklas.neronin@linux.intel.com On Fri, Apr 03, 2026 at 10:51:45AM +0200, Michal Pecio wrote: > On Thu, 2 Apr 2026 22:42:39 -0400, stern@rowland.harvard.edu wrote: > > Every status other than 0 should mean that the endpoint's queue is > > halted. But not all statuses require a clear-halt or reset-endpoint > > to recover. For instance, a short transfer when the URB_SHORT_NOT_OK > > flag is set. > > This doesn't work in xhci-hcd and I'm not sure if it can. I don't recall > any way to make endpoints halt on short transfer at all. > > The driver does exactly two things with this flag: > 1. Isochronus frames get EREMOTEIO status when it's detected, although > ISTR that URB_SHORT_NOT_OK isn't defined for isochronous URBs. > 2. All others log a dbg() message. Supposedly, core is responsible for > updating urb->status then. Truly? What happens if usb-storage is doing a large bulk-IN transfer comprising multiple data URBs (with URB_SHORT_NOT_OK set in all but the last one), and the device indicates an early end-of-data by sending a short packet in the middle? Will xhci-hcd store the information in the following bulk-IN status transaction to the transfer buffer for the next URB in the queue of data URBs? (That's a misleading question. The BOT class specification requires the device to stall the bulk-IN endpoint following a short data transfer, so this scenario wouldn't arise -- although not all devices follow the spec in this regard. And it might arise in other protocols. Regardless, it's hard for me to believe there's no way to tell an xHCI controller to stop the endpoint on a short transfer. EHCI, OHCI, and UHCI all have ways to do it.) > > > * The USB core will not attempt to unlink pending URBs due to halted > > > condition > > > * The HCD is responsible for completing or canceling queued URBs > > > when the halted flag is set. Cancelled and newly submitted URBs > > > will be returned with -EPIPE as long as the halted flag is set > > > > Why make the HCD responsible for draining the queue? It's like > > setting the halted flag; one central place is simpler and less > > error-prone than lots of separate places. > > Is emptying the queue really a good idea at all? It obviously breaks > lazy drivers which just ignore errors and continue with the URBs they > have already submitted. Removing the URBs only adds more data loss. We have to decide on a single approach that can work for everybody. If it means that some class drivers will have to be changed, so be it. A lazy driver that just ignores errors will currently not work very well if at all following an error on an xHCI host controller. But yes, it will work with a USB-2 controller. If we want to minimize the changes needed in class drivers then we will have to make the core and HCDs recover automatically from transaction errors. Non-lazy class drivers still have a chance to do their own recovery when the completion handler for the bad URB runs. To make their lives easier, we could add a usb_purge_endpoint_queue() routine to the core; this seems a lot safer than summarily unlinking all the URBs all the time. Is that what we should do? > > For newly submitted URBs, should the submission fail with -EPIPE, or > > should the submission succeed and then the URB complete with -EPIPE? > > The first is simpler, but drivers probably aren't prepared for it > > Are they truly expecting the alternative? I bet most of them would > usb_clear_halt() for each received EPIPE, or not at all. They might do usb_clear_halt() if the completed URB had -EPIPE status. But they won't do it when usb_submit_urb() returns -EPIPE (or any other error, for that matter). > If completion unlinks remaining URBs before returning, this question > doesn't exist (if we fix restarting before completion). If it doesn't > unlink them, who even knows what the driver wants? Good point. > For example, some drivers resubmit the URB while handling EPROTO and > don't unlink anything. To me, it says "screw data loss and continue". > > It admittedly doesn't work on SuperSpeed anymore, but not all drivers > need to worry about SuperSpeed. Including some old and lazy ones. Think of it this way: If any URBs remain in the queue when the completion handler returns, we can assume the class driver doesn't want them to be unlinked. When that happens, the best thing the core/HCD can do is issue a clear-halt and then restart the endpoint, so the queued URBs will run. Except for the -EPIPE case, where the device halted its side of the endpoint and sent a STALL packet -- it's reasonable to require class drivers to handle that case themselves. On the other hand, if the queue is empty when the completion handler returns (or if all the URBs still in the queue have already been unlinked but not yet given back), is there any harm in the core issuing a clear-halt and restarting the endpoint (with the same exception as above)? This doesn't allow the class driver to do any further error recovery of its own, but when you get down to it, what else is there for it to do? Also, this way we don't have to reject any URBs submitted while endpoint is stopped. They simply get added to the end of the queue, so they will run when the endpoint is restarted. Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xhci-hcd and URB_SHORT_NOT_OK 2026-04-03 14:55 ` stern @ 2026-04-03 19:13 ` Michal Pecio 2026-04-03 20:17 ` stern 0 siblings, 1 reply; 33+ messages in thread From: Michal Pecio @ 2026-04-03 19:13 UTC (permalink / raw) To: stern@rowland.harvard.edu Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, oneukum@suse.com, niklas.neronin@linux.intel.com On Fri, 3 Apr 2026 10:55:58 -0400, stern@rowland.harvard.edu wrote: > > This doesn't work in xhci-hcd and I'm not sure if it can. I don't > > recall any way to make endpoints halt on short transfer at all. > > > > The driver does exactly two things with this flag: > > 1. Isochronus frames get EREMOTEIO status when it's detected, > > although ISTR that URB_SHORT_NOT_OK isn't defined for isochronous > > 2. All others log a dbg() message. Supposedly, core is > > responsible for updating urb->status then. > > Truly? What happens if usb-storage is doing a large bulk-IN transfer > comprising multiple data URBs (with URB_SHORT_NOT_OK set in all but > the last one), and the device indicates an early end-of-data by > sending a short packet in the middle? Will xhci-hcd store the > information in the following bulk-IN status transaction to the > transfer buffer for the next URB in the queue of data URBs? Not sure if I understand the question, but if you mean that the driver submits multiple "data" URBs followed by a "status" URB to the same pipe and expects it to stop when any data URB completes short, so that all others can be unlinked before the status transfer, then nope, not gonna happen. It will be racing with status being written to the data URBs and likely lose. The status URB may never complete. I really don't recall any way of stopping on short transfers and the relevant xHCI 4.10.1.1 section isn't helpful. It's possible to stop unconditionally and decide to restart or not, at some performance cost. But short transfer always terminates the current URB, so natural way is to coalesce all data into a big SG URB. Perhaps xhci-hcd could do it transparently, though that would get awkward if a driver doesn't unlink the remaining SHORT_NOT_OK URBs - we would need to "rewind". On the upside, I don't see many users of SHORT_NOT_OK URB. This string appears in Documentation, include, drivers/usb/host, files ending wih hcd.c, devio/usbip, and usb_sg_init(). The latter is probably what you were referring to, and it seems to have a shortcut for SG-capable HCs, which hopefully doesn't require SHORT_NOT_OK to be functional? Regards, Michal ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: xhci-hcd and URB_SHORT_NOT_OK 2026-04-03 19:13 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio @ 2026-04-03 20:17 ` stern 0 siblings, 0 replies; 33+ messages in thread From: stern @ 2026-04-03 20:17 UTC (permalink / raw) To: Michal Pecio Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, oneukum@suse.com, niklas.neronin@linux.intel.com On Fri, Apr 03, 2026 at 09:13:37PM +0200, Michal Pecio wrote: > On Fri, 3 Apr 2026 10:55:58 -0400, stern@rowland.harvard.edu wrote: > > > This doesn't work in xhci-hcd and I'm not sure if it can. I don't > > > recall any way to make endpoints halt on short transfer at all. > > > > > > The driver does exactly two things with this flag: > > > 1. Isochronus frames get EREMOTEIO status when it's detected, > > > although ISTR that URB_SHORT_NOT_OK isn't defined for isochronous > > > 2. All others log a dbg() message. Supposedly, core is > > > responsible for updating urb->status then. > > > > Truly? What happens if usb-storage is doing a large bulk-IN transfer > > comprising multiple data URBs (with URB_SHORT_NOT_OK set in all but > > the last one), and the device indicates an early end-of-data by > > sending a short packet in the middle? Will xhci-hcd store the > > information in the following bulk-IN status transaction to the > > transfer buffer for the next URB in the queue of data URBs? > > Not sure if I understand the question, but if you mean that the driver > submits multiple "data" URBs followed by a "status" URB to the same > pipe and expects it to stop when any data URB completes short, so that > all others can be unlinked before the status transfer, then nope, not > gonna happen. It will be racing with status being written to the data > URBs and likely lose. The status URB may never complete. > > I really don't recall any way of stopping on short transfers and the > relevant xHCI 4.10.1.1 section isn't helpful. It's possible to stop > unconditionally and decide to restart or not, at some performance cost. > > But short transfer always terminates the current URB, so natural way is > to coalesce all data into a big SG URB. Perhaps xhci-hcd could do it > transparently, though that would get awkward if a driver doesn't unlink > the remaining SHORT_NOT_OK URBs - we would need to "rewind". > > On the upside, I don't see many users of SHORT_NOT_OK URB. This string > appears in Documentation, include, drivers/usb/host, files ending wih > hcd.c, devio/usbip, and usb_sg_init(). The latter is probably what you > were referring to, and it seems to have a shortcut for SG-capable HCs, > which hopefully doesn't require SHORT_NOT_OK to be functional? usb_sg_init() _is_ what I was thinking of, and I had forgotten about the single-URB shortcut for SG-capable HCs. Clearly we're also going to have to update the kerneldoc, not just the code. If an HCD doesn't support SG then it needs to support stopping an endpoint on short transfers. Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-03 2:42 ` stern 2026-04-03 8:51 ` Michal Pecio @ 2026-04-04 1:15 ` Thinh Nguyen 2026-04-04 1:54 ` stern 1 sibling, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-04-04 1:15 UTC (permalink / raw) To: stern@rowland.harvard.edu Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Thu, Apr 02, 2026, stern@rowland.harvard.edu wrote: > On Fri, Apr 03, 2026 at 01:59:58AM +0000, Thinh Nguyen wrote: > > On Wed, Apr 01, 2026, stern@rowland.harvard.edu wrote: > > > On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote: > > > > On 3/31/26 18:31, stern@rowland.harvard.edu wrote: > > > > > > > > > > How about this instead? We add a "halted" flag to the usb_host_endpoint > > > > > structure, and the core will set this flag whenever a bulk or interrupt > > > > > URB gets a status other than 0 (before putting the URB on the bh list). > > > > > If an URB has one of these statuses, when its completion handler returns > > > > > the core will unlink all the URBs queued to the same endpoint. Finally, > > > > > the "halted" flag should be cleared after a completion handler returns > > > > > if there are no more unlinked URBs still in the queue or URBs waiting on > > > > > the bh list to be given back. > > > > > > > > > > The result of this is that any URB remaining in the queue when the flag > > > > > is cleared must have been submitted by the class driver _after_ the > > > > > failing URB's completion handler has run. We can assume the class > > > > > driver knows what it's doing in this case. > > > > > > > > > > The endpoint queue shouldn't be restarted until the "halted" flag is > > > > > cleared. Either right away, if there are any URBs in the queue, or not > > > > > until the next URB is submitted. Doing this might require a new HCD > > > > > callback. (It would also mean the kerneldoc for usb_unlink_urb() would > > > > > need to be updated, because the endpoint might restart before all the > > > > > completion handlers for the unlinked URBs have run.) > > > > > > > > > > What I'm trying to do here is come up with a single, consistent proposal > > > > > for exactly when halted endpoint queues should restart. Maybe someone > > > > > else has a better suggestion. > > > How about this: > > > > Introduce a halted flag the following conditions: > > > > * Introduce the halted flag in usb_host_endpoint > > * The halted flag must be implemented as a bit in a unsigned long so > > we can use atomic bit operation > > I don't see why it needs to use atomic operations. Unless you're > thinking that we might want to add other bitflags into the same unsigned > long? Now that I think about it again, it's not needed if we have the requirement for clearing the flag only after usb_reset_endpoint. > > > * Only the HCD may set the halted flag, and only upon checking the > > first URB completing with a halted status > > Every status other than 0 should mean that the endpoint's queue is > halted. But not all statuses require a clear-halt or reset-endpoint to > recover. For instance, a short transfer when the URB_SHORT_NOT_OK flag > is set. It seems we're using the "halted" flag for different things. The halted flag to me is the endpoint's state and rather than the endpoint queue state. That is, if the endpoint is halted, there's an error that was reported on the pipe. So, an -ECONNRESET would not cause a halted endpoint. > > Why do you want the HCD to set the flag instead of > usb_hcd_giveback_urb()? Just one central place is simpler than in every > HCD. I'm just thinking in term of whose role to return the pending URBs. Typically the HCD handles when to give back URBs. If the HCD were to handle giving back pending URBs due to halted endpoint, then it should be the one to set the halted flag. However, if the core tracks and does both setting and clearing of the halted flag, then the core should handle canceling the URBs. As you said, if we can shift the burden to the core, that would help keep the flow consistent and centralized. > > > * Only the USB core may clear the halted flag, and only after > > usb_reset_endpoint returns (this makes sure the HCD drained and reset > > the endpoint before the flag is cleared and new URBs are accepted) > > * The usb_reset_endpoint must be called after clear-halt, SetInterface, > > and SetConfiguration. > > That's easy to do just by adding it into the appropriate core routines. Yes. > > > * The USB core will not attempt to unlink pending URBs due to halted > > condition > > * The HCD is responsible for completing or canceling queued URBs > > when the halted flag is set. Cancelled and newly submitted URBs will > > be returned with -EPIPE as long as the halted flag is set > > Why make the HCD responsible for draining the queue? It's like setting > the halted flag; one central place is simpler and less error-prone than > lots of separate places. > > For newly submitted URBs, should the submission fail with -EPIPE, or > should the submission succeed and then the URB complete with -EPIPE? > The first is simpler, but drivers probably aren't prepared for it -- it > would mean adding error handling to the submission code as well as to > the completion handler code. > > (Actually, that wouldn't matter unless the driver was queuing up > multiple URBs, in which case it should be prepared to handle submission > errors in the middle.) I didn't think about failing immediately on submission because we would need to audit every class driver for that. Perhaps the core can intercept the URBs and immediately schedule a completion handler with the updated URB's status? > > > * The class driver is responsible to check the halted flag to > > determine whether to initiate error recovery via usb_clear_halt > > > > I'm trying to keep a clear separation of responsibilities between HCD > > and the USB core. Also, I try to keep the halted flag more closely match > > the state of the endpoint. > > > > Let me know what you think? > > Making the flag match the endpoint state is a good idea. But there is > some ambiguity: Do you mean the state in the device, or the state in the > HC hardware, or the state in the HCD? After all, these aren't always > the same. For instance, unlinking an URB won't affect the device's > state but it will affect the state on the host side. > The HCD state. If we can let the core manage the halted state, then here are the points where the halted state is updated (hopefully I didn't miss any other ones): Endpoint is halted (reported by HCD): * URB returned with -EPROTO or -EPIPE Endpoint is active (updated by the core): * SetConfiguration, SetInterface, clear-halt Thanks, Thinh ps. I'll be out of town next week. Apologies if my response will be delayed... ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-04 1:15 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen @ 2026-04-04 1:54 ` stern 2026-04-04 20:41 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: stern @ 2026-04-04 1:54 UTC (permalink / raw) To: Thinh Nguyen Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Sat, Apr 04, 2026 at 01:15:52AM +0000, Thinh Nguyen wrote: > > > How about this: > > > > > > Introduce a halted flag the following conditions: > > > > > > * Introduce the halted flag in usb_host_endpoint > > > * The halted flag must be implemented as a bit in a unsigned long so > > > we can use atomic bit operation > > > > I don't see why it needs to use atomic operations. Unless you're > > thinking that we might want to add other bitflags into the same unsigned > > long? > > Now that I think about it again, it's not needed if we have the > requirement for clearing the flag only after usb_reset_endpoint. > > > > > > * Only the HCD may set the halted flag, and only upon checking the > > > first URB completing with a halted status > > > > Every status other than 0 should mean that the endpoint's queue is > > halted. But not all statuses require a clear-halt or reset-endpoint to > > recover. For instance, a short transfer when the URB_SHORT_NOT_OK flag > > is set. > > It seems we're using the "halted" flag for different things. The halted > flag to me is the endpoint's state and rather than the endpoint queue > state. That is, if the endpoint is halted, there's an error that was > reported on the pipe. So, an -ECONNRESET would not cause a halted > endpoint. > > > > > Why do you want the HCD to set the flag instead of > > usb_hcd_giveback_urb()? Just one central place is simpler than in every > > HCD. > > I'm just thinking in term of whose role to return the pending URBs. > Typically the HCD handles when to give back URBs. If the HCD were to > handle giving back pending URBs due to halted endpoint, then it should > be the one to set the halted flag. However, if the core tracks and does > both setting and clearing of the halted flag, then the core should > handle canceling the URBs. As you said, if we can shift the burden to > the core, that would help keep the flow consistent and centralized. > > > > > > * Only the USB core may clear the halted flag, and only after > > > usb_reset_endpoint returns (this makes sure the HCD drained and reset > > > the endpoint before the flag is cleared and new URBs are accepted) > > > * The usb_reset_endpoint must be called after clear-halt, SetInterface, > > > and SetConfiguration. > > > > That's easy to do just by adding it into the appropriate core routines. > > Yes. > > > > > > * The USB core will not attempt to unlink pending URBs due to halted > > > condition > > > * The HCD is responsible for completing or canceling queued URBs > > > when the halted flag is set. Cancelled and newly submitted URBs will > > > be returned with -EPIPE as long as the halted flag is set > > > > Why make the HCD responsible for draining the queue? It's like setting > > the halted flag; one central place is simpler and less error-prone than > > lots of separate places. > > > > For newly submitted URBs, should the submission fail with -EPIPE, or > > should the submission succeed and then the URB complete with -EPIPE? > > The first is simpler, but drivers probably aren't prepared for it -- it > > would mean adding error handling to the submission code as well as to > > the completion handler code. > > > > (Actually, that wouldn't matter unless the driver was queuing up > > multiple URBs, in which case it should be prepared to handle submission > > errors in the middle.) > > I didn't think about failing immediately on submission because we would > need to audit every class driver for that. Perhaps the core can > intercept the URBs and immediately schedule a completion handler with > the updated URB's status? > > > > > > * The class driver is responsible to check the halted flag to > > > determine whether to initiate error recovery via usb_clear_halt > > > > > > I'm trying to keep a clear separation of responsibilities between HCD > > > and the USB core. Also, I try to keep the halted flag more closely match > > > the state of the endpoint. > > > > > > Let me know what you think? > > > > Making the flag match the endpoint state is a good idea. But there is > > some ambiguity: Do you mean the state in the device, or the state in the > > HC hardware, or the state in the HCD? After all, these aren't always > > the same. For instance, unlinking an URB won't affect the device's > > state but it will affect the state on the host side. > > > > The HCD state. If we can let the core manage the halted state, then here > are the points where the halted state is updated (hopefully I didn't > miss any other ones): > > Endpoint is halted (reported by HCD): > * URB returned with -EPROTO or -EPIPE > > Endpoint is active (updated by the core): > * SetConfiguration, SetInterface, clear-halt Hmmm. What did you think of my recent proposal in a message to Michal? Summarizing: If the class driver wants to unlink queued URBs when a transaction error occurs, it has to do so itself in the failed URB's completion handler. We can make this easier by adding a usb_flush_endpoint_queue() routine to the core. In the meantime, the HCD ensures that the queue is stopped before giving back the URB. (Note: -EPIPE, -ENOENT, -ECONNRESET, and -EREMOTEIO aren't considered to be transaction errors.) When the completion handler returns, the core will automatically call usb_clear_halt(), which will also reset the endpoint on the host side and will restart the queue. This also happens after SetConfiguration and SetInterface, either explicitly or implicitly. For -EPIPE (device sent a STALL token), the class driver has to clear the halt itself. This is because stalls aren't errors; they are an intentional part of the USB protocol. -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet received with URB_SHORT_NOT_OK set) are a little trickier. The HCD may or may not need to stop the queue for an unlink, possibly depending on whether the URB being unlinked is the active one. When a short packet is received, the HC hardware may or may not stop its queue. Either way, the class driver shouldn't need to take any special recovery action; any necessary actions should be taken automatically by the HCD and the core. All of this applies only to bulk and interrupt endpoints. Control endpoints generally need error recovery only on the host side, because the device resets automatically when it gets a new SETUP packet, and so the HCD should handle whatever is needed. Isochronous endpoints don't ever halt and they shouldn't need to be reset when an error occurs. Overall, this seems simpler than anything else we have discussed. Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-04 1:54 ` stern @ 2026-04-04 20:41 ` Thinh Nguyen 2026-04-04 21:54 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-04-04 20:41 UTC (permalink / raw) To: stern@rowland.harvard.edu Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote: > On Sat, Apr 04, 2026 at 01:15:52AM +0000, Thinh Nguyen wrote: > > > > How about this: > > > > > > > > Introduce a halted flag the following conditions: > > > > > > > > * Introduce the halted flag in usb_host_endpoint > > > > * The halted flag must be implemented as a bit in a unsigned long so > > > > we can use atomic bit operation > > > > > > I don't see why it needs to use atomic operations. Unless you're > > > thinking that we might want to add other bitflags into the same unsigned > > > long? > > > > Now that I think about it again, it's not needed if we have the > > requirement for clearing the flag only after usb_reset_endpoint. > > > > > > > > > * Only the HCD may set the halted flag, and only upon checking the > > > > first URB completing with a halted status > > > > > > Every status other than 0 should mean that the endpoint's queue is > > > halted. But not all statuses require a clear-halt or reset-endpoint to > > > recover. For instance, a short transfer when the URB_SHORT_NOT_OK flag > > > is set. > > > > It seems we're using the "halted" flag for different things. The halted > > flag to me is the endpoint's state and rather than the endpoint queue > > state. That is, if the endpoint is halted, there's an error that was > > reported on the pipe. So, an -ECONNRESET would not cause a halted > > endpoint. > > > > > > > > Why do you want the HCD to set the flag instead of > > > usb_hcd_giveback_urb()? Just one central place is simpler than in every > > > HCD. > > > > I'm just thinking in term of whose role to return the pending URBs. > > Typically the HCD handles when to give back URBs. If the HCD were to > > handle giving back pending URBs due to halted endpoint, then it should > > be the one to set the halted flag. However, if the core tracks and does > > both setting and clearing of the halted flag, then the core should > > handle canceling the URBs. As you said, if we can shift the burden to > > the core, that would help keep the flow consistent and centralized. > > > > > > > > > * Only the USB core may clear the halted flag, and only after > > > > usb_reset_endpoint returns (this makes sure the HCD drained and reset > > > > the endpoint before the flag is cleared and new URBs are accepted) > > > > * The usb_reset_endpoint must be called after clear-halt, SetInterface, > > > > and SetConfiguration. > > > > > > That's easy to do just by adding it into the appropriate core routines. > > > > Yes. > > > > > > > > > * The USB core will not attempt to unlink pending URBs due to halted > > > > condition > > > > * The HCD is responsible for completing or canceling queued URBs > > > > when the halted flag is set. Cancelled and newly submitted URBs will > > > > be returned with -EPIPE as long as the halted flag is set > > > > > > Why make the HCD responsible for draining the queue? It's like setting > > > the halted flag; one central place is simpler and less error-prone than > > > lots of separate places. > > > > > > For newly submitted URBs, should the submission fail with -EPIPE, or > > > should the submission succeed and then the URB complete with -EPIPE? > > > The first is simpler, but drivers probably aren't prepared for it -- it > > > would mean adding error handling to the submission code as well as to > > > the completion handler code. > > > > > > (Actually, that wouldn't matter unless the driver was queuing up > > > multiple URBs, in which case it should be prepared to handle submission > > > errors in the middle.) > > > > I didn't think about failing immediately on submission because we would > > need to audit every class driver for that. Perhaps the core can > > intercept the URBs and immediately schedule a completion handler with > > the updated URB's status? > > > > > > > > > * The class driver is responsible to check the halted flag to > > > > determine whether to initiate error recovery via usb_clear_halt > > > > > > > > I'm trying to keep a clear separation of responsibilities between HCD > > > > and the USB core. Also, I try to keep the halted flag more closely match > > > > the state of the endpoint. > > > > > > > > Let me know what you think? > > > > > > Making the flag match the endpoint state is a good idea. But there is > > > some ambiguity: Do you mean the state in the device, or the state in the > > > HC hardware, or the state in the HCD? After all, these aren't always > > > the same. For instance, unlinking an URB won't affect the device's > > > state but it will affect the state on the host side. > > > > > > > The HCD state. If we can let the core manage the halted state, then here > > are the points where the halted state is updated (hopefully I didn't > > miss any other ones): > > > > Endpoint is halted (reported by HCD): > > * URB returned with -EPROTO or -EPIPE > > > > Endpoint is active (updated by the core): > > * SetConfiguration, SetInterface, clear-halt > > Hmmm. What did you think of my recent proposal in a message to Michal? > Summarizing: > > If the class driver wants to unlink queued URBs when a transaction error > occurs, it has to do so itself in the failed URB's completion handler. > We can make this easier by adding a usb_flush_endpoint_queue() routine > to the core. In the meantime, the HCD ensures that the queue is stopped > before giving back the URB. (Note: -EPIPE, -ENOENT, -ECONNRESET, and > -EREMOTEIO aren't considered to be transaction errors.) > > When the completion handler returns, the core will automatically call > usb_clear_halt(), which will also reset the endpoint on the host side > and will restart the queue. This also happens after SetConfiguration > and SetInterface, either explicitly or implicitly. I like that the core will handle this automatically. But one concern: How will the class driver know when the clear-halt complete so it can perform the recovery? (ie. it shouldn't perform recovery immediately after seeing -EPROTO) > > For -EPIPE (device sent a STALL token), the class driver has to clear > the halt itself. This is because stalls aren't errors; they are an > intentional part of the USB protocol. > > -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet > received with URB_SHORT_NOT_OK set) are a little trickier. The HCD may > or may not need to stop the queue for an unlink, possibly depending on > whether the URB being unlinked is the active one. When a short packet > is received, the HC hardware may or may not stop its queue. Either way, > the class driver shouldn't need to take any special recovery action; any > necessary actions should be taken automatically by the HCD and the core. > > All of this applies only to bulk and interrupt endpoints. Control > endpoints generally need error recovery only on the host side, because > the device resets automatically when it gets a new SETUP packet, and so > the HCD should handle whatever is needed. Isochronous endpoints don't > ever halt and they shouldn't need to be reset when an error occurs. > > Overall, this seems simpler than anything else we have discussed. > The rest sounds good to me! Thanks Alan, Thinh ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-04 20:41 ` Thinh Nguyen @ 2026-04-04 21:54 ` Alan Stern 2026-04-04 22:15 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2026-04-04 21:54 UTC (permalink / raw) To: Thinh Nguyen Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Sat, Apr 04, 2026 at 08:41:36PM +0000, Thinh Nguyen wrote: > On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote: > > Summarizing: > > > > If the class driver wants to unlink queued URBs when a transaction error > > occurs, it has to do so itself in the failed URB's completion handler. > > We can make this easier by adding a usb_flush_endpoint_queue() routine > > to the core. In the meantime, the HCD ensures that the queue is stopped > > before giving back the URB. (Note: -EPIPE, -ENOENT, -ECONNRESET, and > > -EREMOTEIO aren't considered to be transaction errors.) > > > > When the completion handler returns, the core will automatically call > > usb_clear_halt(), which will also reset the endpoint on the host side > > and will restart the queue. This also happens after SetConfiguration > > and SetInterface, either explicitly or implicitly. > > I like that the core will handle this automatically. But one concern: > How will the class driver know when the clear-halt complete so it can > perform the recovery? (ie. it shouldn't perform recovery immediately > after seeing -EPROTO) It doesn't know, and it doesn't need to know. Any recovery URBs the class driver wants to send can be submitted as usual, and they will be added onto the queue. When the core resets the endpoint, the queue will start going again and the URBs will run. If the class driver wants to take some other action (like submitting URBs to a different endpoint) before using the endpoint that stopped, it's free to do so. It only has to make sure that it doesn't submit any URBs to the stopped endpoint until after the other action is finished -- which is what it would do anyway. (And maybe it has to unlink any URBs that are already queued, which can be done with a simple function call.) > > For -EPIPE (device sent a STALL token), the class driver has to clear > > the halt itself. This is because stalls aren't errors; they are an > > intentional part of the USB protocol. > > > > -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet > > received with URB_SHORT_NOT_OK set) are a little trickier. The HCD may > > or may not need to stop the queue for an unlink, possibly depending on > > whether the URB being unlinked is the active one. When a short packet > > is received, the HC hardware may or may not stop its queue. Either way, > > the class driver shouldn't need to take any special recovery action; any > > necessary actions should be taken automatically by the HCD and the core. > > > > All of this applies only to bulk and interrupt endpoints. Control > > endpoints generally need error recovery only on the host side, because > > the device resets automatically when it gets a new SETUP packet, and so > > the HCD should handle whatever is needed. Isochronous endpoints don't > > ever halt and they shouldn't need to be reset when an error occurs. > > > > Overall, this seems simpler than anything else we have discussed. > > > > The rest sounds good to me! Good! Let's wait to hear from Michal, Mathias, and Oliver. Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-04 21:54 ` Alan Stern @ 2026-04-04 22:15 ` Thinh Nguyen 2026-04-04 22:28 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-04-04 22:15 UTC (permalink / raw) To: Alan Stern Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Sat, Apr 04, 2026, Alan Stern wrote: > On Sat, Apr 04, 2026 at 08:41:36PM +0000, Thinh Nguyen wrote: > > On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote: > > > Summarizing: > > > > > > If the class driver wants to unlink queued URBs when a transaction error > > > occurs, it has to do so itself in the failed URB's completion handler. > > > We can make this easier by adding a usb_flush_endpoint_queue() routine > > > to the core. In the meantime, the HCD ensures that the queue is stopped > > > before giving back the URB. (Note: -EPIPE, -ENOENT, -ECONNRESET, and > > > -EREMOTEIO aren't considered to be transaction errors.) > > > > > > When the completion handler returns, the core will automatically call > > > usb_clear_halt(), which will also reset the endpoint on the host side > > > and will restart the queue. This also happens after SetConfiguration > > > and SetInterface, either explicitly or implicitly. > > > > I like that the core will handle this automatically. But one concern: > > How will the class driver know when the clear-halt complete so it can > > perform the recovery? (ie. it shouldn't perform recovery immediately > > after seeing -EPROTO) > > It doesn't know, and it doesn't need to know. Any recovery URBs the > class driver wants to send can be submitted as usual, and they will be > added onto the queue. When the core resets the endpoint, the queue will > start going again and the URBs will run. > > If the class driver wants to take some other action (like submitting > URBs to a different endpoint) before using the endpoint that stopped, > it's free to do so. It only has to make sure that it doesn't submit any > URBs to the stopped endpoint until after the other action is finished -- > which is what it would do anyway. (And maybe it has to unlink any URBs > that are already queued, which can be done with a simple function call.) > Then the xhci must make sure that it should not ring the doorbell to restart the endpoint when giving back the canceled URBs. It should only do so on newly submitted URBs. We can add a requirement such that if the class driver submitted the recovery URBs prior to completing the usb_reset_endpoint (which should be done after clear-halt), then the HCD may keep those URBs on a queue and only process those URBs and restart the endpoint afterward. BR, Thinh > > > For -EPIPE (device sent a STALL token), the class driver has to clear > > > the halt itself. This is because stalls aren't errors; they are an > > > intentional part of the USB protocol. > > > > > > -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet > > > received with URB_SHORT_NOT_OK set) are a little trickier. The HCD may > > > or may not need to stop the queue for an unlink, possibly depending on > > > whether the URB being unlinked is the active one. When a short packet > > > is received, the HC hardware may or may not stop its queue. Either way, > > > the class driver shouldn't need to take any special recovery action; any > > > necessary actions should be taken automatically by the HCD and the core. > > > > > > All of this applies only to bulk and interrupt endpoints. Control > > > endpoints generally need error recovery only on the host side, because > > > the device resets automatically when it gets a new SETUP packet, and so > > > the HCD should handle whatever is needed. Isochronous endpoints don't > > > ever halt and they shouldn't need to be reset when an error occurs. > > > > > > Overall, this seems simpler than anything else we have discussed. > > > > > > > The rest sounds good to me! > > Good! Let's wait to hear from Michal, Mathias, and Oliver. > > Alan ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-04 22:15 ` Thinh Nguyen @ 2026-04-04 22:28 ` Thinh Nguyen 2026-04-05 1:30 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-04-04 22:28 UTC (permalink / raw) To: Alan Stern Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Sat, Apr 04, 2026, Thinh Nguyen wrote: > On Sat, Apr 04, 2026, Alan Stern wrote: > > On Sat, Apr 04, 2026 at 08:41:36PM +0000, Thinh Nguyen wrote: > > > On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote: > > > > Summarizing: > > > > > > > > If the class driver wants to unlink queued URBs when a transaction error > > > > occurs, it has to do so itself in the failed URB's completion handler. > > > > We can make this easier by adding a usb_flush_endpoint_queue() routine > > > > to the core. In the meantime, the HCD ensures that the queue is stopped > > > > before giving back the URB. (Note: -EPIPE, -ENOENT, -ECONNRESET, and > > > > -EREMOTEIO aren't considered to be transaction errors.) > > > > > > > > When the completion handler returns, the core will automatically call > > > > usb_clear_halt(), which will also reset the endpoint on the host side > > > > and will restart the queue. This also happens after SetConfiguration > > > > and SetInterface, either explicitly or implicitly. > > > > > > I like that the core will handle this automatically. But one concern: > > > How will the class driver know when the clear-halt complete so it can > > > perform the recovery? (ie. it shouldn't perform recovery immediately > > > after seeing -EPROTO) > > > > It doesn't know, and it doesn't need to know. Any recovery URBs the > > class driver wants to send can be submitted as usual, and they will be > > added onto the queue. When the core resets the endpoint, the queue will > > start going again and the URBs will run. > > > > If the class driver wants to take some other action (like submitting > > URBs to a different endpoint) before using the endpoint that stopped, > > it's free to do so. It only has to make sure that it doesn't submit any > > URBs to the stopped endpoint until after the other action is finished -- > > which is what it would do anyway. (And maybe it has to unlink any URBs > > that are already queued, which can be done with a simple function call.) > > > > Then the xhci must make sure that it should not ring the doorbell to > restart the endpoint when giving back the canceled URBs. It should only > do so on newly submitted URBs. Ignore this comment, it's not restarting the endpoint in the case of unlinking. > > We can add a requirement such that if the class driver submitted the > recovery URBs prior to completing the usb_reset_endpoint (which should > be done after clear-halt), then the HCD may keep those URBs on a queue > and only process those URBs and restart the endpoint afterward. > Actually, adding this new requirement would be tricky because we don't know whether it's recovery URBs or not. BR, Thinh ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-04 22:28 ` Thinh Nguyen @ 2026-04-05 1:30 ` Alan Stern 2026-04-05 3:10 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2026-04-05 1:30 UTC (permalink / raw) To: Thinh Nguyen Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Sat, Apr 04, 2026 at 10:28:24PM +0000, Thinh Nguyen wrote: > On Sat, Apr 04, 2026, Thinh Nguyen wrote: > > On Sat, Apr 04, 2026, Alan Stern wrote: > > > If the class driver wants to take some other action (like submitting > > > URBs to a different endpoint) before using the endpoint that stopped, > > > it's free to do so. It only has to make sure that it doesn't submit any > > > URBs to the stopped endpoint until after the other action is finished -- > > > which is what it would do anyway. (And maybe it has to unlink any URBs > > > that are already queued, which can be done with a simple function call.) > > > > > > > Then the xhci must make sure that it should not ring the doorbell to > > restart the endpoint when giving back the canceled URBs. It should only > > do so on newly submitted URBs. > > Ignore this comment, it's not restarting the endpoint in the case of > unlinking. I was going to say that xhci-hcd shouldn't restart the endpoint until the usb_reset_endpoint() call is made. Whether or not it rings the doorbell at that time may depend on whether there are any URBs on the queue; that's a relatively unimportant implementation detail. > > We can add a requirement such that if the class driver submitted the > > recovery URBs prior to completing the usb_reset_endpoint (which should > > be done after clear-halt), then the HCD may keep those URBs on a queue > > and only process those URBs and restart the endpoint afterward. > > > > Actually, adding this new requirement would be tricky because we don't > know whether it's recovery URBs or not. The purpose of the submitted URBs doesn't matter. The HCD shouldn't restart the endpoint until the usb_reset_endpoint() call occurs. Also, I should point out that usbcore will call usb_clear_halt() and usb_reset_endpoint(), presumably using a work queue for the calls. The class driver doesn't need to do it -- in fact, doing those things could lead to errors because the endpoint may already be running (the core may already have made the calls). Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-05 1:30 ` Alan Stern @ 2026-04-05 3:10 ` Thinh Nguyen 2026-04-07 15:23 ` Alan Stern 0 siblings, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-04-05 3:10 UTC (permalink / raw) To: Alan Stern Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Sat, Apr 04, 2026, Alan Stern wrote: > On Sat, Apr 04, 2026 at 10:28:24PM +0000, Thinh Nguyen wrote: > > On Sat, Apr 04, 2026, Thinh Nguyen wrote: > > > On Sat, Apr 04, 2026, Alan Stern wrote: > > > > If the class driver wants to take some other action (like submitting > > > > URBs to a different endpoint) before using the endpoint that stopped, > > > > it's free to do so. It only has to make sure that it doesn't submit any > > > > URBs to the stopped endpoint until after the other action is finished -- > > > > which is what it would do anyway. (And maybe it has to unlink any URBs > > > > that are already queued, which can be done with a simple function call.) > > > > > > > > > > Then the xhci must make sure that it should not ring the doorbell to > > > restart the endpoint when giving back the canceled URBs. It should only > > > do so on newly submitted URBs. > > > > Ignore this comment, it's not restarting the endpoint in the case of > > unlinking. > > I was going to say that xhci-hcd shouldn't restart the endpoint until > the usb_reset_endpoint() call is made. Whether or not it rings the > doorbell at that time may depend on whether there are any URBs on the > queue; that's a relatively unimportant implementation detail. > > > > We can add a requirement such that if the class driver submitted the > > > recovery URBs prior to completing the usb_reset_endpoint (which should > > > be done after clear-halt), then the HCD may keep those URBs on a queue > > > and only process those URBs and restart the endpoint afterward. > > > > > > > Actually, adding this new requirement would be tricky because we don't > > know whether it's recovery URBs or not. > > The purpose of the submitted URBs doesn't matter. The HCD shouldn't > restart the endpoint until the usb_reset_endpoint() call occurs. > > Also, I should point out that usbcore will call usb_clear_halt() and > usb_reset_endpoint(), presumably using a work queue for the calls. The > class driver doesn't need to do it -- in fact, doing those things could > lead to errors because the endpoint may already be running (the core may > already have made the calls). > That's good. This is what I wanted to confirm. May need to update how xhci handles usb_reset_endpoint() because I believe it's expected the endpoint transfer ring to be drained prior (by the class driver unlinking URBs). Thanks for the clarifications, Thinh ps. Have a good weekend! I'll be back a week after. ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-05 3:10 ` Thinh Nguyen @ 2026-04-07 15:23 ` Alan Stern 2026-04-07 20:24 ` Mathias Nyman 0 siblings, 1 reply; 33+ messages in thread From: Alan Stern @ 2026-04-07 15:23 UTC (permalink / raw) To: Thinh Nguyen Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com It's been a while now, and nobody has objected to the proposed plan for handling this issue, so I'm going to assume that everyone is on board with the idea. There is a loose end still to be straightened out. It concerns handling of -EREMOTEIO errors (short packet received with URB_SHORT_NOT_OK set). While some HCDs -- especially those supporting SG -- may not stop the endpoint queue when this error occurs, other HCDs will do so. The question is how the core should tell them to start it up again. This shouldn't happen until after the completion handler returns. Short packets don't cause any loss of synchronization between the endpoint state on the host and on the device, so -EREMOTEIO doesn't require usb_clear_halt() or usb_reset_endpoint() for recovery. This means we need to find some other way to tell the HCD when the queue can restart. Should we create a new hc_driver callback specifically for this purpose? Related question: Although URB_SHORT_NOT_OK is allowed for all types of IN URB other than isochronous, does its queue-stopping property make sense for control or interrupt URBs? As far as I know, no kernel code uses it for them, although some userspace code might (via usbfs). Alan Stern ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-07 15:23 ` Alan Stern @ 2026-04-07 20:24 ` Mathias Nyman 0 siblings, 0 replies; 33+ messages in thread From: Mathias Nyman @ 2026-04-07 20:24 UTC (permalink / raw) To: Alan Stern, Thinh Nguyen Cc: linux-usb@vger.kernel.org, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On 4/7/26 18:23, Alan Stern wrote: > It's been a while now, and nobody has objected to the proposed plan for > handling this issue, so I'm going to assume that everyone is on board > with the idea. Yes, I support this So basically usb core will call usb_clear_halt() after EPROTO URB completion handler finishes, and xhci-hcd needs to prevent bulk/interrupt endpoint from restarting after returning a EPROTO URB up until usb_reset_endpoint() is called I also support adding usb_purge_endpoint_queue(), but it doesn't have to be done at the same time as the EPROTO changes. > > There is a loose end still to be straightened out. It concerns handling > of -EREMOTEIO errors (short packet received with URB_SHORT_NOT_OK set). > While some HCDs -- especially those supporting SG -- may not stop the > endpoint queue when this error occurs, other HCDs will do so. The > question is how the core should tell them to start it up again. This > shouldn't happen until after the completion handler returns. > > Short packets don't cause any loss of synchronization between the > endpoint state on the host and on the device, so -EREMOTEIO doesn't > require usb_clear_halt() or usb_reset_endpoint() for recovery. This > means we need to find some other way to tell the HCD when the queue can > restart. Should we create a new hc_driver callback specifically for > this purpose? For xHC the issue is the other way around as Michal pointed out. Can't find a way to stop the endpoint on short packets. Can only manually stop the endpoint when xhci-hcd detects the short transfer event, when xHC likely already processing the next URB. Thanks Mathias ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-03-30 12:51 ` Mathias Nyman 2026-03-30 14:17 ` stern @ 2026-04-01 22:08 ` Thinh Nguyen 2026-04-01 22:34 ` Mathias Nyman 1 sibling, 1 reply; 33+ messages in thread From: Thinh Nguyen @ 2026-04-01 22:08 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Mon, Mar 30, 2026, Mathias Nyman wrote: > > > > > On a separate note, will you plan to implement the clear-halt for EPROTO > > in xhci? > > I don't think this should be part of xhci driver. Decision to send control requests > to the device should be done by core or class drivers. > This not like STALL where it's standardized for the core or class driver to know how to handle. The programming sequence for the errors that resulted in EPROTO from xhci is specific to xhci. That is, the xhci reset endpoint command will reset the bulk sequence, it's specific to xhci. The xhci spec recommends to send a clear-halt for this scenario, not the USB spec or any other class specific spec. So we should not delegate this to the core or class driver to handle. BR, Thinh ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-01 22:08 ` Thinh Nguyen @ 2026-04-01 22:34 ` Mathias Nyman 2026-04-01 22:47 ` Thinh Nguyen 0 siblings, 1 reply; 33+ messages in thread From: Mathias Nyman @ 2026-04-01 22:34 UTC (permalink / raw) To: Thinh Nguyen Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On 4/2/26 01:08, Thinh Nguyen wrote: > On Mon, Mar 30, 2026, Mathias Nyman wrote: >> >>> >>> On a separate note, will you plan to implement the clear-halt for EPROTO >>> in xhci? >> >> I don't think this should be part of xhci driver. Decision to send control requests >> to the device should be done by core or class drivers. >> > > This not like STALL where it's standardized for the core or class driver > to know how to handle. The programming sequence for the errors that > resulted in EPROTO from xhci is specific to xhci. That is, the xhci > reset endpoint command will reset the bulk sequence, it's specific to > xhci. The xhci spec recommends to send a clear-halt for this scenario, > not the USB spec or any other class specific spec. So we should not > delegate this to the core or class driver to handle. > USB 2 Specification does mention handling halt conditions due to transmission erros _OR_ STALL handshake with clearing halt and resetting both host and device endpoint toggle. See USB 2 5.7.5 Interrupt Transfer Data Sequences "If a halt condition is detected on an interrupt pipe due to transmission errors or a STALL handshake being returned from the endpoint, all pending IRPs are retired. Removal of the halt condition is achieved via software intervention through a separate control pipe. This recovery will reset the data toggle bit to DATA0 for the endpoint on both the host and the device" 5.8.5 Bulk Transfer Data Sequences "If a halt condition is detected on a bulk pipe due to transmission errors or a STALL handshake being returned from the endpoint, all pending IRPs are retired. Removal of the halt condition is achieved via software intervention through a separate control pipe. This recovery will reset the data toggle bit to DATA0 for the endpoint on both the host and the device" Thanks Mathias ^ permalink raw reply [flat|nested] 33+ messages in thread
* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error 2026-04-01 22:34 ` Mathias Nyman @ 2026-04-01 22:47 ` Thinh Nguyen 0 siblings, 0 replies; 33+ messages in thread From: Thinh Nguyen @ 2026-04-01 22:47 UTC (permalink / raw) To: Mathias Nyman Cc: Thinh Nguyen, linux-usb@vger.kernel.org, stern@rowland.harvard.edu, michal.pecio@gmail.com, oneukum@suse.com, niklas.neronin@linux.intel.com On Thu, Apr 02, 2026, Mathias Nyman wrote: > On 4/2/26 01:08, Thinh Nguyen wrote: > > On Mon, Mar 30, 2026, Mathias Nyman wrote: > > > > > > > > > > > On a separate note, will you plan to implement the clear-halt for EPROTO > > > > in xhci? > > > > > > I don't think this should be part of xhci driver. Decision to send control requests > > > to the device should be done by core or class drivers. > > > > > > > This not like STALL where it's standardized for the core or class driver > > to know how to handle. The programming sequence for the errors that > > resulted in EPROTO from xhci is specific to xhci. That is, the xhci > > reset endpoint command will reset the bulk sequence, it's specific to > > xhci. The xhci spec recommends to send a clear-halt for this scenario, > > not the USB spec or any other class specific spec. So we should not > > delegate this to the core or class driver to handle. > > > USB 2 Specification does mention handling halt conditions due to transmission > erros _OR_ STALL handshake with clearing halt and resetting both host and device > endpoint toggle. > > See USB 2 > > 5.7.5 Interrupt Transfer Data Sequences > > "If a halt condition is detected on an interrupt pipe due to transmission errors or > a STALL handshake being returned from the endpoint, all pending IRPs are retired. > Removal of the halt condition is achieved via software intervention through a > separate control pipe. This recovery will reset the data toggle bit to DATA0 for > the endpoint on both the host and the device" > > 5.8.5 Bulk Transfer Data Sequences > > "If a halt condition is detected on a bulk pipe due to transmission errors or a > STALL handshake being returned from the endpoint, all pending IRPs are retired. > Removal of the halt condition is achieved via software intervention through a > separate control pipe. This recovery will reset the data toggle bit to DATA0 > for the endpoint on both the host and the device" > Ah~ I stand corrected. My memory served me wrong. Thanks for pointing this out. BR, Thinh ^ permalink raw reply [flat|nested] 33+ messages in thread
* [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB 2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman 2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman @ 2026-03-23 12:25 ` Mathias Nyman 1 sibling, 0 replies; 33+ messages in thread From: Mathias Nyman @ 2026-03-23 12:25 UTC (permalink / raw) To: linux-usb Cc: stern, Thinh.Nguyen, michal.pecio, oneukum, niklas.neronin, Mathias Nyman Make sure URB is given back by xhci driver after a stall (EPIPE) or transaction error (EPROTO) on one of the TDs in the URB. Most bulk, interrupt and control transfer URBs consist of one TD, so adding that TD to the cancel list was enough to give back the entire URB when invalidating and giving back cancelled TDs Some bulk URBs that need an additional zero length packet consist of two TDs. Make sure the second TD is added to the cancel list so that URB is given back when clearing the internal endpoint halt. Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> --- drivers/usb/host/xhci-ring.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index f01f343a7e37..5305ad7be025 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -982,8 +982,9 @@ static int xhci_reset_halted_ep(struct xhci_hcd *xhci, unsigned int slot_id, } /* - * Mark the TD the endpoint halted on as TD_HALTED and add it to the cancelled - * td list, ensuring invalidate_cancelled_tds() clears the TD from xHC cache. + * Mark the TD the endpoint halted on as TD_HALTED, add all TDs of that URB + * to the cancelled td list, ensuring invalidate_cancelled_tds() clears the + * TD from xHC cache and URB is given back. * Mark the other TDs on that bulk or interrupt endpoint as TD_TAINTED to * prevent the ring from being restarted too early. */ @@ -997,10 +998,11 @@ static void xhci_cancel_halted_tds(struct xhci_hcd *xhci, struct xhci_virt_ep *e ring = xhci_urb_to_transfer_ring(xhci, td->urb); td->cancel_status = TD_HALTED; - if (list_empty(&td->cancelled_td_list)) - list_add_tail(&td->cancelled_td_list, &ep->cancelled_td_list); list_for_each_entry(tdi, &ring->td_list, td_list) { + if (tdi->urb == td->urb && list_empty(&tdi->cancelled_td_list)) + list_add_tail(&tdi->cancelled_td_list, &ep->cancelled_td_list); + if (!tdi->cancel_status && ring->type != TYPE_CTRL) tdi->cancel_status = TD_TAINTED; } -- 2.43.0 ^ permalink raw reply related [flat|nested] 33+ messages in thread
end of thread, other threads:[~2026-04-07 20:24 UTC | newest] Thread overview: 33+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman 2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman 2026-03-25 1:52 ` Thinh Nguyen 2026-03-25 9:38 ` Mathias Nyman 2026-03-26 1:19 ` Thinh Nguyen 2026-03-26 11:25 ` Mathias Nyman 2026-03-26 23:24 ` Thinh Nguyen 2026-03-30 12:51 ` Mathias Nyman 2026-03-30 14:17 ` stern 2026-03-31 9:34 ` Mathias Nyman 2026-03-31 15:31 ` stern 2026-04-01 22:08 ` Mathias Nyman 2026-04-02 2:36 ` stern 2026-04-03 1:59 ` Thinh Nguyen 2026-04-03 2:42 ` stern 2026-04-03 8:51 ` Michal Pecio 2026-04-03 14:55 ` stern 2026-04-03 19:13 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio 2026-04-03 20:17 ` stern 2026-04-04 1:15 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen 2026-04-04 1:54 ` stern 2026-04-04 20:41 ` Thinh Nguyen 2026-04-04 21:54 ` Alan Stern 2026-04-04 22:15 ` Thinh Nguyen 2026-04-04 22:28 ` Thinh Nguyen 2026-04-05 1:30 ` Alan Stern 2026-04-05 3:10 ` Thinh Nguyen 2026-04-07 15:23 ` Alan Stern 2026-04-07 20:24 ` Mathias Nyman 2026-04-01 22:08 ` Thinh Nguyen 2026-04-01 22:34 ` Mathias Nyman 2026-04-01 22:47 ` Thinh Nguyen 2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox