[RFC PATCH 0/2] fix xhci endpoint restart at EPROTO

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO
@ 2026-03-23 12:25 Mathias Nyman
  2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
  2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman
  0 siblings, 2 replies; 33+ messages in thread
From: Mathias Nyman @ 2026-03-23 12:25 UTC (permalink / raw)
  To: linux-usb
  Cc: stern, Thinh.Nguyen, michal.pecio, oneukum, niklas.neronin,
	Mathias Nyman

Hi

These patches are related to the 'correctly handling EPROTO' discussion,
and aim to fix the xhci endpoint automatic restart issue.

It also makes sure that multi TD URBs are given back with proper
EPROTO/EPIPE status if a stall/transaction error is triggered on
any TD of the URB, not just the last one.

endpoints will still be restarted when a new URB is queued, this does
not fix the issue with BH URBs completion, but will print a debug message
if a new URB is queued before previous 'tainted' pending URBs are cancelled.

I haven't been able to test the actual codepaths in these patches yet.
Can't easily queue several URBs and somehow trigger stall or transaction
error on early URBs, but would like to get feeback on this solution early.

Thanks
Mathias

Mathias Nyman (2):
  xhci: prevent automatic endpoint restart after stall or error
  xhci: Ensure URB is given back when endpoint halts on a multi-TD URB

 drivers/usb/host/xhci-ring.c | 50 +++++++++++++++++++++++++++++++-----
 drivers/usb/host/xhci.c      |  7 +++++
 drivers/usb/host/xhci.h      |  4 ++-
 3 files changed, 53 insertions(+), 8 deletions(-)

-- 
2.43.0

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman
@ 2026-03-23 12:25 ` Mathias Nyman
  2026-03-25  1:52   ` Thinh Nguyen
  2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman
  1 sibling, 1 reply; 33+ messages in thread
From: Mathias Nyman @ 2026-03-23 12:25 UTC (permalink / raw)
  To: linux-usb
  Cc: stern, Thinh.Nguyen, michal.pecio, oneukum, niklas.neronin,
	Mathias Nyman

Avoid automatically restarting bulk or interrupt transfers after a
URB is given back due to stall or error.

Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when
it halted. The actual TD the endpoint halted on is marked TD_HALTED,
and its URB is given back with proper EPROTO or EPIPE error code.

Don't automatically restart an endpoint if the next queued TD after
the TD_HALTED one is marked tainted.

class driver is expected to cancel all those pending URBs before
submitting more URBs, or before clearing halt, but is not forced to.
Always start the endpoint when a new URB is queued, but print a debug
message if there are still tainted URBs left uncancelled.

The same code is used for clearing up cached TDs from xHC in both
the URB stall/error case as when a class driver cancels URBs.
In the cancel case endpoint is restarted after removing the cancelled
URB.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 48 ++++++++++++++++++++++++++++++------
 drivers/usb/host/xhci.c      |  7 ++++++
 drivers/usb/host/xhci.h      |  4 ++-
 3 files changed, 51 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1cbefee3c4ca..f01f343a7e37 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -981,6 +981,31 @@ static int xhci_reset_halted_ep(struct xhci_hcd *xhci, unsigned int slot_id,
 	return ret;
 }
 
+/*
+ * Mark the TD the endpoint halted on as TD_HALTED and add it to the cancelled
+ * td list, ensuring invalidate_cancelled_tds() clears the TD from xHC cache.
+ * Mark the other TDs on that bulk or interrupt endpoint as TD_TAINTED to
+ * prevent the ring from being restarted too early.
+ */
+
+static void xhci_cancel_halted_tds(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
+				   struct xhci_td *td)
+{
+	struct xhci_ring *ring;
+	struct xhci_td *tdi;
+
+	ring = xhci_urb_to_transfer_ring(xhci, td->urb);
+
+	td->cancel_status = TD_HALTED;
+	if (list_empty(&td->cancelled_td_list))
+		list_add_tail(&td->cancelled_td_list, &ep->cancelled_td_list);
+
+	list_for_each_entry(tdi, &ring->td_list, td_list) {
+		if (!tdi->cancel_status && ring->type != TYPE_CTRL)
+			tdi->cancel_status = TD_TAINTED;
+	}
+}
+
 static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
 				struct xhci_virt_ep *ep,
 				struct xhci_td *td,
@@ -999,10 +1024,8 @@ static int xhci_handle_halted_endpoint(struct xhci_hcd *xhci,
 	/* add td to cancelled list and let reset ep handler take care of it */
 	if (reset_type == EP_HARD_RESET) {
 		ep->ep_state |= EP_HARD_CLEAR_TOGGLE;
-		if (td && list_empty(&td->cancelled_td_list)) {
-			list_add_tail(&td->cancelled_td_list, &ep->cancelled_td_list);
-			td->cancel_status = TD_HALTED;
-		}
+		if (td)
+			xhci_cancel_halted_tds(xhci, ep, td);
 	}
 
 	if (ep->ep_state & EP_HALTED) {
@@ -1079,6 +1102,7 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 			case TD_CLEARED: /* TD is already no-op */
 			case TD_CLEARING_CACHE: /* set TR deq command already queued */
 				break;
+			case TD_TAINTED:
 			case TD_DIRTY: /* TD is cached, clear it */
 			case TD_HALTED:
 			case TD_CLEARING_CACHE_DEFERRED:
@@ -1406,6 +1430,14 @@ void xhci_hc_died(struct xhci_hcd *xhci)
 		usb_hc_died(xhci_to_hcd(xhci));
 }
 
+bool next_td_is_tainted(struct xhci_ring *ring)
+{
+	struct xhci_td *td;
+
+	td = list_first_entry_or_null(&ring->td_list, struct xhci_td, td_list);
+	return td && td->cancel_status == TD_TAINTED;
+}
+
 /*
  * When we get a completion for a Set Transfer Ring Dequeue Pointer command,
  * we need to clear the set deq pending flag in the endpoint ring state, so that
@@ -1543,13 +1575,15 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 			 __func__);
 		xhci_invalidate_cancelled_tds(ep);
 		/* Try to restart the endpoint if all is done */
-		ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+		if (!next_td_is_tainted(ep_ring))
+			ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
 		/* Start giving back any TDs invalidated above */
 		xhci_giveback_invalidated_tds(ep);
 	} else {
 		/* Restart any rings with pending URBs */
-		xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__);
-		ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
+		xhci_dbg(ep->xhci, "%s: All cancelled TDs cleared\n", __func__);
+		if (!next_td_is_tainted(ep_ring))
+			ring_doorbell_for_active_rings(xhci, slot_id, ep_index);
 	}
 }
 
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index ef6d8662adec..8f417836ae1f 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1625,6 +1625,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 	unsigned long flags;
 	int ret = 0;
 	unsigned int slot_id, ep_index;
+	struct xhci_ring *ring;
 	unsigned int *ep_state;
 	struct urb_priv	*urb_priv;
 	int num_tds;
@@ -1661,6 +1662,7 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 	}
 
 	slot_id = urb->dev->slot_id;
+	ring = xhci_urb_to_transfer_ring(xhci, urb);
 
 	if (!HCD_HW_ACCESSIBLE(hcd)) {
 		ret = -ESHUTDOWN;
@@ -1694,6 +1696,10 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 		goto free_priv;
 	}
 
+	ring = xhci_urb_to_transfer_ring(xhci, urb);
+	if (ring && next_td_is_tainted(ring))
+		xhci_dbg(xhci, "WARN, enqueue URB without canceling old tainted URBs\n");
+
 	switch (usb_endpoint_type(&urb->ep->desc)) {
 
 	case USB_ENDPOINT_XFER_CONTROL:
@@ -3354,6 +3360,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 
 	/* Bail out if toggle is already being cleared by a endpoint reset */
 	spin_lock_irqsave(&xhci->lock, flags);
+
 	if (ep->ep_state & EP_HARD_CLEAR_TOGGLE) {
 		ep->ep_state &= ~EP_HARD_CLEAR_TOGGLE;
 		spin_unlock_irqrestore(&xhci->lock, flags);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 2b0796f6d00e..0be040f87abf 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1292,7 +1292,8 @@ struct xhci_segment {
 };
 
 enum xhci_cancelled_td_status {
-	TD_DIRTY = 0,
+	TD_TAINTED = 1,
+	TD_DIRTY,
 	TD_HALTED,
 	TD_CLEARING_CACHE,
 	TD_CLEARING_CACHE_DEFERRED,
@@ -1951,6 +1952,7 @@ void xhci_ring_doorbell_for_active_rings(struct xhci_hcd *xhci,
 void xhci_cleanup_command_queue(struct xhci_hcd *xhci);
 void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring);
 unsigned int count_trbs(u64 addr, u64 len);
+bool next_td_is_tainted(struct xhci_ring *ring);
 int xhci_stop_endpoint_sync(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 			    int suspend, gfp_t gfp_flags);
 void xhci_process_cancelled_tds(struct xhci_virt_ep *ep);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB
  2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman
  2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
@ 2026-03-23 12:25 ` Mathias Nyman
  1 sibling, 0 replies; 33+ messages in thread
From: Mathias Nyman @ 2026-03-23 12:25 UTC (permalink / raw)
  To: linux-usb
  Cc: stern, Thinh.Nguyen, michal.pecio, oneukum, niklas.neronin,
	Mathias Nyman

Make sure URB is given back by xhci driver after a stall (EPIPE)
or transaction error (EPROTO) on one of the TDs in the URB.

Most bulk, interrupt and control transfer URBs consist of one TD,
so adding that TD to the cancel list was enough to give back the
entire URB when invalidating and giving back cancelled TDs

Some bulk URBs that need an additional zero length packet consist
of two TDs. Make sure the second TD is added to the cancel list so
that URB is given back when clearing the internal endpoint halt.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index f01f343a7e37..5305ad7be025 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -982,8 +982,9 @@ static int xhci_reset_halted_ep(struct xhci_hcd *xhci, unsigned int slot_id,
 }
 
 /*
- * Mark the TD the endpoint halted on as TD_HALTED and add it to the cancelled
- * td list, ensuring invalidate_cancelled_tds() clears the TD from xHC cache.
+ * Mark the TD the endpoint halted on as TD_HALTED, add all TDs of that URB
+ * to the cancelled td list, ensuring invalidate_cancelled_tds() clears the
+ * TD from xHC cache and URB is given back.
  * Mark the other TDs on that bulk or interrupt endpoint as TD_TAINTED to
  * prevent the ring from being restarted too early.
  */
@@ -997,10 +998,11 @@ static void xhci_cancel_halted_tds(struct xhci_hcd *xhci, struct xhci_virt_ep *e
 	ring = xhci_urb_to_transfer_ring(xhci, td->urb);
 
 	td->cancel_status = TD_HALTED;
-	if (list_empty(&td->cancelled_td_list))
-		list_add_tail(&td->cancelled_td_list, &ep->cancelled_td_list);
 
 	list_for_each_entry(tdi, &ring->td_list, td_list) {
+		if (tdi->urb == td->urb && list_empty(&tdi->cancelled_td_list))
+			list_add_tail(&tdi->cancelled_td_list, &ep->cancelled_td_list);
+
 		if (!tdi->cancel_status && ring->type != TYPE_CTRL)
 			tdi->cancel_status = TD_TAINTED;
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
@ 2026-03-25  1:52   ` Thinh Nguyen
  2026-03-25  9:38     ` Mathias Nyman
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-03-25  1:52 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu,
	Thinh Nguyen, michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On Mon, Mar 23, 2026, Mathias Nyman wrote:
> Avoid automatically restarting bulk or interrupt transfers after a
> URB is given back due to stall or error.
> 
> Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when
> it halted. The actual TD the endpoint halted on is marked TD_HALTED,
> and its URB is given back with proper EPROTO or EPIPE error code.
> 
> Don't automatically restart an endpoint if the next queued TD after
> the TD_HALTED one is marked tainted.

Sounds good for -EPROTO, but will a clear-halt ring the corresponding
the endpoint's doorbell for STALL endpoint?

BR,
Thinh

> 
> class driver is expected to cancel all those pending URBs before
> submitting more URBs, or before clearing halt, but is not forced to.
> Always start the endpoint when a new URB is queued, but print a debug
> message if there are still tainted URBs left uncancelled.
> 
> The same code is used for clearing up cached TDs from xHC in both
> the URB stall/error case as when a class driver cancels URBs.
> In the cancel case endpoint is restarted after removing the cancelled
> URB.
> 
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> ---
>  drivers/usb/host/xhci-ring.c | 48 ++++++++++++++++++++++++++++++------
>  drivers/usb/host/xhci.c      |  7 ++++++
>  drivers/usb/host/xhci.h      |  4 ++-
>  3 files changed, 51 insertions(+), 8 deletions(-)
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-25  1:52   ` Thinh Nguyen
@ 2026-03-25  9:38     ` Mathias Nyman
  2026-03-26  1:19       ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Mathias Nyman @ 2026-03-25  9:38 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On 3/25/26 03:52, Thinh Nguyen wrote:
> On Mon, Mar 23, 2026, Mathias Nyman wrote:
>> Avoid automatically restarting bulk or interrupt transfers after a
>> URB is given back due to stall or error.
>>
>> Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when
>> it halted. The actual TD the endpoint halted on is marked TD_HALTED,
>> and its URB is given back with proper EPROTO or EPIPE error code.
>>
>> Don't automatically restart an endpoint if the next queued TD after
>> the TD_HALTED one is marked tainted.
> 
> Sounds good for -EPROTO, but will a clear-halt ring the corresponding
> the endpoint's doorbell for STALL endpoint?
> 

With this change xhci would not restart the stalled endpoint after a clear-halt
request. The first usb_enqueue() call after clear-halt would start it.

Could make sense to restart the endpoint after a clear-halt, and just add a small
debug message if the next queued URB is marked 'tainted'.

Thanks for the feedback
Mathias


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-25  9:38     ` Mathias Nyman
@ 2026-03-26  1:19       ` Thinh Nguyen
  2026-03-26 11:25         ` Mathias Nyman
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-03-26  1:19 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org,
	stern@rowland.harvard.edu, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Wed, Mar 25, 2026, Mathias Nyman wrote:
> On 3/25/26 03:52, Thinh Nguyen wrote:
> > On Mon, Mar 23, 2026, Mathias Nyman wrote:
> > > Avoid automatically restarting bulk or interrupt transfers after a
> > > URB is given back due to stall or error.
> > > 
> > > Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when
> > > it halted. The actual TD the endpoint halted on is marked TD_HALTED,
> > > and its URB is given back with proper EPROTO or EPIPE error code.
> > > 
> > > Don't automatically restart an endpoint if the next queued TD after
> > > the TD_HALTED one is marked tainted.
> > 
> > Sounds good for -EPROTO, but will a clear-halt ring the corresponding
> > the endpoint's doorbell for STALL endpoint?
> > 
> 
> With this change xhci would not restart the stalled endpoint after a clear-halt
> request. The first usb_enqueue() call after clear-halt would start it.
> 
> Could make sense to restart the endpoint after a clear-halt, and just add a small
> debug message if the next queued URB is marked 'tainted'.
> 

The -EPROTO should be handled differently than -EPIPE. A STALL endpoint
is part of a normal usb flow. Should the class driver submit a new URB
while the endpoint is STALL, we would always expect a STALL error
response after the endpoint is restarted. That's not the case with
-EPROTO where the data may be corrupted and/or the host and device are
out of sync. We should not continue until the class driver do some
recovery. IMHO, we can keep the handling of -EPIPE how it was before.
Let the xHC tell whether the STALL error still persists instead of
managing it by the xhci driver.

Thanks,
Thinh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-26  1:19       ` Thinh Nguyen
@ 2026-03-26 11:25         ` Mathias Nyman
  2026-03-26 23:24           ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Mathias Nyman @ 2026-03-26 11:25 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On 3/26/26 03:19, Thinh Nguyen wrote:
> On Wed, Mar 25, 2026, Mathias Nyman wrote:
>> On 3/25/26 03:52, Thinh Nguyen wrote:
>>> On Mon, Mar 23, 2026, Mathias Nyman wrote:
>>>> Avoid automatically restarting bulk or interrupt transfers after a
>>>> URB is given back due to stall or error.
>>>>
>>>> Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when
>>>> it halted. The actual TD the endpoint halted on is marked TD_HALTED,
>>>> and its URB is given back with proper EPROTO or EPIPE error code.
>>>>
>>>> Don't automatically restart an endpoint if the next queued TD after
>>>> the TD_HALTED one is marked tainted.
>>>
>>> Sounds good for -EPROTO, but will a clear-halt ring the corresponding
>>> the endpoint's doorbell for STALL endpoint?
>>>
>>
>> With this change xhci would not restart the stalled endpoint after a clear-halt
>> request. The first usb_enqueue() call after clear-halt would start it.
>>
>> Could make sense to restart the endpoint after a clear-halt, and just add a small
>> debug message if the next queued URB is marked 'tainted'.
>>
> 
> The -EPROTO should be handled differently than -EPIPE. A STALL endpoint
> is part of a normal usb flow. Should the class driver submit a new URB
> while the endpoint is STALL, we would always expect a STALL error
> response after the endpoint is restarted. That's not the case with
> -EPROTO where the data may be corrupted and/or the host and device are
> out of sync. We should not continue until the class driver do some
> recovery. IMHO, we can keep the handling of -EPIPE how it was before.
> Let the xHC tell whether the STALL error still persists instead of
> managing it by the xhci driver.
> 
I agree that that we should restart the endpoint if class/core enqueues a new
URB _after_ xhci gave back an URB with EPIPE after endpoint STALL.

But I don't think we should restart the ring to continue processing URBs that
were queued before the endpoint stalled. This would prevent the class/core
from even attempting to retire the pending URBs, something USB2.0 spec,
'5.8.5 Bulk Transfer Data Sequences' requires:

"If a halt condition is detected on a bulk pipe due to transmission errors or
  a STALL handshake being returned from the endpoint, all pending IRPs are
  retired.  Removal of the halt condition is achieved via software intervention
  through a separate control pipe."

Thanks
Mathias

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-26 11:25         ` Mathias Nyman
@ 2026-03-26 23:24           ` Thinh Nguyen
  2026-03-30 12:51             ` Mathias Nyman
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-03-26 23:24 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org,
	stern@rowland.harvard.edu, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Thu, Mar 26, 2026, Mathias Nyman wrote:
> On 3/26/26 03:19, Thinh Nguyen wrote:
> > On Wed, Mar 25, 2026, Mathias Nyman wrote:
> > > On 3/25/26 03:52, Thinh Nguyen wrote:
> > > > On Mon, Mar 23, 2026, Mathias Nyman wrote:
> > > > > Avoid automatically restarting bulk or interrupt transfers after a
> > > > > URB is given back due to stall or error.
> > > > > 
> > > > > Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when
> > > > > it halted. The actual TD the endpoint halted on is marked TD_HALTED,
> > > > > and its URB is given back with proper EPROTO or EPIPE error code.
> > > > > 
> > > > > Don't automatically restart an endpoint if the next queued TD after
> > > > > the TD_HALTED one is marked tainted.
> > > > 
> > > > Sounds good for -EPROTO, but will a clear-halt ring the corresponding
> > > > the endpoint's doorbell for STALL endpoint?
> > > > 
> > > 
> > > With this change xhci would not restart the stalled endpoint after a clear-halt
> > > request. The first usb_enqueue() call after clear-halt would start it.
> > > 
> > > Could make sense to restart the endpoint after a clear-halt, and just add a small
> > > debug message if the next queued URB is marked 'tainted'.
> > > 
> > 
> > The -EPROTO should be handled differently than -EPIPE. A STALL endpoint
> > is part of a normal usb flow. Should the class driver submit a new URB
> > while the endpoint is STALL, we would always expect a STALL error
> > response after the endpoint is restarted. That's not the case with
> > -EPROTO where the data may be corrupted and/or the host and device are
> > out of sync. We should not continue until the class driver do some
> > recovery. IMHO, we can keep the handling of -EPIPE how it was before.
> > Let the xHC tell whether the STALL error still persists instead of
> > managing it by the xhci driver.
> > 
> I agree that that we should restart the endpoint if class/core enqueues a new
> URB _after_ xhci gave back an URB with EPIPE after endpoint STALL.
> 
> But I don't think we should restart the ring to continue processing URBs that
> were queued before the endpoint stalled. This would prevent the class/core
> from even attempting to retire the pending URBs, something USB2.0 spec,
> '5.8.5 Bulk Transfer Data Sequences' requires:
> 
> "If a halt condition is detected on a bulk pipe due to transmission errors or
>  a STALL handshake being returned from the endpoint, all pending IRPs are
>  retired.  Removal of the halt condition is achieved via software intervention
>  through a separate control pipe."
> 

Fair point. Then the core will need to track the endpoint's STALL state
and parse the clear-halt request to know which endpoint and when to
clear the STALL before it can accept new URB. So the first usb_enqueue()
call after clear-halt can start the endpoint again. The xhci will also
need to have access to this state.

Currently you have the xhci driver to "retire" the halted URBs. However,
you also noted that class/core may attempt to retire the pending URBs.
Who's expected to handle the retirement here?

On a separate note, will you plan to implement the clear-halt for EPROTO
in xhci?

Thanks!
Thinh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-26 23:24           ` Thinh Nguyen
@ 2026-03-30 12:51             ` Mathias Nyman
  2026-03-30 14:17               ` stern
  2026-04-01 22:08               ` Thinh Nguyen
  0 siblings, 2 replies; 33+ messages in thread
From: Mathias Nyman @ 2026-03-30 12:51 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On 3/27/26 01:24, Thinh Nguyen wrote:
> On Thu, Mar 26, 2026, Mathias Nyman wrote:
>> On 3/26/26 03:19, Thinh Nguyen wrote:
>>> On Wed, Mar 25, 2026, Mathias Nyman wrote:
>>>> On 3/25/26 03:52, Thinh Nguyen wrote:
>>>>> On Mon, Mar 23, 2026, Mathias Nyman wrote:
>>>>>> Avoid automatically restarting bulk or interrupt transfers after a
>>>>>> URB is given back due to stall or error.
>>>>>>
>>>>>> Introduce a 'TD_TAINTED' state for pending TDs queued on a endpoint when
>>>>>> it halted. The actual TD the endpoint halted on is marked TD_HALTED,
>>>>>> and its URB is given back with proper EPROTO or EPIPE error code.
>>>>>>
>>>>>> Don't automatically restart an endpoint if the next queued TD after
>>>>>> the TD_HALTED one is marked tainted.
>>>>>
>>>>> Sounds good for -EPROTO, but will a clear-halt ring the corresponding
>>>>> the endpoint's doorbell for STALL endpoint?
>>>>>
>>>>
>>>> With this change xhci would not restart the stalled endpoint after a clear-halt
>>>> request. The first usb_enqueue() call after clear-halt would start it.
>>>>
>>>> Could make sense to restart the endpoint after a clear-halt, and just add a small
>>>> debug message if the next queued URB is marked 'tainted'.
>>>>
>>>
>>> The -EPROTO should be handled differently than -EPIPE. A STALL endpoint
>>> is part of a normal usb flow. Should the class driver submit a new URB
>>> while the endpoint is STALL, we would always expect a STALL error
>>> response after the endpoint is restarted. That's not the case with
>>> -EPROTO where the data may be corrupted and/or the host and device are
>>> out of sync. We should not continue until the class driver do some
>>> recovery. IMHO, we can keep the handling of -EPIPE how it was before.
>>> Let the xHC tell whether the STALL error still persists instead of
>>> managing it by the xhci driver.
>>>
>> I agree that that we should restart the endpoint if class/core enqueues a new
>> URB _after_ xhci gave back an URB with EPIPE after endpoint STALL.
>>
>> But I don't think we should restart the ring to continue processing URBs that
>> were queued before the endpoint stalled. This would prevent the class/core
>> from even attempting to retire the pending URBs, something USB2.0 spec,
>> '5.8.5 Bulk Transfer Data Sequences' requires:
>>
>> "If a halt condition is detected on a bulk pipe due to transmission errors or
>>   a STALL handshake being returned from the endpoint, all pending IRPs are
>>   retired.  Removal of the halt condition is achieved via software intervention
>>   through a separate control pipe."
>>
> 
> Fair point. Then the core will need to track the endpoint's STALL state
> and parse the clear-halt request to know which endpoint and when to
> clear the STALL before it can accept new URB. So the first usb_enqueue()
> call after clear-halt can start the endpoint again. The xhci will also
> need to have access to this state.

Ideally xhci driver would return the URB with EPIPE after STALL, and not continue
processing URBs before a clear halt is sent, or a new URB is enqueued.
USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this
STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB.

Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to
the bh workqueue. Then after urb->complete() work is called and core is sure class
driver  is aware of the EPIPE, then core would clear the flag and flush the buffered
URBs to the host controller drivers, restarting the ring

Class driver urb->complete() would most likely retire/cancel the pending URBs, both the
earlier queued 'tainted' URBs, and the most recent 'buffered' URBs in usb core.
Class driver should clear the halt, but is free to do whatever it wants.
It could choose to send a new URB without clearing the halt,
have it processed, trigger STALL again, and get URB returned with EPIPE status.

I don't think most class/usb device drivers really handle stall that well.
Above might be wishful thinking.

> 
> Currently you have the xhci driver to "retire" the halted URBs. However,
> you also noted that class/core may attempt to retire the pending URBs.
> Who's expected to handle the retirement here?

Maybe we should let core retire the pending URBS, and only fix the xhci driver
'automatic endpoint restart after stall' part after that core change is done.

Should cause less regression.

> 
> On a separate note, will you plan to implement the clear-halt for EPROTO
> in xhci?

I don't think this should be part of xhci driver. Decision to send control requests
to the device should be done by core or class drivers.

Thanks
Mathias



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-30 12:51             ` Mathias Nyman
@ 2026-03-30 14:17               ` stern
  2026-03-31  9:34                 ` Mathias Nyman
  2026-04-01 22:08               ` Thinh Nguyen
  1 sibling, 1 reply; 33+ messages in thread
From: stern @ 2026-03-30 14:17 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Mon, Mar 30, 2026 at 03:51:46PM +0300, Mathias Nyman wrote:
> Ideally xhci driver would return the URB with EPIPE after STALL, and not continue
> processing URBs before a clear halt is sent, or a new URB is enqueued.
> USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this
> STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB.
> 
> Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to
> the bh workqueue. Then after urb->complete() work is called and core is sure class
> driver  is aware of the EPIPE, then core would clear the flag and flush the buffered
> URBs to the host controller drivers, restarting the ring

This is not practical; it would result in a big slowdown for large bulk 
transfers.  Doing this would mean the core could not send an URB to the 
HCD until all the previous URBs for that endpoint had completed and the 
interrupt handler had run, which would add significant latency to 
transfers.

Exactly what should happen to URBs coming after one that completes with 
-EPIPE is not immediately obvious.  If the HCD did try to send them to 
the device right away then they would also be stalled, which is 
obviously non-productive.  The only guarantee the kernel makes 
about this situation is that the endpoint queue won't restart 
until all completed or unlinked URBs have been given back (likewise for 
-EPROTO errors).

The only safe course available to class drivers is to unlink all the 
pending URBs.  In theory the core could do this for all drivers 
automatically, but at present it doesn't.

> Class driver urb->complete() would most likely retire/cancel the pending URBs, both the
> earlier queued 'tainted' URBs, and the most recent 'buffered' URBs in usb core.
> Class driver should clear the halt, but is free to do whatever it wants.
> It could choose to send a new URB without clearing the halt,
> have it processed, trigger STALL again, and get URB returned with EPIPE status.
> 
> I don't think most class/usb device drivers really handle stall that well.
> Above might be wishful thinking.

Indeed.  We can make life a little easier for drivers, but clearing the 
endpoint halt is up to them.

> > Currently you have the xhci driver to "retire" the halted URBs. However,
> > you also noted that class/core may attempt to retire the pending URBs.
> > Who's expected to handle the retirement here?
> 
> Maybe we should let core retire the pending URBS, and only fix the xhci driver
> 'automatic endpoint restart after stall' part after that core change is done.
> 
> Should cause less regression.
> 
> > 
> > On a separate note, will you plan to implement the clear-halt for EPROTO
> > in xhci?
> 
> I don't think this should be part of xhci driver. Decision to send control requests
> to the device should be done by core or class drivers.

Here's a troubling consequence for people to consider: Suppose an 
endpoint queue stops with -EPROTO or -EPIPE, and before the class driver 
gets around to calling usb_clear_halt(), it is unbound.  What happens 
the next time a driver binds to the device and tries to use the 
endpoint?

In particular: What does xhci-hcd do if an URB is submitted for an 
endpoint whose queue is currently stopped?  Does it reject the 
submission with some sort of error code, or does it go ahead and add the 
URB to the end of the non-advancing queue?  If the latter is true, how 
will a newly bound driver ever discover that the queue is stopped?

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-30 14:17               ` stern
@ 2026-03-31  9:34                 ` Mathias Nyman
  2026-03-31 15:31                   ` stern
  0 siblings, 1 reply; 33+ messages in thread
From: Mathias Nyman @ 2026-03-31  9:34 UTC (permalink / raw)
  To: stern@rowland.harvard.edu
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On 3/30/26 17:17, stern@rowland.harvard.edu wrote:
> On Mon, Mar 30, 2026 at 03:51:46PM +0300, Mathias Nyman wrote:
>> Ideally xhci driver would return the URB with EPIPE after STALL, and not continue
>> processing URBs before a clear halt is sent, or a new URB is enqueued.
>> USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this
>> STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB.
>>
>> Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to
>> the bh workqueue. Then after urb->complete() work is called and core is sure class
>> driver  is aware of the EPIPE, then core would clear the flag and flush the buffered
>> URBs to the host controller drivers, restarting the ring
> 
> This is not practical; it would result in a big slowdown for large bulk
> transfers.  Doing this would mean the core could not send an URB to the
> HCD until all the previous URBs for that endpoint had completed and the
> interrupt handler had run, which would add significant latency to
> transfers.

The URB submit buffering in core would only take place if endpoint is halted
with EPIPE/EPROTO.

Additional latency should be limited to the time between interrupt handler
returns a URB with EPIPE/EPROTO, and the bh workqueue finishing urb->complete()
for that URB

URBs would normally be sent to HCD directly.

Yes, this might end up being quite complex, need to make sure it solves more
issues than it creates.

Usb core is aware of the halted endpoint before the class driver due to the bh
workqueue giveback. To me it would make sense to make core responsible for babysitting
the class driver urb submission for the time it withholds this information.

Whole reason for this is to prevent new URB submission from HCD restarting an endpoint
after HCD gave back a halted URB, and HCD assuming core/class drivers are aware of
the halt when the new URB is submitted.

Other option is that usb core would just refuse class driver from submitting URBs
during this time. usb_submit_urb() would return with an error, but I think this might
impact existing class drivers more.

> 
> Exactly what should happen to URBs coming after one that completes with
> -EPIPE is not immediately obvious.  If the HCD did try to send them to
> the device right away then they would also be stalled, which is
> obviously non-productive.  The only guarantee the kernel makes
> about this situation is that the endpoint queue won't restart
> until all completed or unlinked URBs have been given back (likewise for
> -EPROTO errors).

My take is that endpoint should stop processing URBs, existing pending URBs
should be retired by class/core, new URBs should restart the endpoint but new URBs
are only permitted _after_ submitter is aware of the halt.

A class driver testing USB specification should be able to resubmit a stalled URB
(EPIPE) and expect it to complete with EPIPE again until it clears the halt.

> 
> The only safe course available to class drivers is to unlink all the
> pending URBs.  In theory the core could do this for all drivers
> automatically, but at present it doesn't.
> 
>> Class driver urb->complete() would most likely retire/cancel the pending URBs, both the
>> earlier queued 'tainted' URBs, and the most recent 'buffered' URBs in usb core.
>> Class driver should clear the halt, but is free to do whatever it wants.
>> It could choose to send a new URB without clearing the halt,
>> have it processed, trigger STALL again, and get URB returned with EPIPE status.
>>
>> I don't think most class/usb device drivers really handle stall that well.
>> Above might be wishful thinking.
> 
> Indeed.  We can make life a little easier for drivers, but clearing the
> endpoint halt is up to them.
> 
>>> Currently you have the xhci driver to "retire" the halted URBs. However,
>>> you also noted that class/core may attempt to retire the pending URBs.
>>> Who's expected to handle the retirement here?
>>
>> Maybe we should let core retire the pending URBS, and only fix the xhci driver
>> 'automatic endpoint restart after stall' part after that core change is done.
>>
>> Should cause less regression.
>>
>>>
>>> On a separate note, will you plan to implement the clear-halt for EPROTO
>>> in xhci?
>>
>> I don't think this should be part of xhci driver. Decision to send control requests
>> to the device should be done by core or class drivers.
> 
> Here's a troubling consequence for people to consider: Suppose an
> endpoint queue stops with -EPROTO or -EPIPE, and before the class driver
> gets around to calling usb_clear_halt(), it is unbound.  What happens
> the next time a driver binds to the device and tries to use the
> endpoint?

The disable/enable interface and set config calls during unbind/bind should,
if I remember correctly flush pending URBs and drop and re-add the endpoint,
clearing the xhci side halt and reset toggle.

> 
> In particular: What does xhci-hcd do if an URB is submitted for an
> endpoint whose queue is currently stopped?  Does it reject the
> submission with some sort of error code, or does it go ahead and add the
> URB to the end of the non-advancing queue?  If the latter is true, how
> will a newly bound driver ever discover that the queue is stopped?

xhci-hcd will queue the new URB  and restart the ring, if device side endpoint
remains halted due to uncleared halt then transfer stalls and URB is returned
with EPIPE.

Thanks
Mathias

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-31  9:34                 ` Mathias Nyman
@ 2026-03-31 15:31                   ` stern
  2026-04-01 22:08                     ` Mathias Nyman
  0 siblings, 1 reply; 33+ messages in thread
From: stern @ 2026-03-31 15:31 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Tue, Mar 31, 2026 at 12:34:54PM +0300, Mathias Nyman wrote:
> On 3/30/26 17:17, stern@rowland.harvard.edu wrote:
> > On Mon, Mar 30, 2026 at 03:51:46PM +0300, Mathias Nyman wrote:
> > > Ideally xhci driver would return the URB with EPIPE after STALL, and not continue
> > > processing URBs before a clear halt is sent, or a new URB is enqueued.
> > > USB core would hold off submitting URBs to xhci, buffering URBs enqueued for this
> > > STALLED endpoint until class driver urb->complete() finishes for that EPIPE URB.
> > > 
> > > Usb core could flag this endpoint as "halt_pending" before adding the EPIPE URB to
> > > the bh workqueue. Then after urb->complete() work is called and core is sure class
> > > driver  is aware of the EPIPE, then core would clear the flag and flush the buffered
> > > URBs to the host controller drivers, restarting the ring
> > 
> > This is not practical; it would result in a big slowdown for large bulk
> > transfers.  Doing this would mean the core could not send an URB to the
> > HCD until all the previous URBs for that endpoint had completed and the
> > interrupt handler had run, which would add significant latency to
> > transfers.
> 
> The URB submit buffering in core would only take place if endpoint is halted
> with EPIPE/EPROTO.
> 
> Additional latency should be limited to the time between interrupt handler
> returns a URB with EPIPE/EPROTO, and the bh workqueue finishing urb->complete()
> for that URB
> 
> URBs would normally be sent to HCD directly.
> 
> Yes, this might end up being quite complex, need to make sure it solves more
> issues than it creates.

Can that be done precisely?  I doubt it -- there are races between the 
core and the HCD and between the HCD and the hardware.

As for doing it imprecisely...  I'm not sure the advantages outweigh the 
complexity.  Anyway, if the class driver is well behaved, this buffering 
won't be needed.

> Usb core is aware of the halted endpoint before the class driver due to the bh
> workqueue giveback. To me it would make sense to make core responsible for babysitting
> the class driver urb submission for the time it withholds this information.
> 
> Whole reason for this is to prevent new URB submission from HCD restarting an endpoint
> after HCD gave back a halted URB, and HCD assuming core/class drivers are aware of
> the halt when the new URB is submitted.
> 
> Other option is that usb core would just refuse class driver from submitting URBs
> during this time. usb_submit_urb() would return with an error, but I think this might
> impact existing class drivers more.

How about this instead?  We add a "halted" flag to the usb_host_endpoint 
structure, and the core will set this flag whenever a bulk or interrupt 
URB gets a status other than 0 (before putting the URB on the bh list).  
If an URB has one of these statuses, when its completion handler returns 
the core will unlink all the URBs queued to the same endpoint.  Finally, 
the "halted" flag should be cleared after a completion handler returns 
if there are no more unlinked URBs still in the queue or URBs waiting on 
the bh list to be given back.

The result of this is that any URB remaining in the queue when the flag 
is cleared must have been submitted by the class driver _after_ the 
failing URB's completion handler has run.  We can assume the class 
driver knows what it's doing in this case.

The endpoint queue shouldn't be restarted until the "halted" flag is 
cleared.  Either right away, if there are any URBs in the queue, or not 
until the next URB is submitted.  Doing this might require a new HCD 
callback.  (It would also mean the kerneldoc for usb_unlink_urb() would 
need to be updated, because the endpoint might restart before all the 
completion handlers for the unlinked URBs have run.)

What I'm trying to do here is come up with a single, consistent proposal 
for exactly when halted endpoint queues should restart.  Maybe someone 
else has a better suggestion.

We also should add a special case for control endpoints because their 
halt conditions can get cleared automatically when the device receives a 
SETUP packet.

> > Exactly what should happen to URBs coming after one that completes with
> > -EPIPE is not immediately obvious.  If the HCD did try to send them to
> > the device right away then they would also be stalled, which is
> > obviously non-productive.  The only guarantee the kernel makes
> > about this situation is that the endpoint queue won't restart
> > until all completed or unlinked URBs have been given back (likewise for
> > -EPROTO errors).
> 
> My take is that endpoint should stop processing URBs, existing pending URBs
> should be retired by class/core, new URBs should restart the endpoint but new URBs
> are only permitted _after_ submitter is aware of the halt.

Yes, that's more or less what my proposal attempts to accomplish.

> A class driver testing USB specification should be able to resubmit a stalled URB
> (EPIPE) and expect it to complete with EPIPE again until it clears the halt.

Agreed.  

> > Here's a troubling consequence for people to consider: Suppose an
> > endpoint queue stops with -EPROTO or -EPIPE, and before the class driver
> > gets around to calling usb_clear_halt(), it is unbound.  What happens
> > the next time a driver binds to the device and tries to use the
> > endpoint?
> 
> The disable/enable interface and set config calls during unbind/bind should,
> if I remember correctly flush pending URBs and drop and re-add the endpoint,
> clearing the xhci side halt and reset toggle.

usb_probe_interface() doesn't do any of that stuff, other than a 
deferred switch to altsetting 0 if needed.

usb_unbind_interface() does call usb_enable_interface() if the interface 
is already in altsetting 0, but the reset_ep argument is false so the 
endpoint state doesn't get affected.  Should that be changed?

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-30 12:51             ` Mathias Nyman
  2026-03-30 14:17               ` stern
@ 2026-04-01 22:08               ` Thinh Nguyen
  2026-04-01 22:34                 ` Mathias Nyman
  1 sibling, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-01 22:08 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org,
	stern@rowland.harvard.edu, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Mon, Mar 30, 2026, Mathias Nyman wrote:
> 
> > 
> > On a separate note, will you plan to implement the clear-halt for EPROTO
> > in xhci?
> 
> I don't think this should be part of xhci driver. Decision to send control requests
> to the device should be done by core or class drivers.
> 

This not like STALL where it's standardized for the core or class driver
to know how to handle. The programming sequence for the errors that
resulted in EPROTO from xhci is specific to xhci. That is, the xhci
reset endpoint command will reset the bulk sequence, it's specific to
xhci. The xhci spec recommends to send a clear-halt for this scenario,
not the USB spec or any other class specific spec. So we should not
delegate this to the core or class driver to handle.

BR,
Thinh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-03-31 15:31                   ` stern
@ 2026-04-01 22:08                     ` Mathias Nyman
  2026-04-02  2:36                       ` stern
  0 siblings, 1 reply; 33+ messages in thread
From: Mathias Nyman @ 2026-04-01 22:08 UTC (permalink / raw)
  To: stern@rowland.harvard.edu
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On 3/31/26 18:31, stern@rowland.harvard.edu wrote:
> 
> How about this instead?  We add a "halted" flag to the usb_host_endpoint
> structure, and the core will set this flag whenever a bulk or interrupt
> URB gets a status other than 0 (before putting the URB on the bh list).
> If an URB has one of these statuses, when its completion handler returns
> the core will unlink all the URBs queued to the same endpoint.  Finally,
> the "halted" flag should be cleared after a completion handler returns
> if there are no more unlinked URBs still in the queue or URBs waiting on
> the bh list to be given back.
> 
> The result of this is that any URB remaining in the queue when the flag
> is cleared must have been submitted by the class driver _after_ the
> failing URB's completion handler has run.  We can assume the class
> driver knows what it's doing in this case.
> 
> The endpoint queue shouldn't be restarted until the "halted" flag is
> cleared.  Either right away, if there are any URBs in the queue, or not
> until the next URB is submitted.  Doing this might require a new HCD
> callback.  (It would also mean the kerneldoc for usb_unlink_urb() would
> need to be updated, because the endpoint might restart before all the
> completion handlers for the unlinked URBs have run.)
> 
> What I'm trying to do here is come up with a single, consistent proposal
> for exactly when halted endpoint queues should restart.  Maybe someone
> else has a better suggestion.

Sounds like a possible solution to me.

Just to clarify, core should unlink the remaining URBs queued to that endpoint
after setting the "halted" flag, but before URB completion is called.
"Halted" flag should be cleared after URB completion returns, and endpoint
should be restarted if there are any pending URBs.

This allows the class driver URB completion handler to re-queue the halted URB
without core unlinking it.

> 
>>> Here's a troubling consequence for people to consider: Suppose an
>>> endpoint queue stops with -EPROTO or -EPIPE, and before the class driver
>>> gets around to calling usb_clear_halt(), it is unbound.  What happens
>>> the next time a driver binds to the device and tries to use the
>>> endpoint?
>>
>> The disable/enable interface and set config calls during unbind/bind should,
>> if I remember correctly flush pending URBs and drop and re-add the endpoint,
>> clearing the xhci side halt and reset toggle.
> 
> usb_probe_interface() doesn't do any of that stuff, other than a
> deferred switch to altsetting 0 if needed.
> 
> usb_unbind_interface() does call usb_enable_interface() if the interface
> is already in altsetting 0, but the reset_ep argument is false so the
> endpoint state doesn't get affected.  Should that be changed?

Looks like this needs more attention. Interface driver unbind/bind with
halted endpoint could be an issue. I don't have an answer right now.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-01 22:08               ` Thinh Nguyen
@ 2026-04-01 22:34                 ` Mathias Nyman
  2026-04-01 22:47                   ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Mathias Nyman @ 2026-04-01 22:34 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: linux-usb@vger.kernel.org, stern@rowland.harvard.edu,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On 4/2/26 01:08, Thinh Nguyen wrote:
> On Mon, Mar 30, 2026, Mathias Nyman wrote:
>>
>>>
>>> On a separate note, will you plan to implement the clear-halt for EPROTO
>>> in xhci?
>>
>> I don't think this should be part of xhci driver. Decision to send control requests
>> to the device should be done by core or class drivers.
>>
> 
> This not like STALL where it's standardized for the core or class driver
> to know how to handle. The programming sequence for the errors that
> resulted in EPROTO from xhci is specific to xhci. That is, the xhci
> reset endpoint command will reset the bulk sequence, it's specific to
> xhci. The xhci spec recommends to send a clear-halt for this scenario,
> not the USB spec or any other class specific spec. So we should not
> delegate this to the core or class driver to handle.
> 
USB 2 Specification does mention handling halt conditions due to  transmission
erros _OR_ STALL handshake with clearing halt and resetting both host and device
endpoint toggle.

See USB 2

5.7.5 Interrupt Transfer Data Sequences

"If a halt condition is detected on an interrupt pipe due to transmission errors or
  a STALL handshake being returned from the endpoint, all pending IRPs are retired.
  Removal of the halt condition is achieved via software intervention through a
  separate control pipe. This recovery will reset the data toggle bit to DATA0 for
  the endpoint on both the host and the device"

5.8.5 Bulk Transfer Data Sequences

"If a halt condition is detected on a bulk pipe due to transmission errors or a
  STALL handshake being returned from the endpoint, all pending IRPs are retired.
  Removal of the halt condition is achieved via software intervention through a
  separate control pipe. This recovery will reset the data toggle bit to DATA0
  for the endpoint on both the host and the device"

Thanks
Mathias

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-01 22:34                 ` Mathias Nyman
@ 2026-04-01 22:47                   ` Thinh Nguyen
  0 siblings, 0 replies; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-01 22:47 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org,
	stern@rowland.harvard.edu, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Thu, Apr 02, 2026, Mathias Nyman wrote:
> On 4/2/26 01:08, Thinh Nguyen wrote:
> > On Mon, Mar 30, 2026, Mathias Nyman wrote:
> > > 
> > > > 
> > > > On a separate note, will you plan to implement the clear-halt for EPROTO
> > > > in xhci?
> > > 
> > > I don't think this should be part of xhci driver. Decision to send control requests
> > > to the device should be done by core or class drivers.
> > > 
> > 
> > This not like STALL where it's standardized for the core or class driver
> > to know how to handle. The programming sequence for the errors that
> > resulted in EPROTO from xhci is specific to xhci. That is, the xhci
> > reset endpoint command will reset the bulk sequence, it's specific to
> > xhci. The xhci spec recommends to send a clear-halt for this scenario,
> > not the USB spec or any other class specific spec. So we should not
> > delegate this to the core or class driver to handle.
> > 
> USB 2 Specification does mention handling halt conditions due to  transmission
> erros _OR_ STALL handshake with clearing halt and resetting both host and device
> endpoint toggle.
> 
> See USB 2
> 
> 5.7.5 Interrupt Transfer Data Sequences
> 
> "If a halt condition is detected on an interrupt pipe due to transmission errors or
>  a STALL handshake being returned from the endpoint, all pending IRPs are retired.
>  Removal of the halt condition is achieved via software intervention through a
>  separate control pipe. This recovery will reset the data toggle bit to DATA0 for
>  the endpoint on both the host and the device"
> 
> 5.8.5 Bulk Transfer Data Sequences
> 
> "If a halt condition is detected on a bulk pipe due to transmission errors or a
>  STALL handshake being returned from the endpoint, all pending IRPs are retired.
>  Removal of the halt condition is achieved via software intervention through a
>  separate control pipe. This recovery will reset the data toggle bit to DATA0
>  for the endpoint on both the host and the device"
> 

Ah~ I stand corrected. My memory served me wrong. Thanks for pointing
this out.

BR,
Thinh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-01 22:08                     ` Mathias Nyman
@ 2026-04-02  2:36                       ` stern
  2026-04-03  1:59                         ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: stern @ 2026-04-02  2:36 UTC (permalink / raw)
  To: Mathias Nyman
  Cc: Thinh Nguyen, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote:
> On 3/31/26 18:31, stern@rowland.harvard.edu wrote:
> > 
> > How about this instead?  We add a "halted" flag to the usb_host_endpoint
> > structure, and the core will set this flag whenever a bulk or interrupt
> > URB gets a status other than 0 (before putting the URB on the bh list).
> > If an URB has one of these statuses, when its completion handler returns
> > the core will unlink all the URBs queued to the same endpoint.  Finally,
> > the "halted" flag should be cleared after a completion handler returns
> > if there are no more unlinked URBs still in the queue or URBs waiting on
> > the bh list to be given back.
> > 
> > The result of this is that any URB remaining in the queue when the flag
> > is cleared must have been submitted by the class driver _after_ the
> > failing URB's completion handler has run.  We can assume the class
> > driver knows what it's doing in this case.
> > 
> > The endpoint queue shouldn't be restarted until the "halted" flag is
> > cleared.  Either right away, if there are any URBs in the queue, or not
> > until the next URB is submitted.  Doing this might require a new HCD
> > callback.  (It would also mean the kerneldoc for usb_unlink_urb() would
> > need to be updated, because the endpoint might restart before all the
> > completion handlers for the unlinked URBs have run.)
> > 
> > What I'm trying to do here is come up with a single, consistent proposal
> > for exactly when halted endpoint queues should restart.  Maybe someone
> > else has a better suggestion.
> 
> Sounds like a possible solution to me.
> 
> Just to clarify, core should unlink the remaining URBs queued to that endpoint
> after setting the "halted" flag, but before URB completion is called.

Above I wrote that the remaining URBs should be unlinked _after_ the 
completion handler is called.  If we did the unlinks before then the 
class driver might submit a new URB after the unlinks were finished and 
before the completion handler learned about the transaction error, and 
this new URB then wouldn't get unlinked.

It's a race between completion of one URB and submission of another.

> "Halted" flag should be cleared after URB completion returns, and endpoint
> should be restarted if there are any pending URBs.

To be clear, the flag should be cleared after the completion handlers 
for _all_ the unlinked URBs (as well as the URB getting the original 
error) have returned.

> This allows the class driver URB completion handler to re-queue the halted URB
> without core unlinking it.

The completion handler shouldn't do this, because it would mean 
resubmitting without doing a clear-halt first.  (Completion handlers 
can't do clear-halts because they run in atomic context.)  If it does 
try to do this anyway, I see nothing wrong with the core unlinking the 
resubmitted URB.

(Are you going to ask about verification tests that set the endpoint's 
Halt feature, submit an URB, wait for it to fail with -EPIPE, and then 
submit another URB from within the completion handler?  :-) )

> > > > Here's a troubling consequence for people to consider: Suppose an
> > > > endpoint queue stops with -EPROTO or -EPIPE, and before the class driver
> > > > gets around to calling usb_clear_halt(), it is unbound.  What happens
> > > > the next time a driver binds to the device and tries to use the
> > > > endpoint?
> > > 
> > > The disable/enable interface and set config calls during unbind/bind should,
> > > if I remember correctly flush pending URBs and drop and re-add the endpoint,
> > > clearing the xhci side halt and reset toggle.
> > 
> > usb_probe_interface() doesn't do any of that stuff, other than a
> > deferred switch to altsetting 0 if needed.
> > 
> > usb_unbind_interface() does call usb_enable_interface() if the interface
> > is already in altsetting 0, but the reset_ep argument is false so the
> > endpoint state doesn't get affected.  Should that be changed?
> 
> Looks like this needs more attention. Interface driver unbind/bind with
> halted endpoint could be an issue. I don't have an answer right now.

We can ponder it...

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-02  2:36                       ` stern
@ 2026-04-03  1:59                         ` Thinh Nguyen
  2026-04-03  2:42                           ` stern
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-03  1:59 UTC (permalink / raw)
  To: stern@rowland.harvard.edu
  Cc: Mathias Nyman, Thinh Nguyen, linux-usb@vger.kernel.org,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On Wed, Apr 01, 2026, stern@rowland.harvard.edu wrote:
> On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote:
> > On 3/31/26 18:31, stern@rowland.harvard.edu wrote:
> > > 
> > > How about this instead?  We add a "halted" flag to the usb_host_endpoint
> > > structure, and the core will set this flag whenever a bulk or interrupt
> > > URB gets a status other than 0 (before putting the URB on the bh list).
> > > If an URB has one of these statuses, when its completion handler returns
> > > the core will unlink all the URBs queued to the same endpoint.  Finally,
> > > the "halted" flag should be cleared after a completion handler returns
> > > if there are no more unlinked URBs still in the queue or URBs waiting on
> > > the bh list to be given back.
> > > 
> > > The result of this is that any URB remaining in the queue when the flag
> > > is cleared must have been submitted by the class driver _after_ the
> > > failing URB's completion handler has run.  We can assume the class
> > > driver knows what it's doing in this case.
> > > 
> > > The endpoint queue shouldn't be restarted until the "halted" flag is
> > > cleared.  Either right away, if there are any URBs in the queue, or not
> > > until the next URB is submitted.  Doing this might require a new HCD
> > > callback.  (It would also mean the kerneldoc for usb_unlink_urb() would
> > > need to be updated, because the endpoint might restart before all the
> > > completion handlers for the unlinked URBs have run.)
> > > 
> > > What I'm trying to do here is come up with a single, consistent proposal
> > > for exactly when halted endpoint queues should restart.  Maybe someone
> > > else has a better suggestion.
> > 
> > Sounds like a possible solution to me.
> > 
> > Just to clarify, core should unlink the remaining URBs queued to that endpoint
> > after setting the "halted" flag, but before URB completion is called.
> 
> Above I wrote that the remaining URBs should be unlinked _after_ the 
> completion handler is called.  If we did the unlinks before then the 
> class driver might submit a new URB after the unlinks were finished and 
> before the completion handler learned about the transaction error, and 
> this new URB then wouldn't get unlinked.
> 
> It's a race between completion of one URB and submission of another.
> 
> > "Halted" flag should be cleared after URB completion returns, and endpoint
> > should be restarted if there are any pending URBs.
> 
> To be clear, the flag should be cleared after the completion handlers 
> for _all_ the unlinked URBs (as well as the URB getting the original 
> error) have returned.
> 
> > This allows the class driver URB completion handler to re-queue the halted URB
> > without core unlinking it.
> 
> The completion handler shouldn't do this, because it would mean 
> resubmitting without doing a clear-halt first.  (Completion handlers 
> can't do clear-halts because they run in atomic context.)  If it does 
> try to do this anyway, I see nothing wrong with the core unlinking the 
> resubmitted URB.
> 
> (Are you going to ask about verification tests that set the endpoint's 
> Halt feature, submit an URB, wait for it to fail with -EPIPE, and then 
> submit another URB from within the completion handler?  :-) )
> 

How about this:

Introduce a halted flag the following conditions:

* Introduce the halted flag in usb_host_endpoint
* The halted flag must be implemented as a bit in a unsigned long so
  we can use atomic bit operation
* Only the HCD may set the halted flag, and only upon checking the
  first URB completing with a halted status
* Only the USB core may clear the halted flag, and only after
  usb_reset_endpoint returns (this makes sure the HCD drained and reset
  the endpoint before the flag is cleared and new URBs are accepted)
* The usb_reset_endpoint must be called after clear-halt, SetInterface,
  and SetConfiguration.
* The USB core will not attempt to unlink pending URBs due to halted
  condition
* The HCD is responsible for completing or canceling queued URBs
  when the halted flag is set. Cancelled and newly submitted URBs will
  be returned with -EPIPE as long as the halted flag is set
* The class driver is responsible to check the halted flag to
  determine whether to initiate error recovery via usb_clear_halt

I'm trying to keep a clear separation of responsibilities between HCD
and the USB core. Also, I try to keep the halted flag more closely match
the state of the endpoint. 

Let me know what you think?

BR,
Thinh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-03  1:59                         ` Thinh Nguyen
@ 2026-04-03  2:42                           ` stern
  2026-04-03  8:51                             ` Michal Pecio
  2026-04-04  1:15                             ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
  0 siblings, 2 replies; 33+ messages in thread
From: stern @ 2026-04-03  2:42 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Fri, Apr 03, 2026 at 01:59:58AM +0000, Thinh Nguyen wrote:
> On Wed, Apr 01, 2026, stern@rowland.harvard.edu wrote:
> > On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote:
> > > On 3/31/26 18:31, stern@rowland.harvard.edu wrote:
> > > > 
> > > > How about this instead?  We add a "halted" flag to the usb_host_endpoint
> > > > structure, and the core will set this flag whenever a bulk or interrupt
> > > > URB gets a status other than 0 (before putting the URB on the bh list).
> > > > If an URB has one of these statuses, when its completion handler returns
> > > > the core will unlink all the URBs queued to the same endpoint.  Finally,
> > > > the "halted" flag should be cleared after a completion handler returns
> > > > if there are no more unlinked URBs still in the queue or URBs waiting on
> > > > the bh list to be given back.
> > > > 
> > > > The result of this is that any URB remaining in the queue when the flag
> > > > is cleared must have been submitted by the class driver _after_ the
> > > > failing URB's completion handler has run.  We can assume the class
> > > > driver knows what it's doing in this case.
> > > > 
> > > > The endpoint queue shouldn't be restarted until the "halted" flag is
> > > > cleared.  Either right away, if there are any URBs in the queue, or not
> > > > until the next URB is submitted.  Doing this might require a new HCD
> > > > callback.  (It would also mean the kerneldoc for usb_unlink_urb() would
> > > > need to be updated, because the endpoint might restart before all the
> > > > completion handlers for the unlinked URBs have run.)
> > > > 
> > > > What I'm trying to do here is come up with a single, consistent proposal
> > > > for exactly when halted endpoint queues should restart.  Maybe someone
> > > > else has a better suggestion.

> How about this:
> 
> Introduce a halted flag the following conditions:
> 
> * Introduce the halted flag in usb_host_endpoint
> * The halted flag must be implemented as a bit in a unsigned long so
>   we can use atomic bit operation

I don't see why it needs to use atomic operations.  Unless you're 
thinking that we might want to add other bitflags into the same unsigned 
long?

> * Only the HCD may set the halted flag, and only upon checking the
>   first URB completing with a halted status

Every status other than 0 should mean that the endpoint's queue is 
halted.  But not all statuses require a clear-halt or reset-endpoint to 
recover.  For instance, a short transfer when the URB_SHORT_NOT_OK flag 
is set.

Why do you want the HCD to set the flag instead of 
usb_hcd_giveback_urb()?  Just one central place is simpler than in every 
HCD.

> * Only the USB core may clear the halted flag, and only after
>   usb_reset_endpoint returns (this makes sure the HCD drained and reset
>   the endpoint before the flag is cleared and new URBs are accepted)
> * The usb_reset_endpoint must be called after clear-halt, SetInterface,
>   and SetConfiguration.

That's easy to do just by adding it into the appropriate core routines.

> * The USB core will not attempt to unlink pending URBs due to halted
>   condition
> * The HCD is responsible for completing or canceling queued URBs
>   when the halted flag is set. Cancelled and newly submitted URBs will
>   be returned with -EPIPE as long as the halted flag is set

Why make the HCD responsible for draining the queue?  It's like setting 
the halted flag; one central place is simpler and less error-prone than 
lots of separate places.

For newly submitted URBs, should the submission fail with -EPIPE, or 
should the submission succeed and then the URB complete with -EPIPE?  
The first is simpler, but drivers probably aren't prepared for it -- it 
would mean adding error handling to the submission code as well as to 
the completion handler code.

(Actually, that wouldn't matter unless the driver was queuing up 
multiple URBs, in which case it should be prepared to handle submission 
errors in the middle.)

> * The class driver is responsible to check the halted flag to
>   determine whether to initiate error recovery via usb_clear_halt
> 
> I'm trying to keep a clear separation of responsibilities between HCD
> and the USB core. Also, I try to keep the halted flag more closely match
> the state of the endpoint. 
> 
> Let me know what you think?

Making the flag match the endpoint state is a good idea.  But there is 
some ambiguity: Do you mean the state in the device, or the state in the 
HC hardware, or the state in the HCD?  After all, these aren't always 
the same.  For instance, unlinking an URB won't affect the device's 
state but it will affect the state on the host side.

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-03  2:42                           ` stern
@ 2026-04-03  8:51                             ` Michal Pecio
  2026-04-03 14:55                               ` stern
  2026-04-04  1:15                             ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
  1 sibling, 1 reply; 33+ messages in thread
From: Michal Pecio @ 2026-04-03  8:51 UTC (permalink / raw)
  To: stern@rowland.harvard.edu
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Thu, 2 Apr 2026 22:42:39 -0400, stern@rowland.harvard.edu wrote:
> Every status other than 0 should mean that the endpoint's queue is 
> halted.  But not all statuses require a clear-halt or reset-endpoint
> to recover.  For instance, a short transfer when the URB_SHORT_NOT_OK
> flag is set.

This doesn't work in xhci-hcd and I'm not sure if it can. I don't recall
any way to make endpoints halt on short transfer at all.

The driver does exactly two things with this flag:
1. Isochronus frames get EREMOTEIO status when it's detected, although
   ISTR that URB_SHORT_NOT_OK isn't defined for isochronous URBs.
2. All others log a dbg() message. Supposedly, core is responsible for
   updating urb->status then.

> > * The USB core will not attempt to unlink pending URBs due to halted
> >   condition
> > * The HCD is responsible for completing or canceling queued URBs
> >   when the halted flag is set. Cancelled and newly submitted URBs
> > will be returned with -EPIPE as long as the halted flag is set  
> 
> Why make the HCD responsible for draining the queue?  It's like
> setting the halted flag; one central place is simpler and less
> error-prone than lots of separate places.

Is emptying the queue really a good idea at all? It obviously breaks
lazy drivers which just ignore errors and continue with the URBs they
have already submitted. Removing the URBs only adds more data loss.

> For newly submitted URBs, should the submission fail with -EPIPE, or 
> should the submission succeed and then the URB complete with -EPIPE?  
> The first is simpler, but drivers probably aren't prepared for it

Are they truly expecting the alternative? I bet most of them would
usb_clear_halt() for each received EPIPE, or not at all.

If completion unlinks remaining URBs before returning, this question
doesn't exist (if we fix restarting before completion). If it doesn't
unlink them, who even knows what the driver wants?

For example, some drivers resubmit the URB while handling EPROTO and
don't unlink anything. To me, it says "screw data loss and continue".

It admittedly doesn't work on SuperSpeed anymore, but not all drivers
need to worry about SuperSpeed. Including some old and lazy ones.

Regards,
Michal

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-03  8:51                             ` Michal Pecio
@ 2026-04-03 14:55                               ` stern
  2026-04-03 19:13                                 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio
  0 siblings, 1 reply; 33+ messages in thread
From: stern @ 2026-04-03 14:55 UTC (permalink / raw)
  To: Michal Pecio
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Fri, Apr 03, 2026 at 10:51:45AM +0200, Michal Pecio wrote:
> On Thu, 2 Apr 2026 22:42:39 -0400, stern@rowland.harvard.edu wrote:
> > Every status other than 0 should mean that the endpoint's queue is 
> > halted.  But not all statuses require a clear-halt or reset-endpoint
> > to recover.  For instance, a short transfer when the URB_SHORT_NOT_OK
> > flag is set.
> 
> This doesn't work in xhci-hcd and I'm not sure if it can. I don't recall
> any way to make endpoints halt on short transfer at all.
> 
> The driver does exactly two things with this flag:
> 1. Isochronus frames get EREMOTEIO status when it's detected, although
>    ISTR that URB_SHORT_NOT_OK isn't defined for isochronous URBs.
> 2. All others log a dbg() message. Supposedly, core is responsible for
>    updating urb->status then.

Truly?  What happens if usb-storage is doing a large bulk-IN transfer 
comprising multiple data URBs (with URB_SHORT_NOT_OK set in all but the 
last one), and the device indicates an early end-of-data by sending a 
short packet in the middle?  Will xhci-hcd store the information in the 
following bulk-IN status transaction to the transfer buffer for the next 
URB in the queue of data URBs?

(That's a misleading question.  The BOT class specification requires the 
device to stall the bulk-IN endpoint following a short data transfer, so 
this scenario wouldn't arise -- although not all devices follow the spec 
in this regard.  And it might arise in other protocols.  Regardless, 
it's hard for me to believe there's no way to tell an xHCI controller to 
stop the endpoint on a short transfer.  EHCI, OHCI, and UHCI all have 
ways to do it.)

> > > * The USB core will not attempt to unlink pending URBs due to halted
> > >   condition
> > > * The HCD is responsible for completing or canceling queued URBs
> > >   when the halted flag is set. Cancelled and newly submitted URBs
> > > will be returned with -EPIPE as long as the halted flag is set  
> > 
> > Why make the HCD responsible for draining the queue?  It's like
> > setting the halted flag; one central place is simpler and less
> > error-prone than lots of separate places.
> 
> Is emptying the queue really a good idea at all? It obviously breaks
> lazy drivers which just ignore errors and continue with the URBs they
> have already submitted. Removing the URBs only adds more data loss.

We have to decide on a single approach that can work for everybody.  If 
it means that some class drivers will have to be changed, so be it.

A lazy driver that just ignores errors will currently not work very well 
if at all following an error on an xHCI host controller.  But yes, it 
will work with a USB-2 controller.

If we want to minimize the changes needed in class drivers then we will 
have to make the core and HCDs recover automatically from transaction 
errors.  Non-lazy class drivers still have a chance to do their own 
recovery when the completion handler for the bad URB runs.  To make 
their lives easier, we could add a usb_purge_endpoint_queue() routine to 
the core; this seems a lot safer than summarily unlinking all the URBs 
all the time.

Is that what we should do?

> > For newly submitted URBs, should the submission fail with -EPIPE, or 
> > should the submission succeed and then the URB complete with -EPIPE?  
> > The first is simpler, but drivers probably aren't prepared for it
> 
> Are they truly expecting the alternative? I bet most of them would
> usb_clear_halt() for each received EPIPE, or not at all.

They might do usb_clear_halt() if the completed URB had -EPIPE status.  
But they won't do it when usb_submit_urb() returns -EPIPE (or any other 
error, for that matter).

> If completion unlinks remaining URBs before returning, this question
> doesn't exist (if we fix restarting before completion). If it doesn't
> unlink them, who even knows what the driver wants?

Good point.

> For example, some drivers resubmit the URB while handling EPROTO and
> don't unlink anything. To me, it says "screw data loss and continue".
> 
> It admittedly doesn't work on SuperSpeed anymore, but not all drivers
> need to worry about SuperSpeed. Including some old and lazy ones.

Think of it this way: If any URBs remain in the queue when the 
completion handler returns, we can assume the class driver doesn't want 
them to be unlinked.  When that happens, the best thing the core/HCD can 
do is issue a clear-halt and then restart the endpoint, so the queued 
URBs will run.  Except for the -EPIPE case, where the device halted its 
side of the endpoint and sent a STALL packet -- it's reasonable to 
require class drivers to handle that case themselves.

On the other hand, if the queue is empty when the completion handler 
returns (or if all the URBs still in the queue have already been 
unlinked but not yet given back), is there any harm in the core issuing 
a clear-halt and restarting the endpoint (with the same exception as 
above)?  This doesn't allow the class driver to do any further error 
recovery of its own, but when you get down to it, what else is there for 
it to do?

Also, this way we don't have to reject any URBs submitted while endpoint 
is stopped.  They simply get added to the end of the queue, so they will 
run when the endpoint is restarted.

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xhci-hcd and URB_SHORT_NOT_OK
  2026-04-03 14:55                               ` stern
@ 2026-04-03 19:13                                 ` Michal Pecio
  2026-04-03 20:17                                   ` stern
  0 siblings, 1 reply; 33+ messages in thread
From: Michal Pecio @ 2026-04-03 19:13 UTC (permalink / raw)
  To: stern@rowland.harvard.edu
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Fri, 3 Apr 2026 10:55:58 -0400, stern@rowland.harvard.edu wrote:
> > This doesn't work in xhci-hcd and I'm not sure if it can. I don't
> > recall any way to make endpoints halt on short transfer at all.
> > 
> > The driver does exactly two things with this flag:
> > 1. Isochronus frames get EREMOTEIO status when it's detected,
> >    although ISTR that URB_SHORT_NOT_OK isn't defined for isochronous
> > 2. All others log a dbg() message. Supposedly, core is
> >    responsible for updating urb->status then.  
> 
> Truly?  What happens if usb-storage is doing a large bulk-IN transfer
> comprising multiple data URBs (with URB_SHORT_NOT_OK set in all but
> the last one), and the device indicates an early end-of-data by
> sending a short packet in the middle?  Will xhci-hcd store the
> information in the following bulk-IN status transaction to the
> transfer buffer for the next URB in the queue of data URBs?

Not sure if I understand the question, but if you mean that the driver
submits multiple "data" URBs followed by a "status" URB to the same
pipe and expects it to stop when any data URB completes short, so that
all others can be unlinked before the status transfer, then nope, not
gonna happen. It will be racing with status being written to the data
URBs and likely lose. The status URB may never complete.

I really don't recall any way of stopping on short transfers and the
relevant xHCI 4.10.1.1 section isn't helpful. It's possible to stop
unconditionally and decide to restart or not, at some performance cost.

But short transfer always terminates the current URB, so natural way is
to coalesce all data into a big SG URB. Perhaps xhci-hcd could do it
transparently, though that would get awkward if a driver doesn't unlink
the remaining SHORT_NOT_OK URBs - we would need to "rewind".

On the upside, I don't see many users of SHORT_NOT_OK URB. This string
appears in Documentation, include, drivers/usb/host, files ending wih
hcd.c, devio/usbip, and usb_sg_init(). The latter is probably what you
were referring to, and it seems to have a shortcut for SG-capable HCs,
which hopefully doesn't require SHORT_NOT_OK to be functional?

Regards,
Michal

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: xhci-hcd and URB_SHORT_NOT_OK
  2026-04-03 19:13                                 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio
@ 2026-04-03 20:17                                   ` stern
  0 siblings, 0 replies; 33+ messages in thread
From: stern @ 2026-04-03 20:17 UTC (permalink / raw)
  To: Michal Pecio
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Fri, Apr 03, 2026 at 09:13:37PM +0200, Michal Pecio wrote:
> On Fri, 3 Apr 2026 10:55:58 -0400, stern@rowland.harvard.edu wrote:
> > > This doesn't work in xhci-hcd and I'm not sure if it can. I don't
> > > recall any way to make endpoints halt on short transfer at all.
> > > 
> > > The driver does exactly two things with this flag:
> > > 1. Isochronus frames get EREMOTEIO status when it's detected,
> > >    although ISTR that URB_SHORT_NOT_OK isn't defined for isochronous
> > > 2. All others log a dbg() message. Supposedly, core is
> > >    responsible for updating urb->status then.  
> > 
> > Truly?  What happens if usb-storage is doing a large bulk-IN transfer
> > comprising multiple data URBs (with URB_SHORT_NOT_OK set in all but
> > the last one), and the device indicates an early end-of-data by
> > sending a short packet in the middle?  Will xhci-hcd store the
> > information in the following bulk-IN status transaction to the
> > transfer buffer for the next URB in the queue of data URBs?
> 
> Not sure if I understand the question, but if you mean that the driver
> submits multiple "data" URBs followed by a "status" URB to the same
> pipe and expects it to stop when any data URB completes short, so that
> all others can be unlinked before the status transfer, then nope, not
> gonna happen. It will be racing with status being written to the data
> URBs and likely lose. The status URB may never complete.
> 
> I really don't recall any way of stopping on short transfers and the
> relevant xHCI 4.10.1.1 section isn't helpful. It's possible to stop
> unconditionally and decide to restart or not, at some performance cost.
> 
> But short transfer always terminates the current URB, so natural way is
> to coalesce all data into a big SG URB. Perhaps xhci-hcd could do it
> transparently, though that would get awkward if a driver doesn't unlink
> the remaining SHORT_NOT_OK URBs - we would need to "rewind".
> 
> On the upside, I don't see many users of SHORT_NOT_OK URB. This string
> appears in Documentation, include, drivers/usb/host, files ending wih
> hcd.c, devio/usbip, and usb_sg_init(). The latter is probably what you
> were referring to, and it seems to have a shortcut for SG-capable HCs,
> which hopefully doesn't require SHORT_NOT_OK to be functional?

usb_sg_init() _is_ what I was thinking of, and I had forgotten about the 
single-URB shortcut for SG-capable HCs.

Clearly we're also going to have to update the kerneldoc, not just the 
code.  If an HCD doesn't support SG then it needs to support stopping an 
endpoint on short transfers.

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-03  2:42                           ` stern
  2026-04-03  8:51                             ` Michal Pecio
@ 2026-04-04  1:15                             ` Thinh Nguyen
  2026-04-04  1:54                               ` stern
  1 sibling, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-04  1:15 UTC (permalink / raw)
  To: stern@rowland.harvard.edu
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On Thu, Apr 02, 2026, stern@rowland.harvard.edu wrote:
> On Fri, Apr 03, 2026 at 01:59:58AM +0000, Thinh Nguyen wrote:
> > On Wed, Apr 01, 2026, stern@rowland.harvard.edu wrote:
> > > On Thu, Apr 02, 2026 at 01:08:31AM +0300, Mathias Nyman wrote:
> > > > On 3/31/26 18:31, stern@rowland.harvard.edu wrote:
> > > > > 
> > > > > How about this instead?  We add a "halted" flag to the usb_host_endpoint
> > > > > structure, and the core will set this flag whenever a bulk or interrupt
> > > > > URB gets a status other than 0 (before putting the URB on the bh list).
> > > > > If an URB has one of these statuses, when its completion handler returns
> > > > > the core will unlink all the URBs queued to the same endpoint.  Finally,
> > > > > the "halted" flag should be cleared after a completion handler returns
> > > > > if there are no more unlinked URBs still in the queue or URBs waiting on
> > > > > the bh list to be given back.
> > > > > 
> > > > > The result of this is that any URB remaining in the queue when the flag
> > > > > is cleared must have been submitted by the class driver _after_ the
> > > > > failing URB's completion handler has run.  We can assume the class
> > > > > driver knows what it's doing in this case.
> > > > > 
> > > > > The endpoint queue shouldn't be restarted until the "halted" flag is
> > > > > cleared.  Either right away, if there are any URBs in the queue, or not
> > > > > until the next URB is submitted.  Doing this might require a new HCD
> > > > > callback.  (It would also mean the kerneldoc for usb_unlink_urb() would
> > > > > need to be updated, because the endpoint might restart before all the
> > > > > completion handlers for the unlinked URBs have run.)
> > > > > 
> > > > > What I'm trying to do here is come up with a single, consistent proposal
> > > > > for exactly when halted endpoint queues should restart.  Maybe someone
> > > > > else has a better suggestion.
> 
> > How about this:
> > 
> > Introduce a halted flag the following conditions:
> > 
> > * Introduce the halted flag in usb_host_endpoint
> > * The halted flag must be implemented as a bit in a unsigned long so
> >   we can use atomic bit operation
> 
> I don't see why it needs to use atomic operations.  Unless you're 
> thinking that we might want to add other bitflags into the same unsigned 
> long?

Now that I think about it again, it's not needed if we have the
requirement for clearing the flag only after usb_reset_endpoint.

> 
> > * Only the HCD may set the halted flag, and only upon checking the
> >   first URB completing with a halted status
> 
> Every status other than 0 should mean that the endpoint's queue is 
> halted.  But not all statuses require a clear-halt or reset-endpoint to 
> recover.  For instance, a short transfer when the URB_SHORT_NOT_OK flag 
> is set.

It seems we're using the "halted" flag for different things. The halted
flag to me is the endpoint's state and rather than the endpoint queue
state. That is, if the endpoint is halted, there's an error that was
reported on the pipe. So, an -ECONNRESET would not cause a halted
endpoint.

> 
> Why do you want the HCD to set the flag instead of 
> usb_hcd_giveback_urb()?  Just one central place is simpler than in every 
> HCD.

I'm just thinking in term of whose role to return the pending URBs.
Typically the HCD handles when to give back URBs. If the HCD were to
handle giving back pending URBs due to halted endpoint, then it should
be the one to set the halted flag. However, if the core tracks and does
both setting and clearing of the halted flag, then the core should
handle canceling the URBs. As you said, if we can shift the burden to
the core, that would help keep the flow consistent and centralized.

> 
> > * Only the USB core may clear the halted flag, and only after
> >   usb_reset_endpoint returns (this makes sure the HCD drained and reset
> >   the endpoint before the flag is cleared and new URBs are accepted)
> > * The usb_reset_endpoint must be called after clear-halt, SetInterface,
> >   and SetConfiguration.
> 
> That's easy to do just by adding it into the appropriate core routines.

Yes.

> 
> > * The USB core will not attempt to unlink pending URBs due to halted
> >   condition
> > * The HCD is responsible for completing or canceling queued URBs
> >   when the halted flag is set. Cancelled and newly submitted URBs will
> >   be returned with -EPIPE as long as the halted flag is set
> 
> Why make the HCD responsible for draining the queue?  It's like setting 
> the halted flag; one central place is simpler and less error-prone than 
> lots of separate places.
> 
> For newly submitted URBs, should the submission fail with -EPIPE, or 
> should the submission succeed and then the URB complete with -EPIPE?  
> The first is simpler, but drivers probably aren't prepared for it -- it 
> would mean adding error handling to the submission code as well as to 
> the completion handler code.
> 
> (Actually, that wouldn't matter unless the driver was queuing up 
> multiple URBs, in which case it should be prepared to handle submission 
> errors in the middle.)

I didn't think about failing immediately on submission because we would
need to audit every class driver for that. Perhaps the core can
intercept the URBs and immediately schedule a completion handler with
the updated URB's status?

> 
> > * The class driver is responsible to check the halted flag to
> >   determine whether to initiate error recovery via usb_clear_halt
> > 
> > I'm trying to keep a clear separation of responsibilities between HCD
> > and the USB core. Also, I try to keep the halted flag more closely match
> > the state of the endpoint. 
> > 
> > Let me know what you think?
> 
> Making the flag match the endpoint state is a good idea.  But there is 
> some ambiguity: Do you mean the state in the device, or the state in the 
> HC hardware, or the state in the HCD?  After all, these aren't always 
> the same.  For instance, unlinking an URB won't affect the device's 
> state but it will affect the state on the host side.
> 

The HCD state. If we can let the core manage the halted state, then here
are the points where the halted state is updated (hopefully I didn't
miss any other ones):

Endpoint is halted (reported by HCD):
* URB returned with -EPROTO or -EPIPE

Endpoint is active (updated by the core):
* SetConfiguration, SetInterface, clear-halt

Thanks,
Thinh

ps. I'll be out of town next week. Apologies if my response will be
delayed...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-04  1:15                             ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
@ 2026-04-04  1:54                               ` stern
  2026-04-04 20:41                                 ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: stern @ 2026-04-04  1:54 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Sat, Apr 04, 2026 at 01:15:52AM +0000, Thinh Nguyen wrote:
> > > How about this:
> > > 
> > > Introduce a halted flag the following conditions:
> > > 
> > > * Introduce the halted flag in usb_host_endpoint
> > > * The halted flag must be implemented as a bit in a unsigned long so
> > >   we can use atomic bit operation
> > 
> > I don't see why it needs to use atomic operations.  Unless you're 
> > thinking that we might want to add other bitflags into the same unsigned 
> > long?
> 
> Now that I think about it again, it's not needed if we have the
> requirement for clearing the flag only after usb_reset_endpoint.
> 
> > 
> > > * Only the HCD may set the halted flag, and only upon checking the
> > >   first URB completing with a halted status
> > 
> > Every status other than 0 should mean that the endpoint's queue is 
> > halted.  But not all statuses require a clear-halt or reset-endpoint to 
> > recover.  For instance, a short transfer when the URB_SHORT_NOT_OK flag 
> > is set.
> 
> It seems we're using the "halted" flag for different things. The halted
> flag to me is the endpoint's state and rather than the endpoint queue
> state. That is, if the endpoint is halted, there's an error that was
> reported on the pipe. So, an -ECONNRESET would not cause a halted
> endpoint.
> 
> > 
> > Why do you want the HCD to set the flag instead of 
> > usb_hcd_giveback_urb()?  Just one central place is simpler than in every 
> > HCD.
> 
> I'm just thinking in term of whose role to return the pending URBs.
> Typically the HCD handles when to give back URBs. If the HCD were to
> handle giving back pending URBs due to halted endpoint, then it should
> be the one to set the halted flag. However, if the core tracks and does
> both setting and clearing of the halted flag, then the core should
> handle canceling the URBs. As you said, if we can shift the burden to
> the core, that would help keep the flow consistent and centralized.
> 
> > 
> > > * Only the USB core may clear the halted flag, and only after
> > >   usb_reset_endpoint returns (this makes sure the HCD drained and reset
> > >   the endpoint before the flag is cleared and new URBs are accepted)
> > > * The usb_reset_endpoint must be called after clear-halt, SetInterface,
> > >   and SetConfiguration.
> > 
> > That's easy to do just by adding it into the appropriate core routines.
> 
> Yes.
> 
> > 
> > > * The USB core will not attempt to unlink pending URBs due to halted
> > >   condition
> > > * The HCD is responsible for completing or canceling queued URBs
> > >   when the halted flag is set. Cancelled and newly submitted URBs will
> > >   be returned with -EPIPE as long as the halted flag is set
> > 
> > Why make the HCD responsible for draining the queue?  It's like setting 
> > the halted flag; one central place is simpler and less error-prone than 
> > lots of separate places.
> > 
> > For newly submitted URBs, should the submission fail with -EPIPE, or 
> > should the submission succeed and then the URB complete with -EPIPE?  
> > The first is simpler, but drivers probably aren't prepared for it -- it 
> > would mean adding error handling to the submission code as well as to 
> > the completion handler code.
> > 
> > (Actually, that wouldn't matter unless the driver was queuing up 
> > multiple URBs, in which case it should be prepared to handle submission 
> > errors in the middle.)
> 
> I didn't think about failing immediately on submission because we would
> need to audit every class driver for that. Perhaps the core can
> intercept the URBs and immediately schedule a completion handler with
> the updated URB's status?
> 
> > 
> > > * The class driver is responsible to check the halted flag to
> > >   determine whether to initiate error recovery via usb_clear_halt
> > > 
> > > I'm trying to keep a clear separation of responsibilities between HCD
> > > and the USB core. Also, I try to keep the halted flag more closely match
> > > the state of the endpoint. 
> > > 
> > > Let me know what you think?
> > 
> > Making the flag match the endpoint state is a good idea.  But there is 
> > some ambiguity: Do you mean the state in the device, or the state in the 
> > HC hardware, or the state in the HCD?  After all, these aren't always 
> > the same.  For instance, unlinking an URB won't affect the device's 
> > state but it will affect the state on the host side.
> > 
> 
> The HCD state. If we can let the core manage the halted state, then here
> are the points where the halted state is updated (hopefully I didn't
> miss any other ones):
> 
> Endpoint is halted (reported by HCD):
> * URB returned with -EPROTO or -EPIPE
> 
> Endpoint is active (updated by the core):
> * SetConfiguration, SetInterface, clear-halt

Hmmm.  What did you think of my recent proposal in a message to Michal?  
Summarizing:

If the class driver wants to unlink queued URBs when a transaction error 
occurs, it has to do so itself in the failed URB's completion handler.  
We can make this easier by adding a usb_flush_endpoint_queue() routine 
to the core.  In the meantime, the HCD ensures that the queue is stopped 
before giving back the URB.  (Note: -EPIPE, -ENOENT, -ECONNRESET, and 
-EREMOTEIO aren't considered to be transaction errors.)

When the completion handler returns, the core will automatically call 
usb_clear_halt(), which will also reset the endpoint on the host side 
and will restart the queue.  This also happens after SetConfiguration 
and SetInterface, either explicitly or implicitly.

For -EPIPE (device sent a STALL token), the class driver has to clear 
the halt itself.  This is because stalls aren't errors; they are an 
intentional part of the USB protocol.

-ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet 
received with URB_SHORT_NOT_OK set) are a little trickier.  The HCD may 
or may not need to stop the queue for an unlink, possibly depending on 
whether the URB being unlinked is the active one.  When a short packet 
is received, the HC hardware may or may not stop its queue.  Either way, 
the class driver shouldn't need to take any special recovery action; any 
necessary actions should be taken automatically by the HCD and the core.

All of this applies only to bulk and interrupt endpoints.  Control 
endpoints generally need error recovery only on the host side, because 
the device resets automatically when it gets a new SETUP packet, and so 
the HCD should handle whatever is needed.  Isochronous endpoints don't 
ever halt and they shouldn't need to be reset when an error occurs.

Overall, this seems simpler than anything else we have discussed.

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-04  1:54                               ` stern
@ 2026-04-04 20:41                                 ` Thinh Nguyen
  2026-04-04 21:54                                   ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-04 20:41 UTC (permalink / raw)
  To: stern@rowland.harvard.edu
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote:
> On Sat, Apr 04, 2026 at 01:15:52AM +0000, Thinh Nguyen wrote:
> > > > How about this:
> > > > 
> > > > Introduce a halted flag the following conditions:
> > > > 
> > > > * Introduce the halted flag in usb_host_endpoint
> > > > * The halted flag must be implemented as a bit in a unsigned long so
> > > >   we can use atomic bit operation
> > > 
> > > I don't see why it needs to use atomic operations.  Unless you're 
> > > thinking that we might want to add other bitflags into the same unsigned 
> > > long?
> > 
> > Now that I think about it again, it's not needed if we have the
> > requirement for clearing the flag only after usb_reset_endpoint.
> > 
> > > 
> > > > * Only the HCD may set the halted flag, and only upon checking the
> > > >   first URB completing with a halted status
> > > 
> > > Every status other than 0 should mean that the endpoint's queue is 
> > > halted.  But not all statuses require a clear-halt or reset-endpoint to 
> > > recover.  For instance, a short transfer when the URB_SHORT_NOT_OK flag 
> > > is set.
> > 
> > It seems we're using the "halted" flag for different things. The halted
> > flag to me is the endpoint's state and rather than the endpoint queue
> > state. That is, if the endpoint is halted, there's an error that was
> > reported on the pipe. So, an -ECONNRESET would not cause a halted
> > endpoint.
> > 
> > > 
> > > Why do you want the HCD to set the flag instead of 
> > > usb_hcd_giveback_urb()?  Just one central place is simpler than in every 
> > > HCD.
> > 
> > I'm just thinking in term of whose role to return the pending URBs.
> > Typically the HCD handles when to give back URBs. If the HCD were to
> > handle giving back pending URBs due to halted endpoint, then it should
> > be the one to set the halted flag. However, if the core tracks and does
> > both setting and clearing of the halted flag, then the core should
> > handle canceling the URBs. As you said, if we can shift the burden to
> > the core, that would help keep the flow consistent and centralized.
> > 
> > > 
> > > > * Only the USB core may clear the halted flag, and only after
> > > >   usb_reset_endpoint returns (this makes sure the HCD drained and reset
> > > >   the endpoint before the flag is cleared and new URBs are accepted)
> > > > * The usb_reset_endpoint must be called after clear-halt, SetInterface,
> > > >   and SetConfiguration.
> > > 
> > > That's easy to do just by adding it into the appropriate core routines.
> > 
> > Yes.
> > 
> > > 
> > > > * The USB core will not attempt to unlink pending URBs due to halted
> > > >   condition
> > > > * The HCD is responsible for completing or canceling queued URBs
> > > >   when the halted flag is set. Cancelled and newly submitted URBs will
> > > >   be returned with -EPIPE as long as the halted flag is set
> > > 
> > > Why make the HCD responsible for draining the queue?  It's like setting 
> > > the halted flag; one central place is simpler and less error-prone than 
> > > lots of separate places.
> > > 
> > > For newly submitted URBs, should the submission fail with -EPIPE, or 
> > > should the submission succeed and then the URB complete with -EPIPE?  
> > > The first is simpler, but drivers probably aren't prepared for it -- it 
> > > would mean adding error handling to the submission code as well as to 
> > > the completion handler code.
> > > 
> > > (Actually, that wouldn't matter unless the driver was queuing up 
> > > multiple URBs, in which case it should be prepared to handle submission 
> > > errors in the middle.)
> > 
> > I didn't think about failing immediately on submission because we would
> > need to audit every class driver for that. Perhaps the core can
> > intercept the URBs and immediately schedule a completion handler with
> > the updated URB's status?
> > 
> > > 
> > > > * The class driver is responsible to check the halted flag to
> > > >   determine whether to initiate error recovery via usb_clear_halt
> > > > 
> > > > I'm trying to keep a clear separation of responsibilities between HCD
> > > > and the USB core. Also, I try to keep the halted flag more closely match
> > > > the state of the endpoint. 
> > > > 
> > > > Let me know what you think?
> > > 
> > > Making the flag match the endpoint state is a good idea.  But there is 
> > > some ambiguity: Do you mean the state in the device, or the state in the 
> > > HC hardware, or the state in the HCD?  After all, these aren't always 
> > > the same.  For instance, unlinking an URB won't affect the device's 
> > > state but it will affect the state on the host side.
> > > 
> > 
> > The HCD state. If we can let the core manage the halted state, then here
> > are the points where the halted state is updated (hopefully I didn't
> > miss any other ones):
> > 
> > Endpoint is halted (reported by HCD):
> > * URB returned with -EPROTO or -EPIPE
> > 
> > Endpoint is active (updated by the core):
> > * SetConfiguration, SetInterface, clear-halt
> 
> Hmmm.  What did you think of my recent proposal in a message to Michal?  
> Summarizing:
> 
> If the class driver wants to unlink queued URBs when a transaction error 
> occurs, it has to do so itself in the failed URB's completion handler.  
> We can make this easier by adding a usb_flush_endpoint_queue() routine 
> to the core.  In the meantime, the HCD ensures that the queue is stopped 
> before giving back the URB.  (Note: -EPIPE, -ENOENT, -ECONNRESET, and 
> -EREMOTEIO aren't considered to be transaction errors.)
> 
> When the completion handler returns, the core will automatically call 
> usb_clear_halt(), which will also reset the endpoint on the host side 
> and will restart the queue.  This also happens after SetConfiguration 
> and SetInterface, either explicitly or implicitly.

I like that the core will handle this automatically. But one concern:
How will the class driver know when the clear-halt complete so it can
perform the recovery? (ie. it shouldn't perform recovery immediately
after seeing -EPROTO)

> 
> For -EPIPE (device sent a STALL token), the class driver has to clear 
> the halt itself.  This is because stalls aren't errors; they are an 
> intentional part of the USB protocol.
> 
> -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet 
> received with URB_SHORT_NOT_OK set) are a little trickier.  The HCD may 
> or may not need to stop the queue for an unlink, possibly depending on 
> whether the URB being unlinked is the active one.  When a short packet 
> is received, the HC hardware may or may not stop its queue.  Either way, 
> the class driver shouldn't need to take any special recovery action; any 
> necessary actions should be taken automatically by the HCD and the core.
> 
> All of this applies only to bulk and interrupt endpoints.  Control 
> endpoints generally need error recovery only on the host side, because 
> the device resets automatically when it gets a new SETUP packet, and so 
> the HCD should handle whatever is needed.  Isochronous endpoints don't 
> ever halt and they shouldn't need to be reset when an error occurs.
> 
> Overall, this seems simpler than anything else we have discussed.
> 

The rest sounds good to me!

Thanks Alan,
Thinh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-04 20:41                                 ` Thinh Nguyen
@ 2026-04-04 21:54                                   ` Alan Stern
  2026-04-04 22:15                                     ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2026-04-04 21:54 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Sat, Apr 04, 2026 at 08:41:36PM +0000, Thinh Nguyen wrote:
> On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote:
> > Summarizing:
> > 
> > If the class driver wants to unlink queued URBs when a transaction error 
> > occurs, it has to do so itself in the failed URB's completion handler.  
> > We can make this easier by adding a usb_flush_endpoint_queue() routine 
> > to the core.  In the meantime, the HCD ensures that the queue is stopped 
> > before giving back the URB.  (Note: -EPIPE, -ENOENT, -ECONNRESET, and 
> > -EREMOTEIO aren't considered to be transaction errors.)
> > 
> > When the completion handler returns, the core will automatically call 
> > usb_clear_halt(), which will also reset the endpoint on the host side 
> > and will restart the queue.  This also happens after SetConfiguration 
> > and SetInterface, either explicitly or implicitly.
> 
> I like that the core will handle this automatically. But one concern:
> How will the class driver know when the clear-halt complete so it can
> perform the recovery? (ie. it shouldn't perform recovery immediately
> after seeing -EPROTO)

It doesn't know, and it doesn't need to know.  Any recovery URBs the 
class driver wants to send can be submitted as usual, and they will be 
added onto the queue.  When the core resets the endpoint, the queue will 
start going again and the URBs will run.

If the class driver wants to take some other action (like submitting 
URBs to a different endpoint) before using the endpoint that stopped, 
it's free to do so.  It only has to make sure that it doesn't submit any 
URBs to the stopped endpoint until after the other action is finished -- 
which is what it would do anyway.  (And maybe it has to unlink any URBs 
that are already queued, which can be done with a simple function call.)

> > For -EPIPE (device sent a STALL token), the class driver has to clear 
> > the halt itself.  This is because stalls aren't errors; they are an 
> > intentional part of the USB protocol.
> > 
> > -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet 
> > received with URB_SHORT_NOT_OK set) are a little trickier.  The HCD may 
> > or may not need to stop the queue for an unlink, possibly depending on 
> > whether the URB being unlinked is the active one.  When a short packet 
> > is received, the HC hardware may or may not stop its queue.  Either way, 
> > the class driver shouldn't need to take any special recovery action; any 
> > necessary actions should be taken automatically by the HCD and the core.
> > 
> > All of this applies only to bulk and interrupt endpoints.  Control 
> > endpoints generally need error recovery only on the host side, because 
> > the device resets automatically when it gets a new SETUP packet, and so 
> > the HCD should handle whatever is needed.  Isochronous endpoints don't 
> > ever halt and they shouldn't need to be reset when an error occurs.
> > 
> > Overall, this seems simpler than anything else we have discussed.
> > 
> 
> The rest sounds good to me!

Good!  Let's wait to hear from Michal, Mathias, and Oliver.

Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-04 21:54                                   ` Alan Stern
@ 2026-04-04 22:15                                     ` Thinh Nguyen
  2026-04-04 22:28                                       ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-04 22:15 UTC (permalink / raw)
  To: Alan Stern
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On Sat, Apr 04, 2026, Alan Stern wrote:
> On Sat, Apr 04, 2026 at 08:41:36PM +0000, Thinh Nguyen wrote:
> > On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote:
> > > Summarizing:
> > > 
> > > If the class driver wants to unlink queued URBs when a transaction error 
> > > occurs, it has to do so itself in the failed URB's completion handler.  
> > > We can make this easier by adding a usb_flush_endpoint_queue() routine 
> > > to the core.  In the meantime, the HCD ensures that the queue is stopped 
> > > before giving back the URB.  (Note: -EPIPE, -ENOENT, -ECONNRESET, and 
> > > -EREMOTEIO aren't considered to be transaction errors.)
> > > 
> > > When the completion handler returns, the core will automatically call 
> > > usb_clear_halt(), which will also reset the endpoint on the host side 
> > > and will restart the queue.  This also happens after SetConfiguration 
> > > and SetInterface, either explicitly or implicitly.
> > 
> > I like that the core will handle this automatically. But one concern:
> > How will the class driver know when the clear-halt complete so it can
> > perform the recovery? (ie. it shouldn't perform recovery immediately
> > after seeing -EPROTO)
> 
> It doesn't know, and it doesn't need to know.  Any recovery URBs the 
> class driver wants to send can be submitted as usual, and they will be 
> added onto the queue.  When the core resets the endpoint, the queue will 
> start going again and the URBs will run.
> 
> If the class driver wants to take some other action (like submitting 
> URBs to a different endpoint) before using the endpoint that stopped, 
> it's free to do so.  It only has to make sure that it doesn't submit any 
> URBs to the stopped endpoint until after the other action is finished -- 
> which is what it would do anyway.  (And maybe it has to unlink any URBs 
> that are already queued, which can be done with a simple function call.)
> 

Then the xhci must make sure that it should not ring the doorbell to
restart the endpoint when giving back the canceled URBs. It should only
do so on newly submitted URBs.

We can add a requirement such that if the class driver submitted the
recovery URBs prior to completing the usb_reset_endpoint (which should
be done after clear-halt), then the HCD may keep those URBs on a queue
and only process those URBs and restart the endpoint afterward.

BR,
Thinh

> > > For -EPIPE (device sent a STALL token), the class driver has to clear 
> > > the halt itself.  This is because stalls aren't errors; they are an 
> > > intentional part of the USB protocol.
> > > 
> > > -ENOENT and -ECONNRESET (URB was unlinked) and -EREMOTEIO (short packet 
> > > received with URB_SHORT_NOT_OK set) are a little trickier.  The HCD may 
> > > or may not need to stop the queue for an unlink, possibly depending on 
> > > whether the URB being unlinked is the active one.  When a short packet 
> > > is received, the HC hardware may or may not stop its queue.  Either way, 
> > > the class driver shouldn't need to take any special recovery action; any 
> > > necessary actions should be taken automatically by the HCD and the core.
> > > 
> > > All of this applies only to bulk and interrupt endpoints.  Control 
> > > endpoints generally need error recovery only on the host side, because 
> > > the device resets automatically when it gets a new SETUP packet, and so 
> > > the HCD should handle whatever is needed.  Isochronous endpoints don't 
> > > ever halt and they shouldn't need to be reset when an error occurs.
> > > 
> > > Overall, this seems simpler than anything else we have discussed.
> > > 
> > 
> > The rest sounds good to me!
> 
> Good!  Let's wait to hear from Michal, Mathias, and Oliver.
> 
> Alan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-04 22:15                                     ` Thinh Nguyen
@ 2026-04-04 22:28                                       ` Thinh Nguyen
  2026-04-05  1:30                                         ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-04 22:28 UTC (permalink / raw)
  To: Alan Stern
  Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Sat, Apr 04, 2026, Thinh Nguyen wrote:
> On Sat, Apr 04, 2026, Alan Stern wrote:
> > On Sat, Apr 04, 2026 at 08:41:36PM +0000, Thinh Nguyen wrote:
> > > On Fri, Apr 03, 2026, stern@rowland.harvard.edu wrote:
> > > > Summarizing:
> > > > 
> > > > If the class driver wants to unlink queued URBs when a transaction error 
> > > > occurs, it has to do so itself in the failed URB's completion handler.  
> > > > We can make this easier by adding a usb_flush_endpoint_queue() routine 
> > > > to the core.  In the meantime, the HCD ensures that the queue is stopped 
> > > > before giving back the URB.  (Note: -EPIPE, -ENOENT, -ECONNRESET, and 
> > > > -EREMOTEIO aren't considered to be transaction errors.)
> > > > 
> > > > When the completion handler returns, the core will automatically call 
> > > > usb_clear_halt(), which will also reset the endpoint on the host side 
> > > > and will restart the queue.  This also happens after SetConfiguration 
> > > > and SetInterface, either explicitly or implicitly.
> > > 
> > > I like that the core will handle this automatically. But one concern:
> > > How will the class driver know when the clear-halt complete so it can
> > > perform the recovery? (ie. it shouldn't perform recovery immediately
> > > after seeing -EPROTO)
> > 
> > It doesn't know, and it doesn't need to know.  Any recovery URBs the 
> > class driver wants to send can be submitted as usual, and they will be 
> > added onto the queue.  When the core resets the endpoint, the queue will 
> > start going again and the URBs will run.
> > 
> > If the class driver wants to take some other action (like submitting 
> > URBs to a different endpoint) before using the endpoint that stopped, 
> > it's free to do so.  It only has to make sure that it doesn't submit any 
> > URBs to the stopped endpoint until after the other action is finished -- 
> > which is what it would do anyway.  (And maybe it has to unlink any URBs 
> > that are already queued, which can be done with a simple function call.)
> > 
> 
> Then the xhci must make sure that it should not ring the doorbell to
> restart the endpoint when giving back the canceled URBs. It should only
> do so on newly submitted URBs.

Ignore this comment, it's not restarting the endpoint in the case of
unlinking.

> 
> We can add a requirement such that if the class driver submitted the
> recovery URBs prior to completing the usb_reset_endpoint (which should
> be done after clear-halt), then the HCD may keep those URBs on a queue
> and only process those URBs and restart the endpoint afterward.
> 

Actually, adding this new requirement would be tricky because we don't
know whether it's recovery URBs or not.

BR,
Thinh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-04 22:28                                       ` Thinh Nguyen
@ 2026-04-05  1:30                                         ` Alan Stern
  2026-04-05  3:10                                           ` Thinh Nguyen
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2026-04-05  1:30 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On Sat, Apr 04, 2026 at 10:28:24PM +0000, Thinh Nguyen wrote:
> On Sat, Apr 04, 2026, Thinh Nguyen wrote:
> > On Sat, Apr 04, 2026, Alan Stern wrote:
> > > If the class driver wants to take some other action (like submitting 
> > > URBs to a different endpoint) before using the endpoint that stopped, 
> > > it's free to do so.  It only has to make sure that it doesn't submit any 
> > > URBs to the stopped endpoint until after the other action is finished -- 
> > > which is what it would do anyway.  (And maybe it has to unlink any URBs 
> > > that are already queued, which can be done with a simple function call.)
> > > 
> > 
> > Then the xhci must make sure that it should not ring the doorbell to
> > restart the endpoint when giving back the canceled URBs. It should only
> > do so on newly submitted URBs.
> 
> Ignore this comment, it's not restarting the endpoint in the case of
> unlinking.

I was going to say that xhci-hcd shouldn't restart the endpoint until 
the usb_reset_endpoint() call is made.  Whether or not it rings the 
doorbell at that time may depend on whether there are any URBs on the 
queue; that's a relatively unimportant implementation detail.

> > We can add a requirement such that if the class driver submitted the
> > recovery URBs prior to completing the usb_reset_endpoint (which should
> > be done after clear-halt), then the HCD may keep those URBs on a queue
> > and only process those URBs and restart the endpoint afterward.
> > 
> 
> Actually, adding this new requirement would be tricky because we don't
> know whether it's recovery URBs or not.

The purpose of the submitted URBs doesn't matter.  The HCD shouldn't 
restart the endpoint until the usb_reset_endpoint() call occurs.

Also, I should point out that usbcore will call usb_clear_halt() and 
usb_reset_endpoint(), presumably using a work queue for the calls.  The 
class driver doesn't need to do it -- in fact, doing those things could 
lead to errors because the endpoint may already be running (the core may 
already have made the calls).

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-05  1:30                                         ` Alan Stern
@ 2026-04-05  3:10                                           ` Thinh Nguyen
  2026-04-07 15:23                                             ` Alan Stern
  0 siblings, 1 reply; 33+ messages in thread
From: Thinh Nguyen @ 2026-04-05  3:10 UTC (permalink / raw)
  To: Alan Stern
  Cc: Thinh Nguyen, Mathias Nyman, linux-usb@vger.kernel.org,
	michal.pecio@gmail.com, oneukum@suse.com,
	niklas.neronin@linux.intel.com

On Sat, Apr 04, 2026, Alan Stern wrote:
> On Sat, Apr 04, 2026 at 10:28:24PM +0000, Thinh Nguyen wrote:
> > On Sat, Apr 04, 2026, Thinh Nguyen wrote:
> > > On Sat, Apr 04, 2026, Alan Stern wrote:
> > > > If the class driver wants to take some other action (like submitting 
> > > > URBs to a different endpoint) before using the endpoint that stopped, 
> > > > it's free to do so.  It only has to make sure that it doesn't submit any 
> > > > URBs to the stopped endpoint until after the other action is finished -- 
> > > > which is what it would do anyway.  (And maybe it has to unlink any URBs 
> > > > that are already queued, which can be done with a simple function call.)
> > > > 
> > > 
> > > Then the xhci must make sure that it should not ring the doorbell to
> > > restart the endpoint when giving back the canceled URBs. It should only
> > > do so on newly submitted URBs.
> > 
> > Ignore this comment, it's not restarting the endpoint in the case of
> > unlinking.
> 
> I was going to say that xhci-hcd shouldn't restart the endpoint until 
> the usb_reset_endpoint() call is made.  Whether or not it rings the 
> doorbell at that time may depend on whether there are any URBs on the 
> queue; that's a relatively unimportant implementation detail.
> 
> > > We can add a requirement such that if the class driver submitted the
> > > recovery URBs prior to completing the usb_reset_endpoint (which should
> > > be done after clear-halt), then the HCD may keep those URBs on a queue
> > > and only process those URBs and restart the endpoint afterward.
> > > 
> > 
> > Actually, adding this new requirement would be tricky because we don't
> > know whether it's recovery URBs or not.
> 
> The purpose of the submitted URBs doesn't matter.  The HCD shouldn't 
> restart the endpoint until the usb_reset_endpoint() call occurs.
> 
> Also, I should point out that usbcore will call usb_clear_halt() and 
> usb_reset_endpoint(), presumably using a work queue for the calls.  The 
> class driver doesn't need to do it -- in fact, doing those things could 
> lead to errors because the endpoint may already be running (the core may 
> already have made the calls).
> 

That's good. This is what I wanted to confirm.

May need to update how xhci handles usb_reset_endpoint() because I
believe it's expected the endpoint transfer ring to be drained prior (by
the class driver unlinking URBs).

Thanks for the clarifications,
Thinh

ps. Have a good weekend! I'll be back a week after.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-05  3:10                                           ` Thinh Nguyen
@ 2026-04-07 15:23                                             ` Alan Stern
  2026-04-07 20:24                                               ` Mathias Nyman
  0 siblings, 1 reply; 33+ messages in thread
From: Alan Stern @ 2026-04-07 15:23 UTC (permalink / raw)
  To: Thinh Nguyen
  Cc: Mathias Nyman, linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

It's been a while now, and nobody has objected to the proposed plan for 
handling this issue, so I'm going to assume that everyone is on board 
with the idea.

There is a loose end still to be straightened out.  It concerns handling 
of -EREMOTEIO errors (short packet received with URB_SHORT_NOT_OK set).  
While some HCDs -- especially those supporting SG -- may not stop the 
endpoint queue when this error occurs, other HCDs will do so.  The 
question is how the core should tell them to start it up again.  This 
shouldn't happen until after the completion handler returns.

Short packets don't cause any loss of synchronization between the 
endpoint state on the host and on the device, so -EREMOTEIO doesn't 
require usb_clear_halt() or usb_reset_endpoint() for recovery.  This 
means we need to find some other way to tell the HCD when the queue can 
restart.  Should we create a new hc_driver callback specifically for 
this purpose?

Related question: Although URB_SHORT_NOT_OK is allowed for all types of 
IN URB other than isochronous, does its queue-stopping property make 
sense for control or interrupt URBs?  As far as I know, no kernel code 
uses it for them, although some userspace code might (via usbfs).

Alan Stern

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error
  2026-04-07 15:23                                             ` Alan Stern
@ 2026-04-07 20:24                                               ` Mathias Nyman
  0 siblings, 0 replies; 33+ messages in thread
From: Mathias Nyman @ 2026-04-07 20:24 UTC (permalink / raw)
  To: Alan Stern, Thinh Nguyen
  Cc: linux-usb@vger.kernel.org, michal.pecio@gmail.com,
	oneukum@suse.com, niklas.neronin@linux.intel.com

On 4/7/26 18:23, Alan Stern wrote:
> It's been a while now, and nobody has objected to the proposed plan for
> handling this issue, so I'm going to assume that everyone is on board
> with the idea.

Yes, I support this

So basically usb core will call usb_clear_halt() after EPROTO URB completion
handler finishes, and xhci-hcd needs to prevent bulk/interrupt endpoint
from restarting after returning a EPROTO URB up until usb_reset_endpoint()
is called

I also support adding usb_purge_endpoint_queue(), but it doesn't have to
be done at the same time as the EPROTO changes.

> 
> There is a loose end still to be straightened out.  It concerns handling
> of -EREMOTEIO errors (short packet received with URB_SHORT_NOT_OK set).
> While some HCDs -- especially those supporting SG -- may not stop the
> endpoint queue when this error occurs, other HCDs will do so.  The
> question is how the core should tell them to start it up again.  This
> shouldn't happen until after the completion handler returns.
> 
> Short packets don't cause any loss of synchronization between the
> endpoint state on the host and on the device, so -EREMOTEIO doesn't
> require usb_clear_halt() or usb_reset_endpoint() for recovery.  This
> means we need to find some other way to tell the HCD when the queue can
> restart.  Should we create a new hc_driver callback specifically for
> this purpose?

For xHC the issue is the other way around as Michal pointed out.

Can't find a way to stop the endpoint on short packets.
Can only manually stop the endpoint when xhci-hcd detects the short transfer
event, when xHC likely already processing the next URB.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2026-04-07 20:24 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-23 12:25 [RFC PATCH 0/2] fix xhci endpoint restart at EPROTO Mathias Nyman
2026-03-23 12:25 ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Mathias Nyman
2026-03-25  1:52   ` Thinh Nguyen
2026-03-25  9:38     ` Mathias Nyman
2026-03-26  1:19       ` Thinh Nguyen
2026-03-26 11:25         ` Mathias Nyman
2026-03-26 23:24           ` Thinh Nguyen
2026-03-30 12:51             ` Mathias Nyman
2026-03-30 14:17               ` stern
2026-03-31  9:34                 ` Mathias Nyman
2026-03-31 15:31                   ` stern
2026-04-01 22:08                     ` Mathias Nyman
2026-04-02  2:36                       ` stern
2026-04-03  1:59                         ` Thinh Nguyen
2026-04-03  2:42                           ` stern
2026-04-03  8:51                             ` Michal Pecio
2026-04-03 14:55                               ` stern
2026-04-03 19:13                                 ` xhci-hcd and URB_SHORT_NOT_OK Michal Pecio
2026-04-03 20:17                                   ` stern
2026-04-04  1:15                             ` [RFC PATCH 1/2] xhci: prevent automatic endpoint restart after stall or error Thinh Nguyen
2026-04-04  1:54                               ` stern
2026-04-04 20:41                                 ` Thinh Nguyen
2026-04-04 21:54                                   ` Alan Stern
2026-04-04 22:15                                     ` Thinh Nguyen
2026-04-04 22:28                                       ` Thinh Nguyen
2026-04-05  1:30                                         ` Alan Stern
2026-04-05  3:10                                           ` Thinh Nguyen
2026-04-07 15:23                                             ` Alan Stern
2026-04-07 20:24                                               ` Mathias Nyman
2026-04-01 22:08               ` Thinh Nguyen
2026-04-01 22:34                 ` Mathias Nyman
2026-04-01 22:47                   ` Thinh Nguyen
2026-03-23 12:25 ` [RFC PATCH 2/2] xhci: Ensure URB is given back when endpoint halts on a multi-TD URB Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox