public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: "Michał Pecio" <michal.pecio@gmail.com>
To: mathias.nyman@linux.intel.com
Cc: gregkh@linuxfoundation.org, linux-usb@vger.kernel.org
Subject: Re: [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors.
Date: Fri, 7 Mar 2025 07:54:29 +0100	[thread overview]
Message-ID: <20250307075429.5f9d1d4e@foxbook> (raw)
In-Reply-To: <20250306144954.3507700-13-mathias.nyman@linux.intel.com>

> Ensure that an endpoint halted due to device STALL is not
> restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to
> the device.
> 
> The host side of the endpoint may otherwise be started early by the
> 'Set TR Deq' command completion handler which is called if dequeue
> is moved past a cancelled or halted TD.
> 
> Prevent this with a new flag set for bulk and interrupt endpoints
> when a Stall Error is received. Clear it in hcd->endpoint_reset()
> which is called after Clear_Feature(ENDPOINT_HALT) is sent.
> 
> Also add a debug message if a class driver queues a new URB after
> the STALL. Note that class driver might not be aware of the STALL
> yet when it submits the URB as URBs are given back in BH.
> 
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

Sorry for coming this late, but I haven't looked closely at some
of those xhci/for-next patches before.

This one is unfortunately incomplete, as follows:

> drivers/usb/host/xhci-ring.c | 7 +++++--
> drivers/usb/host/xhci.c      | 6 ++++++
> drivers/usb/host/xhci.h      | 3 ++-
> 3 files changed, 13 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>index c2e15a27338b..7643ab9ec3b4 100644
>--- a/drivers/usb/host/xhci-ring.c
>+++ b/drivers/usb/host/xhci-ring.c
>@@ -556,8 +556,8 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
> 	 * pointer command pending because the device can choose to start any
> 	 * stream once the endpoint is on the HW schedule.
> 	 */
>-	if ((ep_state & EP_STOP_CMD_PENDING) || (ep_state & SET_DEQ_PENDING) ||
>-	    (ep_state & EP_HALTED) || (ep_state & EP_CLEARING_TT))
>+	if (ep_state & (EP_STOP_CMD_PENDING | SET_DEQ_PENDING | EP_HALTED |
>+			EP_CLEARING_TT | EP_STALLED))
> 		return;

Any flag added to this list needs to be added to xhci_urb_dequeue() too
so it knowns that the endpoint is held in Stopped state and URBs can be
unlinked without trying to stop it again.

There really should be a helper function used both here and there, but
those Stop EP patches were meant for stable and I strived to make them
small and noninvasive. Then I forgot about this cleanup.

NB: I also forgot about a bunch of low-impact halted EP handling bugs,
I will try to rebase and send them out today or over the weekend.

>  	trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id));
> @@ -2555,6 +2555,9 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>  
>  		xhci_handle_halted_endpoint(xhci, ep, td, EP_SOFT_RESET);
>  		return;
> +	case COMP_STALL_ERROR:
> +		ep->ep_state |= EP_STALLED;
> +		break;
>  	default:
>  		/* do nothing */
>  		break;
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 3f2cd546a7a2..0c22b78358b9 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -1604,6 +1604,11 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
>  		goto free_priv;
>  	}
>  
> +	/* Class driver might not be aware ep halted due to async URB giveback */
> +	if (*ep_state & EP_STALLED)
> +		dev_dbg(&urb->dev->dev, "URB %p queued before clearing halt\n",
> +			urb);
> +
>  	switch (usb_endpoint_type(&urb->ep->desc)) {
>  
>  	case USB_ENDPOINT_XFER_CONTROL:
> @@ -3202,6 +3207,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
>  		return;
>  
>  	ep = &vdev->eps[ep_index];
> +	ep->ep_state &= ~EP_STALLED;

... and clearing any of those flags has always been followed by calling
xhci_ring_ep_doorbell() again, to ensure that the endpoint is restarted
if it has URBs on it but restart was held off due to the flag.

xhci_urb_dequeue() relies on this too, because it looked lke sensible
design: if you have reasons not to run the EP, you set a flag. Reasons
are gone, you clear the flag and it's running again.

> 	/* Bail out if toggle is already being cleared by a endpoint reset */
> 	spin_lock_irqsave(&xhci->lock, flags);
>diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
>index cd96e0a8c593..4ee14f651d36 100644
>--- a/drivers/usb/host/xhci.h
>+++ b/drivers/usb/host/xhci.h
>@@ -664,7 +664,7 @@ struct xhci_virt_ep {
> 	unsigned int			err_count;
> 	unsigned int			ep_state;
> #define SET_DEQ_PENDING		(1 << 0)
>-#define EP_HALTED		(1 << 1)	/* For stall handling */
>+#define EP_HALTED		(1 << 1)	/* Halted host ep handling */
> #define EP_STOP_CMD_PENDING	(1 << 2)	/* For URB cancellation */
> /* Transitioning the endpoint to using streams, don't enqueue URBs */
> #define EP_GETTING_STREAMS	(1 << 3)
>@@ -675,6 +675,7 @@ struct xhci_virt_ep {
> #define EP_SOFT_CLEAR_TOGGLE	(1 << 7)
> /* usb_hub_clear_tt_buffer is in progress */
> #define EP_CLEARING_TT		(1 << 8)
>+#define EP_STALLED		(1 << 9)	/* For stall handling */

I guess usage rules of those flags should be documented somewhere here
and helpers added such as:

xhci_ep_cancel_pending()
xhci_ep_held_stopped()

to improve maintainability and prevent similar problems in the future.


I could sit and write something, I still have this stuff quite fresh
in memory after spending a few weeks debugging those crazy HW races.

Regards,
Michal

  reply	other threads:[~2025-03-07  6:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
2025-03-06 14:49 ` [PATCH 01/15] xhci: show correct U1 and U2 timeout values in debug messages Mathias Nyman
2025-03-06 14:49 ` [PATCH 02/15] usb: xhci: remove redundant update_ring_for_set_deq_completion() function Mathias Nyman
2025-03-06 14:49 ` [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid Mathias Nyman
2025-03-06 14:52   ` Greg KH
2025-03-06 15:29     ` Mathias Nyman
2025-03-06 15:42       ` Greg KH
2025-03-06 14:49 ` [PATCH 04/15] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Mathias Nyman
2025-03-06 14:49 ` [PATCH 05/15] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Mathias Nyman
2025-03-06 14:49 ` [PATCH 06/15] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Mathias Nyman
2025-03-06 14:49 ` [PATCH 07/15] usb: xhci: Skip only one TD on Ring Underrun/Overrun Mathias Nyman
2025-03-06 14:49 ` [PATCH 08/15] usb: xhci: correct debug message page size calculation Mathias Nyman
2025-03-06 14:49 ` [PATCH 09/15] usb: xhci: set page size to the xHCI-supported size Mathias Nyman
2025-03-06 14:49 ` [PATCH 10/15] usb: xhci: refactor trb_in_td() to be static Mathias Nyman
2025-03-06 14:49 ` [PATCH 11/15] usb: xhci: move debug capabilities from trb_in_td() to handle_tx_event() Mathias Nyman
2025-03-06 14:49 ` [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors Mathias Nyman
2025-03-07  6:54   ` Michał Pecio [this message]
2025-03-07 14:23     ` Mathias Nyman
2025-03-07 15:44       ` Michał Pecio
2025-03-07 16:18         ` Mathias Nyman
2025-03-06 14:49 ` [PATCH 13/15] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Mathias Nyman
2025-03-06 14:49 ` [PATCH 14/15] usb: xhci: Unify duplicate inc_enq() code Mathias Nyman
2025-03-06 14:49 ` [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints Mathias Nyman
2025-03-07  8:27   ` Michał Pecio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250307075429.5f9d1d4e@foxbook \
    --to=michal.pecio@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox