All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michał Pecio" <michal.pecio@gmail.com>
To: mathias.nyman@linux.intel.com
Cc: gregkh@linuxfoundation.org, linux-usb@vger.kernel.org
Subject: Re: [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors.
Date: Fri, 7 Mar 2025 07:54:29 +0100	[thread overview]
Message-ID: <20250307075429.5f9d1d4e@foxbook> (raw)
In-Reply-To: <20250306144954.3507700-13-mathias.nyman@linux.intel.com>

> Ensure that an endpoint halted due to device STALL is not
> restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to
> the device.
> 
> The host side of the endpoint may otherwise be started early by the
> 'Set TR Deq' command completion handler which is called if dequeue
> is moved past a cancelled or halted TD.
> 
> Prevent this with a new flag set for bulk and interrupt endpoints
> when a Stall Error is received. Clear it in hcd->endpoint_reset()
> which is called after Clear_Feature(ENDPOINT_HALT) is sent.
> 
> Also add a debug message if a class driver queues a new URB after
> the STALL. Note that class driver might not be aware of the STALL
> yet when it submits the URB as URBs are given back in BH.
> 
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

Sorry for coming this late, but I haven't looked closely at some
of those xhci/for-next patches before.

This one is unfortunately incomplete, as follows:

> drivers/usb/host/xhci-ring.c | 7 +++++--
> drivers/usb/host/xhci.c      | 6 ++++++
> drivers/usb/host/xhci.h      | 3 ++-
> 3 files changed, 13 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>index c2e15a27338b..7643ab9ec3b4 100644
>--- a/drivers/usb/host/xhci-ring.c
>+++ b/drivers/usb/host/xhci-ring.c
>@@ -556,8 +556,8 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
> 	 * pointer command pending because the device can choose to start any
> 	 * stream once the endpoint is on the HW schedule.
> 	 */
>-	if ((ep_state & EP_STOP_CMD_PENDING) || (ep_state & SET_DEQ_PENDING) ||
>-	    (ep_state & EP_HALTED) || (ep_state & EP_CLEARING_TT))
>+	if (ep_state & (EP_STOP_CMD_PENDING | SET_DEQ_PENDING | EP_HALTED |
>+			EP_CLEARING_TT | EP_STALLED))
> 		return;

Any flag added to this list needs to be added to xhci_urb_dequeue() too
so it knowns that the endpoint is held in Stopped state and URBs can be
unlinked without trying to stop it again.

There really should be a helper function used both here and there, but
those Stop EP patches were meant for stable and I strived to make them
small and noninvasive. Then I forgot about this cleanup.

NB: I also forgot about a bunch of low-impact halted EP handling bugs,
I will try to rebase and send them out today or over the weekend.

>  	trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id));
> @@ -2555,6 +2555,9 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>  
>  		xhci_handle_halted_endpoint(xhci, ep, td, EP_SOFT_RESET);
>  		return;
> +	case COMP_STALL_ERROR:
> +		ep->ep_state |= EP_STALLED;
> +		break;
>  	default:
>  		/* do nothing */
>  		break;
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 3f2cd546a7a2..0c22b78358b9 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -1604,6 +1604,11 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
>  		goto free_priv;
>  	}
>  
> +	/* Class driver might not be aware ep halted due to async URB giveback */
> +	if (*ep_state & EP_STALLED)
> +		dev_dbg(&urb->dev->dev, "URB %p queued before clearing halt\n",
> +			urb);
> +
>  	switch (usb_endpoint_type(&urb->ep->desc)) {
>  
>  	case USB_ENDPOINT_XFER_CONTROL:
> @@ -3202,6 +3207,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
>  		return;
>  
>  	ep = &vdev->eps[ep_index];
> +	ep->ep_state &= ~EP_STALLED;

... and clearing any of those flags has always been followed by calling
xhci_ring_ep_doorbell() again, to ensure that the endpoint is restarted
if it has URBs on it but restart was held off due to the flag.

xhci_urb_dequeue() relies on this too, because it looked lke sensible
design: if you have reasons not to run the EP, you set a flag. Reasons
are gone, you clear the flag and it's running again.

> 	/* Bail out if toggle is already being cleared by a endpoint reset */
> 	spin_lock_irqsave(&xhci->lock, flags);
>diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
>index cd96e0a8c593..4ee14f651d36 100644
>--- a/drivers/usb/host/xhci.h
>+++ b/drivers/usb/host/xhci.h
>@@ -664,7 +664,7 @@ struct xhci_virt_ep {
> 	unsigned int			err_count;
> 	unsigned int			ep_state;
> #define SET_DEQ_PENDING		(1 << 0)
>-#define EP_HALTED		(1 << 1)	/* For stall handling */
>+#define EP_HALTED		(1 << 1)	/* Halted host ep handling */
> #define EP_STOP_CMD_PENDING	(1 << 2)	/* For URB cancellation */
> /* Transitioning the endpoint to using streams, don't enqueue URBs */
> #define EP_GETTING_STREAMS	(1 << 3)
>@@ -675,6 +675,7 @@ struct xhci_virt_ep {
> #define EP_SOFT_CLEAR_TOGGLE	(1 << 7)
> /* usb_hub_clear_tt_buffer is in progress */
> #define EP_CLEARING_TT		(1 << 8)
>+#define EP_STALLED		(1 << 9)	/* For stall handling */

I guess usage rules of those flags should be documented somewhere here
and helpers added such as:

xhci_ep_cancel_pending()
xhci_ep_held_stopped()

to improve maintainability and prevent similar problems in the future.


I could sit and write something, I still have this stuff quite fresh
in memory after spending a few weeks debugging those crazy HW races.

Regards,
Michal

  reply	other threads:[~2025-03-07  6:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
2025-03-06 14:49 ` [PATCH 01/15] xhci: show correct U1 and U2 timeout values in debug messages Mathias Nyman
2025-03-06 14:49 ` [PATCH 02/15] usb: xhci: remove redundant update_ring_for_set_deq_completion() function Mathias Nyman
2025-03-06 14:49 ` [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid Mathias Nyman
2025-03-06 14:52   ` Greg KH
2025-03-06 15:29     ` Mathias Nyman
2025-03-06 15:42       ` Greg KH
2025-03-06 14:49 ` [PATCH 04/15] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Mathias Nyman
2025-03-06 14:49 ` [PATCH 05/15] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Mathias Nyman
2025-03-06 14:49 ` [PATCH 06/15] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Mathias Nyman
2025-03-06 14:49 ` [PATCH 07/15] usb: xhci: Skip only one TD on Ring Underrun/Overrun Mathias Nyman
2025-03-06 14:49 ` [PATCH 08/15] usb: xhci: correct debug message page size calculation Mathias Nyman
2025-03-06 14:49 ` [PATCH 09/15] usb: xhci: set page size to the xHCI-supported size Mathias Nyman
2025-03-06 14:49 ` [PATCH 10/15] usb: xhci: refactor trb_in_td() to be static Mathias Nyman
2025-03-06 14:49 ` [PATCH 11/15] usb: xhci: move debug capabilities from trb_in_td() to handle_tx_event() Mathias Nyman
2025-03-06 14:49 ` [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors Mathias Nyman
2025-03-07  6:54   ` Michał Pecio [this message]
2025-03-07 14:23     ` Mathias Nyman
2025-03-07 15:44       ` Michał Pecio
2025-03-07 16:18         ` Mathias Nyman
2025-03-06 14:49 ` [PATCH 13/15] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Mathias Nyman
2025-03-06 14:49 ` [PATCH 14/15] usb: xhci: Unify duplicate inc_enq() code Mathias Nyman
2025-03-06 14:49 ` [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints Mathias Nyman
2025-03-07  8:27   ` Michał Pecio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250307075429.5f9d1d4e@foxbook \
    --to=michal.pecio@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.