All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Mathias Nyman <mathias.nyman@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH 6/6] usb: xhci: Update comments about Stop Endpoint races
Date: Mon, 10 Mar 2025 09:42:13 +0100	[thread overview]
Message-ID: <20250310094213.3afca51c@foxbook> (raw)
In-Reply-To: <20250310093605.2b3d0425@foxbook>

The switch statement has grown beyond its original EP_STATE_HALTED case,
so move related comments inside this case and shorten them somewhat.

Shorten EP_STATE_STOPPED comments, add some common context at the top.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 70 ++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 31 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 241cd82672a6..759e8f612b4d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1199,20 +1199,21 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 	trace_xhci_handle_cmd_stop_ep(ep_ctx);
 
 	if (comp_code == COMP_CONTEXT_STATE_ERROR) {
-	/*
-	 * If stop endpoint command raced with a halting endpoint we need to
-	 * reset the host side endpoint first.
-	 * If the TD we halted on isn't cancelled the TD should be given back
-	 * with a proper error code, and the ring dequeue moved past the TD.
-	 * If streams case we can't find hw_deq, or the TD we halted on so do a
-	 * soft reset.
-	 *
-	 * Proper error code is unknown here, it would be -EPIPE if device side
-	 * of enadpoit halted (aka STALL), and -EPROTO if not (transaction error)
-	 * We use -EPROTO, if device is stalled it should return a stall error on
-	 * next transfer, which then will return -EPIPE, and device side stall is
-	 * noted and cleared by class driver.
-	 */
+
+		/*
+		 * This happens if the endpoint was not in the Running state. Its state now may
+		 * be other than at the time the command failed. We need to look at the Endpoint
+		 * Context and driver state to figure out what went wrong and what to do next.
+		 *
+		 * Many HCs are kind enough to hide their internal state transitions from us and
+		 * never fail Stop Endpoint after a doorbell ring. Others, including vendors like
+		 * NEC, ASMedia or Intel, may fail the command and begin running microseconds or
+		 * even milliseconds later (up to an ESIT on NEC periodic endpoints).
+		 *
+		 * We may see Running or Halted state after the command really failed on Stopped.
+		 * We may see Stopped after it failed on Halted, if somebody resets the endpoint.
+		 */
+
 		switch (GET_EP_CTX_STATE(ep_ctx)) {
 		case EP_STATE_HALTED:
 			xhci_dbg(xhci, "Stop ep completion raced with stall\n");
@@ -1222,8 +1223,23 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 			 */
 			if (ep->ep_state & EP_HALTED)
 				goto reset_done;
-
+			/*
+			 * Maybe reset failed with -ENOMEM. We are paranoid that a buggy HC may
+			 * mishandle the race and produce no transfer event before this command
+			 * completion. Or the endpoint actually was restarting, rejected Stop
+			 * Endpoint, then started and halted. The transfer event may only come
+			 * after this completion and some ASMedia HCs don't report such events.
+			 *
+			 * Try to reset the host endpoint now. Locate the halted TD, update its
+			 * status and cancel it. Reset EP completion takes care of the rest.
+			 *
+			 * Proper status is unknown here, it would be -EPIPE if the device side
+			 * endpoint stalled and -EPROTO otherwise (Transaction Error, etc).
+			 * We use -EPROTO, if device is stalled it should keep returning STALL
+			 * on future transfers which will hopefully receive normal handling.
+			 */
 			if (ep->ep_state & EP_HAS_STREAMS) {
+				/* We don't know which stream failed, attempt Soft Retry */
 				reset_type = EP_SOFT_RESET;
 			} else {
 				reset_type = EP_HARD_RESET;
@@ -1231,39 +1247,31 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 				if (td)
 					td->status = -EPROTO;
 			}
-			/* reset ep, reset handler cleans up cancelled tds */
 			err = xhci_handle_halted_endpoint(xhci, ep, td, reset_type);
 			xhci_dbg(xhci, "Stop ep completion resetting ep, status %d\n", err);
 			if (err)
+				/* persistent problem or disconnection, we must give back TDs */
 				break;
 reset_done:
 			/* Reset EP handler will clean up cancelled TDs */
 			ep->ep_state &= ~EP_STOP_CMD_PENDING;
 			return;
+
 		case EP_STATE_STOPPED:
 			/*
-			 * Per xHCI 4.6.9, Stop Endpoint command on a Stopped
-			 * EP is a Context State Error, and EP stays Stopped.
-			 *
-			 * But maybe it failed on Halted, and somebody ran Reset
-			 * Endpoint later. EP state is now Stopped and EP_HALTED
-			 * still set because Reset EP handler will run after us.
+			 * Maybe the command failed in Halted state and the transfer event queued
+			 * Reset Endpoint. The endpoint is now Stopped and EP_HALTED is still set,
+			 * because Reset Endpoint completion handler will run after this one.
 			 */
 			if (ep->ep_state & EP_HALTED)
 				break;
 			/*
-			 * On some HCs EP state remains Stopped for some tens of
-			 * us to a few ms or more after a doorbell ring, and any
-			 * new Stop Endpoint fails without aborting the restart.
-			 * This handler may run quickly enough to still see this
-			 * Stopped state, but it will soon change to Running.
-			 *
-			 * Assume this bug on unexpected Stop Endpoint failures.
-			 * Keep retrying until the EP starts and stops again.
+			 * We don't queue Stop Endpoint on Stopped endpoints. Assume the pending
+			 * restart case and keep retrying until it starts and stops again.
 			 */
 			fallthrough;
+
 		case EP_STATE_RUNNING:
-			/* Race, HW handled stop ep cmd before ep was running */
 			xhci_dbg(xhci, "Stop ep completion ctx error, ctx_state %d\n",
 					GET_EP_CTX_STATE(ep_ctx));
 			/*
-- 
2.48.1

  parent reply	other threads:[~2025-03-10  8:42 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-10  8:36 [PATCH 0/6] xHCI: endpoint state maintainability and small fixes Michal Pecio
2025-03-10  8:36 ` [PATCH 1/6] usb: xhci: Document endpoint state management Michal Pecio
2025-03-10  9:25   ` Mathias Nyman
2025-03-10  8:37 ` [PATCH 2/6] usb: xhci: Deduplicate some endpoint state flag lists Michal Pecio
2025-03-10  9:51   ` Mathias Nyman
2025-03-11  0:13     ` Michał Pecio
2025-03-10  8:38 ` [PATCH 3/6] usb: xhci: Only set EP_HARD_CLEAR_TOGGLE after queuing Reset Endpoint Michal Pecio
2025-03-10  8:39 ` usb: xhci: Don't change the status of stalled TDs on failed Stop EP Michal Pecio
2025-03-10  8:43   ` Michal Pecio
2025-03-10  8:40 ` [PATCH 4/6] " Michal Pecio
2025-03-10 13:54   ` Mathias Nyman
2025-03-10  8:41 ` [PATCH 5/6] usb: xhci: Avoid Stop Endpoint retry loop if the endpoint seems Running Michal Pecio
2025-03-10  8:42 ` Michal Pecio [this message]
2025-03-11 15:41 ` [PATCH 0/6] xHCI: endpoint state maintainability and small fixes Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250310094213.3afca51c@foxbook \
    --to=michal.pecio@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.