[PATCH 00/15] xhci features for usb-next

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/15] xhci features for usb-next
@ 2025-03-06 14:49 Mathias Nyman
  2025-03-06 14:49 ` [PATCH 01/15] xhci: show correct U1 and U2 timeout values in debug messages Mathias Nyman
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman

Hi Greg

A set of xhci reworks, refactoring and minor improvements for usb-next.
A lot of the work is related to transfer event handling.

Thanks
Mathias

Mathias Nyman (3):
  xhci: show correct U1 and U2 timeout values in debug messages
  xhci: Prevent early endpoint restart when handling STALL errors.
  xhci: Handle spurious events on Etron host isoc enpoints

Michal Pecio (7):
  usb: xhci: Don't skip on Stopped - Length Invalid
  usb: xhci: Complete 'error mid TD' transfers when handling Missed
    Service
  usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
  usb: xhci: Expedite skipping missed isoch TDs on modern HCs
  usb: xhci: Skip only one TD on Ring Underrun/Overrun
  usb: xhci: Apply the link chain quirk on NEC isoc endpoints
  usb: xhci: Unify duplicate inc_enq() code

Niklas Neronin (5):
  usb: xhci: remove redundant update_ring_for_set_deq_completion()
    function
  usb: xhci: correct debug message page size calculation
  usb: xhci: set page size to the xHCI-supported size
  usb: xhci: refactor trb_in_td() to be static
  usb: xhci: move debug capabilities from trb_in_td() to
    handle_tx_event()

 drivers/usb/host/xhci-mem.c  |  34 +--
 drivers/usb/host/xhci-ring.c | 397 +++++++++++++++++------------------
 drivers/usb/host/xhci.c      |  14 +-
 drivers/usb/host/xhci.h      |  28 ++-
 4 files changed, 240 insertions(+), 233 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/15] xhci: show correct U1 and U2 timeout values in debug messages
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 02/15] usb: xhci: remove redundant update_ring_for_set_deq_completion() function Mathias Nyman
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman

U2 value is encoded in 256 microsecond intervals, show in microseconds.
U1 value is in microseconds. debug message incorrectly showed "ms"

Unwrap debug messages while we anyway modify them.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 45653114ccd7..3f2cd546a7a2 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4755,8 +4755,8 @@ static u16 xhci_calculate_u1_timeout(struct xhci_hcd *xhci,
 	 */
 	if (timeout_ns <= USB3_LPM_U1_MAX_TIMEOUT)
 		return timeout_ns;
-	dev_dbg(&udev->dev, "Hub-initiated U1 disabled "
-			"due to long timeout %llu ms\n", timeout_ns);
+	dev_dbg(&udev->dev, "Hub-initiated U1 disabled due to long timeout %lluus\n",
+		timeout_ns);
 	return xhci_get_timeout_no_hub_lpm(udev, USB3_LPM_U1);
 }
 
@@ -4813,8 +4813,8 @@ static u16 xhci_calculate_u2_timeout(struct xhci_hcd *xhci,
 	 */
 	if (timeout_ns <= USB3_LPM_U2_MAX_TIMEOUT)
 		return timeout_ns;
-	dev_dbg(&udev->dev, "Hub-initiated U2 disabled "
-			"due to long timeout %llu ms\n", timeout_ns);
+	dev_dbg(&udev->dev, "Hub-initiated U2 disabled due to long timeout %lluus\n",
+		timeout_ns * 256);
 	return xhci_get_timeout_no_hub_lpm(udev, USB3_LPM_U2);
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/15] usb: xhci: remove redundant update_ring_for_set_deq_completion() function
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
  2025-03-06 14:49 ` [PATCH 01/15] xhci: show correct U1 and U2 timeout values in debug messages Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid Mathias Nyman
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Niklas Neronin, Mathias Nyman

From: Niklas Neronin <niklas.neronin@linux.intel.com>

The function is a remnant from a previous implementation and is now
redundant. There is no longer a need to search for the dequeue pointer,
as both the TRB and segment dequeue pointers are saved within
'queued_deq_seg' and 'queued_deq_ptr'.

Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 41 ++----------------------------------
 1 file changed, 2 insertions(+), 39 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 965bffce301e..23cf20026359 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1332,43 +1332,6 @@ void xhci_hc_died(struct xhci_hcd *xhci)
 		usb_hc_died(xhci_to_hcd(xhci));
 }
 
-static void update_ring_for_set_deq_completion(struct xhci_hcd *xhci,
-		struct xhci_virt_device *dev,
-		struct xhci_ring *ep_ring,
-		unsigned int ep_index)
-{
-	union xhci_trb *dequeue_temp;
-
-	dequeue_temp = ep_ring->dequeue;
-
-	/* If we get two back-to-back stalls, and the first stalled transfer
-	 * ends just before a link TRB, the dequeue pointer will be left on
-	 * the link TRB by the code in the while loop.  So we have to update
-	 * the dequeue pointer one segment further, or we'll jump off
-	 * the segment into la-la-land.
-	 */
-	if (trb_is_link(ep_ring->dequeue)) {
-		ep_ring->deq_seg = ep_ring->deq_seg->next;
-		ep_ring->dequeue = ep_ring->deq_seg->trbs;
-	}
-
-	while (ep_ring->dequeue != dev->eps[ep_index].queued_deq_ptr) {
-		/* We have more usable TRBs */
-		ep_ring->dequeue++;
-		if (trb_is_link(ep_ring->dequeue)) {
-			if (ep_ring->dequeue ==
-					dev->eps[ep_index].queued_deq_ptr)
-				break;
-			ep_ring->deq_seg = ep_ring->deq_seg->next;
-			ep_ring->dequeue = ep_ring->deq_seg->trbs;
-		}
-		if (ep_ring->dequeue == dequeue_temp) {
-			xhci_dbg(xhci, "Unable to find new dequeue pointer\n");
-			break;
-		}
-	}
-}
-
 /*
  * When we get a completion for a Set Transfer Ring Dequeue Pointer command,
  * we need to clear the set deq pending flag in the endpoint ring state, so that
@@ -1473,8 +1436,8 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id,
 			/* Update the ring's dequeue segment and dequeue pointer
 			 * to reflect the new position.
 			 */
-			update_ring_for_set_deq_completion(xhci, ep->vdev,
-				ep_ring, ep_index);
+			ep_ring->deq_seg = ep->queued_deq_seg;
+			ep_ring->dequeue = ep->queued_deq_ptr;
 		} else {
 			xhci_warn(xhci, "Mismatch between completed Set TR Deq Ptr command & xHCI internal state.\n");
 			xhci_warn(xhci, "ep deq seg = %p, deq ptr = %p\n",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
  2025-03-06 14:49 ` [PATCH 01/15] xhci: show correct U1 and U2 timeout values in debug messages Mathias Nyman
  2025-03-06 14:49 ` [PATCH 02/15] usb: xhci: remove redundant update_ring_for_set_deq_completion() function Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:52   ` Greg KH
  2025-03-06 14:49 ` [PATCH 04/15] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Mathias Nyman
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Michal Pecio, stable, Mathias Nyman

From: Michal Pecio <michal.pecio@gmail.com>

Up until commit d56b0b2ab142 ("usb: xhci: ensure skipped isoc TDs are
returned when isoc ring is stopped") in v6.11, the driver didn't skip
missed isochronous TDs when handling Stoppend and Stopped - Length
Invalid events. Instead, it erroneously cleared the skip flag, which
would cause the ring to get stuck, as future events won't match the
missed TD which is never removed from the queue until it's cancelled.

This buggy logic seems to have been in place substantially unchanged
since the 3.x series over 10 years ago, which probably speaks first
and foremost about relative rarity of this case in normal usage, but
by the spec I see no reason why it shouldn't be possible.

After d56b0b2ab142, TDs are immediately skipped when handling those
Stopped events. This poses a potential problem in case of Stopped -
Length Invalid, which occurs either on completed TDs (likely already
given back) or Link and No-Op TRBs. Such event won't be recognized
as matching any TD (unless it's the rare Link TRB inside a TD) and
will result in skipping all pending TDs, giving them back possibly
before they are done, risking isoc data loss and maybe UAF by HW.

As a compromise, don't skip and don't clear the skip flag on this
kind of event. Then the next event will skip missed TDs. A downside
of not handling Stopped - Length Invalid on a Link inside a TD is
that if the TD is cancelled, its actual length will not be updated
to account for TRBs (silently) completed before the TD was stopped.

I had no luck producing this sequence of completion events so there
is no compelling demonstration of any resulting disaster. It may be
a very rare, obscure condition. The sole motivation for this patch
is that if such unlikely event does occur, I'd rather risk reporting
a cancelled partially done isoc frame as empty than gamble with UAF.

This will be fixed more properly by looking at Stopped event's TRB
pointer when making skipping decisions, but such rework is unlikely
to be backported to v6.12, which will stay around for a few years.

Fixes: d56b0b2ab142 ("usb: xhci: ensure skipped isoc TDs are returned when isoc ring is stopped")
Cc: stable@vger.kernel.org
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 23cf20026359..6fb48d30ec21 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2828,6 +2828,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		if (!ep_seg) {
 
 			if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
+				/* this event is unlikely to match any TD, don't skip them all */
+				if (trb_comp_code == COMP_STOPPED_LENGTH_INVALID)
+					return 0;
+
 				skip_isoc_td(xhci, td, ep, status);
 				if (!list_empty(&ep_ring->td_list))
 					continue;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/15] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (2 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 05/15] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Mathias Nyman
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Michal Pecio, Mathias Nyman

From: Michal Pecio <michal.pecio@gmail.com>

Missed Service Error after an error mid TD means that the failed TD has
already been passed by the xHC without acknowledgment of the final TRB,
a known hardware bug. So don't wait any more and give back the TD.

Reproduced on NEC uPD720200 under conditions of ludicrously bad USB link
quality, confirmed to behave as expected using dynamic debug.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 6fb48d30ec21..47aaaf4eb92a 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2752,7 +2752,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		xhci_dbg(xhci,
 			 "Miss service interval error for slot %u ep %u, set skip flag\n",
 			 slot_id, ep_index);
-		return 0;
+		break;
 	case COMP_NO_PING_RESPONSE_ERROR:
 		ep->skip = true;
 		xhci_dbg(xhci,
@@ -2800,6 +2800,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		xhci_dequeue_td(xhci, td, ep_ring, td->status);
 	}
 
+	/* Missed TDs will be skipped on the next event */
+	if (trb_comp_code == COMP_MISSED_SERVICE_ERROR)
+		return 0;
+
 	if (list_empty(&ep_ring->td_list)) {
 		/*
 		 * Don't print wanings if ring is empty due to a stopped endpoint generating an
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/15] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (3 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 04/15] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 06/15] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Mathias Nyman
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Michal Pecio, Mathias Nyman

From: Michal Pecio <michal.pecio@gmail.com>

The TRB pointer of these events points at enqueue at the time of error
occurrence on xHCI 1.1+ HCs or it's NULL on older ones. By the time we
are handling the event, a new TD may be queued at this ring position.

I can trigger this race by rising interrupt moderation to increase IRQ
handling delay. Similar delay may occur naturally due to system load.

If this ever happens after a Missed Service Error, missed TDs will be
skipped and the new TD processed as if it matched the event. It could
be given back prematurely, risking data loss or buffer UAF by the xHC.

Don't complete TDs on xrun events and don't warn if queued TDs don't
match the event's TRB pointer, which can be NULL or a link/no-op TRB.
Don't warn if there are no queued TDs at all.

Now that it's safe, also handle xrun events if the skip flag is clear.
This ensures completion of any TD stuck in 'error mid TD' state right
before the xrun event, which could happen if a driver submits a finite
number of URBs to a buggy HC and then an error occurs on the last TD.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 47aaaf4eb92a..d34f46b63006 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2627,6 +2627,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	int status = -EINPROGRESS;
 	struct xhci_ep_ctx *ep_ctx;
 	u32 trb_comp_code;
+	bool ring_xrun_event = false;
 
 	slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags));
 	ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1;
@@ -2733,14 +2734,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		 * Underrun Event for OUT Isoch endpoint.
 		 */
 		xhci_dbg(xhci, "Underrun event on slot %u ep %u\n", slot_id, ep_index);
-		if (ep->skip)
-			break;
-		return 0;
+		ring_xrun_event = true;
+		break;
 	case COMP_RING_OVERRUN:
 		xhci_dbg(xhci, "Overrun event on slot %u ep %u\n", slot_id, ep_index);
-		if (ep->skip)
-			break;
-		return 0;
+		ring_xrun_event = true;
+		break;
 	case COMP_MISSED_SERVICE_ERROR:
 		/*
 		 * When encounter missed service error, one or more isoc tds
@@ -2813,6 +2812,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		 */
 		if (trb_comp_code != COMP_STOPPED &&
 		    trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
+		    !ring_xrun_event &&
 		    !ep_ring->last_td_was_short) {
 			xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
 				  slot_id, ep_index);
@@ -2847,6 +2847,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 				goto check_endpoint_halted;
 			}
 
+			/* TD was queued after xrun, maybe xrun was on a link, don't panic yet */
+			if (ring_xrun_event)
+				return 0;
+
 			/*
 			 * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
 			 * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
@@ -2893,6 +2897,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	 */
 	} while (ep->skip);
 
+	/* Get out if a TD was queued at enqueue after the xrun occurred */
+	if (ring_xrun_event)
+		return 0;
+
 	if (trb_comp_code == COMP_SHORT_PACKET)
 		ep_ring->last_td_was_short = true;
 	else
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/15] usb: xhci: Expedite skipping missed isoch TDs on modern HCs
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (4 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 05/15] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 07/15] usb: xhci: Skip only one TD on Ring Underrun/Overrun Mathias Nyman
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Michal Pecio, Mathias Nyman

From: Michal Pecio <michal.pecio@gmail.com>

xHCI spec rev. 1.0 allowed the TRB pointer of Missed Service events
to be NULL. Having no idea which of the queued TDs were missed and
which are waiting, we can only set a flag to skip missed TDs later.

But HCs are also allowed to give us pointer to the last missed TRB,
and this became mandatory in spec rev. 1.1 and later.

Use this pointer, if available, to immediately skip all missed TDs.
This reduces latency and risk of skipping-related bugs, because we
can now leave the skip flag cleared for future events.

Handle Missed Service Error events as 'error mid TD', if applicable,
because rev. 1.0 spec excplicitly says so in notes to 4.10.3.2 and
later revs in 4.10.3.2 and 4.11.2.5.2. Notes to 4.9.1 seem to apply.

Tested on ASM1142 and ASM3142 v1.1 xHCs which provide TRB pointers.
Tested on AMD, Etron, Renesas v1.0 xHCs which provide TRB pointers.
Tested on NEC v0.96 and VIA v1.0 xHCs which send a NULL pointer.

Change inspired by a discussion about realtime USB audio.

Link: https://lore.kernel.org/linux-usb/76e1a191-020d-4a76-97f6-237f9bd0ede0@gmx.net/T/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index d34f46b63006..e871dd61a636 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2439,6 +2439,12 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		if (ep_trb != td->end_trb)
 			td->error_mid_td = true;
 		break;
+	case COMP_MISSED_SERVICE_ERROR:
+		frame->status = -EXDEV;
+		sum_trbs_for_length = true;
+		if (ep_trb != td->end_trb)
+			td->error_mid_td = true;
+		break;
 	case COMP_INCOMPATIBLE_DEVICE_ERROR:
 	case COMP_STALL_ERROR:
 		frame->status = -EPROTO;
@@ -2749,8 +2755,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		 */
 		ep->skip = true;
 		xhci_dbg(xhci,
-			 "Miss service interval error for slot %u ep %u, set skip flag\n",
-			 slot_id, ep_index);
+			 "Miss service interval error for slot %u ep %u, set skip flag%s\n",
+			 slot_id, ep_index, ep_trb_dma ? ", skip now" : "");
 		break;
 	case COMP_NO_PING_RESPONSE_ERROR:
 		ep->skip = true;
@@ -2799,8 +2805,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		xhci_dequeue_td(xhci, td, ep_ring, td->status);
 	}
 
-	/* Missed TDs will be skipped on the next event */
-	if (trb_comp_code == COMP_MISSED_SERVICE_ERROR)
+	/* If the TRB pointer is NULL, missed TDs will be skipped on the next event */
+	if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb_dma)
 		return 0;
 
 	if (list_empty(&ep_ring->td_list)) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/15] usb: xhci: Skip only one TD on Ring Underrun/Overrun
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (5 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 06/15] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 08/15] usb: xhci: correct debug message page size calculation Mathias Nyman
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Michal Pecio, Mathias Nyman

From: Michal Pecio <michal.pecio@gmail.com>

If skipping is deferred to events other than Missed Service Error itsef,
it means we are running on an xHCI 1.0 host and don't know how many TDs
were missed until we reach some ordinary transfer completion event.

And in case of ring xrun, we can't know where the xrun happened either.

If we skip all pending TDs, we may prematurely give back TDs added after
the xrun had occurred, risking data loss or buffer UAF by the xHC.

If we skip none, a driver may become confused and stop working when all
its URBs are missed and appear to be "in flight" forever.

Skip exactly one TD on each xrun event - the first one that was missed,
as we can now be sure that the HC has finished processing it. Provided
that one more TD is queued before any subsequent doorbell ring, it will
become safe to skip another TD by the time we get an xrun again.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e871dd61a636..70b896297494 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2843,8 +2843,21 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 					return 0;

 				skip_isoc_td(xhci, td, ep, status);
-				if (!list_empty(&ep_ring->td_list))
+
+				if (!list_empty(&ep_ring->td_list)) {
+					if (ring_xrun_event) {
+						/*
+						 * If we are here, we are on xHCI 1.0 host with no
+						 * idea how many TDs were missed or where the xrun
+						 * occurred. New TDs may have been added after the
+						 * xrun, so skip only one TD to be safe.
+						 */
+						xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
+								slot_id, ep_index);
+						return 0;
+					}
 					continue;
+				}

 				xhci_dbg(xhci, "All TDs skipped for slot %u ep %u. Clear skip flag.\n",
 					 slot_id, ep_index);
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/15] usb: xhci: correct debug message page size calculation
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (6 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 07/15] usb: xhci: Skip only one TD on Ring Underrun/Overrun Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 09/15] usb: xhci: set page size to the xHCI-supported size Mathias Nyman
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Niklas Neronin, Mathias Nyman

From: Niklas Neronin <niklas.neronin@linux.intel.com>

The ffs() function returns the index of the first set bit, starting from 1.
If no bits are set, it returns zero. This behavior causes an off-by-one
page size in the debug message, as the page size calculation [1]
is zero-based, while ffs() is one-based.

Fix this by subtracting one from the result of ffs(). Note that since
variable 'val' is unsigned, subtracting one from zero will result in the
maximum unsigned integer value. Consequently, the condition 'if (val < 16)'
will still function correctly.

[1], Page size: (2^(n+12)), where 'n' is the set page size bit.

Fixes: 81720ec5320c ("usb: host: xhci: use ffs() in xhci_mem_init()")
Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-mem.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..dc5bcd8db4c0 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -2391,10 +2391,10 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 	page_size = readl(&xhci->op_regs->page_size);
 	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
 			"Supported page size register = 0x%x", page_size);
-	i = ffs(page_size);
-	if (i < 16)
+	val = ffs(page_size) - 1;
+	if (val < 16)
 		xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"Supported page size of %iK", (1 << (i+12)) / 1024);
+			"Supported page size of %iK", (1 << (val + 12)) / 1024);
 	else
 		xhci_warn(xhci, "WARN: no supported page size\n");
 	/* Use 4K pages, since that's common and the minimum the HC supports */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/15] usb: xhci: set page size to the xHCI-supported size
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (7 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 08/15] usb: xhci: correct debug message page size calculation Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 10/15] usb: xhci: refactor trb_in_td() to be static Mathias Nyman
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Niklas Neronin, Mathias Nyman

From: Niklas Neronin <niklas.neronin@linux.intel.com>

The current xHCI driver does not validate whether a page size of 4096
bytes is supported. Address the issue by setting the page size to the
value supported by the xHCI controller, as read from the Page Size
register. In the event of an unexpected value; default to a 4K page size.

Additionally, this commit removes unnecessary debug messages and instead
prints the supported and used page size once.

The xHCI controller supports page sizes of (2^{(n+12)}) bytes, where 'n'
is the Page Size Bit. Only one page size is supported, with a maximum
page size of 128 KB.

Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-mem.c | 34 ++++++++++++++++++----------------
 drivers/usb/host/xhci.h     |  8 ++++----
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index dc5bcd8db4c0..a7fdfa00eb48 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1953,7 +1953,6 @@ void xhci_mem_cleanup(struct xhci_hcd *xhci)
 	xhci->interrupters = NULL;
 
 	xhci->page_size = 0;
-	xhci->page_shift = 0;
 	xhci->usb2_rhub.bus_state.bus_suspended = 0;
 	xhci->usb3_rhub.bus_state.bus_suspended = 0;
 }
@@ -2372,6 +2371,22 @@ xhci_create_secondary_interrupter(struct usb_hcd *hcd, unsigned int segs,
 }
 EXPORT_SYMBOL_GPL(xhci_create_secondary_interrupter);
 
+static void xhci_hcd_page_size(struct xhci_hcd *xhci)
+{
+	u32 page_size;
+
+	page_size = readl(&xhci->op_regs->page_size) & XHCI_PAGE_SIZE_MASK;
+	if (!is_power_of_2(page_size)) {
+		xhci_warn(xhci, "Invalid page size register = 0x%x\n", page_size);
+		/* Fallback to 4K page size, since that's common */
+		page_size = 1;
+	}
+
+	xhci->page_size = page_size << 12;
+	xhci_dbg_trace(xhci, trace_xhci_dbg_init, "HCD page size set to %iK",
+		       xhci->page_size >> 10);
+}
+
 int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 {
 	struct xhci_interrupter *ir;
@@ -2379,7 +2394,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 	dma_addr_t	dma;
 	unsigned int	val, val2;
 	u64		val_64;
-	u32		page_size, temp;
+	u32		temp;
 	int		i;
 
 	INIT_LIST_HEAD(&xhci->cmd_list);
@@ -2388,20 +2403,7 @@ int xhci_mem_init(struct xhci_hcd *xhci, gfp_t flags)
 	INIT_DELAYED_WORK(&xhci->cmd_timer, xhci_handle_command_timeout);
 	init_completion(&xhci->cmd_ring_stop_completion);
 
-	page_size = readl(&xhci->op_regs->page_size);
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"Supported page size register = 0x%x", page_size);
-	val = ffs(page_size) - 1;
-	if (val < 16)
-		xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"Supported page size of %iK", (1 << (val + 12)) / 1024);
-	else
-		xhci_warn(xhci, "WARN: no supported page size\n");
-	/* Use 4K pages, since that's common and the minimum the HC supports */
-	xhci->page_shift = 12;
-	xhci->page_size = 1 << xhci->page_shift;
-	xhci_dbg_trace(xhci, trace_xhci_dbg_init,
-			"HCD page size set to %iK", xhci->page_size / 1024);
+	xhci_hcd_page_size(xhci);
 
 	/*
 	 * Program the Number of Device Slots Enabled field in the CONFIG
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 8c164340a2c3..5b8751b86008 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -211,6 +211,9 @@ struct xhci_op_regs {
 #define CONFIG_CIE		(1 << 9)
 /* bits 10:31 - reserved and should be preserved */
 
+/* bits 15:0 - HCD page shift bit */
+#define XHCI_PAGE_SIZE_MASK     0xffff
+
 /**
  * struct xhci_intr_reg - Interrupt Register Set
  * @irq_pending:	IMAN - Interrupt Management Register.  Used to enable
@@ -1514,10 +1517,7 @@ struct xhci_hcd {
 	u16		max_interrupters;
 	/* imod_interval in ns (I * 250ns) */
 	u32		imod_interval;
-	/* 4KB min, 128MB max */
-	int		page_size;
-	/* Valid values are 12 to 20, inclusive */
-	int		page_shift;
+	u32		page_size;
 	/* MSI-X/MSI vectors */
 	int		nvecs;
 	/* optional clocks */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/15] usb: xhci: refactor trb_in_td() to be static
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (8 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 09/15] usb: xhci: set page size to the xHCI-supported size Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 11/15] usb: xhci: move debug capabilities from trb_in_td() to handle_tx_event() Mathias Nyman
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Niklas Neronin, Mathias Nyman

From: Niklas Neronin <niklas.neronin@linux.intel.com>

Relocate trb_in_td() and marks it as static, as it's exclusively utilized
in xhci-ring.c. This adjustment lays the groundwork for future rework of
the function.

The function's logic remains unchanged; only its access specifier is
altered to static and a redundant "else" is removed on line 325
(due to checkpatch.pl complaining).

Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 122 +++++++++++++++++------------------
 drivers/usb/host/xhci.h      |   2 -
 2 files changed, 61 insertions(+), 63 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 70b896297494..8c7258afb6bf 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -277,6 +277,67 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
 	}
 }
 
+/*
+ * If the suspect DMA address is a TRB in this TD, this function returns that
+ * TRB's segment. Otherwise it returns 0.
+ */
+static struct xhci_segment *trb_in_td(struct xhci_hcd *xhci, struct xhci_td *td,
+				      dma_addr_t suspect_dma, bool debug)
+{
+	dma_addr_t start_dma;
+	dma_addr_t end_seg_dma;
+	dma_addr_t end_trb_dma;
+	struct xhci_segment *cur_seg;
+
+	start_dma = xhci_trb_virt_to_dma(td->start_seg, td->start_trb);
+	cur_seg = td->start_seg;
+
+	do {
+		if (start_dma == 0)
+			return NULL;
+		/* We may get an event for a Link TRB in the middle of a TD */
+		end_seg_dma = xhci_trb_virt_to_dma(cur_seg,
+				&cur_seg->trbs[TRBS_PER_SEGMENT - 1]);
+		/* If the end TRB isn't in this segment, this is set to 0 */
+		end_trb_dma = xhci_trb_virt_to_dma(cur_seg, td->end_trb);
+
+		if (debug)
+			xhci_warn(xhci,
+				"Looking for event-dma %016llx trb-start %016llx trb-end %016llx seg-start %016llx seg-end %016llx\n",
+				(unsigned long long)suspect_dma,
+				(unsigned long long)start_dma,
+				(unsigned long long)end_trb_dma,
+				(unsigned long long)cur_seg->dma,
+				(unsigned long long)end_seg_dma);
+
+		if (end_trb_dma > 0) {
+			/* The end TRB is in this segment, so suspect should be here */
+			if (start_dma <= end_trb_dma) {
+				if (suspect_dma >= start_dma && suspect_dma <= end_trb_dma)
+					return cur_seg;
+			} else {
+				/* Case for one segment with
+				 * a TD wrapped around to the top
+				 */
+				if ((suspect_dma >= start_dma &&
+							suspect_dma <= end_seg_dma) ||
+						(suspect_dma >= cur_seg->dma &&
+						 suspect_dma <= end_trb_dma))
+					return cur_seg;
+			}
+			return NULL;
+		}
+		/* Might still be somewhere in this segment */
+		if (suspect_dma >= start_dma && suspect_dma <= end_seg_dma)
+			return cur_seg;
+
+		cur_seg = cur_seg->next;
+		start_dma = xhci_trb_virt_to_dma(cur_seg, &cur_seg->trbs[0]);
+	} while (cur_seg != td->start_seg);
+
+	return NULL;
+}
+
 /*
  * Return number of free normal TRBs from enqueue to dequeue pointer on ring.
  * Not counting an assumed link TRB at end of each TRBS_PER_SEGMENT sized segment.
@@ -2079,67 +2140,6 @@ static void handle_port_status(struct xhci_hcd *xhci, union xhci_trb *event)
 	spin_lock(&xhci->lock);
 }
 
-/*
- * If the suspect DMA address is a TRB in this TD, this function returns that
- * TRB's segment. Otherwise it returns 0.
- */
-struct xhci_segment *trb_in_td(struct xhci_hcd *xhci, struct xhci_td *td, dma_addr_t suspect_dma,
-			       bool debug)
-{
-	dma_addr_t start_dma;
-	dma_addr_t end_seg_dma;
-	dma_addr_t end_trb_dma;
-	struct xhci_segment *cur_seg;
-
-	start_dma = xhci_trb_virt_to_dma(td->start_seg, td->start_trb);
-	cur_seg = td->start_seg;
-
-	do {
-		if (start_dma == 0)
-			return NULL;
-		/* We may get an event for a Link TRB in the middle of a TD */
-		end_seg_dma = xhci_trb_virt_to_dma(cur_seg,
-				&cur_seg->trbs[TRBS_PER_SEGMENT - 1]);
-		/* If the end TRB isn't in this segment, this is set to 0 */
-		end_trb_dma = xhci_trb_virt_to_dma(cur_seg, td->end_trb);
-
-		if (debug)
-			xhci_warn(xhci,
-				"Looking for event-dma %016llx trb-start %016llx trb-end %016llx seg-start %016llx seg-end %016llx\n",
-				(unsigned long long)suspect_dma,
-				(unsigned long long)start_dma,
-				(unsigned long long)end_trb_dma,
-				(unsigned long long)cur_seg->dma,
-				(unsigned long long)end_seg_dma);
-
-		if (end_trb_dma > 0) {
-			/* The end TRB is in this segment, so suspect should be here */
-			if (start_dma <= end_trb_dma) {
-				if (suspect_dma >= start_dma && suspect_dma <= end_trb_dma)
-					return cur_seg;
-			} else {
-				/* Case for one segment with
-				 * a TD wrapped around to the top
-				 */
-				if ((suspect_dma >= start_dma &&
-							suspect_dma <= end_seg_dma) ||
-						(suspect_dma >= cur_seg->dma &&
-						 suspect_dma <= end_trb_dma))
-					return cur_seg;
-			}
-			return NULL;
-		} else {
-			/* Might still be somewhere in this segment */
-			if (suspect_dma >= start_dma && suspect_dma <= end_seg_dma)
-				return cur_seg;
-		}
-		cur_seg = cur_seg->next;
-		start_dma = xhci_trb_virt_to_dma(cur_seg, &cur_seg->trbs[0]);
-	} while (cur_seg != td->start_seg);
-
-	return NULL;
-}
-
 static void xhci_clear_hub_tt_buffer(struct xhci_hcd *xhci, struct xhci_td *td,
 		struct xhci_virt_ep *ep)
 {
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 5b8751b86008..cd96e0a8c593 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1884,8 +1884,6 @@ int xhci_set_interrupter_moderation(struct xhci_interrupter *ir,
 
 /* xHCI ring, segment, TRB, and TD functions */
 dma_addr_t xhci_trb_virt_to_dma(struct xhci_segment *seg, union xhci_trb *trb);
-struct xhci_segment *trb_in_td(struct xhci_hcd *xhci, struct xhci_td *td,
-			       dma_addr_t suspect_dma, bool debug);
 int xhci_is_vendor_info_code(struct xhci_hcd *xhci, unsigned int trb_comp_code);
 void xhci_ring_cmd_db(struct xhci_hcd *xhci);
 int xhci_queue_slot_control(struct xhci_hcd *xhci, struct xhci_command *cmd,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 11/15] usb: xhci: move debug capabilities from trb_in_td() to handle_tx_event()
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (9 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 10/15] usb: xhci: refactor trb_in_td() to be static Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors Mathias Nyman
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Niklas Neronin, Mathias Nyman

From: Niklas Neronin <niklas.neronin@linux.intel.com>

Function trb_in_td() currently includes debug capabilities that are
triggered when its debug argument is set to true. The only consumer of
these debug capabilities is handle_tx_event(), which calls trb_in_td()
twice, once for its primary functionality and a second time solely for
debugging purposes if the first call returns 'NULL'.

This approach is inefficient and can lead to confusion, as trb_in_td()
executes the same code with identical arguments twice, differing only in
the debug output during the second execution.

To enhance clarity and efficiency, move the debug capabilities out of
trb_in_td() and integrates them directly into handle_tx_event().
This change reduces the argument count of trb_in_td() and ensures that
debug steps are executed only when necessary, streamlining the function's
operation.

Signed-off-by: Niklas Neronin <niklas.neronin@linux.intel.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 38 ++++++++++++++++--------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 8c7258afb6bf..c2e15a27338b 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -281,8 +281,7 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
  * If the suspect DMA address is a TRB in this TD, this function returns that
  * TRB's segment. Otherwise it returns 0.
  */
-static struct xhci_segment *trb_in_td(struct xhci_hcd *xhci, struct xhci_td *td,
-				      dma_addr_t suspect_dma, bool debug)
+static struct xhci_segment *trb_in_td(struct xhci_td *td, dma_addr_t suspect_dma)
 {
 	dma_addr_t start_dma;
 	dma_addr_t end_seg_dma;
@@ -301,15 +300,6 @@ static struct xhci_segment *trb_in_td(struct xhci_hcd *xhci, struct xhci_td *td,
 		/* If the end TRB isn't in this segment, this is set to 0 */
 		end_trb_dma = xhci_trb_virt_to_dma(cur_seg, td->end_trb);
 
-		if (debug)
-			xhci_warn(xhci,
-				"Looking for event-dma %016llx trb-start %016llx trb-end %016llx seg-start %016llx seg-end %016llx\n",
-				(unsigned long long)suspect_dma,
-				(unsigned long long)start_dma,
-				(unsigned long long)end_trb_dma,
-				(unsigned long long)cur_seg->dma,
-				(unsigned long long)end_seg_dma);
-
 		if (end_trb_dma > 0) {
 			/* The end TRB is in this segment, so suspect should be here */
 			if (start_dma <= end_trb_dma) {
@@ -1075,7 +1065,7 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 					 td->urb->stream_id);
 		hw_deq &= ~0xf;
 
-		if (td->cancel_status == TD_HALTED || trb_in_td(xhci, td, hw_deq, false)) {
+		if (td->cancel_status == TD_HALTED || trb_in_td(td, hw_deq)) {
 			switch (td->cancel_status) {
 			case TD_CLEARED: /* TD is already no-op */
 			case TD_CLEARING_CACHE: /* set TR deq command already queued */
@@ -1165,7 +1155,7 @@ static struct xhci_td *find_halted_td(struct xhci_virt_ep *ep)
 		hw_deq = xhci_get_hw_deq(ep->xhci, ep->vdev, ep->ep_index, 0);
 		hw_deq &= ~0xf;
 		td = list_first_entry(&ep->ring->td_list, struct xhci_td, td_list);
-		if (trb_in_td(ep->xhci, td, hw_deq, false))
+		if (trb_in_td(td, hw_deq))
 			return td;
 	}
 	return NULL;
@@ -2800,7 +2790,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	 */
 	td = list_first_entry_or_null(&ep_ring->td_list, struct xhci_td, td_list);
 
-	if (td && td->error_mid_td && !trb_in_td(xhci, td, ep_trb_dma, false)) {
+	if (td && td->error_mid_td && !trb_in_td(td, ep_trb_dma)) {
 		xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
 		xhci_dequeue_td(xhci, td, ep_ring, td->status);
 	}
@@ -2833,7 +2823,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 				      td_list);
 
 		/* Is this a TRB in the currently executing TD? */
-		ep_seg = trb_in_td(xhci, td, ep_trb_dma, false);
+		ep_seg = trb_in_td(td, ep_trb_dma);
 
 		if (!ep_seg) {
 
@@ -2893,12 +2883,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 			}
 
 			/* HC is busted, give up! */
-			xhci_err(xhci,
-				 "ERROR Transfer event TRB DMA ptr not part of current TD ep_index %d comp_code %u\n",
-				 ep_index, trb_comp_code);
-			trb_in_td(xhci, td, ep_trb_dma, true);
-
-			return -ESHUTDOWN;
+			goto debug_finding_td;
 		}
 
 		if (ep->skip) {
@@ -2955,6 +2940,17 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 
 	return 0;
 
+debug_finding_td:
+	xhci_err(xhci, "Event dma %pad for ep %d status %d not part of TD at %016llx - %016llx\n",
+		 &ep_trb_dma, ep_index, trb_comp_code,
+		 (unsigned long long)xhci_trb_virt_to_dma(td->start_seg, td->start_trb),
+		 (unsigned long long)xhci_trb_virt_to_dma(td->end_seg, td->end_trb));
+
+	xhci_for_each_ring_seg(ep_ring->first_seg, ep_seg)
+		xhci_warn(xhci, "Ring seg %u dma %pad\n", ep_seg->num, &ep_seg->dma);
+
+	return -ESHUTDOWN;
+
 err_out:
 	xhci_err(xhci, "@%016llx %08x %08x %08x %08x\n",
 		 (unsigned long long) xhci_trb_virt_to_dma(
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors.
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (10 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 11/15] usb: xhci: move debug capabilities from trb_in_td() to handle_tx_event() Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-07  6:54   ` Michał Pecio
  2025-03-06 14:49 ` [PATCH 13/15] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Mathias Nyman
                   ` (2 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman

Ensure that an endpoint halted due to device STALL is not
restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to
the device.

The host side of the endpoint may otherwise be started early by the
'Set TR Deq' command completion handler which is called if dequeue
is moved past a cancelled or halted TD.

Prevent this with a new flag set for bulk and interrupt endpoints
when a Stall Error is received. Clear it in hcd->endpoint_reset()
which is called after Clear_Feature(ENDPOINT_HALT) is sent.

Also add a debug message if a class driver queues a new URB after the
STALL. Note that class driver might not be aware of the STALL
yet when it submits the URB as URBs are given back in BH.

Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 7 +++++--
 drivers/usb/host/xhci.c      | 6 ++++++
 drivers/usb/host/xhci.h      | 3 ++-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index c2e15a27338b..7643ab9ec3b4 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -556,8 +556,8 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
 	 * pointer command pending because the device can choose to start any
 	 * stream once the endpoint is on the HW schedule.
 	 */
-	if ((ep_state & EP_STOP_CMD_PENDING) || (ep_state & SET_DEQ_PENDING) ||
-	    (ep_state & EP_HALTED) || (ep_state & EP_CLEARING_TT))
+	if (ep_state & (EP_STOP_CMD_PENDING | SET_DEQ_PENDING | EP_HALTED |
+			EP_CLEARING_TT | EP_STALLED))
 		return;
 
 	trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id));
@@ -2555,6 +2555,9 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 
 		xhci_handle_halted_endpoint(xhci, ep, td, EP_SOFT_RESET);
 		return;
+	case COMP_STALL_ERROR:
+		ep->ep_state |= EP_STALLED;
+		break;
 	default:
 		/* do nothing */
 		break;
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 3f2cd546a7a2..0c22b78358b9 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -1604,6 +1604,11 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
 		goto free_priv;
 	}
 
+	/* Class driver might not be aware ep halted due to async URB giveback */
+	if (*ep_state & EP_STALLED)
+		dev_dbg(&urb->dev->dev, "URB %p queued before clearing halt\n",
+			urb);
+
 	switch (usb_endpoint_type(&urb->ep->desc)) {
 
 	case USB_ENDPOINT_XFER_CONTROL:
@@ -3202,6 +3207,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
 		return;
 
 	ep = &vdev->eps[ep_index];
+	ep->ep_state &= ~EP_STALLED;
 
 	/* Bail out if toggle is already being cleared by a endpoint reset */
 	spin_lock_irqsave(&xhci->lock, flags);
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index cd96e0a8c593..4ee14f651d36 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -664,7 +664,7 @@ struct xhci_virt_ep {
 	unsigned int			err_count;
 	unsigned int			ep_state;
 #define SET_DEQ_PENDING		(1 << 0)
-#define EP_HALTED		(1 << 1)	/* For stall handling */
+#define EP_HALTED		(1 << 1)	/* Halted host ep handling */
 #define EP_STOP_CMD_PENDING	(1 << 2)	/* For URB cancellation */
 /* Transitioning the endpoint to using streams, don't enqueue URBs */
 #define EP_GETTING_STREAMS	(1 << 3)
@@ -675,6 +675,7 @@ struct xhci_virt_ep {
 #define EP_SOFT_CLEAR_TOGGLE	(1 << 7)
 /* usb_hub_clear_tt_buffer is in progress */
 #define EP_CLEARING_TT		(1 << 8)
+#define EP_STALLED		(1 << 9)	/* For stall handling */
 	/* ----  Related to URB cancellation ---- */
 	struct list_head	cancelled_td_list;
 	struct xhci_hcd		*xhci;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 13/15] usb: xhci: Apply the link chain quirk on NEC isoc endpoints
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (11 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 14/15] usb: xhci: Unify duplicate inc_enq() code Mathias Nyman
  2025-03-06 14:49 ` [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints Mathias Nyman
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Michal Pecio, stable, Mathias Nyman

From: Michal Pecio <michal.pecio@gmail.com>

Two clearly different specimens of NEC uPD720200 (one with start/stop
bug, one without) were seen to cause IOMMU faults after some Missed
Service Errors. Faulting address is immediately after a transfer ring
segment and patched dynamic debug messages revealed that the MSE was
received when waiting for a TD near the end of that segment:

[ 1.041954] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ffa08fe0
[ 1.042120] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09000 flags=0x0000]
[ 1.042146] xhci_hcd: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09040 flags=0x0000]

It gets even funnier if the next page is a ring segment accessible to
the HC. Below, it reports MSE in segment at ff1e8000, plows through a
zero-filled page at ff1e9000 and starts reporting events for TRBs in
page at ff1ea000 every microframe, instead of jumping to seg ff1e6000.

[ 7.041671] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 7.041999] xhci_hcd: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 7.042011] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 7.042028] xhci_hcd: All TDs skipped for slot 1 ep 2. Clear skip flag.
[ 7.042134] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 7.042138] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 7.042144] xhci_hcd: Looking for event-dma 00000000ff1ea040 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820
[ 7.042259] xhci_hcd: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 7.042262] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 7.042266] xhci_hcd: Looking for event-dma 00000000ff1ea050 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820

At some point completion events change from Isoch Buffer Overrun to
Short Packet and the HC finally finds cycle bit mismatch in ff1ec000.

[ 7.098130] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 7.098132] xhci_hcd: Looking for event-dma 00000000ff1ecc50 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820
[ 7.098254] xhci_hcd: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 7.098256] xhci_hcd: Looking for event-dma 00000000ff1ecc60 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820
[ 7.098379] xhci_hcd: Overrun event on slot 1 ep 2

It's possible that data from the isochronous device were written to
random buffers of pending TDs on other endpoints (either IN or OUT),
other devices or even other HCs in the same IOMMU domain.

Lastly, an error from a different USB device on another HC. Was it
caused by the above? I don't know, but it may have been. The disk
was working without any other issues and generated PCIe traffic to
starve the NEC of upstream BW and trigger those MSEs. The two HCs
shared one x1 slot by means of a commercial "PCIe splitter" board.

[ 7.162604] usb 10-2: reset SuperSpeed USB device number 3 using xhci_hcd
[ 7.178990] sd 9:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[ 7.179001] sd 9:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 04 02 ae 00 00 02 00 00
[ 7.179004] I/O error, dev sdb, sector 67284480 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0

Fortunately, it appears that this ridiculous bug is avoided by setting
the chain bit of Link TRBs on isochronous rings. Other ancient HCs are
known which also expect the bit to be set and they ignore Link TRBs if
it's not. Reportedly, 0.95 spec guaranteed that the bit is set.

The bandwidth-starved NEC HC running a 32KB/uframe UVC endpoint reports
tens of MSEs per second and runs into the bug within seconds. Chaining
Link TRBs allows the same workload to run for many minutes, many times.

No negative side effects seen in UVC recording and UAC playback with a
few devices at full speed, high speed and SuperSpeed.

The problem doesn't reproduce on the newer Renesas uPD720201/uPD720202
and on old Etron EJ168 and VIA VL805 (but the VL805 has other bug).

[shorten line length of log snippets in commit messge -Mathias]
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 4ee14f651d36..d9d7cd1906f3 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1760,11 +1760,20 @@ static inline void xhci_write_64(struct xhci_hcd *xhci,
 }

-/* Link TRB chain should always be set on 0.95 hosts, and AMD 0.96 ISOC rings */
+/*
+ * Reportedly, some chapters of v0.95 spec said that Link TRB always has its chain bit set.
+ * Other chapters and later specs say that it should only be set if the link is inside a TD
+ * which continues from the end of one segment to the next segment.
+ *
+ * Some 0.95 hardware was found to misbehave if any link TRB doesn't have the chain bit set.
+ *
+ * 0.96 hardware from AMD and NEC was found to ignore unchained isochronous link TRBs when
+ * "resynchronizing the pipe" after a Missed Service Error.
+ */
 static inline bool xhci_link_chain_quirk(struct xhci_hcd *xhci, enum xhci_ring_type type)
 {
 	return (xhci->quirks & XHCI_LINK_TRB_QUIRK) ||
-	       (type == TYPE_ISOC && (xhci->quirks & XHCI_AMD_0x96_HOST));
+	       (type == TYPE_ISOC && (xhci->quirks & (XHCI_AMD_0x96_HOST | XHCI_NEC_HOST)));
 }

 /* xHCI debugging */
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 14/15] usb: xhci: Unify duplicate inc_enq() code
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (12 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 13/15] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-06 14:49 ` [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints Mathias Nyman
  14 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Michal Pecio, Mathias Nyman

From: Michal Pecio <michal.pecio@gmail.com>

Extract a block of code copied from inc_enq() into a separate function
and call it from inc_enq() and the other function which used this code.
Remove the pointless 'next' variable which only aliases ring->enqueue.

Note: I don't know if any 0.95 xHC ever reached series production, but
"AMD 0.96 host" appears to be the "Llano" family APU. Example dmesg at
https://linux-hardware.org/?probe=79d5cfd4fd&log=dmesg

pci 0000:00:10.0: [1022:7812] type 00 class 0x0c0330
hcc params 0x014042c3 hci version 0x96 quirks 0x0000000000000608

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 130 +++++++++++++++--------------------
 1 file changed, 55 insertions(+), 75 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 7643ab9ec3b4..2df94ed3152c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -204,79 +204,84 @@ void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring)
 }
 
 /*
- * See Cycle bit rules. SW is the consumer for the event ring only.
- *
- * If we've just enqueued a TRB that is in the middle of a TD (meaning the
- * chain bit is set), then set the chain bit in all the following link TRBs.
- * If we've enqueued the last TRB in a TD, make sure the following link TRBs
- * have their chain bit cleared (so that each Link TRB is a separate TD).
- *
- * Section 6.4.4.1 of the 0.95 spec says link TRBs cannot have the chain bit
- * set, but other sections talk about dealing with the chain bit set.  This was
- * fixed in the 0.96 specification errata, but we have to assume that all 0.95
- * xHCI hardware can't handle the chain bit being cleared on a link TRB.
- *
- * @more_trbs_coming:	Will you enqueue more TRBs before calling
- *			prepare_transfer()?
+ * If enqueue points at a link TRB, follow links until an ordinary TRB is reached.
+ * Toggle the cycle bit of passed link TRBs and optionally chain them.
  */
-static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
-			bool more_trbs_coming)
+static void inc_enq_past_link(struct xhci_hcd *xhci, struct xhci_ring *ring, u32 chain)
 {
-	u32 chain;
-	union xhci_trb *next;
 	unsigned int link_trb_count = 0;
 
-	chain = le32_to_cpu(ring->enqueue->generic.field[3]) & TRB_CHAIN;
-
-	if (last_trb_on_seg(ring->enq_seg, ring->enqueue)) {
-		xhci_err(xhci, "Tried to move enqueue past ring segment\n");
-		return;
-	}
-
-	next = ++(ring->enqueue);
-
-	/* Update the dequeue pointer further if that was a link TRB */
-	while (trb_is_link(next)) {
+	while (trb_is_link(ring->enqueue)) {
 
 		/*
-		 * If the caller doesn't plan on enqueueing more TDs before
-		 * ringing the doorbell, then we don't want to give the link TRB
-		 * to the hardware just yet. We'll give the link TRB back in
-		 * prepare_ring() just before we enqueue the TD at the top of
-		 * the ring.
-		 */
-		if (!chain && !more_trbs_coming)
-			break;
-
-		/* If we're not dealing with 0.95 hardware or isoc rings on
-		 * AMD 0.96 host, carry over the chain bit of the previous TRB
-		 * (which may mean the chain bit is cleared).
+		 * Section 6.4.4.1 of the 0.95 spec says link TRBs cannot have the chain bit
+		 * set, but other sections talk about dealing with the chain bit set. This was
+		 * fixed in the 0.96 specification errata, but we have to assume that all 0.95
+		 * xHCI hardware can't handle the chain bit being cleared on a link TRB.
+		 *
+		 * On 0.95 and some 0.96 HCs the chain bit is set once at segment initalization
+		 * and never changed here. On all others, modify it as requested by the caller.
 		 */
 		if (!xhci_link_chain_quirk(xhci, ring->type)) {
-			next->link.control &= cpu_to_le32(~TRB_CHAIN);
-			next->link.control |= cpu_to_le32(chain);
+			ring->enqueue->link.control &= cpu_to_le32(~TRB_CHAIN);
+			ring->enqueue->link.control |= cpu_to_le32(chain);
 		}
+
 		/* Give this link TRB to the hardware */
 		wmb();
-		next->link.control ^= cpu_to_le32(TRB_CYCLE);
+		ring->enqueue->link.control ^= cpu_to_le32(TRB_CYCLE);
 
 		/* Toggle the cycle bit after the last ring segment. */
-		if (link_trb_toggles_cycle(next))
+		if (link_trb_toggles_cycle(ring->enqueue))
 			ring->cycle_state ^= 1;
 
 		ring->enq_seg = ring->enq_seg->next;
 		ring->enqueue = ring->enq_seg->trbs;
-		next = ring->enqueue;
 
 		trace_xhci_inc_enq(ring);
 
 		if (link_trb_count++ > ring->num_segs) {
-			xhci_warn(xhci, "%s: Ring link TRB loop\n", __func__);
+			xhci_warn(xhci, "Link TRB loop at enqueue\n");
 			break;
 		}
 	}
 }
 
+/*
+ * See Cycle bit rules. SW is the consumer for the event ring only.
+ *
+ * If we've just enqueued a TRB that is in the middle of a TD (meaning the
+ * chain bit is set), then set the chain bit in all the following link TRBs.
+ * If we've enqueued the last TRB in a TD, make sure the following link TRBs
+ * have their chain bit cleared (so that each Link TRB is a separate TD).
+ *
+ * @more_trbs_coming:	Will you enqueue more TRBs before calling
+ *			prepare_transfer()?
+ */
+static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
+			bool more_trbs_coming)
+{
+	u32 chain;
+
+	chain = le32_to_cpu(ring->enqueue->generic.field[3]) & TRB_CHAIN;
+
+	if (last_trb_on_seg(ring->enq_seg, ring->enqueue)) {
+		xhci_err(xhci, "Tried to move enqueue past ring segment\n");
+		return;
+	}
+
+	ring->enqueue++;
+
+	/*
+	 * If we are in the middle of a TD or the caller plans to enqueue more
+	 * TDs as one transfer (eg. control), traverse any link TRBs right now.
+	 * Otherwise, enqueue can stay on a link until the next prepare_ring().
+	 * This avoids enqueue entering deq_seg and simplifies ring expansion.
+	 */
+	if (trb_is_link(ring->enqueue) && (chain || more_trbs_coming))
+		inc_enq_past_link(xhci, ring, chain);
+}
+
 /*
  * If the suspect DMA address is a TRB in this TD, this function returns that
  * TRB's segment. Otherwise it returns 0.
@@ -3213,7 +3218,6 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
 static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring,
 		u32 ep_state, unsigned int num_trbs, gfp_t mem_flags)
 {
-	unsigned int link_trb_count = 0;
 	unsigned int new_segs = 0;
 
 	/* Make sure the endpoint has been added to xHC schedule */
@@ -3261,33 +3265,9 @@ static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring,
 		}
 	}
 
-	while (trb_is_link(ep_ring->enqueue)) {
-		/* If we're not dealing with 0.95 hardware or isoc rings
-		 * on AMD 0.96 host, clear the chain bit.
-		 */
-		if (!xhci_link_chain_quirk(xhci, ep_ring->type))
-			ep_ring->enqueue->link.control &=
-				cpu_to_le32(~TRB_CHAIN);
-		else
-			ep_ring->enqueue->link.control |=
-				cpu_to_le32(TRB_CHAIN);
-
-		wmb();
-		ep_ring->enqueue->link.control ^= cpu_to_le32(TRB_CYCLE);
-
-		/* Toggle the cycle bit after the last ring segment. */
-		if (link_trb_toggles_cycle(ep_ring->enqueue))
-			ep_ring->cycle_state ^= 1;
-
-		ep_ring->enq_seg = ep_ring->enq_seg->next;
-		ep_ring->enqueue = ep_ring->enq_seg->trbs;
-
-		/* prevent infinite loop if all first trbs are link trbs */
-		if (link_trb_count++ > ep_ring->num_segs) {
-			xhci_warn(xhci, "Ring is an endless link TRB loop\n");
-			return -EINVAL;
-		}
-	}
+	/* Ensure that new TRBs won't overwrite a link */
+	if (trb_is_link(ep_ring->enqueue))
+		inc_enq_past_link(xhci, ep_ring, 0);
 
 	if (last_trb_on_seg(ep_ring->enq_seg, ep_ring->enqueue)) {
 		xhci_warn(xhci, "Missing link TRB at end of ring segment\n");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints
  2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
                   ` (13 preceding siblings ...)
  2025-03-06 14:49 ` [PATCH 14/15] usb: xhci: Unify duplicate inc_enq() code Mathias Nyman
@ 2025-03-06 14:49 ` Mathias Nyman
  2025-03-07  8:27   ` Michał Pecio
  14 siblings, 1 reply; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 14:49 UTC (permalink / raw)
  To: gregkh; +Cc: linux-usb, Mathias Nyman, Kuangyi Chiang, Michal Pecio

Unplugging a USB3.0 webcam from Etron hosts while streaming results
in errors like this:

[ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 2.646446] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start 000000002fdf8640 trb-end 000000002fdf8650
[ 2.646560] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 18 comp_code 13
[ 2.646568] xhci_hcd 0000:03:00.0: Looking for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end 000000002fdf8670

Etron xHC generates two transfer events for the TRB if an error is
detected while processing the last TRB of an isoc TD.

The first event can be any sort of error (like USB Transaction or
Babble Detected, etc), and the final event is Success.

The xHCI driver will handle the TD after the first event and remove it
from its internal list, and then print an "Transfer event TRB DMA ptr
not part of current TD" error message after the final event.

Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a
transaction error mid TD.") is designed to address isoc transaction
errors, but unfortunately it doesn't account for this scenario.

This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a success
event follows a 'short transfer' event, but the TD the event points to
is already given back.

Expand the spurious success 'short transfer' event handling to cover
the spurious success after error on Etron hosts.

Kuangyi Chiang reported this issue and submitted a different solution
based on using error_mid_td. This commit message is mostly taken
from that patch.

Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Closes: https://lore.kernel.org/linux-usb/20241028025337.6372-6-ki.chiang65@gmail.com/
Tested-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Tested-by: Michal Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/host/xhci-ring.c | 38 ++++++++++++++++++++++++------------
 drivers/usb/host/xhci.h      |  2 +-
 2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 2df94ed3152c..0f8acbb9cd21 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2611,6 +2611,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_
 	return 0;
 }
 
+static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
+					   struct xhci_ring *ring)
+{
+	switch (ring->old_trb_comp_code) {
+	case COMP_SHORT_PACKET:
+		return xhci->quirks & XHCI_SPURIOUS_SUCCESS;
+	case COMP_USB_TRANSACTION_ERROR:
+	case COMP_BABBLE_DETECTED_ERROR:
+	case COMP_ISOCH_BUFFER_OVERRUN:
+		return xhci->quirks & XHCI_ETRON_HOST &&
+			ring->type == TYPE_ISOC;
+	default:
+		return false;
+	}
+}
+
 /*
  * If this function returns an error condition, it means it got a Transfer
  * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address.
@@ -2665,8 +2681,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	case COMP_SUCCESS:
 		if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) {
 			trb_comp_code = COMP_SHORT_PACKET;
-			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n",
-				 slot_id, ep_index, ep_ring->last_td_was_short);
+			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n",
+				 slot_id, ep_index, ep_ring->old_trb_comp_code);
 		}
 		break;
 	case COMP_SHORT_PACKET:
@@ -2817,7 +2833,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		if (trb_comp_code != COMP_STOPPED &&
 		    trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
 		    !ring_xrun_event &&
-		    !ep_ring->last_td_was_short) {
+		    !xhci_spurious_success_tx_event(xhci, ep_ring)) {
 			xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
 				  slot_id, ep_index);
 		}
@@ -2882,11 +2898,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 
 			/*
 			 * Some hosts give a spurious success event after a short
-			 * transfer. Ignore it.
+			 * transfer or error on last TRB. Ignore it.
 			 */
-			if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
-			    ep_ring->last_td_was_short) {
-				ep_ring->last_td_was_short = false;
+			if (xhci_spurious_success_tx_event(xhci, ep_ring)) {
+				xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
+					 &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
+				ep_ring->old_trb_comp_code = trb_comp_code;
 				return 0;
 			}
 
@@ -2909,15 +2926,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	 */
 	} while (ep->skip);
 
+	ep_ring->old_trb_comp_code = trb_comp_code;
+
 	/* Get out if a TD was queued at enqueue after the xrun occurred */
 	if (ring_xrun_event)
 		return 0;
 
-	if (trb_comp_code == COMP_SHORT_PACKET)
-		ep_ring->last_td_was_short = true;
-	else
-		ep_ring->last_td_was_short = false;
-
 	ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
 	trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma);
 
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index d9d7cd1906f3..6c00062a9acc 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1375,7 +1375,7 @@ struct xhci_ring {
 	unsigned int		num_trbs_free; /* used only by xhci DbC */
 	unsigned int		bounce_buf_len;
 	enum xhci_ring_type	type;
-	bool			last_td_was_short;
+	u32			old_trb_comp_code;
 	struct radix_tree_root	*trb_address_map;
 };
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid
  2025-03-06 14:49 ` [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid Mathias Nyman
@ 2025-03-06 14:52   ` Greg KH
  2025-03-06 15:29     ` Mathias Nyman
  0 siblings, 1 reply; 24+ messages in thread
From: Greg KH @ 2025-03-06 14:52 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb, Michal Pecio, stable

On Thu, Mar 06, 2025 at 04:49:42PM +0200, Mathias Nyman wrote:
> From: Michal Pecio <michal.pecio@gmail.com>
> 
> Up until commit d56b0b2ab142 ("usb: xhci: ensure skipped isoc TDs are
> returned when isoc ring is stopped") in v6.11, the driver didn't skip
> missed isochronous TDs when handling Stoppend and Stopped - Length
> Invalid events. Instead, it erroneously cleared the skip flag, which
> would cause the ring to get stuck, as future events won't match the
> missed TD which is never removed from the queue until it's cancelled.
> 
> This buggy logic seems to have been in place substantially unchanged
> since the 3.x series over 10 years ago, which probably speaks first
> and foremost about relative rarity of this case in normal usage, but
> by the spec I see no reason why it shouldn't be possible.
> 
> After d56b0b2ab142, TDs are immediately skipped when handling those
> Stopped events. This poses a potential problem in case of Stopped -
> Length Invalid, which occurs either on completed TDs (likely already
> given back) or Link and No-Op TRBs. Such event won't be recognized
> as matching any TD (unless it's the rare Link TRB inside a TD) and
> will result in skipping all pending TDs, giving them back possibly
> before they are done, risking isoc data loss and maybe UAF by HW.
> 
> As a compromise, don't skip and don't clear the skip flag on this
> kind of event. Then the next event will skip missed TDs. A downside
> of not handling Stopped - Length Invalid on a Link inside a TD is
> that if the TD is cancelled, its actual length will not be updated
> to account for TRBs (silently) completed before the TD was stopped.
> 
> I had no luck producing this sequence of completion events so there
> is no compelling demonstration of any resulting disaster. It may be
> a very rare, obscure condition. The sole motivation for this patch
> is that if such unlikely event does occur, I'd rather risk reporting
> a cancelled partially done isoc frame as empty than gamble with UAF.
> 
> This will be fixed more properly by looking at Stopped event's TRB
> pointer when making skipping decisions, but such rework is unlikely
> to be backported to v6.12, which will stay around for a few years.
> 
> Fixes: d56b0b2ab142 ("usb: xhci: ensure skipped isoc TDs are returned when isoc ring is stopped")
> Cc: stable@vger.kernel.org
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

Why is a patch cc: stable burried here in a series for linux-next?  It
will be many many weeks before it gets out to anyone else, is that
intentional?

Same for the other commit in this series tagged that way.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid
  2025-03-06 14:52   ` Greg KH
@ 2025-03-06 15:29     ` Mathias Nyman
  2025-03-06 15:42       ` Greg KH
  0 siblings, 1 reply; 24+ messages in thread
From: Mathias Nyman @ 2025-03-06 15:29 UTC (permalink / raw)
  To: Greg KH; +Cc: linux-usb, Michal Pecio, stable

On 6.3.2025 16.52, Greg KH wrote:
> On Thu, Mar 06, 2025 at 04:49:42PM +0200, Mathias Nyman wrote:
> Why is a patch cc: stable burried here in a series for linux-next?  It
> will be many many weeks before it gets out to anyone else, is that
> intentional?
> 
> Same for the other commit in this series tagged that way.

These are both kind of half theoretical issues that have been
around for years without more complaints. No need to rush them to
stable. Balance between regression risk vs adding them to stable.

This patch for example states:

"I had no luck producing this sequence of completion events so there
  is no compelling demonstration of any resulting disaster. It may be
  a very rare, obscure condition. The sole motivation for this patch
  is that if such unlikely event does occur, I'd rather risk reporting
  a cancelled partially done isoc frame as empty than gamble with UA"

Thanks
Mathias

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid
  2025-03-06 15:29     ` Mathias Nyman
@ 2025-03-06 15:42       ` Greg KH
  0 siblings, 0 replies; 24+ messages in thread
From: Greg KH @ 2025-03-06 15:42 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb, Michal Pecio, stable

On Thu, Mar 06, 2025 at 05:29:30PM +0200, Mathias Nyman wrote:
> On 6.3.2025 16.52, Greg KH wrote:
> > On Thu, Mar 06, 2025 at 04:49:42PM +0200, Mathias Nyman wrote:
> > Why is a patch cc: stable burried here in a series for linux-next?  It
> > will be many many weeks before it gets out to anyone else, is that
> > intentional?
> > 
> > Same for the other commit in this series tagged that way.
> 
> These are both kind of half theoretical issues that have been
> around for years without more complaints. No need to rush them to
> stable. Balance between regression risk vs adding them to stable.
> 
> This patch for example states:
> 
> "I had no luck producing this sequence of completion events so there
>  is no compelling demonstration of any resulting disaster. It may be
>  a very rare, obscure condition. The sole motivation for this patch
>  is that if such unlikely event does occur, I'd rather risk reporting
>  a cancelled partially done isoc frame as empty than gamble with UA"

Ok, fair enough, just seeing patches languish in -next that are tagged
for stable looks odd.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors.
  2025-03-06 14:49 ` [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors Mathias Nyman
@ 2025-03-07  6:54   ` Michał Pecio
  2025-03-07 14:23     ` Mathias Nyman
  0 siblings, 1 reply; 24+ messages in thread
From: Michał Pecio @ 2025-03-07  6:54 UTC (permalink / raw)
  To: mathias.nyman; +Cc: gregkh, linux-usb

> Ensure that an endpoint halted due to device STALL is not
> restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to
> the device.
> 
> The host side of the endpoint may otherwise be started early by the
> 'Set TR Deq' command completion handler which is called if dequeue
> is moved past a cancelled or halted TD.
> 
> Prevent this with a new flag set for bulk and interrupt endpoints
> when a Stall Error is received. Clear it in hcd->endpoint_reset()
> which is called after Clear_Feature(ENDPOINT_HALT) is sent.
> 
> Also add a debug message if a class driver queues a new URB after
> the STALL. Note that class driver might not be aware of the STALL
> yet when it submits the URB as URBs are given back in BH.
> 
> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>

Sorry for coming this late, but I haven't looked closely at some
of those xhci/for-next patches before.

This one is unfortunately incomplete, as follows:

> drivers/usb/host/xhci-ring.c | 7 +++++--
> drivers/usb/host/xhci.c      | 6 ++++++
> drivers/usb/host/xhci.h      | 3 ++-
> 3 files changed, 13 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>index c2e15a27338b..7643ab9ec3b4 100644
>--- a/drivers/usb/host/xhci-ring.c
>+++ b/drivers/usb/host/xhci-ring.c
>@@ -556,8 +556,8 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
> 	 * pointer command pending because the device can choose to start any
> 	 * stream once the endpoint is on the HW schedule.
> 	 */
>-	if ((ep_state & EP_STOP_CMD_PENDING) || (ep_state & SET_DEQ_PENDING) ||
>-	    (ep_state & EP_HALTED) || (ep_state & EP_CLEARING_TT))
>+	if (ep_state & (EP_STOP_CMD_PENDING | SET_DEQ_PENDING | EP_HALTED |
>+			EP_CLEARING_TT | EP_STALLED))
> 		return;

Any flag added to this list needs to be added to xhci_urb_dequeue() too
so it knowns that the endpoint is held in Stopped state and URBs can be
unlinked without trying to stop it again.

There really should be a helper function used both here and there, but
those Stop EP patches were meant for stable and I strived to make them
small and noninvasive. Then I forgot about this cleanup.

NB: I also forgot about a bunch of low-impact halted EP handling bugs,
I will try to rebase and send them out today or over the weekend.

>  	trace_xhci_ring_ep_doorbell(slot_id, DB_VALUE(ep_index, stream_id));
> @@ -2555,6 +2555,9 @@ static void process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>  
>  		xhci_handle_halted_endpoint(xhci, ep, td, EP_SOFT_RESET);
>  		return;
> +	case COMP_STALL_ERROR:
> +		ep->ep_state |= EP_STALLED;
> +		break;
>  	default:
>  		/* do nothing */
>  		break;
> diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
> index 3f2cd546a7a2..0c22b78358b9 100644
> --- a/drivers/usb/host/xhci.c
> +++ b/drivers/usb/host/xhci.c
> @@ -1604,6 +1604,11 @@ static int xhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag
>  		goto free_priv;
>  	}
>  
> +	/* Class driver might not be aware ep halted due to async URB giveback */
> +	if (*ep_state & EP_STALLED)
> +		dev_dbg(&urb->dev->dev, "URB %p queued before clearing halt\n",
> +			urb);
> +
>  	switch (usb_endpoint_type(&urb->ep->desc)) {
>  
>  	case USB_ENDPOINT_XFER_CONTROL:
> @@ -3202,6 +3207,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
>  		return;
>  
>  	ep = &vdev->eps[ep_index];
> +	ep->ep_state &= ~EP_STALLED;

... and clearing any of those flags has always been followed by calling
xhci_ring_ep_doorbell() again, to ensure that the endpoint is restarted
if it has URBs on it but restart was held off due to the flag.

xhci_urb_dequeue() relies on this too, because it looked lke sensible
design: if you have reasons not to run the EP, you set a flag. Reasons
are gone, you clear the flag and it's running again.

> 	/* Bail out if toggle is already being cleared by a endpoint reset */
> 	spin_lock_irqsave(&xhci->lock, flags);
>diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
>index cd96e0a8c593..4ee14f651d36 100644
>--- a/drivers/usb/host/xhci.h
>+++ b/drivers/usb/host/xhci.h
>@@ -664,7 +664,7 @@ struct xhci_virt_ep {
> 	unsigned int			err_count;
> 	unsigned int			ep_state;
> #define SET_DEQ_PENDING		(1 << 0)
>-#define EP_HALTED		(1 << 1)	/* For stall handling */
>+#define EP_HALTED		(1 << 1)	/* Halted host ep handling */
> #define EP_STOP_CMD_PENDING	(1 << 2)	/* For URB cancellation */
> /* Transitioning the endpoint to using streams, don't enqueue URBs */
> #define EP_GETTING_STREAMS	(1 << 3)
>@@ -675,6 +675,7 @@ struct xhci_virt_ep {
> #define EP_SOFT_CLEAR_TOGGLE	(1 << 7)
> /* usb_hub_clear_tt_buffer is in progress */
> #define EP_CLEARING_TT		(1 << 8)
>+#define EP_STALLED		(1 << 9)	/* For stall handling */

I guess usage rules of those flags should be documented somewhere here
and helpers added such as:

xhci_ep_cancel_pending()
xhci_ep_held_stopped()

to improve maintainability and prevent similar problems in the future.


I could sit and write something, I still have this stuff quite fresh
in memory after spending a few weeks debugging those crazy HW races.

Regards,
Michal

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints
  2025-03-06 14:49 ` [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints Mathias Nyman
@ 2025-03-07  8:27   ` Michał Pecio
  0 siblings, 0 replies; 24+ messages in thread
From: Michał Pecio @ 2025-03-07  8:27 UTC (permalink / raw)
  To: mathias.nyman; +Cc: gregkh, ki.chiang65, linux-usb, michal.pecio

> Unplugging a USB3.0 webcam from Etron hosts while streaming results
> in errors like this:
> 
> [ 2.646387] xhci_hcd 0000:03:00.0: ERROR Transfer event TRB DMA ptr
> not part of current TD ep_index 18 comp_code 13 [ 2.646446] xhci_hcd
> 0000:03:00.0: Looking for event-dma 000000002fdf8630 trb-start
> 000000002fdf8640 trb-end 000000002fdf8650 [ 2.646560] xhci_hcd
> 0000:03:00.0: ERROR Transfer event TRB DMA ptr not part of current TD
> ep_index 18 comp_code 13 [ 2.646568] xhci_hcd 0000:03:00.0: Looking
> for event-dma 000000002fdf8660 trb-start 000000002fdf8670 trb-end
> 000000002fdf8670
> 
> Etron xHC generates two transfer events for the TRB if an error is
> detected while processing the last TRB of an isoc TD.
> 
> The first event can be any sort of error (like USB Transaction or
> Babble Detected, etc), and the final event is Success.
> 
> The xHCI driver will handle the TD after the first event and remove
> it from its internal list, and then print an "Transfer event TRB DMA
> ptr not part of current TD" error message after the final event.
> 
> Commit 5372c65e1311 ("xhci: process isoc TD properly when there was a
> transaction error mid TD.") is designed to address isoc transaction
> errors, but unfortunately it doesn't account for this scenario.
> 
> This issue is similar to the XHCI_SPURIOUS_SUCCESS case where a
> success event follows a 'short transfer' event, but the TD the event
> points to is already given back.
> 
> Expand the spurious success 'short transfer' event handling to cover
> the spurious success after error on Etron hosts.
> 
> Kuangyi Chiang reported this issue and submitted a different solution
> based on using error_mid_td. This commit message is mostly taken
> from that patch.
> 
> Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
> Closes:
> https://lore.kernel.org/linux-usb/20241028025337.6372-6-ki.chiang65@gmail.com/
> Tested-by: Kuangyi Chiang <ki.chiang65@gmail.com> Tested-by: Michal
> Pecio <michal.pecio@gmail.com> Signed-off-by: Mathias Nyman
> <mathias.nyman@linux.intel.com>

Such simple HW quirk would be an abvious candidate for stable if
a Short Packet refactor weren't bundled with it.

And it is subtly broken. I could swear that I have mailed you about
it, maybe you missed it or I didn't explain myself clearly enough.

> ---
>  drivers/usb/host/xhci-ring.c | 38 ++++++++++++++++++++++++------------
>  drivers/usb/host/xhci.h      |  2 +-
>  2 files changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 2df94ed3152c..0f8acbb9cd21 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2611,6 +2611,22 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_
>  	return 0;
>  }
>  
> +static bool xhci_spurious_success_tx_event(struct xhci_hcd *xhci,
> +					   struct xhci_ring *ring)
> +{
> +	switch (ring->old_trb_comp_code) {
> +	case COMP_SHORT_PACKET:
> +		return xhci->quirks & XHCI_SPURIOUS_SUCCESS;

XHCI_SPURIOUS_SUCCESS applies to practically all HCs, so this code
will typically boil down to:

return (ring->old_trb_comp_code == COMP_SHORT_PACKET);

> +	case COMP_USB_TRANSACTION_ERROR:
> +	case COMP_BABBLE_DETECTED_ERROR:
> +	case COMP_ISOCH_BUFFER_OVERRUN:
> +		return xhci->quirks & XHCI_ETRON_HOST &&
> +			ring->type == TYPE_ISOC;
> +	default:
> +		return false;
> +	}
> +}
> +
>  /*
>   * If this function returns an error condition, it means it got a Transfer
>   * event with a corrupted Slot ID, Endpoint ID, or TRB DMA address.
> @@ -2665,8 +2681,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>  	case COMP_SUCCESS:
>  		if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) {
>  			trb_comp_code = COMP_SHORT_PACKET;
> -			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n",
> -				 slot_id, ep_index, ep_ring->last_td_was_short);
> +			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td comp code %d\n",
> +				 slot_id, ep_index, ep_ring->old_trb_comp_code);
>  		}
>  		break;
>  	case COMP_SHORT_PACKET:
> @@ -2817,7 +2833,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>  		if (trb_comp_code != COMP_STOPPED &&
>  		    trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
>  		    !ring_xrun_event &&
> -		    !ep_ring->last_td_was_short) {
> +		    !xhci_spurious_success_tx_event(xhci, ep_ring)) {
>  			xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
>  				  slot_id, ep_index);
>  		}
> @@ -2882,11 +2898,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>  
>  			/*
>  			 * Some hosts give a spurious success event after a short
> -			 * transfer. Ignore it.
> +			 * transfer or error on last TRB. Ignore it.
>  			 */
> -			if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
> -			    ep_ring->last_td_was_short) {
> -				ep_ring->last_td_was_short = false;

'last_td_was_short' means "expect one more event", and it is being
cleared here after receiving said event, or at least suspecting so.

> +			if (xhci_spurious_success_tx_event(xhci, ep_ring)) {
> +				xhci_dbg(xhci, "Spurious event dma %pad, comp_code %u after %u\n",
> +					 &ep_trb_dma, trb_comp_code, ep_ring->old_trb_comp_code);
> +				ep_ring->old_trb_comp_code = trb_comp_code;

Proper equivalent here would be to reset old_trb_comp_code to some
"impossible" value (0, -1) so that xhci_spurious_success_tx_event()
ceases returning true. Otherwise, this branch will trigger again on
the next event if it's for a wrong transfer (dangerous HW or SW bug).

Specifically and explicitly, two problems are created:

1. The "one more event" we expect will always be COMP_SHORT_PACKET,
   so this code will keep silently ignoring invalid events until some
   event is handled without error or is other than Short Packet.

2. There are endpoints (e.g. async/adaptive audio, usb-serial IN, IIRC
   some UAS too) where all or most transfers complete with Short Packet
   as a matter of routine. This code will silently ignore errors until
   an event is handled without error, so it will ignore all errors.

   IOW, "TRB DMA ptr not part of current TD" can never show up as far
   as I can tell.

>  				return 0;
>  			}
>  
> @@ -2909,15 +2926,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>  	 */
>  	} while (ep->skip);
>  
> +	ep_ring->old_trb_comp_code = trb_comp_code;
> +
>  	/* Get out if a TD was queued at enqueue after the xrun occurred */
>  	if (ring_xrun_event)
>  		return 0;
>  
> -	if (trb_comp_code == COMP_SHORT_PACKET)
> -		ep_ring->last_td_was_short = true;
> -	else
> -		ep_ring->last_td_was_short = false;
> -
>  	ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
>  	trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb, ep_trb_dma);
>  
> diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
> index d9d7cd1906f3..6c00062a9acc 100644
> --- a/drivers/usb/host/xhci.h
> +++ b/drivers/usb/host/xhci.h
> @@ -1375,7 +1375,7 @@ struct xhci_ring {
>  	unsigned int		num_trbs_free; /* used only by xhci DbC */
>  	unsigned int		bounce_buf_len;
>  	enum xhci_ring_type	type;
> -	bool			last_td_was_short;
> +	u32			old_trb_comp_code;
>  	struct radix_tree_root	*trb_address_map;
>  };
>  
> -- 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors.
  2025-03-07  6:54   ` Michał Pecio
@ 2025-03-07 14:23     ` Mathias Nyman
  2025-03-07 15:44       ` Michał Pecio
  0 siblings, 1 reply; 24+ messages in thread
From: Mathias Nyman @ 2025-03-07 14:23 UTC (permalink / raw)
  To: Michał Pecio; +Cc: gregkh, linux-usb

On 7.3.2025 8.54, Michał Pecio wrote:
>> Ensure that an endpoint halted due to device STALL is not
>> restarted before a Clear_Feature(ENDPOINT_HALT) request is sent to
>> the device.
>>
>> The host side of the endpoint may otherwise be started early by the
>> 'Set TR Deq' command completion handler which is called if dequeue
>> is moved past a cancelled or halted TD.
>>
>> Prevent this with a new flag set for bulk and interrupt endpoints
>> when a Stall Error is received. Clear it in hcd->endpoint_reset()
>> which is called after Clear_Feature(ENDPOINT_HALT) is sent.
>>
>> Also add a debug message if a class driver queues a new URB after
>> the STALL. Note that class driver might not be aware of the STALL
>> yet when it submits the URB as URBs are given back in BH.
>>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
> 
> Sorry for coming this late, but I haven't looked closely at some
> of those xhci/for-next patches before.
> 
> This one is unfortunately incomplete, as follows:
> 
>> drivers/usb/host/xhci-ring.c | 7 +++++--
>> drivers/usb/host/xhci.c      | 6 ++++++
>> drivers/usb/host/xhci.h      | 3 ++-
>> 3 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
>> index c2e15a27338b..7643ab9ec3b4 100644
>> --- a/drivers/usb/host/xhci-ring.c
>> +++ b/drivers/usb/host/xhci-ring.c
>> @@ -556,8 +556,8 @@ void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
>> 	 * pointer command pending because the device can choose to start any
>> 	 * stream once the endpoint is on the HW schedule.
>> 	 */
>> -	if ((ep_state & EP_STOP_CMD_PENDING) || (ep_state & SET_DEQ_PENDING) ||
>> -	    (ep_state & EP_HALTED) || (ep_state & EP_CLEARING_TT))
>> +	if (ep_state & (EP_STOP_CMD_PENDING | SET_DEQ_PENDING | EP_HALTED |
>> +			EP_CLEARING_TT | EP_STALLED))
>> 		return;
> 
> Any flag added to this list needs to be added to xhci_urb_dequeue() too
> so it knowns that the endpoint is held in Stopped state and URBs can be
> unlinked without trying to stop it again.

In this case it's intentional.

If we prevent xhci_urb_dequeue() from queuing a stop endpoint command due to a flag,
then we must make sure the cancelled URB is given back in the same place we clear
the flag, like we do in the command completion handlers that clear EP_HALTED and
  SET_DEQ_PENDING.

The EP_STALLED flag is cleared after a ClearFeature(ENDPOINT_HALT) control transfer
request is (successfully?) sent to the device.
If we only give back those cancelled URBs after this then we create a situation where
cancelled urb giveback is blocked and depend on the completion of another transfer
on a different endpoint.
I don't want this dependency.

It's possible that this could create some type of deadlock where class driver ends
up waiting for cancelled URBs to be given back before it sends the request to clear
the halt, and  xhci won't give back the cancelld URBs before the
ClearFeature(ENDPOINT_HALT) request completes..

Lets look at the cases where xhci_urb_dequeue() is called between setting and clearing
this new EP_STALLED flag.

The EP_HALTED is set during same spinlock as EP_STALLED, so urbs dequeued during this time
will be added to cancelled list, and given back in xhci_handle_cmd_reset_ep() completion
handler where also EP_HALTED is cleared. If dequeue needs to be moved then SET_DEQ_PENDING
is set, and cancelled urbs will be given back in xhci_handle_cmd_set_deq() completion handler.

At this stage we know endpoint is in stopped state. and will remauin so until EP_STALLED is cleared.
if xhci_urb_dequeue() is called now then a stop endpoint command will ne queued,
it will complete with a context state error due to endpoint already being stopped, but
URB will be given back in one of the completion handlers. mentioned before.

We could improve this codepath a bit by adding:

iff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 0f8acbb9cd21..c8d1651c9703 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1244,7 +1244,7 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
                          * Endpoint later. EP state is now Stopped and EP_HALTED
                          * still set because Reset EP handler will run after us.
                          */
-                       if (ep->ep_state & EP_HALTED)
+                       if (ep->ep_state & (EP_HALTED | EP_STALLED)
                                 break;
                         /*
                          * On some HCs EP state remains Stopped for some tens of

>>   	case USB_ENDPOINT_XFER_CONTROL:
>> @@ -3202,6 +3207,7 @@ static void xhci_endpoint_reset(struct usb_hcd *hcd,
>>   		return;
>>   
>>   	ep = &vdev->eps[ep_index];
>> +	ep->ep_state &= ~EP_STALLED;
> 
> ... and clearing any of those flags has always been followed by calling
> xhci_ring_ep_doorbell() again, to ensure that the endpoint is restarted
> if it has URBs on it but restart was held off due to the flag.
> 

Probably no harm in ringing the doorbell here. Should not be needed as there
shouldn't be any pending URBs, see usb core message.c comment for usb_clear_halt():

  * This is used to clear halt conditions for bulk and interrupt endpoints,
  * as reported by URB completion status.  Endpoints that are halted are
  * sometimes referred to as being "stalled".  Such endpoints are unable
  * to transmit or receive data until the halt status is cleared.  Any URBs
  * queued for such an endpoint should normally be unlinked by the driver
  * before clearing the halt condition, as described in sections 5.7.5
  * and 5.8.5 of the USB 2.0 spec.

But I don't see any harm in ringing the doorbell here either.

Thanks
Mathias

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors.
  2025-03-07 14:23     ` Mathias Nyman
@ 2025-03-07 15:44       ` Michał Pecio
  2025-03-07 16:18         ` Mathias Nyman
  0 siblings, 1 reply; 24+ messages in thread
From: Michał Pecio @ 2025-03-07 15:44 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: gregkh, linux-usb

On Fri, 7 Mar 2025 16:23:17 +0200, Mathias Nyman wrote:
> > Any flag added to this list needs to be added to xhci_urb_dequeue()
> > too so it knowns that the endpoint is held in Stopped state and
> > URBs can be unlinked without trying to stop it again.  
> 
> In this case it's intentional.
> 
> If we prevent xhci_urb_dequeue() from queuing a stop endpoint command
> due to a flag, then we must make sure the cancelled URB is given back
> in the same place we clear the flag, like we do in the command
> completion handlers that clear EP_HALTED and SET_DEQ_PENDING.

I'm not sure why this would be, what's the problem with the approach
used for EP_CLEARING_TT currently? And if there is a problem, doesn't
EP_CLEARING_TT also have this problem?

In this case, xhci_urb_dequeue() simply takes xhci->lock and calls:

void xhci_process_cancelled_tds(struct xhci_virt_ep *ep)
{
        xhci_invalidate_cancelled_tds(ep);
        xhci_giveback_invalidated_tds(ep);
}

Unlinked URBs are either given back instantly, or Set TR Dequeue is
queued (and flagged on ep->ep_state) and the rest of the process goes
same way as usual when called from xhci_handle_cmd_stop_ep(). 

The EP will be restarted when the last flag is cleared, which may be
either SET_DEQ_PENDING or EP_CLEARING_TT/EP_STALLED.

It's practically an optimization which eliminates the dummy Stop EP
command from the process. I thought EP_STALLED could use it.

> The EP_STALLED flag is cleared after a ClearFeature(ENDPOINT_HALT)
> control transfer request is (successfully?) sent to the device.
> If we only give back those cancelled URBs after this then we create a
> situation where cancelled urb giveback is blocked and depend on the
> completion of another transfer on a different endpoint.
> I don't want this dependency.

No doubt, that would be unbounded latency and asking for trouble.

> It's possible that this could create some type of deadlock where
> class driver ends up waiting for cancelled URBs to be given back
> before it sends the request to clear the halt, and  xhci won't give
> back the cancelld URBs before the ClearFeature(ENDPOINT_HALT) request
> completes..
> 
> Lets look at the cases where xhci_urb_dequeue() is called between
> setting and clearing this new EP_STALLED flag.
> 
> The EP_HALTED is set during same spinlock as EP_STALLED, so urbs
> dequeued during this time will be added to cancelled list, and given
> back in xhci_handle_cmd_reset_ep() completion handler where also
> EP_HALTED is cleared. If dequeue needs to be moved then
> SET_DEQ_PENDING is set, and cancelled urbs will be given back in
> xhci_handle_cmd_set_deq() completion handler.
> 
> At this stage we know endpoint is in stopped state. and will remauin
> so until EP_STALLED is cleared. if xhci_urb_dequeue() is called now
> then a stop endpoint command will ne queued, it will complete with a
> context state error due to endpoint already being stopped, but URB
> will be given back in one of the completion handlers. mentioned
> before.

Yes, it works, but in this case the "shortcut" will also work.

One problems with pointless Stop EP commands I remember is that there
is code in xhci-hub.c:xhci_stop_device() which avoids queuing Stop EP
on stopped endpoints, supposedly because it triggers some HW bug.

So the idea of these Stop EP patches was to eliminate such cases. It
also simplifies the completion handler and avoids needing:

> We could improve this codepath a bit by adding:
> [...]


Michal

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors.
  2025-03-07 15:44       ` Michał Pecio
@ 2025-03-07 16:18         ` Mathias Nyman
  0 siblings, 0 replies; 24+ messages in thread
From: Mathias Nyman @ 2025-03-07 16:18 UTC (permalink / raw)
  To: Michał Pecio; +Cc: gregkh, linux-usb

On 7.3.2025 17.44, Michał Pecio wrote:
> On Fri, 7 Mar 2025 16:23:17 +0200, Mathias Nyman wrote:
>>> Any flag added to this list needs to be added to xhci_urb_dequeue()
>>> too so it knowns that the endpoint is held in Stopped state and
>>> URBs can be unlinked without trying to stop it again.
>>
>> In this case it's intentional.
>>
>> If we prevent xhci_urb_dequeue() from queuing a stop endpoint command
>> due to a flag, then we must make sure the cancelled URB is given back
>> in the same place we clear the flag, like we do in the command
>> completion handlers that clear EP_HALTED and SET_DEQ_PENDING.
> 
> I'm not sure why this would be, what's the problem with the approach
> used for EP_CLEARING_TT currently? And if there is a problem, doesn't
> EP_CLEARING_TT also have this problem?
> 
> In this case, xhci_urb_dequeue() simply takes xhci->lock and calls:
> 
> void xhci_process_cancelled_tds(struct xhci_virt_ep *ep)
> {
>          xhci_invalidate_cancelled_tds(ep);
>          xhci_giveback_invalidated_tds(ep);
> }
> 
> Unlinked URBs are either given back instantly, or Set TR Dequeue is
> queued (and flagged on ep->ep_state) and the rest of the process goes
> same way as usual when called from xhci_handle_cmd_stop_ep().
> 
> The EP will be restarted when the last flag is cleared, which may be
> either SET_DEQ_PENDING or EP_CLEARING_TT/EP_STALLED.
> 
> It's practically an optimization which eliminates the dummy Stop EP
> command from the process. I thought EP_STALLED could use it.
> 

This should work, and avoid that unnecessary stop endpoint command.

Just need to make sure we check for EP_STALLED flag after the other
(EP_STOP_CMD_PENDING | EP_HALTED | SET_DEQ_PENDING) flags in
xhci_urb_dequeue(), just like EP_CLEARING_TT case.

Also need to protect clearing the EP_STALLED flag with the lock

I'll either send an update patch next week, or during rc cycle if
that's too late.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-03-07 16:17 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-06 14:49 [PATCH 00/15] xhci features for usb-next Mathias Nyman
2025-03-06 14:49 ` [PATCH 01/15] xhci: show correct U1 and U2 timeout values in debug messages Mathias Nyman
2025-03-06 14:49 ` [PATCH 02/15] usb: xhci: remove redundant update_ring_for_set_deq_completion() function Mathias Nyman
2025-03-06 14:49 ` [PATCH 03/15] usb: xhci: Don't skip on Stopped - Length Invalid Mathias Nyman
2025-03-06 14:52   ` Greg KH
2025-03-06 15:29     ` Mathias Nyman
2025-03-06 15:42       ` Greg KH
2025-03-06 14:49 ` [PATCH 04/15] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Mathias Nyman
2025-03-06 14:49 ` [PATCH 05/15] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Mathias Nyman
2025-03-06 14:49 ` [PATCH 06/15] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Mathias Nyman
2025-03-06 14:49 ` [PATCH 07/15] usb: xhci: Skip only one TD on Ring Underrun/Overrun Mathias Nyman
2025-03-06 14:49 ` [PATCH 08/15] usb: xhci: correct debug message page size calculation Mathias Nyman
2025-03-06 14:49 ` [PATCH 09/15] usb: xhci: set page size to the xHCI-supported size Mathias Nyman
2025-03-06 14:49 ` [PATCH 10/15] usb: xhci: refactor trb_in_td() to be static Mathias Nyman
2025-03-06 14:49 ` [PATCH 11/15] usb: xhci: move debug capabilities from trb_in_td() to handle_tx_event() Mathias Nyman
2025-03-06 14:49 ` [PATCH 12/15] xhci: Prevent early endpoint restart when handling STALL errors Mathias Nyman
2025-03-07  6:54   ` Michał Pecio
2025-03-07 14:23     ` Mathias Nyman
2025-03-07 15:44       ` Michał Pecio
2025-03-07 16:18         ` Mathias Nyman
2025-03-06 14:49 ` [PATCH 13/15] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Mathias Nyman
2025-03-06 14:49 ` [PATCH 14/15] usb: xhci: Unify duplicate inc_enq() code Mathias Nyman
2025-03-06 14:49 ` [PATCH 15/15] xhci: Handle spurious events on Etron host isoc enpoints Mathias Nyman
2025-03-07  8:27   ` Michał Pecio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox