linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Various xhci fixes and improvements
@ 2024-08-27 17:46 Michal Pecio
  2024-08-27 17:47 ` [PATCH 1/9] usb: xhci: Fix double newline in a debug message Michal Pecio
                   ` (10 more replies)
  0 siblings, 11 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:46 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

Hi,

Here's a handful of bug fixes, cleanups and improvements for the xHCI
driver.

The first is a trivial fix, the second also fixes a bug albeit a less
trivial one. These two maybe deserve to go into usb-linus, the latter
perhaps to stable as well, but the patch doesn't apply as-is.

The rest are new functionality or code cleanups with low anticipated
user impact.

All patches were applied and tested on v6.11-rc5, all compiled cleanly
and worked without regressions with some HID, storage, audio, video.
Functional changes received additional functional testing described in
their respective changelogs.


Notably missing is a solution to the "Underrun after Missed Service"
problem. To recap, Underrun/Overrun typically has a NULL TRB pointer
and ep_ring->td_list contains some missed TDs and possibly a few that
have been added after the underrun occured but before we got the IRQ.
No way to tell them apart, at least by the spec, as far as I see.

On USB 3.1+ hosts we can get out of it with "expedited skipping" - it
ensures that the ring is never left with stale TDs in the first place.

On USB 3.0 hosts the underrun handler *will* sometimes face skipping
and it needs to make a decision.

Currently, it skips everything. This may cause DMA-after-free - not
great on IN endpoints - and "TRB DMA ptr not part of current TD" or
"WARN Event TRB with no TDs queued" later.

The obvious alterantive is to never skip on empty ring events, but it
can deadlock a driver which waits for its URBs to be given back when
all of them were missed and we can't get a valid dequeue from the HC.


I wonder if it would make sense to always skip the first queued TD
when we get an MSE with NULL pointer? I think it's legal, and likely
sufficient to avoid the stupid deadlock.

Just a last minute idea. I will think about it, but now I'd like to
flush this patch queue before it grows to infinity ;)

Thanks,
Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/9] usb: xhci: Fix double newline in a debug message
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
@ 2024-08-27 17:47 ` Michal Pecio
  2024-08-27 17:48 ` [PATCH 2/9] usb: xhci: Fix handling errors mid TD followed by other errors Michal Pecio
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:47 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

xhci_dbg_trace() appends a newline to the format string,
don't call it with a newline terminated string.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4ea2c3e072a9..e960609dcf51 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -723,9 +723,9 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
 	ep->queued_deq_seg = new_seg;
 	ep->queued_deq_ptr = new_deq;
 
 	xhci_dbg_trace(xhci, trace_xhci_dbg_cancel_urb,
-		       "Set TR Deq ptr 0x%llx, cycle %u\n", addr, new_cycle);
+		       "Set TR Deq ptr 0x%llx, cycle %u", addr, new_cycle);
 
 	/* Stop the TD queueing code from ringing the doorbell until
 	 * this command completes.  The HC won't set the dequeue pointer
 	 * if the ring is running, and ringing the doorbell starts the
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/9] usb: xhci: Fix handling errors mid TD followed by other errors
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
  2024-08-27 17:47 ` [PATCH 1/9] usb: xhci: Fix double newline in a debug message Michal Pecio
@ 2024-08-27 17:48 ` Michal Pecio
  2024-08-27 17:49 ` [PATCH 3/9] usb: xhci: Clean up the TD skipping loop Michal Pecio
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:48 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

Some host controllers fail to produce the final completion event on an
isochronous TD which experienced an error mid TD. We deal with it by
flagging such TDs and checking if the next event points at the flagged
TD or at the next one, and completing the flagged TD if the latter.

This is not enough, because the next TD may be missed by the xHC. And
we need to do the check early, or errors on the next TD may confuse us.

If the next TD experiences a Missed Service Error, we will set the skip
flag on the endpoint and then attempt skipping TDs when yet another
event arrives. In such scenario, we ought to report the 'erorr mid TD'
transfer as such rather than skip it. We also need to detect that it is
there and in need of handling, despite never receiving a completion of
the subsequent TD. For this purpose, add a check for ep->skip flag and
move the whole error handling logic before the skipping logic.

Note that when we see both td->error_mid_td and ep->skip, they surely
must have been set in this order. Otherwise, skip flag would have been
cleared when the error mid TD was being handled.

Another problem case are Stopped Endpoint events. If we see one after
an error mid TD, we naively assume that it's a Forced Stopped Event
because it doesn't match the pending TD, even though it might actually
be a Stopped Event on the next TD, which we fail to recognize. This is
fixed similarly, by reordering erorr handling before FSE handling.

Lastly, error mid TD handling is reordered with last_td_was_short. This
is harmless, because these two cases are mutually exclusive - only one
can happen in any given execution of handle_tx_event().

And if we are here, add a FIXME comment on the last_td_was_short thing
because it's a wholly broken idea.

Tested on the NEC host and a USB camera with flaky cable. Dynamic debug
confirmed that Transaction Errors are sometimes seen, sometimes mid-TD,
sometimes followed by Missed Service. In such cases, they were finished
properly before skipping began.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 78 ++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e960609dcf51..401c34ff2260 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2798,8 +2798,39 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 
 		/* Is this a TRB in the currently executing TD? */
 		ep_seg = trb_in_td(xhci, td, ep_trb_dma, false);
 
+		if (!ep_seg) {
+			/*
+			 * xhci 4.10.2 states isoc endpoints should continue
+			 * processing the next TD if there was an error mid TD.
+			 * So host like NEC don't generate an event for the last
+			 * isoc TRB even if the IOC flag is set.
+			 * xhci 4.9.1 states that if there are errors in mult-TRB
+			 * TDs xHC should generate an error for that TRB, and if xHC
+			 * proceeds to the next TD it should genete an event for
+			 * any TRB with IOC flag on the way. Other host follow this.
+			 * So this event might be for the next TD.
+			 */
+			if (td->error_mid_td &&
+			    !list_is_last(&td->td_list, &ep_ring->td_list)) {
+				struct xhci_td *td_next = list_next_entry(td, td_list);
+
+				/* Is this a TRB in the next queued TD? */
+				/* *OR* did we miss some TDs meanwhile? */
+				ep_seg = trb_in_td(xhci, td_next, ep_trb_dma, false);
+				if (ep_seg || ep->skip) {
+					/* give back previous TD, start handling new */
+					xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
+					ep_ring->dequeue = td->last_trb;
+					ep_ring->deq_seg = td->last_trb_seg;
+					inc_deq(xhci, ep_ring);
+					xhci_td_cleanup(xhci, td, ep_ring, td->status);
+					td = td_next;
+				}
+			}
+		}
+
 		if (!ep_seg) {
 
 			if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
 				skip_isoc_td(xhci, td, ep, status);
@@ -2820,53 +2851,24 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 
 			/*
 			 * Some hosts give a spurious success event after a short
 			 * transfer. Ignore it.
+			 * FIXME xHCI 4.10.1.1: this should be freed now, not mid-TD
 			 */
 			if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
 			    ep_ring->last_td_was_short) {
 				ep_ring->last_td_was_short = false;
 				return 0;
 			}
 
-			/*
-			 * xhci 4.10.2 states isoc endpoints should continue
-			 * processing the next TD if there was an error mid TD.
-			 * So host like NEC don't generate an event for the last
-			 * isoc TRB even if the IOC flag is set.
-			 * xhci 4.9.1 states that if there are errors in mult-TRB
-			 * TDs xHC should generate an error for that TRB, and if xHC
-			 * proceeds to the next TD it should genete an event for
-			 * any TRB with IOC flag on the way. Other host follow this.
-			 * So this event might be for the next TD.
-			 */
-			if (td->error_mid_td &&
-			    !list_is_last(&td->td_list, &ep_ring->td_list)) {
-				struct xhci_td *td_next = list_next_entry(td, td_list);
-
-				ep_seg = trb_in_td(xhci, td_next, ep_trb_dma, false);
-				if (ep_seg) {
-					/* give back previous TD, start handling new */
-					xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
-					ep_ring->dequeue = td->last_trb;
-					ep_ring->deq_seg = td->last_trb_seg;
-					inc_deq(xhci, ep_ring);
-					xhci_td_cleanup(xhci, td, ep_ring, td->status);
-					td = td_next;
-				}
-			}
-
-			if (!ep_seg) {
-				/* HC is busted, give up! */
-				xhci_err(xhci,
-					"ERROR Transfer event TRB DMA ptr not "
-					"part of current TD ep_index %d "
-					"comp_code %u\n", ep_index,
-					trb_comp_code);
-				trb_in_td(xhci, td, ep_trb_dma, true);
-
-				return -ESHUTDOWN;
-			}
+			/* HC is busted, give up! */
+			xhci_err(xhci,
+				"ERROR Transfer event TRB DMA ptr not "
+				"part of current TD ep_index %d "
+				"comp_code %u\n", ep_index,
+				trb_comp_code);
+			trb_in_td(xhci, td, ep_trb_dma, true);
+			return -ESHUTDOWN;
 		}
 
 		if (ep->skip) {
 			xhci_dbg(xhci,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/9] usb: xhci: Clean up the TD skipping loop
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
  2024-08-27 17:47 ` [PATCH 1/9] usb: xhci: Fix double newline in a debug message Michal Pecio
  2024-08-27 17:48 ` [PATCH 2/9] usb: xhci: Fix handling errors mid TD followed by other errors Michal Pecio
@ 2024-08-27 17:49 ` Michal Pecio
  2024-08-27 17:50 ` [PATCH 4/9] usb: xhci: Expedite processing missed isoch TDs on modern HCs Michal Pecio
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:49 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

Recent reworks left this loop ending with:

		if (ep->skip)
			ep->skip = false;
	while (ep->skip);

which obviously cannot ever repeat. Repetition is only possible by a
'continue' statement earlier in the loop body.

Move the "tail" which only executes once out of the loop body.

Bring the cases of "Found TD" and "skip isoc td" closely together to
consolidate the skipping logic in one place and improve clarity. Now
this code properly controls the loop condition.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 102 ++++++++++++++++++-----------------
 1 file changed, 52 insertions(+), 50 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 401c34ff2260..e65cc80cb285 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2829,63 +2829,65 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 				}
 			}
 		}
 
-		if (!ep_seg) {
-
-			if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
-				skip_isoc_td(xhci, td, ep, status);
-				continue;
-			}
-
-			/*
-			 * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
-			 * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
-			 * pointer still at the previous TRB of the current TD. The previous TRB
-			 * maybe a Link TD or the last TRB of the previous TD. The command
-			 * completion handle will take care the rest.
-			 */
-			if (trb_comp_code == COMP_STOPPED ||
-			    trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
-				return 0;
-			}
-
-			/*
-			 * Some hosts give a spurious success event after a short
-			 * transfer. Ignore it.
-			 * FIXME xHCI 4.10.1.1: this should be freed now, not mid-TD
-			 */
-			if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
-			    ep_ring->last_td_was_short) {
-				ep_ring->last_td_was_short = false;
-				return 0;
-			}
-
-			/* HC is busted, give up! */
-			xhci_err(xhci,
-				"ERROR Transfer event TRB DMA ptr not "
-				"part of current TD ep_index %d "
-				"comp_code %u\n", ep_index,
-				trb_comp_code);
-			trb_in_td(xhci, td, ep_trb_dma, true);
-			return -ESHUTDOWN;
-		}
-
+		/*
+		 * If ep->skip is set, it means there are missed tds on the
+		 * endpoint ring need to take care of.
+		 * Process them as short transfer until reach the td pointed by
+		 * the event.
+		 */
 		if (ep->skip) {
-			xhci_dbg(xhci,
-				 "Found td. Clear skip flag for slot %u ep %u.\n",
-				 slot_id, ep_index);
-			ep->skip = false;
+			if (ep_seg) {
+				xhci_dbg(xhci,
+					 "Found td. Clear skip flag for slot %u ep %u.\n",
+					 slot_id, ep_index);
+				ep->skip = false;
+			} else {
+				if (usb_endpoint_xfer_isoc(&td->urb->ep->desc))
+					skip_isoc_td(xhci, td, ep, status);
+				else
+					break;
+			}
 		}
 
-	/*
-	 * If ep->skip is set, it means there are missed tds on the
-	 * endpoint ring need to take care of.
-	 * Process them as short transfer until reach the td pointed by
-	 * the event.
-	 */
 	} while (ep->skip);
 
+	if (!ep_seg) {
+
+		/*
+		 * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
+		 * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
+		 * pointer still at the previous TRB of the current TD. The previous TRB
+		 * maybe a Link TD or the last TRB of the previous TD. The command
+		 * completion handle will take care the rest.
+		 */
+		if (trb_comp_code == COMP_STOPPED ||
+		    trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
+			return 0;
+		}
+
+		/*
+		 * Some hosts give a spurious success event after a short
+		 * transfer. Ignore it.
+		 * FIXME xHCI 4.10.1.1: this should be freed now, not mid-TD
+		 */
+		if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
+		    ep_ring->last_td_was_short) {
+			ep_ring->last_td_was_short = false;
+			return 0;
+		}
+
+		/* HC is busted, give up! */
+		xhci_err(xhci,
+			"ERROR Transfer event TRB DMA ptr not "
+			"part of current TD ep_index %d "
+			"comp_code %u\n", ep_index,
+			trb_comp_code);
+		trb_in_td(xhci, td, ep_trb_dma, true);
+		return -ESHUTDOWN;
+	}
+
 	if (trb_comp_code == COMP_SHORT_PACKET)
 		ep_ring->last_td_was_short = true;
 	else
 		ep_ring->last_td_was_short = false;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/9] usb: xhci: Expedite processing missed isoch TDs on modern HCs
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (2 preceding siblings ...)
  2024-08-27 17:49 ` [PATCH 3/9] usb: xhci: Clean up the TD skipping loop Michal Pecio
@ 2024-08-27 17:50 ` Michal Pecio
  2024-08-27 17:51 ` [PATCH 5/9] usb: xhci: Simplify error_mid_td marking Michal Pecio
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:50 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

xHCI spec rev. 1.0 allowed the TRB pointer of Missed Service events
to be NULL. In such case we have no idea which queued TD were missed
and which are still waiting for execution (and the spec forbids us
guessing based on frame numbers), so all we can do is set a flag on
the endpoint and wait for a new event with a valid dequeue pointer.

But hosts are also allowed to give us pointer to the last missed TRB
and this became mandatory in spec rev. 1.1 and later.

Use this pointer, if available, to immediately skip all missed TDs.
This may be particularly helpful for applications which queue very
few URBs/TDs for the sake of lowest possible latency.

It also saves the day if the next event is an underrun/overrun and
we will not get a valid dequeue pointer with it. If we then saw the
skip flag, we would have no way of knowing how many TDs were missed
and how many were queued after the host encountered ring underrun.

Handle Missed Service Error events as "error mid TD", if applicable,
because rev. 1.0 spec excplicitly says so in notes to 4.10.3.2. The
notes are no longer present in later revs, but rules of 4.9.1 should
apply universally.

Note that handle_tx_event() can cope with a host reporting MSE on an
early TRB of a TD and then failing to signal the final TRB. However,
in such (hopefully rare) case latency is not improved by this patch.

Tested on ASMedia ASM3142. This USB 3.1 host gives valid pointers in
Missed Service events and the skipping loop works until it finds the
last missed TRB indicated by the host. Then the skip flag is cleared
and the TRB passed to process_isoc_td() like any other.

Tested on NEC and VIA VL805. These USB 3.0 hosts give NULL pointers
so the event handler sets the skip flag and returns, as expected.

Change inspired by a recent discussion about realtime USB audio.

Link: https://lore.kernel.org/linux-usb/76e1a191-020d-4a76-97f6-237f9bd0ede0@gmx.net/T/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e65cc80cb285..cc0420021683 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2413,8 +2413,14 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		frame->status = -EOVERFLOW;
 		if (ep_trb != td->last_trb)
 			td->error_mid_td = true;
 		break;
+	case COMP_MISSED_SERVICE_ERROR:
+		frame->status = -EXDEV;
+		sum_trbs_for_length = true;
+		if (ep_trb != td->last_trb)
+			td->error_mid_td = true;
+		break;
 	case COMP_INCOMPATIBLE_DEVICE_ERROR:
 	case COMP_STALL_ERROR:
 		frame->status = -EPROTO;
 		break;
@@ -2730,13 +2736,17 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		 * When encounter missed service error, one or more isoc tds
 		 * may be missed by xHC.
 		 * Set skip flag of the ep_ring; Complete the missed tds as
 		 * short transfer when process the ep_ring next time.
+		 * If the xHC tells us the last missed TRB (ep_trb_dma != NULL)
+		 * perform the skipping right away.
 		 */
 		ep->skip = true;
 		xhci_dbg(xhci,
-			 "Miss service interval error for slot %u ep %u, set skip flag\n",
-			 slot_id, ep_index);
+			 "Miss service interval error for slot %u ep %u, set skip flag, go ahead %d\n",
+			 slot_id, ep_index, !!ep_trb_dma);
+		if (ep_trb_dma)
+			break;
 		return 0;
 	case COMP_NO_PING_RESPONSE_ERROR:
 		ep->skip = true;
 		xhci_dbg(xhci,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/9] usb: xhci: Simplify error_mid_td marking
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (3 preceding siblings ...)
  2024-08-27 17:50 ` [PATCH 4/9] usb: xhci: Expedite processing missed isoch TDs on modern HCs Michal Pecio
@ 2024-08-27 17:51 ` Michal Pecio
  2024-08-27 17:52 ` [PATCH 6/9] usb: xhci: Sanity check "spurious success" handling Michal Pecio
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:51 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

It doesn't matter whether we set this flag or not on the final TRB,
because the TD is about to be immediately "finished" anyway.

Without these checks the code looks cleaner and easier to follow.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index cc0420021683..c777cb897579 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2410,26 +2410,23 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		sum_trbs_for_length = true;
 		fallthrough;
 	case COMP_ISOCH_BUFFER_OVERRUN:
 		frame->status = -EOVERFLOW;
-		if (ep_trb != td->last_trb)
-			td->error_mid_td = true;
+		td->error_mid_td = true;
 		break;
 	case COMP_MISSED_SERVICE_ERROR:
 		frame->status = -EXDEV;
 		sum_trbs_for_length = true;
-		if (ep_trb != td->last_trb)
-			td->error_mid_td = true;
+		td->error_mid_td = true;
 		break;
 	case COMP_INCOMPATIBLE_DEVICE_ERROR:
 	case COMP_STALL_ERROR:
 		frame->status = -EPROTO;
 		break;
 	case COMP_USB_TRANSACTION_ERROR:
 		frame->status = -EPROTO;
 		sum_trbs_for_length = true;
-		if (ep_trb != td->last_trb)
-			td->error_mid_td = true;
+		td->error_mid_td = true;
 		break;
 	case COMP_STOPPED:
 		sum_trbs_for_length = true;
 		break;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/9] usb: xhci: Sanity check "spurious success" handling
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (4 preceding siblings ...)
  2024-08-27 17:51 ` [PATCH 5/9] usb: xhci: Simplify error_mid_td marking Michal Pecio
@ 2024-08-27 17:52 ` Michal Pecio
  2024-08-27 17:53 ` [PATCH 7/9] usb: xhci: Don't lie about trb_comp_code in handle_tx_event() Michal Pecio
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:52 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

It's not spurious, it's expected, it's required by the spec since its
final 1.0 revision 14 years ago, and it's handled incorrectly here.
But until somebody puts some effort to get it all right, try at least
not to do obviously wrong things here.

This code claims to handle "spurious" Success events, but in reality
it is ready and willing to silently swallow any kind of event, on most
host controllers these days, after any short transfer.

It got in my way while debugging genuinely incorrect events from the
xHC, which this code thought were meant for it, because it has no way
of knowing better, because it's utterly broken.

Limit it at least to only accept Success and Short Packet completions
so that rightful warnings will be printed in other cases.

So far I found no instances of this change exposing previously hidden
errors, besides the aforementioned case of a buggy xHC. The buggy xHC
completely fails to acknowledge some TDs in any way. It proceeeds to
the next TD, whose event then doesn't match the current TD, so if the
prior TD got a short packet, the "spurious" code swallows the event.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index c777cb897579..e19c8a17b59c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2785,9 +2785,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 			 */
 
 			if (!(trb_comp_code == COMP_STOPPED ||
 			      trb_comp_code == COMP_STOPPED_LENGTH_INVALID ||
-			      ep_ring->last_td_was_short)) {
+			      (trb_comp_code == COMP_SUCCESS && ep_ring->last_td_was_short) ||
+			      (trb_comp_code == COMP_SHORT_PACKET && ep_ring->last_td_was_short))) {
 				xhci_warn(xhci, "WARN Event TRB for slot %u ep %d with no TDs queued?\n",
 					  slot_id, ep_index);
 			}
 			if (ep->skip) {
@@ -2878,9 +2879,11 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		 * transfer. Ignore it.
 		 * FIXME xHCI 4.10.1.1: this should be freed now, not mid-TD
 		 */
 		if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
-		    ep_ring->last_td_was_short) {
+		    ep_ring->last_td_was_short &&
+		    (trb_comp_code == COMP_SUCCESS ||
+		    trb_comp_code == COMP_SHORT_PACKET)) {
 			ep_ring->last_td_was_short = false;
 			return 0;
 		}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/9] usb: xhci: Don't lie about trb_comp_code in handle_tx_event()
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (5 preceding siblings ...)
  2024-08-27 17:52 ` [PATCH 6/9] usb: xhci: Sanity check "spurious success" handling Michal Pecio
@ 2024-08-27 17:53 ` Michal Pecio
  2024-08-27 17:54 ` [PATCH 8/9] usb: xhci: Print completion code in the empty ring warning Michal Pecio
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:53 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

This variable normally is an endian-corrected copy of the completion
code received from hardware, except for one case where it is changed
in order to trick some later code into setting some flag.

This can be confusing when analyzing or debugging the function and
the false completion code is sometimes written to dmesg too.

For even more confusion, functions called by handle_tx_event() also
have same-named variables but they initialize them from scratch from
the hardware event, undoing this change within their scope.

Use a dedicated local variable instead of such machinations.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e19c8a17b59c..58e6d0280e00 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2613,8 +2613,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	union xhci_trb *ep_trb;
 	int status = -EINPROGRESS;
 	struct xhci_ep_ctx *ep_ctx;
 	u32 trb_comp_code;
+	bool short_packet = false;
 
 	slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags));
 	ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1;
 	trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len));
@@ -2645,14 +2646,15 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 	 * transfer type
 	 */
 	case COMP_SUCCESS:
 		if (EVENT_TRB_LEN(le32_to_cpu(event->transfer_len)) != 0) {
-			trb_comp_code = COMP_SHORT_PACKET;
+			short_packet = true;
 			xhci_dbg(xhci, "Successful completion on short TX for slot %u ep %u with last td short %d\n",
 				 slot_id, ep_index, ep_ring->last_td_was_short);
 		}
 		break;
 	case COMP_SHORT_PACKET:
+		short_packet = true;
 		break;
 	/* Completion codes for endpoint stopped state */
 	case COMP_STOPPED:
 		xhci_dbg(xhci, "Stopped on Transfer TRB for slot %u ep %u\n",
@@ -2896,12 +2898,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		trb_in_td(xhci, td, ep_trb_dma, true);
 		return -ESHUTDOWN;
 	}
 
-	if (trb_comp_code == COMP_SHORT_PACKET)
-		ep_ring->last_td_was_short = true;
-	else
-		ep_ring->last_td_was_short = false;
+	ep_ring->last_td_was_short = short_packet;
 
 	ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
 	trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 8/9] usb: xhci: Print completion code in the empty ring warning
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (6 preceding siblings ...)
  2024-08-27 17:53 ` [PATCH 7/9] usb: xhci: Don't lie about trb_comp_code in handle_tx_event() Michal Pecio
@ 2024-08-27 17:54 ` Michal Pecio
  2024-08-27 17:55 ` [PATCH 9/9] usb: xhci: Clean up and rename bad stream event handler Michal Pecio
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:54 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

This gives a better picture of what was happening when the ring
was found unexpectedly empty.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 58e6d0280e00..9d1af76955cd 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2789,10 +2789,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 			if (!(trb_comp_code == COMP_STOPPED ||
 			      trb_comp_code == COMP_STOPPED_LENGTH_INVALID ||
 			      (trb_comp_code == COMP_SUCCESS && ep_ring->last_td_was_short) ||
 			      (trb_comp_code == COMP_SHORT_PACKET && ep_ring->last_td_was_short))) {
-				xhci_warn(xhci, "WARN Event TRB for slot %u ep %d with no TDs queued?\n",
-					  slot_id, ep_index);
+				xhci_warn(xhci, "WARN Event TRB for slot %u ep %d comp_code %d with no TDs queued?\n",
+					  slot_id, ep_index, trb_comp_code);
 			}
 			if (ep->skip) {
 				ep->skip = false;
 				xhci_dbg(xhci, "td_list is empty while skip flag set. Clear skip flag for slot %u ep %u.\n",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 9/9] usb: xhci: Clean up and rename bad stream event handler
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (7 preceding siblings ...)
  2024-08-27 17:54 ` [PATCH 8/9] usb: xhci: Print completion code in the empty ring warning Michal Pecio
@ 2024-08-27 17:55 ` Michal Pecio
  2024-08-28 10:21 ` [PATCH 0/9] Various xhci fixes and improvements FPS
  2024-08-28 11:14 ` Mathias Nyman
  10 siblings, 0 replies; 16+ messages in thread
From: Michal Pecio @ 2024-08-27 17:55 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: linux-usb

Function handle_transferless_tx_event() pretends that it sometimes
deals with certain isochronous events, but it really doesn't.

All isochronous events are handled in the big switch statement of
handle_tx_event(). The above function is never called on isoch EPs
because xhci_dma_to_transfer_ring() never returns NULL on those.

The only conceivable way we could end up there is if an isoch EP
is marked EP_HAS_STREAMS or its ring pointer is NULL, or we get an
X-run event on a non-isoch EP. Either way it's a blatant bug, so
don't "handle" this with a no-op, but print the default warning.

The actual meaningful work done by this function is cleaning up
after various problems on streams in questionable state, so give
it a more specific name and add a bit of comment.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9d1af76955cd..aebbdbfd8da8 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2566,10 +2566,14 @@ static int process_bulk_intr_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 
 	return finish_td(xhci, ep, ep_ring, td, trb_comp_code);
 }
 
-/* Transfer events which don't point to a transfer TRB, see xhci 4.17.4 */
-static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
+/*
+ * Transfer events which don't point to a transfer TRB, see xhci 4.17.4
+ * Specifically, this deals with cases where the EP has streams and the event
+ * TRB pointer is NULL or otherwise doesn't point into any known stream ring.
+ */
+static int handle_bad_stream_event(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 					u32 trb_comp_code)
 {
 	switch (trb_comp_code) {
 	case COMP_STALL_ERROR:
@@ -2581,10 +2585,8 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_
 			xhci_handle_halted_endpoint(xhci, ep, NULL, EP_HARD_RESET);
 		else
 			xhci_handle_halted_endpoint(xhci, ep, NULL, EP_SOFT_RESET);
 		break;
-	case COMP_RING_UNDERRUN:
-	case COMP_RING_OVERRUN:
 	case COMP_STOPPED_LENGTH_INVALID:
 		break;
 	default:
 		xhci_err(xhci, "Transfer event %u for unknown stream ring slot %u ep %u\n",
@@ -2637,9 +2639,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
 		goto err_out;
 	}
 
 	if (!ep_ring)
-		return handle_transferless_tx_event(xhci, ep, trb_comp_code);
+		return handle_bad_stream_event(xhci, ep, trb_comp_code);
 
 	/* Look for common error cases */
 	switch (trb_comp_code) {
 	/* Skip codes that require special handling depending on
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Various xhci fixes and improvements
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (8 preceding siblings ...)
  2024-08-27 17:55 ` [PATCH 9/9] usb: xhci: Clean up and rename bad stream event handler Michal Pecio
@ 2024-08-28 10:21 ` FPS
  2024-08-28 13:24   ` Michał Pecio
  2024-08-28 11:14 ` Mathias Nyman
  10 siblings, 1 reply; 16+ messages in thread
From: FPS @ 2024-08-28 10:21 UTC (permalink / raw)
  To: Michal Pecio, Mathias Nyman; +Cc: linux-usb



On 8/27/24 7:46 PM, Michal Pecio wrote:
> Hi,
>
> Here's a handful of bug fixes, cleanups and improvements for the xHCI
> driver.

[...]

> All patches were applied and tested on v6.11-rc5, all compiled cleanly
> and worked without regressions with some HID, storage, audio, video.
> Functional changes received additional functional testing described in
> their respective changelogs.

Hi!

I have a 6.11-rc5 tree here (tar.gz from kernel.org) and trying to apply this series I get:

$ for n in ~/Downloads/michal\ patches/20240827-\[PATCH\ *; do echo "$n"; patch --dry-run -p 1 < "$n"; done
/home/fps/Downloads/michal patches/20240827-[PATCH 1_9] usb_ xhci_ Fix double newline in a deb-148577.txt
checking file drivers/usb/host/xhci-ring.c
/home/fps/Downloads/michal patches/20240827-[PATCH 2_9] usb_ xhci_ Fix handling errors mid TD -148571.txt
checking file drivers/usb/host/xhci-ring.c
/home/fps/Downloads/michal patches/20240827-[PATCH 3_9] usb_ xhci_ Clean up the TD skipping lo-148572.txt
checking file drivers/usb/host/xhci-ring.c
Hunk #1 FAILED at 2829.
1 out of 1 hunk FAILED
/home/fps/Downloads/michal patches/20240827-[PATCH 4_9] usb_ xhci_ Expedite processing missed -148579.txt
checking file drivers/usb/host/xhci-ring.c
/home/fps/Downloads/michal patches/20240827-[PATCH 5_9] usb_ xhci_ Simplify error_mid_td marki-148573.txt
checking file drivers/usb/host/xhci-ring.c
Hunk #1 FAILED at 2410.
1 out of 1 hunk FAILED
/home/fps/Downloads/michal patches/20240827-[PATCH 6_9] usb_ xhci_ Sanity check _spurious succ-148574.txt
checking file drivers/usb/host/xhci-ring.c
Hunk #1 succeeded at 2778 (offset -7 lines).
Hunk #2 FAILED at 2879.
1 out of 2 hunks FAILED
/home/fps/Downloads/michal patches/20240827-[PATCH 7_9] usb_ xhci_ Don't lie about trb_comp_co-148575.txt
checking file drivers/usb/host/xhci-ring.c
Hunk #1 succeeded at 2610 (offset -3 lines).
Hunk #2 succeeded at 2643 (offset -3 lines).
Hunk #3 FAILED at 2898.
1 out of 3 hunks FAILED
/home/fps/Downloads/michal patches/20240827-[PATCH 8_9] usb_ xhci_ Print completion code in th-148576.txt
checking file drivers/usb/host/xhci-ring.c
Hunk #1 FAILED at 2789.
1 out of 1 hunk FAILED
/home/fps/Downloads/michal patches/20240827-[PATCH 9_9] usb_ xhci_ Clean up and rename bad str-148580.txt
checking file drivers/usb/host/xhci-ring.c
Hunk #1 succeeded at 2563 (offset -3 lines).
Hunk #2 succeeded at 2582 (offset -3 lines).
Hunk #3 succeeded at 2635 (offset -4 lines).

Are you sure these are against 6.11-rc5? Or did I screw up on my side?

Kind regards,
FPS

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Various xhci fixes and improvements
  2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
                   ` (9 preceding siblings ...)
  2024-08-28 10:21 ` [PATCH 0/9] Various xhci fixes and improvements FPS
@ 2024-08-28 11:14 ` Mathias Nyman
  2024-08-28 15:02   ` Michał Pecio
  2024-08-30 11:53   ` Michał Pecio
  10 siblings, 2 replies; 16+ messages in thread
From: Mathias Nyman @ 2024-08-28 11:14 UTC (permalink / raw)
  To: Michal Pecio, Mathias Nyman; +Cc: linux-usb, Niklas Neronin

On 27.8.2024 20.46, Michal Pecio wrote:
> Hi,
> 
> Here's a handful of bug fixes, cleanups and improvements for the xHCI
> driver.
> 
> The first is a trivial fix, the second also fixes a bug albeit a less
> trivial one. These two maybe deserve to go into usb-linus, the latter
> perhaps to stable as well, but the patch doesn't apply as-is.
> 
> The rest are new functionality or code cleanups with low anticipated
> user impact.
> 
> All patches were applied and tested on v6.11-rc5, all compiled cleanly
> and worked without regressions with some HID, storage, audio, video.
> Functional changes received additional functional testing described in
> their respective changelogs.
> 
> 
> Notably missing is a solution to the "Underrun after Missed Service"
> problem. To recap, Underrun/Overrun typically has a NULL TRB pointer
> and ep_ring->td_list contains some missed TDs and possibly a few that
> have been added after the underrun occured but before we got the IRQ.
> No way to tell them apart, at least by the spec, as far as I see.
> 
> On USB 3.1+ hosts we can get out of it with "expedited skipping" - it
> ensures that the ring is never left with stale TDs in the first place.
> 
> On USB 3.0 hosts the underrun handler *will* sometimes face skipping
> and it needs to make a decision.
> 
> Currently, it skips everything. This may cause DMA-after-free - not
> great on IN endpoints - and "TRB DMA ptr not part of current TD" or
> "WARN Event TRB with no TDs queued" later.
> 
> The obvious alterantive is to never skip on empty ring events, but it
> can deadlock a driver which waits for its URBs to be given back when
> all of them were missed and we can't get a valid dequeue from the HC.
> 
> 
> I wonder if it would make sense to always skip the first queued TD
> when we get an MSE with NULL pointer? I think it's legal, and likely
> sufficient to avoid the stupid deadlock.
> 
> Just a last minute idea. I will think about it, but now I'd like to
> flush this patch queue before it grows to infinity ;)
> 

Thanks Michal for this series.

I need to go through the patches individually and pick fixes.

We are in the middle of reworking transfer event handling with Niklas as well.

I'd be grateful if you could take a quick look at our solution and give
your opinion on it as you have expertise in this area.

Main idea is simple.
If a transfer event points to a TRB between dequeue and enqueue we give
back all queued TDs passed up to this TRB.
And if the event points to a valid TD then handle that. (normal case)

Code is much simpler, we get rid of skip flag, and could probably merge
error_mid_td and urb_length_set flags into one flag.

The code isn't complete or tested yet. It doesn't handle under/overruns,
but that could probably be fixed by just turning "return 0" for those
cases into "break"

Code on top of 6.11-rc4 can be found in my feature_transfer_event_rework branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git feature_transfer_event_rework
https://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_transfer_event_rework

Thanks
Mathias


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Various xhci fixes and improvements
  2024-08-28 10:21 ` [PATCH 0/9] Various xhci fixes and improvements FPS
@ 2024-08-28 13:24   ` Michał Pecio
  2024-08-28 13:48     ` fps
  0 siblings, 1 reply; 16+ messages in thread
From: Michał Pecio @ 2024-08-28 13:24 UTC (permalink / raw)
  To: FPS; +Cc: Mathias Nyman, linux-usb

> Are you sure these are against 6.11-rc5? Or did I screw up on my side?
Absolutely for 6.11-rc5 (but -rc4 works too and -rc6 likely will).

Please try again without --dry-run, some of them will only apply on top
of earlier ones because they touch the same lines. If that doesn't work
then I may have screwed up mailing them.

BTW, #3 is logically dependent on #2, but the others could be reordered
at will IIRC.

As for your audio case, #4 is probably the only potentially interesting
one, but it didn't help on my sluggish ASMedia host. It should, however, 
at least eliminate the "WARN Event TRB with no TDs queued" on USB 3.1
and newer hosts.

But the root cause of those audio issues is likely elsewhere.

Regards,
Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Various xhci fixes and improvements
  2024-08-28 13:24   ` Michał Pecio
@ 2024-08-28 13:48     ` fps
  0 siblings, 0 replies; 16+ messages in thread
From: fps @ 2024-08-28 13:48 UTC (permalink / raw)
  To: Michał Pecio; +Cc: Mathias Nyman, linux-usb

On August 28, 2024 3:24:58 PM GMT+02:00, "Michał Pecio" <michal.pecio@gmail.com> wrote:
>> Are you sure these are against 6.11-rc5? Or did I screw up on my side?
>Absolutely for 6.11-rc5 (but -rc4 works too and -rc6 likely will).
>
>Please try again without --dry-run, some of them will only apply on top
>of earlier ones because they touch the same lines. If that doesn't work
>then I may have screwed up mailing them.

Oh, of course! My bad. Sorry for the noise!


>
>BTW, #3 is logically dependent on #2, but the others could be reordered
>at will IIRC.
>
>As for your audio case, #4 is probably the only potentially interesting
>one, but it didn't help on my sluggish ASMedia host. It should, however, 
>at least eliminate the "WARN Event TRB with no TDs queued" on USB 3.1
>and newer hosts.
>
>But the root cause of those audio issues is likely elsewhere.

Yeah, just giving it a shot :)

Kind regards,
FPS


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Various xhci fixes and improvements
  2024-08-28 11:14 ` Mathias Nyman
@ 2024-08-28 15:02   ` Michał Pecio
  2024-08-30 11:53   ` Michał Pecio
  1 sibling, 0 replies; 16+ messages in thread
From: Michał Pecio @ 2024-08-28 15:02 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Mathias Nyman, linux-usb, Niklas Neronin

> We are in the middle of reworking transfer event handling with Niklas
> as well.
> 
> I'd be grateful if you could take a quick look at our solution and
> give your opinion on it as you have expertise in this area.
Beware that I may only have enough of that to be dangerous ;)

But I also have a few of those PCIe cards with various chips and all
three public revisions of the spec, so I'm trying to stay in spec and
verify things on hardware.

BTW, I only really touched isochronous transfers so far, but it looks
like some things are not entirely right in others too. The handling of
short packets for instance - it looks like TRB_ISP is set on various
transfers, but process_xxx_td() functions don't care if the event is
mid-TD and give back the URB immediately.

I though about fixing this (I have a naive hope that some xHC crashes
that I see are "undefined behavior" caused by driver bugs, even though
in reality the hardware is probably simply FUBAR) but I'd need to read
more spec chapters to get there.

> Main idea is simple.
> If a transfer event points to a TRB between dequeue and enqueue we
> give back all queued TDs passed up to this TRB.
> And if the event points to a valid TD then handle that. (normal case)
> 
> Code is much simpler, we get rid of skip flag, and could probably
> merge error_mid_td and urb_length_set flags into one flag.
Yes, lots of things could be better if the code walked the ring from
the last known dequeue to the current one and analyzed what's there.
We could know if the event is for something that we have already freed
(a bug?), or for any known pending TD and which one exactly, or for a
non-transfer TD (like Forced Stop Event).

Many of those thing are currently detected with heuristics which only
work about right if everything goes as planned (no bugs in HW or SW).

And first find the TD, then worry how to handle it. I like that Niklas
managed to shorten this giant loop already, and my patches #2 and #3
shorten it further and move TD selection logic up (#2) and all else
down (#3). This way we can work on selection and handling separately,
with less worrying about how they influence each other.

> The code isn't complete or tested yet. It doesn't handle
> under/overruns, but that could probably be fixed by just turning
> "return 0" for those cases into "break"
Problem is that not all events give valid dequeue pointers. These two,
for example, are not guaranteed to. Then we have no idea where on the
ring the HC currently is. The EP doesn't stall and it may have already
been resumed by a simple doorbell ring.

Regards,
Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/9] Various xhci fixes and improvements
  2024-08-28 11:14 ` Mathias Nyman
  2024-08-28 15:02   ` Michał Pecio
@ 2024-08-30 11:53   ` Michał Pecio
  1 sibling, 0 replies; 16+ messages in thread
From: Michał Pecio @ 2024-08-30 11:53 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Mathias Nyman, linux-usb, Niklas Neronin

Hi Mathias,

> Code on top of 6.11-rc4 can be found in my
> feature_transfer_event_rework branch:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git
Thanks for pointing me to this.

It does look like a very promising approach. This is the principle of
"matching before handling" taken to the extreme, as nothing is changed
during the first pass through the TD list. It seems to work well.

I feel like the counting of distances from ring->dequeue is a little,
I don't know, unexpected maybe, but it does indeed produce surprisingly
compact and simple code thanks to those segment numbers.

That being said, I've always been a strong case of NIH syndrome, so
I implemented a similar search for passed TDs which relies solely on
walking the ring in order from dequeue to the event DMA pointer.

So I guess we will have two implementations to compare and choose from.

I will try to develop this approach into a complete implementation that
actually works and can be tested. I want to maximally preserve original
behavior, except for obvious bugs (I found an obscure one already) to
simplify validation and review.

So, for example, no removing the skip flag for now. I'm not sure if
it's a good idea at all, as it frankly is sweeping bugs under the rug.
I have one HC which sometimes silently misses TDs for no known reason
and it misses so many of them that auto-skipping wouldn't help anyway.
OTOH, when I get the TRB errors, at least I know the HC is screwed.

Regards,
Michal

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-08-30 11:53 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-27 17:46 [PATCH 0/9] Various xhci fixes and improvements Michal Pecio
2024-08-27 17:47 ` [PATCH 1/9] usb: xhci: Fix double newline in a debug message Michal Pecio
2024-08-27 17:48 ` [PATCH 2/9] usb: xhci: Fix handling errors mid TD followed by other errors Michal Pecio
2024-08-27 17:49 ` [PATCH 3/9] usb: xhci: Clean up the TD skipping loop Michal Pecio
2024-08-27 17:50 ` [PATCH 4/9] usb: xhci: Expedite processing missed isoch TDs on modern HCs Michal Pecio
2024-08-27 17:51 ` [PATCH 5/9] usb: xhci: Simplify error_mid_td marking Michal Pecio
2024-08-27 17:52 ` [PATCH 6/9] usb: xhci: Sanity check "spurious success" handling Michal Pecio
2024-08-27 17:53 ` [PATCH 7/9] usb: xhci: Don't lie about trb_comp_code in handle_tx_event() Michal Pecio
2024-08-27 17:54 ` [PATCH 8/9] usb: xhci: Print completion code in the empty ring warning Michal Pecio
2024-08-27 17:55 ` [PATCH 9/9] usb: xhci: Clean up and rename bad stream event handler Michal Pecio
2024-08-28 10:21 ` [PATCH 0/9] Various xhci fixes and improvements FPS
2024-08-28 13:24   ` Michał Pecio
2024-08-28 13:48     ` fps
2024-08-28 11:14 ` Mathias Nyman
2024-08-28 15:02   ` Michał Pecio
2024-08-30 11:53   ` Michał Pecio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).