* [PATCH 0/5] Quick and effective handle_tx_event() cleanup
@ 2024-09-10 11:12 Michal Pecio
2024-09-10 11:13 ` [PATCH 1/5] usb: xhci: Fix handling errors mid TD followed by other errors Michal Pecio
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Michal Pecio @ 2024-09-10 11:12 UTC (permalink / raw)
To: Mathias Nyman, Niklas Neronin; +Cc: linux-usb
Hi,
I mentioned that I came up with a simple way to clean up the messy
(and buggy) event handling loop. Here are the patches.
Total line count is reduced by 26 and the sole loop remaining has no
breaks or continues and 8 lines of code. Some functionally duplicate
code is merged into one path. No functional change besides bugfixes.
Six defects identified by code review are resolved along the way.
I successfully reproduced #1, and #4 was seen in the wild on linux-usb.
1. Error mid TD followed by Missed Service is misreported as missed.
2. If EP stops on the next TD after error-mid-TD, neither TD is handled.
3. Empty list quiety ignored after short TD on hosts without the quirk.
4. Emergency stall recovery not attempted after "TRB not part of TD".
5. A race could prematurely complete a TD after an isoch ring underrun.
6. Error-mid-TD transfer on buggy HC is stuck forever if it's the last.
Debugging of TD skipping is improved - we know if/how many TDs were
skipped, in addition to whether a match was found or not. This enables
quickly catching cases when suspiciously many TDs are skipped (I have
seen a case of 150 skipped TDs, turned out to be a HW bug.)
The event handling process becomes linear - check a condition, handle
something, check for another condition, handle it, and so on. This is
much easier to reason about and to modify.
To demonstrate this point, patch 5/5 fixes a pair of stupid issues by
inserting one check, which would be duplicated 3 times before cleanup.
This series should be a good base for future work to resolve remaining
bugs. For example, the skipping loop could change from (simplified):
while (td && !trb_in_td(td, ep_trb_dma))
to
while (td && trb_after_td(td, ep_trb))
subject to providing a working implementation of trb_after_td(). I have
tested three implementations, some based on pre-scanning the list and
some on direct comparison, but I'm not 100% happy with any so far.
Mathias had a clever idea to use ring segment numbers for this. I tried
and it compiled and worked flawlessly on the first go, but it requires
passing all those seg pointers and total ring size around. This happens
to complicate sharing implementation with trb_in_td(), because users of
the latter don't currently provide such information. And I would like
to share implementation of these functions, as they are very similar.
Regards,
Michal
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/5] usb: xhci: Fix handling errors mid TD followed by other errors
2024-09-10 11:12 [PATCH 0/5] Quick and effective handle_tx_event() cleanup Michal Pecio
@ 2024-09-10 11:13 ` Michal Pecio
2024-09-10 11:14 ` [PATCH 2/5] usb: xhci: Clean up the TD skipping loop Michal Pecio
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Michal Pecio @ 2024-09-10 11:13 UTC (permalink / raw)
To: Mathias Nyman, Niklas Neronin; +Cc: linux-usb
Some host controllers fail to produce the final completion event on an
isochronous TD which experienced an error mid TD. We deal with it by
flagging such TDs and checking if the next event points at the flagged
TD or at the next one, and giving back the flagged TD if the latter.
This is not enough, because the next TD may be missed by the xHC. Or
there may be no next TD but a ring underrun. We also need to get such
TD quickly out of the way, or errors on later TDs may be handled wrong.
If the next TD experiences a Missed Service Error, we will set the skip
flag on the endpoint and then attempt skipping TDs when yet another
event arrives. In such scenario, we ought to report the 'erorr mid TD'
transfer as such rather than skip it.
Another problem case are Stopped events. If we see one after an error
mid TD, we naively assume that it's a Force Stopped Event because it
doesn't match the pending TD, but in reality it might be an ordinary
Stopped event for the next TD, which we fail to recognize and handle.
Fix this by moving error mid TD handling before the whole TD skipping
loop. Remove unnecessary conditions, always give back the TD if the new
event points to any TRB outside it or if the pointer is NULL, as may be
the case in Ring Underrun and Overrun events on 1st gen hardware. Only
if the pending TD isn't flagged, consider other actions like skipping.
As a side effect of reordering with skip and FSE cases, error mid TD is
reordered with last_td_was_short check. This is harmless, because the
two cases are mutually exclusive - only one can happen in any given run
of handle_tx_event().
Tested on the NEC host and a USB camera with flaky cable. Dynamic debug
confirmed that Transaction Errors are sometimes seen, sometimes mid-TD,
sometimes followed by Missed Service. In such cases, they were finished
properly before skipping began.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 70 +++++++++++++++++-------------------
1 file changed, 32 insertions(+), 38 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 4ea2c3e072a9..0cab482b3f4e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2764,6 +2764,30 @@ static int handle_tx_event(struct xhci_hcd *xhci,
return 0;
}
+ /*
+ * xhci 4.10.2 states isoc endpoints should continue
+ * processing the next TD if there was an error mid TD.
+ * So host like NEC don't generate an event for the last
+ * isoc TRB even if the IOC flag is set.
+ * xhci 4.9.1 states that if there are errors in mult-TRB
+ * TDs xHC should generate an error for that TRB, and if xHC
+ * proceeds to the next TD it should genete an event for
+ * any TRB with IOC flag on the way. Other host follow this.
+ *
+ * We wait for the final IOC event, but if we get an event
+ * anywhere outside this TD, just give it back already.
+ */
+ td = list_first_entry_or_null(&ep_ring->td_list,
+ struct xhci_td, td_list);
+
+ if (td && td->error_mid_td && !trb_in_td(xhci, td, ep_trb_dma, false)) {
+ xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
+ ep_ring->dequeue = td->last_trb;
+ ep_ring->deq_seg = td->last_trb_seg;
+ inc_deq(xhci, ep_ring);
+ xhci_td_cleanup(xhci, td, ep_ring, td->status);
+ }
+
do {
/* This TRB should be in the TD at the head of this ring's
* TD list.
@@ -2828,44 +2852,14 @@ static int handle_tx_event(struct xhci_hcd *xhci,
return 0;
}
- /*
- * xhci 4.10.2 states isoc endpoints should continue
- * processing the next TD if there was an error mid TD.
- * So host like NEC don't generate an event for the last
- * isoc TRB even if the IOC flag is set.
- * xhci 4.9.1 states that if there are errors in mult-TRB
- * TDs xHC should generate an error for that TRB, and if xHC
- * proceeds to the next TD it should genete an event for
- * any TRB with IOC flag on the way. Other host follow this.
- * So this event might be for the next TD.
- */
- if (td->error_mid_td &&
- !list_is_last(&td->td_list, &ep_ring->td_list)) {
- struct xhci_td *td_next = list_next_entry(td, td_list);
-
- ep_seg = trb_in_td(xhci, td_next, ep_trb_dma, false);
- if (ep_seg) {
- /* give back previous TD, start handling new */
- xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
- ep_ring->dequeue = td->last_trb;
- ep_ring->deq_seg = td->last_trb_seg;
- inc_deq(xhci, ep_ring);
- xhci_td_cleanup(xhci, td, ep_ring, td->status);
- td = td_next;
- }
- }
-
- if (!ep_seg) {
- /* HC is busted, give up! */
- xhci_err(xhci,
- "ERROR Transfer event TRB DMA ptr not "
- "part of current TD ep_index %d "
- "comp_code %u\n", ep_index,
- trb_comp_code);
- trb_in_td(xhci, td, ep_trb_dma, true);
-
- return -ESHUTDOWN;
- }
+ /* HC is busted, give up! */
+ xhci_err(xhci,
+ "ERROR Transfer event TRB DMA ptr not "
+ "part of current TD ep_index %d "
+ "comp_code %u\n", ep_index,
+ trb_comp_code);
+ trb_in_td(xhci, td, ep_trb_dma, true);
+ return -ESHUTDOWN;
}
if (ep->skip) {
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/5] usb: xhci: Clean up the TD skipping loop
2024-09-10 11:12 [PATCH 0/5] Quick and effective handle_tx_event() cleanup Michal Pecio
2024-09-10 11:13 ` [PATCH 1/5] usb: xhci: Fix handling errors mid TD followed by other errors Michal Pecio
@ 2024-09-10 11:14 ` Michal Pecio
2024-09-10 11:15 ` [PATCH 3/5] usb: xhci: Unify event handler's 'empty list' and 'no match' cases Michal Pecio
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Michal Pecio @ 2024-09-10 11:14 UTC (permalink / raw)
To: Mathias Nyman, Niklas Neronin; +Cc: linux-usb
Recent reworks left this loop ending with:
if (ep->skip)
ep->skip = false;
while (ep->skip);
which obviously cannot ever repeat. Repetition is only possible by a
'continue' statement earlier in the loop body.
Move the "tail", which only executes once, out of the loop body.
Bring the cases of "Found TD" and "skip isoc td" closely together to
consolidate the skipping logic in one place and improve clarity. Now
this code properly controls the loop condition.
No functional change.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 100 ++++++++++++++++++-----------------
1 file changed, 51 insertions(+), 49 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 0cab482b3f4e..0eef7cd2f20a 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2823,60 +2823,62 @@ static int handle_tx_event(struct xhci_hcd *xhci,
/* Is this a TRB in the currently executing TD? */
ep_seg = trb_in_td(xhci, td, ep_trb_dma, false);
- if (!ep_seg) {
-
- if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
- skip_isoc_td(xhci, td, ep, status);
- continue;
- }
-
- /*
- * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
- * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
- * pointer still at the previous TRB of the current TD. The previous TRB
- * maybe a Link TD or the last TRB of the previous TD. The command
- * completion handle will take care the rest.
- */
- if (trb_comp_code == COMP_STOPPED ||
- trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
- return 0;
- }
-
- /*
- * Some hosts give a spurious success event after a short
- * transfer. Ignore it.
- */
- if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
- ep_ring->last_td_was_short) {
- ep_ring->last_td_was_short = false;
- return 0;
- }
-
- /* HC is busted, give up! */
- xhci_err(xhci,
- "ERROR Transfer event TRB DMA ptr not "
- "part of current TD ep_index %d "
- "comp_code %u\n", ep_index,
- trb_comp_code);
- trb_in_td(xhci, td, ep_trb_dma, true);
- return -ESHUTDOWN;
- }
-
+ /*
+ * If ep->skip is set, it means there are missed tds on the
+ * endpoint ring need to take care of.
+ * Process them as short transfer until reach the td pointed by
+ * the event.
+ */
if (ep->skip) {
- xhci_dbg(xhci,
- "Found td. Clear skip flag for slot %u ep %u.\n",
- slot_id, ep_index);
- ep->skip = false;
+ if (ep_seg) {
+ xhci_dbg(xhci,
+ "Found td. Clear skip flag for slot %u ep %u.\n",
+ slot_id, ep_index);
+ ep->skip = false;
+ } else {
+ if (usb_endpoint_xfer_isoc(&td->urb->ep->desc))
+ skip_isoc_td(xhci, td, ep, status);
+ else
+ break;
+ }
}
- /*
- * If ep->skip is set, it means there are missed tds on the
- * endpoint ring need to take care of.
- * Process them as short transfer until reach the td pointed by
- * the event.
- */
} while (ep->skip);
+ if (!ep_seg) {
+
+ /*
+ * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
+ * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
+ * pointer still at the previous TRB of the current TD. The previous TRB
+ * maybe a Link TD or the last TRB of the previous TD. The command
+ * completion handle will take care the rest.
+ */
+ if (trb_comp_code == COMP_STOPPED ||
+ trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
+ return 0;
+ }
+
+ /*
+ * Some hosts give a spurious success event after a short
+ * transfer. Ignore it.
+ */
+ if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
+ ep_ring->last_td_was_short) {
+ ep_ring->last_td_was_short = false;
+ return 0;
+ }
+
+ /* HC is busted, give up! */
+ xhci_err(xhci,
+ "ERROR Transfer event TRB DMA ptr not "
+ "part of current TD ep_index %d "
+ "comp_code %u\n", ep_index,
+ trb_comp_code);
+ trb_in_td(xhci, td, ep_trb_dma, true);
+ return -ESHUTDOWN;
+ }
+
if (trb_comp_code == COMP_SHORT_PACKET)
ep_ring->last_td_was_short = true;
else
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 3/5] usb: xhci: Unify event handler's 'empty list' and 'no match' cases
2024-09-10 11:12 [PATCH 0/5] Quick and effective handle_tx_event() cleanup Michal Pecio
2024-09-10 11:13 ` [PATCH 1/5] usb: xhci: Fix handling errors mid TD followed by other errors Michal Pecio
2024-09-10 11:14 ` [PATCH 2/5] usb: xhci: Clean up the TD skipping loop Michal Pecio
@ 2024-09-10 11:15 ` Michal Pecio
2024-09-10 11:16 ` [PATCH 4/5] usb: xhci: Simplify the TD skipping loop further Michal Pecio
2024-09-10 11:16 ` [PATCH 5/5] usb: xhci: Fix Ring Underrun/Overrun handling Michal Pecio
4 siblings, 0 replies; 6+ messages in thread
From: Michal Pecio @ 2024-09-10 11:15 UTC (permalink / raw)
To: Mathias Nyman, Niklas Neronin; +Cc: linux-usb
Sometimes handle_tx_event() gets an event which doesn't match any
pending TD. Or it may be that there are simply no pending TDs at all.
These two cases are hardly different, but they are handled by two
separate blocks of code. Some logic is pointlessly duplicated, some
logic is missing or buggy in one branch or in the other.
Reduce the 'empty list' case in the searching loop to a minimum and
merge the code removed from there with almost identical code after
the loop, which deals with the 'no match' case.
This fixes the "spurious success" check in the 'empty list' case not
verifying if the host actually has the quirk, and the lack of halted
endpoint recovery in the 'no match' case.
Remove an obsolete attempt at stall recovery. This code relied on a
bug which has been fixed earlier this year, and it was never really
fully effective because not all cancelled TDs get No-Op'ed.
Make the empty list warning print event completion code, like the
'no match' case does.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 68 ++++++++++++------------------------
1 file changed, 23 insertions(+), 45 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 0eef7cd2f20a..56b0c0e85293 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2606,7 +2606,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
int ep_index;
struct xhci_td *td = NULL;
dma_addr_t ep_trb_dma;
- struct xhci_segment *ep_seg;
+ struct xhci_segment *ep_seg = NULL;
union xhci_trb *ep_trb;
int status = -EINPROGRESS;
struct xhci_ep_ctx *ep_ctx;
@@ -2793,28 +2793,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
* TD list.
*/
if (list_empty(&ep_ring->td_list)) {
- /*
- * Don't print wanings if it's due to a stopped endpoint
- * generating an extra completion event if the device
- * was suspended. Or, a event for the last TRB of a
- * short TD we already got a short event for.
- * The short TD is already removed from the TD list.
- */
-
- if (!(trb_comp_code == COMP_STOPPED ||
- trb_comp_code == COMP_STOPPED_LENGTH_INVALID ||
- ep_ring->last_td_was_short)) {
- xhci_warn(xhci, "WARN Event TRB for slot %u ep %d with no TDs queued?\n",
- slot_id, ep_index);
- }
if (ep->skip) {
ep->skip = false;
xhci_dbg(xhci, "td_list is empty while skip flag set. Clear skip flag for slot %u ep %u.\n",
slot_id, ep_index);
}
-
- td = NULL;
- goto check_endpoint_halted;
+ break;
}
td = list_first_entry(&ep_ring->td_list, struct xhci_td,
@@ -2848,11 +2832,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
if (!ep_seg) {
/*
- * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
- * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
- * pointer still at the previous TRB of the current TD. The previous TRB
- * maybe a Link TD or the last TRB of the previous TD. The command
- * completion handle will take care the rest.
+ * Ignore the Force Stopped Event. The endpoint may stop on
+ * some Link or No-Op TRB outside our TDs. We don't care.
*/
if (trb_comp_code == COMP_STOPPED ||
trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
@@ -2870,13 +2851,25 @@ static int handle_tx_event(struct xhci_hcd *xhci,
}
/* HC is busted, give up! */
- xhci_err(xhci,
- "ERROR Transfer event TRB DMA ptr not "
- "part of current TD ep_index %d "
- "comp_code %u\n", ep_index,
- trb_comp_code);
- trb_in_td(xhci, td, ep_trb_dma, true);
- return -ESHUTDOWN;
+ if (list_empty(&ep_ring->td_list)) {
+ xhci_warn(xhci, "WARN Event TRB for slot %u ep %d comp_code %u with no TDs queued?\n",
+ slot_id, ep_index, trb_comp_code);
+ } else {
+ xhci_err(xhci, "ERROR Transfer event TRB DMA ptr not part of current TD ep_index %d comp_code %u\n",
+ ep_index, trb_comp_code);
+ trb_in_td(xhci, td, ep_trb_dma, true);
+ }
+
+ /*
+ * Bugs (in HW or SW) may cause the xHC to execute transfers as
+ * they are being cancelled and forgotten about. Then we get some
+ * event and have no idea which transfer caused it. If the event
+ * indicates that the EP halted, try to fix that at least.
+ */
+ if (xhci_halted_host_endpoint(ep_ctx, trb_comp_code))
+ xhci_handle_halted_endpoint(xhci, ep, td, EP_HARD_RESET);
+
+ return 0;
}
if (trb_comp_code == COMP_SHORT_PACKET)
@@ -2887,16 +2880,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
ep_trb = &ep_seg->trbs[(ep_trb_dma - ep_seg->dma) / sizeof(*ep_trb)];
trace_xhci_handle_transfer(ep_ring, (struct xhci_generic_trb *) ep_trb);
- /*
- * No-op TRB could trigger interrupts in a case where a URB was killed
- * and a STALL_ERROR happens right after the endpoint ring stopped.
- * Reset the halted endpoint. Otherwise, the endpoint remains stalled
- * indefinitely.
- */
-
- if (trb_is_noop(ep_trb))
- goto check_endpoint_halted;
-
td->status = status;
/* update the urb's actual_length and give back to the core */
@@ -2906,11 +2889,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
process_isoc_td(xhci, ep, ep_ring, td, ep_trb, event);
else
process_bulk_intr_td(xhci, ep, ep_ring, td, ep_trb, event);
- return 0;
-
-check_endpoint_halted:
- if (xhci_halted_host_endpoint(ep_ctx, trb_comp_code))
- xhci_handle_halted_endpoint(xhci, ep, td, EP_HARD_RESET);
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 4/5] usb: xhci: Simplify the TD skipping loop further
2024-09-10 11:12 [PATCH 0/5] Quick and effective handle_tx_event() cleanup Michal Pecio
` (2 preceding siblings ...)
2024-09-10 11:15 ` [PATCH 3/5] usb: xhci: Unify event handler's 'empty list' and 'no match' cases Michal Pecio
@ 2024-09-10 11:16 ` Michal Pecio
2024-09-10 11:16 ` [PATCH 5/5] usb: xhci: Fix Ring Underrun/Overrun handling Michal Pecio
4 siblings, 0 replies; 6+ messages in thread
From: Michal Pecio @ 2024-09-10 11:16 UTC (permalink / raw)
To: Mathias Nyman, Niklas Neronin; +Cc: linux-usb
Now that everything not related to skipping has been removed from the
loop, its role becomes more clear. It repeatedly skips the first TD on
the ring, as long as one exists, doesn't match the event, and is isoc.
Rewrite this logic more succintly and unify the various exit paths.
The skip flag is always cleared, so do it only once (nitpick: it used
not to be cleared on non-isoc exit, no big deal). Print one universal
debug message too, and count how many TDs were skipped.
Note: the test for non-isoc URB is valuable because skip_isoc_td()
writes to URB fields which don't exist on non-isoc URBs. This avoids
memory corruption in the unlucky event of a severe HW or SW bug.
Add a helper function for getting the first TD from the ring's queue
to make the code less of an eyesore.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 61 +++++++++++++++++-------------------
1 file changed, 28 insertions(+), 33 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 56b0c0e85293..db9bc7db5aac 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2591,6 +2591,15 @@ static int handle_transferless_tx_event(struct xhci_hcd *xhci, struct xhci_virt_
return 0;
}
+/*
+ * Return the first TD pending on a transfer ring, or NULL if none.
+ */
+static struct xhci_td *xhci_ring_pending_td(struct xhci_ring *ring)
+{
+ return list_first_entry_or_null(&ring->td_list,
+ struct xhci_td, td_list);
+}
+
/*
* If this function returns an error condition, it means it got a Transfer
* event with a corrupted Slot ID, Endpoint ID, or TRB DMA address.
@@ -2777,8 +2786,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
* We wait for the final IOC event, but if we get an event
* anywhere outside this TD, just give it back already.
*/
- td = list_first_entry_or_null(&ep_ring->td_list,
- struct xhci_td, td_list);
+ td = xhci_ring_pending_td(ep_ring);
if (td && td->error_mid_td && !trb_in_td(xhci, td, ep_trb_dma, false)) {
xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
@@ -2786,48 +2794,35 @@ static int handle_tx_event(struct xhci_hcd *xhci,
ep_ring->deq_seg = td->last_trb_seg;
inc_deq(xhci, ep_ring);
xhci_td_cleanup(xhci, td, ep_ring, td->status);
+
+ td = xhci_ring_pending_td(ep_ring);
}
- do {
- /* This TRB should be in the TD at the head of this ring's
- * TD list.
- */
- if (list_empty(&ep_ring->td_list)) {
- if (ep->skip) {
- ep->skip = false;
- xhci_dbg(xhci, "td_list is empty while skip flag set. Clear skip flag for slot %u ep %u.\n",
- slot_id, ep_index);
- }
- break;
- }
-
- td = list_first_entry(&ep_ring->td_list, struct xhci_td,
- td_list);
-
- /* Is this a TRB in the currently executing TD? */
+ /* Is this a TRB in the currently executing TD? */
+ if (td)
ep_seg = trb_in_td(xhci, td, ep_trb_dma, false);
+ if (ep->skip) {
+ int skipped = 0;
/*
* If ep->skip is set, it means there are missed tds on the
* endpoint ring need to take care of.
* Process them as short transfer until reach the td pointed by
* the event.
*/
- if (ep->skip) {
- if (ep_seg) {
- xhci_dbg(xhci,
- "Found td. Clear skip flag for slot %u ep %u.\n",
- slot_id, ep_index);
- ep->skip = false;
- } else {
- if (usb_endpoint_xfer_isoc(&td->urb->ep->desc))
- skip_isoc_td(xhci, td, ep, status);
- else
- break;
- }
- }
- } while (ep->skip);
+ while (td && !ep_seg &&
+ usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
+ skip_isoc_td(xhci, td, ep, status);
+ td = xhci_ring_pending_td(ep_ring);
+ if (td)
+ ep_seg = trb_in_td(xhci, td, ep_trb_dma, false);
+ skipped++;
+ }
+ ep->skip = false;
+ xhci_dbg(xhci, "Skipped %d isoc TDs, TD %sfound, clear skip flag for slot %u ep %u\n",
+ skipped, td ? "":"not ", slot_id, ep_index);
+ }
if (!ep_seg) {
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 5/5] usb: xhci: Fix Ring Underrun/Overrun handling
2024-09-10 11:12 [PATCH 0/5] Quick and effective handle_tx_event() cleanup Michal Pecio
` (3 preceding siblings ...)
2024-09-10 11:16 ` [PATCH 4/5] usb: xhci: Simplify the TD skipping loop further Michal Pecio
@ 2024-09-10 11:16 ` Michal Pecio
4 siblings, 0 replies; 6+ messages in thread
From: Michal Pecio @ 2024-09-10 11:16 UTC (permalink / raw)
To: Mathias Nyman, Niklas Neronin; +Cc: linux-usb
Depending on implemented spec revision, the TRB pointer of these events
may either be NULL or point at enqueue at the time of error occurrence.
By the time we handle the event, a new TD may be queued at this address.
Ensure that the new TD is not completed prematurely if the event handler
is entered due to the skip flag being set on the endpoint. Or, when the
TRB pointer is NULL, prevent the empty ring warning being printed.
Now that it is safe to enter the event handler, also enter it when the
skip flag is not set. This enables completion of any TD stuck in 'error
mid TD' state on buggy hosts. Such problem could, for example, happen
when a USBFS application knows in advance how many frames it needs and
submits the exact number of URBs, then an error occurs on the last TD.
One bug remains: when enqueue points at a Link TRB and a new TD appears
after that, skipping will remove the new TD prematurely. This should be
255 times less common that the 'matching TD' bug being fixed here, and
it will take a major improvement to the skipping loop to fix.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index db9bc7db5aac..475b4d69551b 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2726,14 +2726,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
* Underrun Event for OUT Isoch endpoint.
*/
xhci_dbg(xhci, "Underrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ break;
case COMP_RING_OVERRUN:
xhci_dbg(xhci, "Overrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ break;
case COMP_MISSED_SERVICE_ERROR:
/*
* When encounter missed service error, one or more isoc tds
@@ -2824,6 +2820,15 @@ static int handle_tx_event(struct xhci_hcd *xhci,
skipped, td ? "":"not ", slot_id, ep_index);
}
+ /*
+ * In these events ep_trb_dma is NULL or points at enqueue from the time
+ * of error occurrence. If it matches a new TD queued since then, don't
+ * complete the TD now. And otherwise, don't print senseless warnings.
+ */
+ if (trb_comp_code == COMP_RING_UNDERRUN ||
+ trb_comp_code == COMP_RING_OVERRUN)
+ return 0;
+
if (!ep_seg) {
/*
--
2.43.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-09-10 11:16 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10 11:12 [PATCH 0/5] Quick and effective handle_tx_event() cleanup Michal Pecio
2024-09-10 11:13 ` [PATCH 1/5] usb: xhci: Fix handling errors mid TD followed by other errors Michal Pecio
2024-09-10 11:14 ` [PATCH 2/5] usb: xhci: Clean up the TD skipping loop Michal Pecio
2024-09-10 11:15 ` [PATCH 3/5] usb: xhci: Unify event handler's 'empty list' and 'no match' cases Michal Pecio
2024-09-10 11:16 ` [PATCH 4/5] usb: xhci: Simplify the TD skipping loop further Michal Pecio
2024-09-10 11:16 ` [PATCH 5/5] usb: xhci: Fix Ring Underrun/Overrun handling Michal Pecio
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).