* [PATCH 0/5] xHCI: Isochronous error handling fixes and improvements
@ 2025-02-10 7:37 Michal Pecio
2025-02-10 7:38 ` [PATCH 1/5] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Michal Pecio
` (4 more replies)
0 siblings, 5 replies; 14+ messages in thread
From: Michal Pecio @ 2025-02-10 7:37 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman; +Cc: Niklas Neronin, linux-usb, linux-kernel
These patches reduce latency of error reporting in some cases related
to 'error mid TD' and Missed Service events and sometimes fix a failure
to give back such TDs altogether until they are cancelled.
Also included are fixes for potential packet loss or memory corruption
due to an obscure race. Whether it causes problems IRL is not known and
the worst case would be hard to reproduce, but exactly for this reason
if the worst case actually happens, it could be hard to debug too.
2/5 is a cleanup which makes 3/5 implementation shorter and simpler.
The first three should be safe. The fourth should also be safe, but it
relies on HC functionality Linux never relied on before, so I placed it
towards the end in case it would need some tweaks. I tested it on all
hardware I have and it worked just fine.
The last one is perhaps the most controversial, though it should be
OK with typical "complete -> resubmit" drivers. It's the only one here
which increases latency in some severe error cases. The intent is to
avoid potentially giving back URBs not yet executed by hardware.
Michal Pecio (5):
usb: xhci: Complete 'error mid TD' transfers when handling Missed
Service
usb: xhci: Clean up the TD skipping loop
usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
usb: xhci: Expedite skipping missed isoch TDs on modern HCs
usb: xhci: Skip only one TD on Ring Underrun/Overrun
drivers/usb/host/xhci-ring.c | 110 +++++++++++++++++++++--------------
1 file changed, 67 insertions(+), 43 deletions(-)
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/5] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service
2025-02-10 7:37 [PATCH 0/5] xHCI: Isochronous error handling fixes and improvements Michal Pecio
@ 2025-02-10 7:38 ` Michal Pecio
2025-02-10 7:39 ` [PATCH 2/5] usb: xhci: Clean up the TD skipping loop Michal Pecio
` (3 subsequent siblings)
4 siblings, 0 replies; 14+ messages in thread
From: Michal Pecio @ 2025-02-10 7:38 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman; +Cc: Niklas Neronin, linux-usb, linux-kernel
Missed Service Error after an error mid TD means that the failed TD has
already been passed by the xHC without acknowledgment of the final TRB,
a known hardware bug. So don't wait any more and give back the TD.
Reproduced on NEC uPD720200 under conditions of ludicrously bad USB link
quality, confirmed to behave as expected using dynamic debug.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 965bffce301e..af6c4c4cbe1c 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2789,7 +2789,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
xhci_dbg(xhci,
"Miss service interval error for slot %u ep %u, set skip flag\n",
slot_id, ep_index);
- return 0;
+ break;
case COMP_NO_PING_RESPONSE_ERROR:
ep->skip = true;
xhci_dbg(xhci,
@@ -2837,6 +2837,10 @@ static int handle_tx_event(struct xhci_hcd *xhci,
xhci_dequeue_td(xhci, td, ep_ring, td->status);
}
+ /* Missed TDs will be skipped on the next event */
+ if (trb_comp_code == COMP_MISSED_SERVICE_ERROR)
+ return 0;
+
if (list_empty(&ep_ring->td_list)) {
/*
* Don't print wanings if ring is empty due to a stopped endpoint generating an
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/5] usb: xhci: Clean up the TD skipping loop
2025-02-10 7:37 [PATCH 0/5] xHCI: Isochronous error handling fixes and improvements Michal Pecio
2025-02-10 7:38 ` [PATCH 1/5] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Michal Pecio
@ 2025-02-10 7:39 ` Michal Pecio
2025-02-22 12:37 ` Neronin, Niklas
2025-02-10 7:40 ` [PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Michal Pecio
` (2 subsequent siblings)
4 siblings, 1 reply; 14+ messages in thread
From: Michal Pecio @ 2025-02-10 7:39 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman; +Cc: Niklas Neronin, linux-usb, linux-kernel
Half of this loop is code which only executes once to deal with cases
where no TD matches the event and then immediately returns. This code
has no need to be in any kind of loop, so get it out.
Shuffle the remaining conditionals a little to improve readability.
No functional change.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 68 ++++++++++++++++++------------------
1 file changed, 34 insertions(+), 34 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index af6c4c4cbe1c..9b06a911a16e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2866,9 +2866,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
/* Is this a TRB in the currently executing TD? */
ep_seg = trb_in_td(xhci, td, ep_trb_dma, false);
- if (!ep_seg) {
+ if (ep->skip) {
- if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
+ if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
skip_isoc_td(xhci, td, ep, status);
if (!list_empty(&ep_ring->td_list))
continue;
@@ -2880,38 +2880,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
goto check_endpoint_halted;
}
- /*
- * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
- * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
- * pointer still at the previous TRB of the current TD. The previous TRB
- * maybe a Link TD or the last TRB of the previous TD. The command
- * completion handle will take care the rest.
- */
- if (trb_comp_code == COMP_STOPPED ||
- trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
- return 0;
- }
-
- /*
- * Some hosts give a spurious success event after a short
- * transfer. Ignore it.
- */
- if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
- ep_ring->last_td_was_short) {
- ep_ring->last_td_was_short = false;
- return 0;
- }
-
- /* HC is busted, give up! */
- xhci_err(xhci,
- "ERROR Transfer event TRB DMA ptr not part of current TD ep_index %d comp_code %u\n",
- ep_index, trb_comp_code);
- trb_in_td(xhci, td, ep_trb_dma, true);
-
- return -ESHUTDOWN;
- }
-
- if (ep->skip) {
xhci_dbg(xhci,
"Found td. Clear skip flag for slot %u ep %u.\n",
slot_id, ep_index);
@@ -2926,6 +2894,38 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
} while (ep->skip);
+ if (!ep_seg) {
+ /*
+ * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
+ * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
+ * pointer still at the previous TRB of the current TD. The previous TRB
+ * maybe a Link TD or the last TRB of the previous TD. The command
+ * completion handle will take care the rest.
+ */
+ if (trb_comp_code == COMP_STOPPED ||
+ trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
+ return 0;
+ }
+
+ /*
+ * Some hosts give a spurious success event after a short
+ * transfer. Ignore it.
+ */
+ if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
+ ep_ring->last_td_was_short) {
+ ep_ring->last_td_was_short = false;
+ return 0;
+ }
+
+ /* HC is busted, give up! */
+ xhci_err(xhci,
+ "ERROR Transfer event TRB DMA ptr not part of current TD ep_index %d comp_code %u\n",
+ ep_index, trb_comp_code);
+ trb_in_td(xhci, td, ep_trb_dma, true);
+
+ return -ESHUTDOWN;
+ }
+
if (trb_comp_code == COMP_SHORT_PACKET)
ep_ring->last_td_was_short = true;
else
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
2025-02-10 7:37 [PATCH 0/5] xHCI: Isochronous error handling fixes and improvements Michal Pecio
2025-02-10 7:38 ` [PATCH 1/5] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Michal Pecio
2025-02-10 7:39 ` [PATCH 2/5] usb: xhci: Clean up the TD skipping loop Michal Pecio
@ 2025-02-10 7:40 ` Michal Pecio
2025-02-10 7:41 ` [PATCH 4/5] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Michal Pecio
2025-02-10 7:42 ` [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun Michal Pecio
4 siblings, 0 replies; 14+ messages in thread
From: Michal Pecio @ 2025-02-10 7:40 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman; +Cc: Niklas Neronin, linux-usb, linux-kernel
The TRB pointer of these events points at enqueue at the time of error
occurrence on xHCI 1.1+ HCs or it's NULL on older ones. By the time we
are handling the event, a new TD may be queued at this ring position.
I can trigger this race by rising interrupt moderation to increase IRQ
handling delay. Similar delay may occur naturally due to system load.
If this ever happens after a Missed Service Error, missed TDs will be
skipped and the new TD processed as if it matched the event. It could
be given back prematurely, risking data loss or buffer UAF by the xHC.
Don't complete TDs on xrun events and don't warn if queued TDs don't
match the event's TRB pointer, which can be NULL or a link/no-op TRB.
Now that it's safe, also handle xrun events if the skip flag is clear.
This ensures completion of any TD stuck in 'error mid TD' state right
before the xrun event, which could happen if a driver submits a finite
number of URBs to a buggy HC and then an error occurs on the last TD.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9b06a911a16e..11a53e310826 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2664,6 +2664,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
int status = -EINPROGRESS;
struct xhci_ep_ctx *ep_ctx;
u32 trb_comp_code;
+ bool ring_xrun_event = false;
slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags));
ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1;
@@ -2770,14 +2771,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
* Underrun Event for OUT Isoch endpoint.
*/
xhci_dbg(xhci, "Underrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ ring_xrun_event = true;
+ break;
case COMP_RING_OVERRUN:
xhci_dbg(xhci, "Overrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ ring_xrun_event = true;
+ break;
case COMP_MISSED_SERVICE_ERROR:
/*
* When encounter missed service error, one or more isoc tds
@@ -2894,6 +2893,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
} while (ep->skip);
+ if (ring_xrun_event)
+ return 0; /* don't warn or complete any TDs */
+
if (!ep_seg) {
/*
* Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/5] usb: xhci: Expedite skipping missed isoch TDs on modern HCs
2025-02-10 7:37 [PATCH 0/5] xHCI: Isochronous error handling fixes and improvements Michal Pecio
` (2 preceding siblings ...)
2025-02-10 7:40 ` [PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Michal Pecio
@ 2025-02-10 7:41 ` Michal Pecio
2025-02-10 7:42 ` [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun Michal Pecio
4 siblings, 0 replies; 14+ messages in thread
From: Michal Pecio @ 2025-02-10 7:41 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman; +Cc: Niklas Neronin, linux-usb, linux-kernel
xHCI spec rev. 1.0 allowed the TRB pointer of Missed Service events
to be NULL. Having no idea which of the queued TDs were missed and
which are waiting, we can only set a flag to skip missed TDs later.
But HCs are also allowed to give us pointer to the last missed TRB,
and this became mandatory in spec rev. 1.1 and later.
Use this pointer, if available, to immediately skip all missed TDs.
This reduces latency and risk of skipping-related bugs, because we
can now leave the skip flag cleared for future events.
Handle Missed Service Error events as 'error mid TD', if applicable,
because rev. 1.0 spec excplicitly says so in notes to 4.10.3.2 and
later revs in 4.10.3.2 and 4.11.2.5.2. Notes to 4.9.1 seem to apply.
Tested on ASM1142 and ASM3142 v1.1 xHCs which provide TRB pointers.
Tested on AMD, Etron, Renesas v1.0 xHCs which provide TRB pointers.
Tested on NEC v0.96 and VIA v1.0 xHCs which send a NULL pointer.
Change inspired by a discussion about realtime USB audio.
Link: https://lore.kernel.org/linux-usb/76e1a191-020d-4a76-97f6-237f9bd0ede0@gmx.net/T/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 11a53e310826..878abf5b745d 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2476,6 +2476,12 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
if (ep_trb != td->end_trb)
td->error_mid_td = true;
break;
+ case COMP_MISSED_SERVICE_ERROR:
+ frame->status = -EXDEV;
+ sum_trbs_for_length = true;
+ if (ep_trb != td->end_trb)
+ td->error_mid_td = true;
+ break;
case COMP_INCOMPATIBLE_DEVICE_ERROR:
case COMP_STALL_ERROR:
frame->status = -EPROTO;
@@ -2786,8 +2792,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
ep->skip = true;
xhci_dbg(xhci,
- "Miss service interval error for slot %u ep %u, set skip flag\n",
- slot_id, ep_index);
+ "Miss service interval error for slot %u ep %u, set skip flag%s\n",
+ slot_id, ep_index, ep_trb_dma ? ", skip now" : "");
break;
case COMP_NO_PING_RESPONSE_ERROR:
ep->skip = true;
@@ -2836,8 +2842,8 @@ static int handle_tx_event(struct xhci_hcd *xhci,
xhci_dequeue_td(xhci, td, ep_ring, td->status);
}
- /* Missed TDs will be skipped on the next event */
- if (trb_comp_code == COMP_MISSED_SERVICE_ERROR)
+ /* If the TRB pointer is NULL, missed TDs will be skipped on the next event */
+ if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb_dma)
return 0;
if (list_empty(&ep_ring->td_list)) {
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
2025-02-10 7:37 [PATCH 0/5] xHCI: Isochronous error handling fixes and improvements Michal Pecio
` (3 preceding siblings ...)
2025-02-10 7:41 ` [PATCH 4/5] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Michal Pecio
@ 2025-02-10 7:42 ` Michal Pecio
2025-02-11 15:41 ` Mathias Nyman
4 siblings, 1 reply; 14+ messages in thread
From: Michal Pecio @ 2025-02-10 7:42 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman; +Cc: Niklas Neronin, linux-usb, linux-kernel
If skipping is deferred to events other than Missed Service Error itsef,
it means we are running on an xHCI 1.0 host and don't know how many TDs
were missed until we reach some ordinary transfer completion event.
And in case of ring xrun, we can't know where the xrun happened either.
If we skip all pending TDs, we may prematurely give back TDs added after
the xrun had occurred, risking data loss or buffer UAF by the xHC.
If we skip none, a driver may become confused and stop working when all
its URBs are missed and appear to be "in flight" forever.
Skip exactly one TD on each xrun event - the first one that was missed,
as we can now be sure that the HC has finished processing it. Provided
that one more TD is queued before any subsequent doorbell ring, it will
become safe to skip another TD by the time we get an xrun again.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 878abf5b745d..049206a1db76 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2875,6 +2875,18 @@ static int handle_tx_event(struct xhci_hcd *xhci,
if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
skip_isoc_td(xhci, td, ep, status);
+
+ if (ring_xrun_event) {
+ /*
+ * If we are here, we are on xHCI 1.0 host with no idea how
+ * many TDs were missed and where the xrun occurred. Don't
+ * skip more TDs, they may have been queued after the xrun.
+ */
+ xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
+ slot_id, ep_index);
+ break;
+ }
+
if (!list_empty(&ep_ring->td_list))
continue;
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
2025-02-10 7:42 ` [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun Michal Pecio
@ 2025-02-11 15:41 ` Mathias Nyman
2025-02-12 7:30 ` Michał Pecio
2025-02-21 1:17 ` Michał Pecio
0 siblings, 2 replies; 14+ messages in thread
From: Mathias Nyman @ 2025-02-11 15:41 UTC (permalink / raw)
To: Michal Pecio, Mathias Nyman, Greg Kroah-Hartman
Cc: Niklas Neronin, linux-usb, linux-kernel
On 10.2.2025 9.42, Michal Pecio wrote:
> If skipping is deferred to events other than Missed Service Error itsef,
> it means we are running on an xHCI 1.0 host and don't know how many TDs
> were missed until we reach some ordinary transfer completion event.
>
> And in case of ring xrun, we can't know where the xrun happened either.
>
> If we skip all pending TDs, we may prematurely give back TDs added after
> the xrun had occurred, risking data loss or buffer UAF by the xHC.
>
> If we skip none, a driver may become confused and stop working when all
> its URBs are missed and appear to be "in flight" forever.
>
> Skip exactly one TD on each xrun event - the first one that was missed,
> as we can now be sure that the HC has finished processing it. Provided
> that one more TD is queued before any subsequent doorbell ring, it will
> become safe to skip another TD by the time we get an xrun again.
>
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
> drivers/usb/host/xhci-ring.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 878abf5b745d..049206a1db76 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2875,6 +2875,18 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>
> if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
> skip_isoc_td(xhci, td, ep, status);
> +
> + if (ring_xrun_event) {
> + /*
> + * If we are here, we are on xHCI 1.0 host with no idea how
> + * many TDs were missed and where the xrun occurred. Don't
> + * skip more TDs, they may have been queued after the xrun.
> + */
> + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> + slot_id, ep_index);
> + break;
This would be the same as return 0; right?
Whole series looks good, I'll add it
-Mathias
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
2025-02-11 15:41 ` Mathias Nyman
@ 2025-02-12 7:30 ` Michał Pecio
2025-02-21 1:17 ` Michał Pecio
1 sibling, 0 replies; 14+ messages in thread
From: Michał Pecio @ 2025-02-12 7:30 UTC (permalink / raw)
To: Mathias Nyman
Cc: Mathias Nyman, Greg Kroah-Hartman, Niklas Neronin, linux-usb,
linux-kernel
On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
> > + if (ring_xrun_event) {
> > + /*
> > + * If we are here, we are on xHCI 1.0 host with no idea how
> > + * many TDs were missed and where the xrun occurred. Don't
> > + * skip more TDs, they may have been queued after the xrun.
> > + */
> > + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> > + slot_id, ep_index);
> > + break;
>
> This would be the same as return 0; right?
Currently, yes. I know it looks silly, but I thought it would be more
future proof than hardcoding 'return 0' into the loop. The point it to
simply stop iteration, what happens next is none of the loop's business.
I hope gcc is clever enough to do the right thing here.
Regards,
Michal
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
2025-02-11 15:41 ` Mathias Nyman
2025-02-12 7:30 ` Michał Pecio
@ 2025-02-21 1:17 ` Michał Pecio
2025-02-21 1:18 ` [PATCH v2 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Michal Pecio
` (2 more replies)
1 sibling, 3 replies; 14+ messages in thread
From: Michał Pecio @ 2025-02-21 1:17 UTC (permalink / raw)
To: Mathias Nyman
Cc: Mathias Nyman, Greg Kroah-Hartman, Niklas Neronin, linux-usb,
linux-kernel
On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
> On 10.2.2025 9.42, Michal Pecio wrote:
> > + if (ring_xrun_event) {
> > + /*
> > + * If we are here, we are on xHCI 1.0 host with no idea how
> > + * many TDs were missed and where the xrun occurred. Don't
> > + * skip more TDs, they may have been queued after the xrun.
> > + */
> > + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> > + slot_id, ep_index);
> > + break;
>
> This would be the same as return 0; right?
>
> Whole series looks good, I'll add it
I hope you haven't sent it out yet because I found two minor issues.
Firstly,
[PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
increases the number of xrun events that we handle but doesn't suppress
the "Event TRB for slot %u ep %u with no TDs queued\n" warning, so the
warning started to show up sometimes for no good reason. The fix is to
add ring_xrun_event to the list of exception for this warning.
Secondly,
[PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
can be improved to clear the skip flag if skipped TD was the only one.
This eliminates any confusion and risk of skipping bugs in the future.
The change is a matter of moving that code to a different branch.
I also changed 'break' to 'return 0' because it gets hard to follow at
this level of indentation.
I'll send a v2 of those two patches. Sorry for any inconvenience.
Michal
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
2025-02-21 1:17 ` Michał Pecio
@ 2025-02-21 1:18 ` Michal Pecio
2025-02-21 1:20 ` [PATCH v2 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun Michal Pecio
2025-02-21 13:17 ` [PATCH " Mathias Nyman
2 siblings, 0 replies; 14+ messages in thread
From: Michal Pecio @ 2025-02-21 1:18 UTC (permalink / raw)
To: Mathias Nyman
Cc: Mathias Nyman, Greg Kroah-Hartman, Niklas Neronin, linux-usb,
linux-kernel
The TRB pointer of these events points at enqueue at the time of error
occurrence on xHCI 1.1+ HCs or it's NULL on older ones. By the time we
are handling the event, a new TD may be queued at this ring position.
I can trigger this race by rising interrupt moderation to increase IRQ
handling delay. Similar delay may occur naturally due to system load.
If this ever happens after a Missed Service Error, missed TDs will be
skipped and the new TD processed as if it matched the event. It could
be given back prematurely, risking data loss or buffer UAF by the xHC.
Don't complete TDs on xrun events and don't warn if queued TDs don't
match the event's TRB pointer, which can be NULL or a link/no-op TRB.
Now that it's safe, also handle xrun events if the skip flag is clear.
This ensures completion of any TD stuck in 'error mid TD' state right
before the xrun event, which could happen if a driver submits a finite
number of URBs to a buggy HC and then an error occurs on the last TD.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index a278f284f4c0..4cf17801a233 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2632,6 +2632,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
int status = -EINPROGRESS;
struct xhci_ep_ctx *ep_ctx;
u32 trb_comp_code;
+ bool ring_xrun_event = false;
slot_id = TRB_TO_SLOT_ID(le32_to_cpu(event->flags));
ep_index = TRB_TO_EP_ID(le32_to_cpu(event->flags)) - 1;
@@ -2738,14 +2739,12 @@ static int handle_tx_event(struct xhci_hcd *xhci,
* Underrun Event for OUT Isoch endpoint.
*/
xhci_dbg(xhci, "Underrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ ring_xrun_event = true;
+ break;
case COMP_RING_OVERRUN:
xhci_dbg(xhci, "Overrun event on slot %u ep %u\n", slot_id, ep_index);
- if (ep->skip)
- break;
- return 0;
+ ring_xrun_event = true;
+ break;
case COMP_MISSED_SERVICE_ERROR:
/*
* When encounter missed service error, one or more isoc tds
@@ -2818,6 +2817,7 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
if (trb_comp_code != COMP_STOPPED &&
trb_comp_code != COMP_STOPPED_LENGTH_INVALID &&
+ !ring_xrun_event &&
!ep_ring->last_td_was_short) {
xhci_warn(xhci, "Event TRB for slot %u ep %u with no TDs queued\n",
slot_id, ep_index);
@@ -2862,6 +2862,9 @@ static int handle_tx_event(struct xhci_hcd *xhci,
*/
} while (ep->skip);
+ if (ring_xrun_event)
+ return 0; /* don't warn or complete any TDs */
+
if (!ep_seg) {
/*
* Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
2025-02-21 1:17 ` Michał Pecio
2025-02-21 1:18 ` [PATCH v2 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Michal Pecio
@ 2025-02-21 1:20 ` Michal Pecio
2025-02-21 13:17 ` [PATCH " Mathias Nyman
2 siblings, 0 replies; 14+ messages in thread
From: Michal Pecio @ 2025-02-21 1:20 UTC (permalink / raw)
To: Mathias Nyman
Cc: Mathias Nyman, Greg Kroah-Hartman, Niklas Neronin, linux-usb,
linux-kernel
If skipping is deferred to events other than Missed Service Error itsef,
it means we are running on an xHCI 1.0 host and don't know how many TDs
were missed until we reach some ordinary transfer completion event.
And in case of ring xrun, we can't know where the xrun happened either.
If we skip all pending TDs, we may prematurely give back TDs added after
the xrun had occurred, risking data loss or buffer UAF by the xHC.
If we skip none, a driver may become confused and stop working when all
its URBs are missed and appear to be "in flight" forever.
Skip exactly one TD on each xrun event - the first one that was missed,
as we can now be sure that the HC has finished processing it. Provided
that one more TD is queued before any subsequent doorbell ring, it will
become safe to skip another TD by the time we get an xrun again.
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
drivers/usb/host/xhci-ring.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 167ae7dfca47..d78b7877f65f 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2844,8 +2844,21 @@ static int handle_tx_event(struct xhci_hcd *xhci,
if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
skip_isoc_td(xhci, td, ep, status);
- if (!list_empty(&ep_ring->td_list))
+
+ if (!list_empty(&ep_ring->td_list)) {
+ if (ring_xrun_event) {
+ /*
+ * If we are here, we are on xHCI 1.0 host with no
+ * idea how many TDs were missed or where the xrun
+ * occurred. New TDs may have been added after the
+ * xrun, so skip only one TD to be safe.
+ */
+ xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
+ slot_id, ep_index);
+ return 0;
+ }
continue;
+ }
xhci_dbg(xhci, "All TDs skipped for slot %u ep %u. Clear skip flag.\n",
slot_id, ep_index);
--
2.48.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
2025-02-21 1:17 ` Michał Pecio
2025-02-21 1:18 ` [PATCH v2 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Michal Pecio
2025-02-21 1:20 ` [PATCH v2 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun Michal Pecio
@ 2025-02-21 13:17 ` Mathias Nyman
2 siblings, 0 replies; 14+ messages in thread
From: Mathias Nyman @ 2025-02-21 13:17 UTC (permalink / raw)
To: Michał Pecio
Cc: Mathias Nyman, Greg Kroah-Hartman, Niklas Neronin, linux-usb,
linux-kernel
On 21.2.2025 3.17, Michał Pecio wrote:
> On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
>> On 10.2.2025 9.42, Michal Pecio wrote:
>>> + if (ring_xrun_event) {
>>> + /*
>>> + * If we are here, we are on xHCI 1.0 host with no idea how
>>> + * many TDs were missed and where the xrun occurred. Don't
>>> + * skip more TDs, they may have been queued after the xrun.
>>> + */
>>> + xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
>>> + slot_id, ep_index);
>>> + break;
>>
>> This would be the same as return 0; right?
>>
>> Whole series looks good, I'll add it
>
> I hope you haven't sent it out yet because I found two minor issues.
>
>
> Firstly,
> [PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling
>
> increases the number of xrun events that we handle but doesn't suppress
> the "Event TRB for slot %u ep %u with no TDs queued\n" warning, so the
> warning started to show up sometimes for no good reason. The fix is to
> add ring_xrun_event to the list of exception for this warning.
>
>
> Secondly,
> [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun
>
> can be improved to clear the skip flag if skipped TD was the only one.
> This eliminates any confusion and risk of skipping bugs in the future.
> The change is a matter of moving that code to a different branch.
>
> I also changed 'break' to 'return 0' because it gets hard to follow at
> this level of indentation.
>
>
> I'll send a v2 of those two patches. Sorry for any inconvenience.
Patches updated, they are now in my for-usb-next branch
Thanks
Mathias
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] usb: xhci: Clean up the TD skipping loop
2025-02-10 7:39 ` [PATCH 2/5] usb: xhci: Clean up the TD skipping loop Michal Pecio
@ 2025-02-22 12:37 ` Neronin, Niklas
2025-02-24 0:02 ` Michał Pecio
0 siblings, 1 reply; 14+ messages in thread
From: Neronin, Niklas @ 2025-02-22 12:37 UTC (permalink / raw)
To: Michal Pecio, Mathias Nyman, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel
On 10/02/2025 9.39, Michal Pecio wrote:
> Half of this loop is code which only executes once to deal with cases
> where no TD matches the event and then immediately returns. This code
> has no need to be in any kind of loop, so get it out.
>
> Shuffle the remaining conditionals a little to improve readability.
>
> No functional change.
>
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
>
> - if (!ep_seg) {
> + if (ep->skip) {
>
> - if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
> + if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
> skip_isoc_td(xhci, td, ep, status);
> if (!list_empty(&ep_ring->td_list))
> continue;
> @@ -2880,38 +2880,6 @@ static int handle_tx_event(struct xhci_hcd *xhci,
> goto check_endpoint_halted;
> }
>
> - /*
> - * Skip the Force Stopped Event. The 'ep_trb' of FSE is not in the current
> - * TD pointed by 'ep_ring->dequeue' because that the hardware dequeue
> - * pointer still at the previous TRB of the current TD. The previous TRB
> - * maybe a Link TD or the last TRB of the previous TD. The command
> - * completion handle will take care the rest.
> - */
> - if (trb_comp_code == COMP_STOPPED ||
> - trb_comp_code == COMP_STOPPED_LENGTH_INVALID) {
> - return 0;
> - }
> -
> - /*
> - * Some hosts give a spurious success event after a short
> - * transfer. Ignore it.
> - */
> - if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
> - ep_ring->last_td_was_short) {
> - ep_ring->last_td_was_short = false;
> - return 0;
> - }
> -
> - /* HC is busted, give up! */
> - xhci_err(xhci,
> - "ERROR Transfer event TRB DMA ptr not part of current TD ep_index %d comp_code %u\n",
> - ep_index, trb_comp_code);
> - trb_in_td(xhci, td, ep_trb_dma, true);
> -
> - return -ESHUTDOWN;
> - }
> -
> - if (ep->skip) {
> xhci_dbg(xhci,
> "Found td. Clear skip flag for slot %u ep %u.\n",
> slot_id, ep_index);
This debug message is now misleading, the TD way or may not be found on non-isochronous.
Before:
if (ep_seg && ep->skip)
xhci_dbg(xhci, "Found td. ...
After:
if (ep->skip && (ep_seg || !isoc))
xhci_dbg(xhci, "Found td. ...
With Best Regards,
Niklas
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] usb: xhci: Clean up the TD skipping loop
2025-02-22 12:37 ` Neronin, Niklas
@ 2025-02-24 0:02 ` Michał Pecio
0 siblings, 0 replies; 14+ messages in thread
From: Michał Pecio @ 2025-02-24 0:02 UTC (permalink / raw)
To: Neronin, Niklas
Cc: Mathias Nyman, Greg Kroah-Hartman, linux-usb, linux-kernel
On Sat, 22 Feb 2025 14:37:58 +0200, Neronin, Niklas wrote:
> This debug message is now misleading, the TD way or may not be found
> on non-isochronous.
>
> Before:
> if (ep_seg && ep->skip)
> xhci_dbg(xhci, "Found td. ...
> After:
> if (ep->skip && (ep_seg || !isoc))
> xhci_dbg(xhci, "Found td. ...
Hmm, you're right, the whole block will now execute in this
pathological edge case and we will clear the flag too.
It can be fixed quite easily, but I think I may actually drop this
patch altogether. It will make the next patch slightly more verbose
(that's why I included this one), but it will also make it possible
to backport any of those patches to 6.12-lts if a need arises.
I also realized that one more skipping pathology is a recent (6.11)
regression and perhaps it too could be fixed without major rework,
basically by going back to something similar to pre-6.11 behavior.
I should have v3 ready in a day or a few.
Regards,
Michal
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-02-24 0:02 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-10 7:37 [PATCH 0/5] xHCI: Isochronous error handling fixes and improvements Michal Pecio
2025-02-10 7:38 ` [PATCH 1/5] usb: xhci: Complete 'error mid TD' transfers when handling Missed Service Michal Pecio
2025-02-10 7:39 ` [PATCH 2/5] usb: xhci: Clean up the TD skipping loop Michal Pecio
2025-02-22 12:37 ` Neronin, Niklas
2025-02-24 0:02 ` Michał Pecio
2025-02-10 7:40 ` [PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Michal Pecio
2025-02-10 7:41 ` [PATCH 4/5] usb: xhci: Expedite skipping missed isoch TDs on modern HCs Michal Pecio
2025-02-10 7:42 ` [PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun Michal Pecio
2025-02-11 15:41 ` Mathias Nyman
2025-02-12 7:30 ` Michał Pecio
2025-02-21 1:17 ` Michał Pecio
2025-02-21 1:18 ` [PATCH v2 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling Michal Pecio
2025-02-21 1:20 ` [PATCH v2 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun Michal Pecio
2025-02-21 13:17 ` [PATCH " Mathias Nyman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox