public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/1] usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue
@ 2025-04-14  8:18 Michal Pecio
  2025-04-14  8:19 ` [PATCH 1/1] " Michal Pecio
  0 siblings, 1 reply; 2+ messages in thread
From: Michal Pecio @ 2025-04-14  8:18 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman
  Cc: Jonathan Bell, Oliver Neukum, linux-usb, linux-kernel

Hi,

I ran into an annoying bug in VIA hardware, where Endpoint Context
state is not updated when it's supposed to and Set TR Dequeue issued
by the driver sets wrong cycle state and the endpoint stops working.

This can be triggered by stalls, which are relatively common:
- some card readers without a card inserted
- disk I/O errors
- failing SMART commands, possibly issued by daemons like udisks2

I knew immediately what to do, because it's a known bug and Raspberry
Pi has a fix for it, which they submitted upstream but it got reverted
later. The revert is puzzling, because it was a boot issue, happening
before the changed code is supposed to run.

If the problem wasn't misdiagnozed and blamed on a wrong patch, I can
imagine the workaround could put the chip in a bad state which caused
problems after rebooting. The commit surely had a blatant endian bug
and VL805 is prone to locking up hard under some unclear conditions.

This is my attempt at a simpler, and hopefully correct, solution.
I tried to come up with something stable-friendly, but the patch is
not marked for stable becasue it wouldn't compile anyway. If there
is interest in having this in stable, we can trivially backport it
(a matter of end_trb/last_trb rename) and submit later.

Regards,
Michal

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [PATCH 1/1] usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue
  2025-04-14  8:18 [PATCH 0/1] usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue Michal Pecio
@ 2025-04-14  8:19 ` Michal Pecio
  0 siblings, 0 replies; 2+ messages in thread
From: Michal Pecio @ 2025-04-14  8:19 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman
  Cc: Jonathan Bell, Oliver Neukum, linux-usb, linux-kernel

VIA VL805 doesn't bother updating the EP Context cycle bit when the
endpoint halts. This is seen by patching xhci_move_dequeue_past_td()
to print the cycle bits of the EP Context and the TRB at hw_dequeue
and then disconnecting a flash drive while reading it. Actual cycle
state is random as expected, but the EP Context bit is always 1.

This means that the cycle state produced by this function is wrong
half the time, and then the endpoint stops working.

Work around it by looking at the cycle bit of TD's end_trb instead
of believing the Endpoint or Stream Context. Specifically:

- rename cycle_found to hw_dequeue_found to avoid confusion
- initialize new_cycle from td->end_trb instead of hw_dequeue
- switch new_cycle toggling to happen after end_trb is found

Now a workload which regularly stalls the device works normally for
a few hours and clearly demonstrates the HW bug - the EP Context bit
is not updated in a new cycle until Set TR Dequeue overwrites it:

[  +0,000298] sd 10:0:0:0: [sdc] Attached SCSI disk
[  +0,011758] cycle bits: TRB 1 EP Ctx 1
[  +5,947138] cycle bits: TRB 1 EP Ctx 1
[  +0,065731] cycle bits: TRB 0 EP Ctx 1
[  +0,064022] cycle bits: TRB 0 EP Ctx 0
[  +0,063297] cycle bits: TRB 0 EP Ctx 0
[  +0,069823] cycle bits: TRB 0 EP Ctx 0
[  +0,063390] cycle bits: TRB 1 EP Ctx 0
[  +0,063064] cycle bits: TRB 1 EP Ctx 1
[  +0,062293] cycle bits: TRB 1 EP Ctx 1
[  +0,066087] cycle bits: TRB 0 EP Ctx 1
[  +0,063636] cycle bits: TRB 0 EP Ctx 0
[  +0,066360] cycle bits: TRB 0 EP Ctx 0

Also tested on the buggy ASM1042 which moves EP Context dequeue to
the next TRB after errors, one problem case addressed by the rework
that implemented this loop. In this case hw_dequeue can be enqueue,
so simply picking the cycle bit of TRB at hw_dequeue wouldn't work.

Commit 5255660b208a ("xhci: add quirk for host controllers that
don't update endpoint DCS") tried to solve the stale cycle problem,
but it was more complex and got reverted due to a reported issue.

Cc: Jonathan Bell <jonathan@raspberrypi.org>
Cc: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index b62ae7953979..9937e7c8be70 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -694,7 +694,7 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
 	int new_cycle;
 	dma_addr_t addr;
 	u64 hw_dequeue;
-	bool cycle_found = false;
+	bool hw_dequeue_found = false;
 	bool td_last_trb_found = false;
 	u32 trb_sct = 0;
 	int ret;
@@ -710,25 +710,24 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
 	hw_dequeue = xhci_get_hw_deq(xhci, dev, ep_index, stream_id);
 	new_seg = ep_ring->deq_seg;
 	new_deq = ep_ring->dequeue;
-	new_cycle = hw_dequeue & 0x1;
+	new_cycle = le32_to_cpu(td->end_trb->generic.field[3]) & TRB_CYCLE;
 
 	/*
-	 * We want to find the pointer, segment and cycle state of the new trb
-	 * (the one after current TD's end_trb). We know the cycle state at
-	 * hw_dequeue, so walk the ring until both hw_dequeue and end_trb are
-	 * found.
+	 * Walk the ring until both the next TRB and hw_dequeue are found (don't
+	 * move hw_dequeue back if it went forward due to a HW bug). Cycle state
+	 * is loaded from a known good TRB, track later toggles to maintain it.
 	 */
 	do {
-		if (!cycle_found && xhci_trb_virt_to_dma(new_seg, new_deq)
+		if (!hw_dequeue_found && xhci_trb_virt_to_dma(new_seg, new_deq)
 		    == (dma_addr_t)(hw_dequeue & ~0xf)) {
-			cycle_found = true;
+			hw_dequeue_found = true;
 			if (td_last_trb_found)
 				break;
 		}
 		if (new_deq == td->end_trb)
 			td_last_trb_found = true;
 
-		if (cycle_found && trb_is_link(new_deq) &&
+		if (td_last_trb_found && trb_is_link(new_deq) &&
 		    link_trb_toggles_cycle(new_deq))
 			new_cycle ^= 0x1;
 
@@ -740,7 +739,7 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
 			return -EINVAL;
 		}
 
-	} while (!cycle_found || !td_last_trb_found);
+	} while (!hw_dequeue_found || !td_last_trb_found);
 
 	/* Don't update the ring cycle state for the producer (us). */
 	addr = xhci_trb_virt_to_dma(new_seg, new_deq);
-- 
2.48.1

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2025-04-14  8:20 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-14  8:18 [PATCH 0/1] usb: xhci: Don't trust the EP Context cycle bit when moving HW dequeue Michal Pecio
2025-04-14  8:19 ` [PATCH 1/1] " Michal Pecio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox