public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk
@ 2025-02-25 11:57 Michal Pecio
  2025-02-25 11:58 ` [PATCH v2 1/3] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Michal Pecio
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Michal Pecio @ 2025-02-25 11:57 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel

I was looking at all uses of enqueue/dequeue pointers and I found two
rather complex loops which appear to be doing really simple things.

I don't understand why they were written this way, it seems wasteful
and I see nothing that should go wrong if they are replaced with much
simpler code.

I rewrote them and the driver still works. I exercised Set TR Dequeue
code by starting/stopping isoc streams, using usb-storage with crappy
cable (transaction errors, halts) and also the smartctl -x trick that
results in URB unlinks (both on usb-storage and uas) with some disks.

The third patch is a dedupe. BTW, that comment there about section
6.4.4.1 of the 0.95 spec seems to be wrong, I suspect it should say
that the chain bit cannot be *cleared* because that's how the code
works and what some commit messages say. But I don't have 0.95 spec.

New in v2:
- dropped the patch for obsolete update_ring_for_set_deq_completion()
- added a patch to enable the link chain quirk on one more HC
- don't touch the chain bit in inc_enq_past_link() on quirky HCs
- don't call inc_enq_past_link() unnecessarily

Michal Pecio (3):
  usb: xhci: Apply the link chain quirk on NEC isoc endpoints
  usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs
  usb: xhci: Unify duplicate inc_enq() code

 drivers/usb/host/xhci-ring.c | 198 +++++++++++++----------------------
 drivers/usb/host/xhci.h      |  13 ++-
 2 files changed, 86 insertions(+), 125 deletions(-)

-- 
2.48.1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] usb: xhci: Apply the link chain quirk on NEC isoc endpoints
  2025-02-25 11:57 [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Michal Pecio
@ 2025-02-25 11:58 ` Michal Pecio
  2025-02-25 11:59 ` [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs Michal Pecio
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Michal Pecio @ 2025-02-25 11:58 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel

Two clearly different specimens of NEC uPD720200 (one with start/stop
bug, one without) were seen to cause IOMMU faults after some Missed
Service Errors. Faulting address is immediately after a transfer ring
segment and patched dynamic debug messages revealed that the MSE was
received when waiting for a TD near the end of that segment:

[ 5441.041954] xhci_hcd 0000:0c:00.0: Miss service interval error for slot 1 ep 2 expected TD DMA ffa08fe0
[ 5441.042120] xhci_hcd 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09000 flags=0x0000]
[ 5441.042146] xhci_hcd 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09040 flags=0x0000]

It gets even funnier if the next page is a ring segment accessible to
the HC. Below, it reports MSE in segment at ff1e8000, plows through a
zero-filled page at ff1e9000 and starts reporting events for TRBs in
page at ff1ea000 every microframe, instead of jumping to seg ff1e6000.

[ 4807.041671] xhci_hcd 0000:0c:00.0: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 4807.041999] xhci_hcd 0000:0c:00.0: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 4807.042011] xhci_hcd 0000:0c:00.0: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 4807.042028] xhci_hcd 0000:0c:00.0: All TDs skipped for slot 1 ep 2. Clear skip flag.
[ 4807.042134] xhci_hcd 0000:0c:00.0: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 4807.042138] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 4807.042144] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ea040 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0
[ 4807.042259] xhci_hcd 0000:0c:00.0: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 4807.042262] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 4807.042266] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ea050 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0

At some point completion events change from Isoch Buffer Overrun to
Short Packet and the HC finally finds cycle bit mismatch in ff1ec000.

[ 4807.098130] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 4807.098132] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ecc50 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0
[ 4807.098254] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 4807.098256] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ecc60 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0
[ 4807.098379] xhci_hcd 0000:0c:00.0: Overrun event on slot 1 ep 2

It's possible that data from the isochronous device were written to
random buffers of pending TDs on other endpoints (either IN or OUT),
other devices or even other HCs in the same IOMMU domain.

Lastly, an error from a different USB device on another HC. Was it
caused by the above? I don't know, but it may have been. The disk
was working without any other issues and generated PCIe traffic to
starve the NEC of upstream BW and trigger those MSEs. The two HCs
shared one x1 slot by means of a commercial "PCIe splitter" board.

[ 4807.162604] usb 10-2: reset SuperSpeed USB device number 3 using xhci_hcd
[ 4807.178990] sd 9:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[ 4807.179001] sd 9:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 04 02 ae 00 00 02 00 00
[ 4807.179004] I/O error, dev sdb, sector 67284480 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0

Fortunately, it appears that this ridiculous bug is avoided by setting
the chain bit of Link TRBs on isochronous rings. Other ancient HCs are
known which also expect the bit to be set and they ignore Link TRBs if
it's not. Reportedly, 0.95 spec guaranteed that the bit is set.

The bandwidth-starved NEC HC running a 32KB/uframe UVC endpoint reports
tens of MSEs per second and runs into the bug within seconds. Chaining
Link TRBs allows the same workload to run for many minutes, many times.

No negative side effects seen in UVC recording and UAC playback with a
few devices at full speed, high speed and SuperSpeed.

The problem doesn't reproduce on the newer Renesas uPD720201/uPD720202
and on old Etron EJ168 and VIA VL805 (but the VL805 has other bug).

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Cc: stable@vger.kernel.org
---
 drivers/usb/host/xhci.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 377dad9cd639..2ad31e147d67 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1761,11 +1761,20 @@ static inline void xhci_write_64(struct xhci_hcd *xhci,
 }
 
 
-/* Link TRB chain should always be set on 0.95 hosts, and AMD 0.96 ISOC rings */
+/*
+ * Reportedly, some chapters of v0.95 spec said that Link TRB always has its chain bit set.
+ * Other chapters and later specs say that it should only be set if the link is inside a TD
+ * which continues from the end of one segment to the next segment.
+ *
+ * Some 0.95 hardware was found to misbehave if any link TRB doesn't have the chain bit set.
+ *
+ * 0.96 hardware from AMD and NEC was found to ignore unchained isochronous link TRBs when
+ * "resynchronizing the pipe" after a Missed Service Error.
+ */
 static inline bool xhci_link_chain_quirk(struct xhci_hcd *xhci, enum xhci_ring_type type)
 {
 	return (xhci->quirks & XHCI_LINK_TRB_QUIRK) ||
-	       (type == TYPE_ISOC && (xhci->quirks & XHCI_AMD_0x96_HOST));
+	       (type == TYPE_ISOC && (xhci->quirks & (XHCI_AMD_0x96_HOST | XHCI_NEC_HOST)));
 }
 
 /* xHCI debugging */
-- 
2.48.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs
  2025-02-25 11:57 [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Michal Pecio
  2025-02-25 11:58 ` [PATCH v2 1/3] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Michal Pecio
@ 2025-02-25 11:59 ` Michal Pecio
  2025-02-25 14:55   ` Mathias Nyman
  2025-02-25 12:00 ` [PATCH v2 3/3] usb: xhci: Unify duplicate inc_enq() code Michal Pecio
  2025-02-26 12:39 ` [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Mathias Nyman
  3 siblings, 1 reply; 7+ messages in thread
From: Michal Pecio @ 2025-02-25 11:59 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel

xhci_move_dequeue_past_td() uses a relatively complex and inefficient
procedure to find new dequeue position after the cancelled TD.

Replace it with a simpler function which moves dequeue immediately to
the first pending TD, or to enqueue if the ring is empty.

The outcome should be basically equivalent, because we only clear xHC
cache if it stopped or halted on some cancelled TD and moving past the
TD effectively means moving to the first remaining TD, if any.

If the cancelled TD is followed by more cancelled TDs turned into No-
Ops, we will now jump over them and save the xHC some work.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 64 ++++++++++--------------------------
 1 file changed, 18 insertions(+), 46 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 965bffce301e..fd2d5b371483 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -630,9 +630,9 @@ static u64 xhci_get_hw_deq(struct xhci_hcd *xhci, struct xhci_virt_device *vdev,
 	return le64_to_cpu(ep_ctx->deq);
 }
 
-static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
-				unsigned int slot_id, unsigned int ep_index,
-				unsigned int stream_id, struct xhci_td *td)
+/* Move HW dequeue to the first pending TD or to our enqueue if there are no TDs */
+static int set_ring_dequeue(struct xhci_hcd *xhci, unsigned int slot_id,
+				unsigned int ep_index, unsigned int stream_id)
 {
 	struct xhci_virt_device *dev = xhci->devs[slot_id];
 	struct xhci_virt_ep *ep = &dev->eps[ep_index];
@@ -640,58 +640,31 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci,
 	struct xhci_command *cmd;
 	struct xhci_segment *new_seg;
 	union xhci_trb *new_deq;
+	struct xhci_td *td;
 	int new_cycle;
 	dma_addr_t addr;
-	u64 hw_dequeue;
-	bool cycle_found = false;
-	bool td_last_trb_found = false;
 	u32 trb_sct = 0;
 	int ret;
 
-	ep_ring = xhci_triad_to_transfer_ring(xhci, slot_id,
-			ep_index, stream_id);
+	ep_ring = xhci_triad_to_transfer_ring(xhci, slot_id, ep_index, stream_id);
+
 	if (!ep_ring) {
 		xhci_warn(xhci, "WARN can't find new dequeue, invalid stream ID %u\n",
 			  stream_id);
 		return -ENODEV;
 	}
 
-	hw_dequeue = xhci_get_hw_deq(xhci, dev, ep_index, stream_id);
-	new_seg = ep_ring->deq_seg;
-	new_deq = ep_ring->dequeue;
-	new_cycle = hw_dequeue & 0x1;
+	if (!list_empty(&ep_ring->td_list)) {
+		td = list_first_entry(&ep_ring->td_list, struct xhci_td, td_list);
+		new_seg = td->start_seg;
+		new_deq = td->start_trb;
+		new_cycle = le32_to_cpu(new_deq->generic.field[3]) & TRB_CYCLE;
+	} else {
+		new_seg = ep_ring->enq_seg;
+		new_deq = ep_ring->enqueue;
+		new_cycle = ep_ring->cycle_state;
+	}
 
-	/*
-	 * We want to find the pointer, segment and cycle state of the new trb
-	 * (the one after current TD's end_trb). We know the cycle state at
-	 * hw_dequeue, so walk the ring until both hw_dequeue and end_trb are
-	 * found.
-	 */
-	do {
-		if (!cycle_found && xhci_trb_virt_to_dma(new_seg, new_deq)
-		    == (dma_addr_t)(hw_dequeue & ~0xf)) {
-			cycle_found = true;
-			if (td_last_trb_found)
-				break;
-		}
-		if (new_deq == td->end_trb)
-			td_last_trb_found = true;
-
-		if (cycle_found && trb_is_link(new_deq) &&
-		    link_trb_toggles_cycle(new_deq))
-			new_cycle ^= 0x1;
-
-		next_trb(&new_seg, &new_deq);
-
-		/* Search wrapped around, bail out */
-		if (new_deq == ep->ring->dequeue) {
-			xhci_err(xhci, "Error: Failed finding new dequeue state\n");
-			return -EINVAL;
-		}
-
-	} while (!cycle_found || !td_last_trb_found);
-
-	/* Don't update the ring cycle state for the producer (us). */
 	addr = xhci_trb_virt_to_dma(new_seg, new_deq);
 	if (addr == 0) {
 		xhci_warn(xhci, "Can't find dma of new dequeue ptr\n");
@@ -1055,9 +1028,8 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep)
 	if (!cached_td)
 		return 0;
 
-	err = xhci_move_dequeue_past_td(xhci, slot_id, ep->ep_index,
-					cached_td->urb->stream_id,
-					cached_td);
+	err = set_ring_dequeue(xhci, slot_id, ep->ep_index, cached_td->urb->stream_id);
+
 	if (err) {
 		/* Failed to move past cached td, just set cached TDs to no-op */
 		list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) {
-- 
2.48.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] usb: xhci: Unify duplicate inc_enq() code
  2025-02-25 11:57 [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Michal Pecio
  2025-02-25 11:58 ` [PATCH v2 1/3] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Michal Pecio
  2025-02-25 11:59 ` [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs Michal Pecio
@ 2025-02-25 12:00 ` Michal Pecio
  2025-02-26 12:39 ` [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Mathias Nyman
  3 siblings, 0 replies; 7+ messages in thread
From: Michal Pecio @ 2025-02-25 12:00 UTC (permalink / raw)
  To: Mathias Nyman, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel

Extract a block of code copied from inc_enq() into a separate function
and call it from inc_enq() and the other function which used this code.
Remove the pointless 'next' variable which only aliases ring->enqueue.

Note: I don't know if any 0.95 xHC ever reached series production, but
"AMD 0.96 host" appears to be the "Llano" family APU. Example dmesg at
https://linux-hardware.org/?probe=79d5cfd4fd&log=dmesg

pci 0000:00:10.0: [1022:7812] type 00 class 0x0c0330
hcc params 0x014042c3 hci version 0x96 quirks 0x0000000000000608

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 134 +++++++++++++++--------------------
 1 file changed, 57 insertions(+), 77 deletions(-)

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index fd2d5b371483..f325b8959a5a 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -203,6 +203,50 @@ void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring)
 	return;
 }
 
+/*
+ * If enqueue points at a link TRB, follow links until an ordinary TRB is reached.
+ * Toggle the cycle bit of passed link TRBs and optionally chain them.
+ */
+static void inc_enq_past_link(struct xhci_hcd *xhci, struct xhci_ring *ring, u32 chain)
+{
+	unsigned int link_trb_count = 0;
+
+	while (trb_is_link(ring->enqueue)) {
+
+		/*
+		 * Section 6.4.4.1 of the 0.95 spec says link TRBs cannot have the chain bit
+		 * set, but other sections talk about dealing with the chain bit set. This was
+		 * fixed in the 0.96 specification errata, but we have to assume that all 0.95
+		 * xHCI hardware can't handle the chain bit being cleared on a link TRB.
+		 *
+		 * On 0.95 and some 0.96 HCs the chain bit is set once at segment initalization
+		 * and never changed here. On all others, modify it as requested by the caller.
+		 */
+		if (!xhci_link_chain_quirk(xhci, ring->type)) {
+			ring->enqueue->link.control &= cpu_to_le32(~TRB_CHAIN);
+			ring->enqueue->link.control |= cpu_to_le32(chain);
+		}
+
+		/* Give this link TRB to the hardware */
+		wmb();
+		ring->enqueue->link.control ^= cpu_to_le32(TRB_CYCLE);
+
+		/* Toggle the cycle bit after the last ring segment. */
+		if (link_trb_toggles_cycle(ring->enqueue))
+			ring->cycle_state ^= 1;
+
+		ring->enq_seg = ring->enq_seg->next;
+		ring->enqueue = ring->enq_seg->trbs;
+
+		trace_xhci_inc_enq(ring);
+
+		if (link_trb_count++ > ring->num_segs) {
+			xhci_warn(xhci, "Link TRB loop at enqueue\n");
+			break;
+		}
+	}
+}
+
 /*
  * See Cycle bit rules. SW is the consumer for the event ring only.
  *
@@ -211,11 +255,6 @@ void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring)
  * If we've enqueued the last TRB in a TD, make sure the following link TRBs
  * have their chain bit cleared (so that each Link TRB is a separate TD).
  *
- * Section 6.4.4.1 of the 0.95 spec says link TRBs cannot have the chain bit
- * set, but other sections talk about dealing with the chain bit set.  This was
- * fixed in the 0.96 specification errata, but we have to assume that all 0.95
- * xHCI hardware can't handle the chain bit being cleared on a link TRB.
- *
  * @more_trbs_coming:	Will you enqueue more TRBs before calling
  *			prepare_transfer()?
  */
@@ -223,8 +262,6 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
 			bool more_trbs_coming)
 {
 	u32 chain;
-	union xhci_trb *next;
-	unsigned int link_trb_count = 0;
 
 	chain = le32_to_cpu(ring->enqueue->generic.field[3]) & TRB_CHAIN;
 
@@ -233,48 +270,16 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring *ring,
 		return;
 	}
 
-	next = ++(ring->enqueue);
-
-	/* Update the dequeue pointer further if that was a link TRB */
-	while (trb_is_link(next)) {
-
-		/*
-		 * If the caller doesn't plan on enqueueing more TDs before
-		 * ringing the doorbell, then we don't want to give the link TRB
-		 * to the hardware just yet. We'll give the link TRB back in
-		 * prepare_ring() just before we enqueue the TD at the top of
-		 * the ring.
-		 */
-		if (!chain && !more_trbs_coming)
-			break;
-
-		/* If we're not dealing with 0.95 hardware or isoc rings on
-		 * AMD 0.96 host, carry over the chain bit of the previous TRB
-		 * (which may mean the chain bit is cleared).
-		 */
-		if (!xhci_link_chain_quirk(xhci, ring->type)) {
-			next->link.control &= cpu_to_le32(~TRB_CHAIN);
-			next->link.control |= cpu_to_le32(chain);
-		}
-		/* Give this link TRB to the hardware */
-		wmb();
-		next->link.control ^= cpu_to_le32(TRB_CYCLE);
-
-		/* Toggle the cycle bit after the last ring segment. */
-		if (link_trb_toggles_cycle(next))
-			ring->cycle_state ^= 1;
-
-		ring->enq_seg = ring->enq_seg->next;
-		ring->enqueue = ring->enq_seg->trbs;
-		next = ring->enqueue;
-
-		trace_xhci_inc_enq(ring);
-
-		if (link_trb_count++ > ring->num_segs) {
-			xhci_warn(xhci, "%s: Ring link TRB loop\n", __func__);
-			break;
-		}
-	}
+	ring->enqueue++;
+
+	/*
+	 * If we are in the middle of a TD or the caller plans to enqueue more
+	 * TDs as one transfer (eg. control), traverse any link TRBs right now.
+	 * Otherwise, enqueue can stay on a link until the next prepare_ring().
+	 * This avoids enqueue entering deq_seg and simplifies ring expansion.
+	 */
+	if (trb_is_link(ring->enqueue) && (chain || more_trbs_coming))
+		inc_enq_past_link(xhci, ring, chain);
 }
 
 /*
@@ -3188,7 +3193,6 @@ static void queue_trb(struct xhci_hcd *xhci, struct xhci_ring *ring,
 static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring,
 		u32 ep_state, unsigned int num_trbs, gfp_t mem_flags)
 {
-	unsigned int link_trb_count = 0;
 	unsigned int new_segs = 0;
 
 	/* Make sure the endpoint has been added to xHC schedule */
@@ -3236,33 +3240,9 @@ static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring,
 		}
 	}
 
-	while (trb_is_link(ep_ring->enqueue)) {
-		/* If we're not dealing with 0.95 hardware or isoc rings
-		 * on AMD 0.96 host, clear the chain bit.
-		 */
-		if (!xhci_link_chain_quirk(xhci, ep_ring->type))
-			ep_ring->enqueue->link.control &=
-				cpu_to_le32(~TRB_CHAIN);
-		else
-			ep_ring->enqueue->link.control |=
-				cpu_to_le32(TRB_CHAIN);
-
-		wmb();
-		ep_ring->enqueue->link.control ^= cpu_to_le32(TRB_CYCLE);
-
-		/* Toggle the cycle bit after the last ring segment. */
-		if (link_trb_toggles_cycle(ep_ring->enqueue))
-			ep_ring->cycle_state ^= 1;
-
-		ep_ring->enq_seg = ep_ring->enq_seg->next;
-		ep_ring->enqueue = ep_ring->enq_seg->trbs;
-
-		/* prevent infinite loop if all first trbs are link trbs */
-		if (link_trb_count++ > ep_ring->num_segs) {
-			xhci_warn(xhci, "Ring is an endless link TRB loop\n");
-			return -EINVAL;
-		}
-	}
+	/* Ensure that new TRBs won't overwrite a link */
+	if (trb_is_link(ep_ring->enqueue))
+		inc_enq_past_link(xhci, ep_ring, 0);
 
 	if (last_trb_on_seg(ep_ring->enq_seg, ep_ring->enqueue)) {
 		xhci_warn(xhci, "Missing link TRB at end of ring segment\n");
-- 
2.48.1

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs
  2025-02-25 11:59 ` [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs Michal Pecio
@ 2025-02-25 14:55   ` Mathias Nyman
  2025-02-25 22:17     ` Michał Pecio
  0 siblings, 1 reply; 7+ messages in thread
From: Mathias Nyman @ 2025-02-25 14:55 UTC (permalink / raw)
  To: Michal Pecio, Mathias Nyman, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel

On 25.2.2025 13.59, Michal Pecio wrote:
> xhci_move_dequeue_past_td() uses a relatively complex and inefficient
> procedure to find new dequeue position after the cancelled TD.
> 
> Replace it with a simpler function which moves dequeue immediately to
> the first pending TD, or to enqueue if the ring is empty.
> 
> The outcome should be basically equivalent, because we only clear xHC
> cache if it stopped or halted on some cancelled TD and moving past the
> TD effectively means moving to the first remaining TD, if any.

This new way relies on td_list being in sync and up to date.
i.e. hardware dequeue can't be ahead of first TD in list.

One bad scenario could be something like:

class driver queues TD1
class driver queues TD2
Class driver cancels TD2, queue stop endpoint command
(Class driver cancels TD1) (optional)

xHC hardware just completed TD1 and stop endpoint command at the same time,
xHC hw may have advanced the hw dequeue to TD2, write event for stop endpoint command, and
then write transfer event for TD1 completion. (xHC hardware may do things in odd order)

Now we detect that hw dequeue is in the cancelled TD2 but with TD1 is till in the td_list.
This new solution would move dequeue back to TD1 beginning, and process it again.

Thanks
Mathias


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs
  2025-02-25 14:55   ` Mathias Nyman
@ 2025-02-25 22:17     ` Michał Pecio
  0 siblings, 0 replies; 7+ messages in thread
From: Michał Pecio @ 2025-02-25 22:17 UTC (permalink / raw)
  To: Mathias Nyman; +Cc: Mathias Nyman, Greg Kroah-Hartman, linux-usb, linux-kernel

On Tue, 25 Feb 2025 16:55:49 +0200, Mathias Nyman wrote:
> This new way relies on td_list being in sync and up to date.
> i.e. hardware dequeue can't be ahead of first TD in list.
> 
> One bad scenario could be something like:
> 
> class driver queues TD1
> class driver queues TD2
> Class driver cancels TD2, queue stop endpoint command
> (Class driver cancels TD1) (optional)
> 
> xHC hardware just completed TD1 and stop endpoint command at the same
> time, xHC hw may have advanced the hw dequeue to TD2, write event for
> stop endpoint command, and then write transfer event for TD1
> completion. (xHC hardware may do things in odd order)

I suppose this would be illegal; per 4.6.9 transfer events are posted
and EP Context updated before Stop EP cmd completion. HW Dequeue Ptr
is advanced on subseqent doorbell ring if the stopped TRB is complete.

But I can see how this could appear to work fine and then mysteriously
break on some weird buggy HC. I will abandon this patch for now.

> Now we detect that hw dequeue is in the cancelled TD2 but with TD1 is
> till in the td_list. This new solution would move dequeue back to TD1
> beginning, and process it again.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk
  2025-02-25 11:57 [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Michal Pecio
                   ` (2 preceding siblings ...)
  2025-02-25 12:00 ` [PATCH v2 3/3] usb: xhci: Unify duplicate inc_enq() code Michal Pecio
@ 2025-02-26 12:39 ` Mathias Nyman
  3 siblings, 0 replies; 7+ messages in thread
From: Mathias Nyman @ 2025-02-26 12:39 UTC (permalink / raw)
  To: Michal Pecio, Mathias Nyman, Greg Kroah-Hartman; +Cc: linux-usb, linux-kernel

On 25.2.2025 13.57, Michal Pecio wrote:
> I was looking at all uses of enqueue/dequeue pointers and I found two
> rather complex loops which appear to be doing really simple things.
> 
> I don't understand why they were written this way, it seems wasteful
> and I see nothing that should go wrong if they are replaced with much
> simpler code.
> 
> I rewrote them and the driver still works. I exercised Set TR Dequeue
> code by starting/stopping isoc streams, using usb-storage with crappy
> cable (transaction errors, halts) and also the smartctl -x trick that
> results in URB unlinks (both on usb-storage and uas) with some disks.
> 
> The third patch is a dedupe. BTW, that comment there about section
> 6.4.4.1 of the 0.95 spec seems to be wrong, I suspect it should say
> that the chain bit cannot be *cleared* because that's how the code
> works and what some commit messages say. But I don't have 0.95 spec.
> 
> New in v2:
> - dropped the patch for obsolete update_ring_for_set_deq_completion()
> - added a patch to enable the link chain quirk on one more HC
> - don't touch the chain bit in inc_enq_past_link() on quirky HCs
> - don't call inc_enq_past_link() unnecessarily
> 
> Michal Pecio (3):
>    usb: xhci: Apply the link chain quirk on NEC isoc endpoints
>    usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs
>    usb: xhci: Unify duplicate inc_enq() code
> 

Thanks, adding Patch 1/3 and 3/3, skipping 2/3 for now.

-Mathias

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-26 12:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-25 11:57 [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Michal Pecio
2025-02-25 11:58 ` [PATCH v2 1/3] usb: xhci: Apply the link chain quirk on NEC isoc endpoints Michal Pecio
2025-02-25 11:59 ` [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs Michal Pecio
2025-02-25 14:55   ` Mathias Nyman
2025-02-25 22:17     ` Michał Pecio
2025-02-25 12:00 ` [PATCH v2 3/3] usb: xhci: Unify duplicate inc_enq() code Michal Pecio
2025-02-26 12:39 ` [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Mathias Nyman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox