public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Mathias Nyman <mathias.nyman@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2 1/3] usb: xhci: Apply the link chain quirk on NEC isoc endpoints
Date: Tue, 25 Feb 2025 12:58:54 +0100	[thread overview]
Message-ID: <20250225125854.622a6433@foxbook> (raw)
In-Reply-To: <20250225125750.1b345e2c@foxbook>

Two clearly different specimens of NEC uPD720200 (one with start/stop
bug, one without) were seen to cause IOMMU faults after some Missed
Service Errors. Faulting address is immediately after a transfer ring
segment and patched dynamic debug messages revealed that the MSE was
received when waiting for a TD near the end of that segment:

[ 5441.041954] xhci_hcd 0000:0c:00.0: Miss service interval error for slot 1 ep 2 expected TD DMA ffa08fe0
[ 5441.042120] xhci_hcd 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09000 flags=0x0000]
[ 5441.042146] xhci_hcd 0000:0c:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0005 address=0xffa09040 flags=0x0000]

It gets even funnier if the next page is a ring segment accessible to
the HC. Below, it reports MSE in segment at ff1e8000, plows through a
zero-filled page at ff1e9000 and starts reporting events for TRBs in
page at ff1ea000 every microframe, instead of jumping to seg ff1e6000.

[ 4807.041671] xhci_hcd 0000:0c:00.0: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 4807.041999] xhci_hcd 0000:0c:00.0: Miss service interval error for slot 1 ep 2 expected TD DMA ff1e8fe0
[ 4807.042011] xhci_hcd 0000:0c:00.0: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 4807.042028] xhci_hcd 0000:0c:00.0: All TDs skipped for slot 1 ep 2. Clear skip flag.
[ 4807.042134] xhci_hcd 0000:0c:00.0: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 4807.042138] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 4807.042144] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ea040 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0
[ 4807.042259] xhci_hcd 0000:0c:00.0: WARN: buffer overrun event for slot 1 ep 2 on endpoint
[ 4807.042262] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 31
[ 4807.042266] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ea050 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0

At some point completion events change from Isoch Buffer Overrun to
Short Packet and the HC finally finds cycle bit mismatch in ff1ec000.

[ 4807.098130] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 4807.098132] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ecc50 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0
[ 4807.098254] xhci_hcd 0000:0c:00.0: ERROR Transfer event TRB DMA ptr not part of current TD ep_index 2 comp_code 13
[ 4807.098256] xhci_hcd 0000:0c:00.0: Looking for event-dma 00000000ff1ecc60 trb-start 00000000ff1e6820 trb-end 00000000ff1e6820 seg-start 00000000ff1e6000 seg-end 00000000ff1e6ff0
[ 4807.098379] xhci_hcd 0000:0c:00.0: Overrun event on slot 1 ep 2

It's possible that data from the isochronous device were written to
random buffers of pending TDs on other endpoints (either IN or OUT),
other devices or even other HCs in the same IOMMU domain.

Lastly, an error from a different USB device on another HC. Was it
caused by the above? I don't know, but it may have been. The disk
was working without any other issues and generated PCIe traffic to
starve the NEC of upstream BW and trigger those MSEs. The two HCs
shared one x1 slot by means of a commercial "PCIe splitter" board.

[ 4807.162604] usb 10-2: reset SuperSpeed USB device number 3 using xhci_hcd
[ 4807.178990] sd 9:0:0:0: [sdb] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[ 4807.179001] sd 9:0:0:0: [sdb] tag#0 CDB: opcode=0x28 28 00 04 02 ae 00 00 02 00 00
[ 4807.179004] I/O error, dev sdb, sector 67284480 op 0x0:(READ) flags 0x80700 phys_seg 5 prio class 0

Fortunately, it appears that this ridiculous bug is avoided by setting
the chain bit of Link TRBs on isochronous rings. Other ancient HCs are
known which also expect the bit to be set and they ignore Link TRBs if
it's not. Reportedly, 0.95 spec guaranteed that the bit is set.

The bandwidth-starved NEC HC running a 32KB/uframe UVC endpoint reports
tens of MSEs per second and runs into the bug within seconds. Chaining
Link TRBs allows the same workload to run for many minutes, many times.

No negative side effects seen in UVC recording and UAC playback with a
few devices at full speed, high speed and SuperSpeed.

The problem doesn't reproduce on the newer Renesas uPD720201/uPD720202
and on old Etron EJ168 and VIA VL805 (but the VL805 has other bug).

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
Cc: stable@vger.kernel.org
---
 drivers/usb/host/xhci.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 377dad9cd639..2ad31e147d67 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1761,11 +1761,20 @@ static inline void xhci_write_64(struct xhci_hcd *xhci,
 }
 
 
-/* Link TRB chain should always be set on 0.95 hosts, and AMD 0.96 ISOC rings */
+/*
+ * Reportedly, some chapters of v0.95 spec said that Link TRB always has its chain bit set.
+ * Other chapters and later specs say that it should only be set if the link is inside a TD
+ * which continues from the end of one segment to the next segment.
+ *
+ * Some 0.95 hardware was found to misbehave if any link TRB doesn't have the chain bit set.
+ *
+ * 0.96 hardware from AMD and NEC was found to ignore unchained isochronous link TRBs when
+ * "resynchronizing the pipe" after a Missed Service Error.
+ */
 static inline bool xhci_link_chain_quirk(struct xhci_hcd *xhci, enum xhci_ring_type type)
 {
 	return (xhci->quirks & XHCI_LINK_TRB_QUIRK) ||
-	       (type == TYPE_ISOC && (xhci->quirks & XHCI_AMD_0x96_HOST));
+	       (type == TYPE_ISOC && (xhci->quirks & (XHCI_AMD_0x96_HOST | XHCI_NEC_HOST)));
 }
 
 /* xHCI debugging */
-- 
2.48.1

  reply	other threads:[~2025-02-25 11:58 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-25 11:57 [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Michal Pecio
2025-02-25 11:58 ` Michal Pecio [this message]
2025-02-25 11:59 ` [PATCH v2 2/3] usb: xhci: Simplify moving HW Dequeue Pointer past cancelled TDs Michal Pecio
2025-02-25 14:55   ` Mathias Nyman
2025-02-25 22:17     ` Michał Pecio
2025-02-25 12:00 ` [PATCH v2 3/3] usb: xhci: Unify duplicate inc_enq() code Michal Pecio
2025-02-26 12:39 ` [PATCH v2 0/3] xhci: ring queuing cleanups plus a quirk Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250225125854.622a6433@foxbook \
    --to=michal.pecio@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox