My transfer ring grew to 740 segments

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

From: "Michał Pecio" <michal.pecio@gmail.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: linux-usb@vger.kernel.org
Subject: My transfer ring grew to 740 segments
Date: Tue, 11 Mar 2025 23:41:39 +0100	[thread overview]
Message-ID: <20250311234139.0e73e138@foxbook> (raw)

Hi,

This happened under a simple test meant to check if AMD "Promontory"
chipset (from ASMedia) has the delayed restart bug (it does, rarely).

Two pl2303 serial dongles were connected to a hub, loops were opening
and closing /dev/ttyUSBn to enqueue/dequeue some IN URBs which would
never complete with any data (nothing was fed to UART RX).

The test was running unattended for a few hours and it seems that at
some point the hub stopped working and transfers to downstream devices
were all returning Transaction Error. dmesg was full of this:

[102711.994235] xhci_hcd 0000:02:00.0: Event dma 0x00000000ffef4a50 for ep 6 status 4 not part of TD at 00000000eb22b790 - 00000000eb22b790
[102711.994243] xhci_hcd 0000:02:00.0: Ring seg 0 dma 0x00000000ffef4000
[102711.994246] xhci_hcd 0000:02:00.0: Ring seg 1 dma 0x00000000ffeee000
[102711.994249] xhci_hcd 0000:02:00.0: Ring seg 2 dma 0x00000000ffc4e000

[ ... 735 lines omitted for brevity ... ]

[102711.995935] xhci_hcd 0000:02:00.0: Ring seg 738 dma 0x00000000eb2e2000
[102711.995937] xhci_hcd 0000:02:00.0: Ring seg 739 dma 0x00000000eb22b000

Looking through debugfs, ffef4a50 is indeed a normal TD, apparently no
longer on td_list for some reason and hence the errors. The rest of the
ring is No-Ops.

Class driver enqueues its URBs, rings the doorbell and triggers this
error message. The endpoint halts, but that is ignored. Serial device
is closed, URBs are unlinked, Stop EP sees Halted, resests. No Set Deq
because HW Dequeue doesn't match any known TD. Rinse, repeat.

At some point end of the segment is reached, new segment is allocated
because ep_ring->dequeue is still in the first segment.

Sow how does the driver enter this screwed up state? Apparently due to
a HW bug. More detailed debug log from a different run:

[39607.305224] xhci_hcd 0000:02:00.0: 2/6 (040/3) ring_ep_doorbell stream 0
[39607.305235] xhci_hcd 0000:02:00.0: 2/6 (040/3) ring_ep_doorbell stream 0
[39607.305413] xhci_hcd 0000:02:00.0: 2/6 (040/1) handle_tx_event comp_code 4 trb_dma 0x00000000ffa80050

The 1 in (040/1) is EP Ctx state, i.e. Running, despite Trans. Error.
It looks like finish_td() sees it, ignores the error and gives back
normally. EP Ctx is still wrong later when the next URB is unlinked:

[39607.398526] xhci_hcd 0000:02:00.0: 2/6 (040/1) xhci_urb_dequeue cancel TD at 0x00000000ffa80060 stream 0
[39607.398531] xhci_hcd 0000:02:00.0: 2/6 (044/1) queue_stop_endpoint suspend 0

But Stop EP fails and updates it properly to 2=Halted:

[39607.398655] xhci_hcd 0000:02:00.0: 2/6 (044/2) handle_cmd_completion cmd_type 15 comp_code 19

Then the EP is reset without Set Deq or clearing and ffa80050 becomes
"stuck and forgotten", initiating the above problem.

The fact that EP Ctx state is Running for >90ms suggests that it's
a bug. But a race could have similar effect, and I can't find any
guarantee in the spec that EP Ctx is updated before posting an error
transfer event. 4.8.3 guarantees that it becomes Running before normal
transfer events are posted, but suggests not to trust EP Ctx too much.

Maybe finish_td() should be more cautious?

Michal

next             reply	other threads:[~2025-03-11 22:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-11 22:41 Michał Pecio [this message]
2025-03-12 13:37 ` My transfer ring grew to 740 segments Mathias Nyman
2025-03-13  7:54   ` Michał Pecio
2025-03-13  8:46 ` Michał Pecio
2025-03-13  9:45   ` Neronin, Niklas
2025-03-14  8:10     ` Michał Pecio
2025-03-13 14:43   ` Mathias Nyman
2025-03-14 19:15 ` David Laight
2025-03-16 10:27   ` Michał Pecio
2025-03-16 13:20     ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250311234139.0e73e138@foxbook \
    --to=michal.pecio@gmail.com \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox