Linux USB
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: raoxu <raoxu@uniontech.com>,
	gregkh@linuxfoundation.org, linux-kernel@vger.kernel.org,
	linux-usb@vger.kernel.org, mathias.nyman@intel.com,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] xhci: pci: Disable soft retry for Renesas uPD720201
Date: Tue, 23 Jun 2026 13:55:17 +0200	[thread overview]
Message-ID: <20260623135517.2b1f0809.michal.pecio@gmail.com> (raw)
In-Reply-To: <c4ef0081-fbe9-47a4-b5d5-60665564ca02@linux.intel.com>

Replying a little out of order here.

On Mon, 22 Jun 2026 14:31:58 +0300, Mathias Nyman wrote:
> Cancel the realtek URB we tried to soft retry earlier.
> 
> > 2026-06-22T13:23:39.477082+08:00 uos-PC kernel: xhci_hcd 0000:04:00.0: 8/6 (000/3) [200cb341b0/200cb341b1/200cb341c0] xhci_urb_dequeue cancel TD at 200cb341b0 stream 0
> > 2026-06-22T13:23:39.477082+08:00 uos-PC kernel: xhci_hcd 0000:04:00.0: 8/6 (004/3) [200cb341b0/200cb341b1/200cb341c0] queue_stop_endpoint suspend 0  
> 
> queue stop endpoint to cancel URB for realtek device.
> Endpoint context still shows endpoint is in "stopped" state.
> Note that we restarted the endpoint 20ms earlier, endpoint context
> might not have updated yet.

This was business as usual on uPD720200, it seems that these chips
don't update EP Context until the first scheduled service opportunity
(though no later than about 30ms - long interval endpoints must have
different rules) and they cannot execute Stop EP until then either. 

Some of them complete the command with Context State Error, others
delay completion until the scheduled restart. If we wait longer and
then queue Stop Endpoint, it executes instantly (fraction of a ms).

It seems that 201/202 chips still have the same limitation.

> I think there are some steps we could do to avoid soft retry,
> restart, and stopping an endpoint we know is behind a disconnected
> parent.

Yes, existing logic can be trivially extended to cover children too.
Of course, this does nothing if the device is disconnected from an
external hub or a transaction error occurs without disconnection.

But further experiments indicate that disconnection from the root hub
is actually a necessary condition to trigger this bug.

If another SuperSpeed device (even one without periodic endpoints like
UAS) is connected to another port, the retry causes another Transaction
Error a few ms later, the pipe halts and Stop EP completes normally
with Context State Error, as expected. Then we reset, remove the URB
and never restart this endpoint again.

The same happens if I trigger the bug and then connect either the same
hub or any other device to any SuperSpeed port before command timeout.

[  +0,000009] xhci_hcd 0000:06:00.0: 6/6 (000/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] queue_reset_endpoint tsp 1
[  +0,000009] xhci_hcd 0000:06:00.0: 0/-1 (fff/f) [ffffffff/ffffffff/ffffffff] xhci_ring_cmd_db cmd_ring_state 1
[  +0,000504] xhci_hcd 0000:06:00.0: 6/6 (002/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_cmd_completion cmd_type 14 comp_code 1
[  +0,000025] xhci_hcd 0000:06:00.0: 6/6 (000/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] ring_ep_doorbell stream 0
[  +0,006627] usb 10-1: USB disconnect, device number 22
[  +0,000016] usb 10-1.4: USB disconnect, device number 23
[  +0,000005] r8152-cfgselector 10-1.4.4: USB disconnect, device number 24
[  +0,000190] xhci_hcd 0000:06:00.0: 6/6 (000/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] xhci_urb_dequeue cancel TD at ff8f0bd0 stream 0
[  +0,000011] xhci_hcd 0000:06:00.0: 6/6 (004/3) [ff8f0bd0/ff8f0bd1/ff8f0be0] queue_stop_endpoint suspend 0
[  +0,000009] xhci_hcd 0000:06:00.0: 0/-1 (fff/f) [ffffffff/ffffffff/ffffffff] xhci_ring_cmd_db cmd_ring_state 1
[  +0,000655] xhci_hcd 0000:06:00.0: 6/6 (004/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_tx_event comp_code 4 trb_dma ff8f0bd0
[  +0,000023] xhci_hcd 0000:06:00.0: 6/6 (004/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_tx_event stream_id 0 trb_len 2 missing 2
[  +0,000013] xhci_hcd 0000:06:00.0: 6/6 (004/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] queue_reset_endpoint tsp 1
[  +0,000008] xhci_hcd 0000:06:00.0: 0/-1 (fff/f) [ffffffff/ffffffff/ffffffff] xhci_ring_cmd_db cmd_ring_state 1
[  +0,000012] xhci_hcd 0000:06:00.0: 6/6 (006/2) [ff8f0bd0/ff8f0bd1/ff8f0be0] handle_cmd_completion cmd_type 15 comp_code 19

I would guess that disconnecting all SuperSpeed ports causes the chip
to turn off its SuperSpeed schedule altogether and wait for SW to stop
all endpoints which aren't halted yet, but in case of pending restart,
Stop EP is scheduled to complete at the next service opportunity, which
never happens.

I also found that disconnecting a different affected NIC from the root
hub itself also triggers this bug, but only if I disable protection
from queuing Reset Endpoint (including with TSP) to "inactive" devices.

And the bug doesn't trigger every time - sometimes the unlink happens
while Reset Endpoint is pending and then its handler removes the URB
without Stop Endpoint.

And cable connection isn't actually necessary - I was mistaken due to
the randomness of the bug.

Regards,
Michal

  reply	other threads:[~2026-06-23 11:55 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 10:09 [PATCH v2] xhci: pci: Disable soft retry for Renesas uPD720201 raoxu
2026-06-18 14:03 ` Mathias Nyman
2026-06-19 10:42   ` Michal Pecio
2026-06-20 12:21     ` raoxu
2026-06-22  6:21     ` raoxu
2026-06-22 11:31       ` Mathias Nyman
2026-06-23 11:55         ` Michal Pecio [this message]
2026-06-22 11:36     ` Mathias Nyman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623135517.2b1f0809.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=mathias.nyman@linux.intel.com \
    --cc=raoxu@uniontech.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox