From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Michal Pecio <michal.pecio@gmail.com>
Cc: raoxu <raoxu@uniontech.com>,
mathias.nyman@intel.com, gregkh@linuxfoundation.org,
linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH v2] xhci: pci: Disable soft retry for Renesas uPD720201
Date: Mon, 22 Jun 2026 14:36:31 +0300 [thread overview]
Message-ID: <9d87814e-baf0-4dad-aeb9-b34d28a4fc86@linux.intel.com> (raw)
In-Reply-To: <20260619124234.0a9e4670.michal.pecio@gmail.com>
On 6/19/26 13:42, Michal Pecio wrote:
>> On 6/17/26 13:09, raoxu wrote:
>>> From: Xu Rao <raoxu@uniontech.com>
>>>
>>> The Renesas uPD720201 xHCI controller can fail to complete
>>> a Stop Endpoint command after a transaction error on an interrupt
>>> endpoint when soft retry is used.
>>>
>>> This was reproduced with this setup:
>>>
>>> xHCI: Renesas uPD720201, PCI ID 1912:0014 rev 03
>>> dev: USB Ethernet device with an integrated Genesys Logic
>>> USB3.1 hub, USB ID 05e3:0626, and a Realtek RTL8153
>>> Ethernet function, USB ID 0bda:8153
>
> Same thing with uPD720202 (1912:0015) here.
>
> Is the hub even necessary? In my case I have one too, but I cannot
> separate it from the RTL8153 for testing.
>
>>> Reproducer:
>>>
>>> 1. Plug the integrated USB hub and Ethernet device into the
>>> 1912:0014 xHCI controller.
>>> 2. Let r8152 bind to the 0bda:8153 RTL8153 Ethernet function
>>> behind the integrated hub.
>>> 3. Bring the Ethernet device up.
>>> 4. Hot-unplug the device.
>
> In my case, necessary step 3.5: connect a cable and wait for the
> "r8152: carrier on" message. Otherwise it disconnects cleanly.
>
>>> The host reports a transaction error on the RTL8153 interrupt
>>> endpoint, queues a soft reset, and later times out the Stop
>>> Endpoint command while disconnecting the device:
>>>
>>> Transfer error for slot 8 ep 6 on endpoint
>>> Soft-reset ep 6, slot 8
>>> Ignoring reset ep completion code of 1
>>> xHCI host not responding to stop endpoint command
>>> xHCI host controller not responding, assume dead
>>> HC died; cleaning up
>
> There is other stuff too, like concurrent teardown of a separate bulk
> endpoint, not yet sure what exactly breaks these chips.
>
> Would you mind to apply the attached debug patch, reproduce and post
> dmesg from your system for comparison?
>
>>> The Renesas 1912:0014 controller cannot safely use the xHCI soft
>>> retry path. Set XHCI_NO_SOFT_RETRY for this controller so
>>> transaction errors use the pre-soft-retry recovery path. With
>>> this quirk the same hot-unplug test no longer times out the Stop
>>> Endpoint command and the RTL8153 remains usable and stable.
>
> A bit heavy handed, but we might find no better way.
>
> On Thu, 18 Jun 2026 17:03:26 +0300, Mathias Nyman wrote:
>> I'd appreciate your opinion on a related issue.
>> I'm thinking about trying to recover from these stop endpoint command
>> timeouts.
>
> I can share a bit of mine. I tried aborting Stop EP on Etron and found
> the EP in some bogus state afterwards (e.g. Running but Stop EP fails
> with Context State Error, or Stopped but not responing to doorbells,
> something like that, I don't remember).
>
> Per xHCI 4.6.9 there isn't really a case when this command should time
> out, so it's always some internal bug/deadlock in the xHC and IMO good
> chance that abort will leave at least this one EP or slot broken.
>
> Another case is ASMedia, which doesn't seem to implement abort at all -
> at least in my tests with Address Device and a dummy device that always
> NAKs, abort simply waits for the command to finish (these chips have
> internal 3 second timeout on Address Device). I would expect the same
> for Stop EP, except that it likely lacks internal timeout. And the
> driver will busy-wait for several seconds with IRQs disabled.
>
>> While debugging this, did xHC controller otherwise seem somewhat
>> functional? Did you for example see port status change events, or
>> transfer events between queuing the stop endpoint command and the
>> timeout?
>
> Mouse continues to work until we kill the HC. And I can even abort the
> command, but then some URB is never given back, so teardown of the USB
> device gets stuck and IDK what would happen later.
>
> Such recovery would be a bit of work, potential chip specific bugs and
> frankly we can' be sure if the EP won't try to begin executing URBs.
Thanks, sounds like simple recovery by just canceling the command and moving
on might not be the best approach.
If root port is disconnected or link in error state (link:Inactive) then
we could avoid all soft retries and ring restarts for child devices.
This could avoid queuing the problematic stop endpoint command as well.
Thanks
Mathias
prev parent reply other threads:[~2026-06-22 11:36 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-17 10:09 [PATCH v2] xhci: pci: Disable soft retry for Renesas uPD720201 raoxu
2026-06-18 14:03 ` Mathias Nyman
2026-06-19 10:42 ` Michal Pecio
2026-06-20 12:21 ` raoxu
2026-06-22 6:21 ` raoxu
2026-06-22 11:31 ` Mathias Nyman
2026-06-22 11:36 ` Mathias Nyman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9d87814e-baf0-4dad-aeb9-b34d28a4fc86@linux.intel.com \
--to=mathias.nyman@linux.intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=michal.pecio@gmail.com \
--cc=raoxu@uniontech.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox