From: Ladislav Michl <oss-lists@triops.cz>
To: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: linux-usb@vger.kernel.org
Subject: Re: xHCI host dies on device unplug
Date: Mon, 19 Dec 2022 19:31:02 +0100 [thread overview]
Message-ID: <Y6Ct5s5fIoA9FsAt@lenoch> (raw)
In-Reply-To: <983a1eb1-4599-517b-6c88-63a0051ae261@linux.intel.com>
On Mon, Dec 19, 2022 at 02:25:46PM +0200, Mathias Nyman wrote:
> On 16.12.2022 23.32, Ladislav Michl wrote:
> > On Fri, Dec 16, 2022 at 12:13:23PM +0200, Mathias Nyman wrote:
> > > On 15.12.2022 18.12, Ladislav Michl wrote:
> > > > +Cc Mathias as he last touched this code path and may know more :)
> > > >
> > > > On Tue, Dec 06, 2022 at 02:17:08PM +0100, Ladislav Michl wrote:
> > > > > On Mon, Dec 05, 2022 at 10:27:57PM +0100, Ladislav Michl wrote:
> > > > > > I'm running current linux.git on custom Marvell OCTEON III CN7020
> > > > > > based board. USB devices like FTDI (idVendor=0403, idProduct=6001,
> > > > > > bcdDevice= 6.00) Realtek WiFi dongle (idVendor=0bda, idProduct=8179,
> > > > > > bcdDevice= 0.00) works without issues, while Ralink WiFi dongle
> > > > > > (idVendor=148f, idProduct=5370, bcdDevice= 1.01) kills the host on
> > > > > > disconnect:
> > > > > > xhci-hcd xhci-hcd.0.auto: xHCI host not responding to stop endpoint command
> > > > > > xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
> > > > > > xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
> > > > > >
> > > > > > Unfortunately I do not have a datasheet for CN7020 SoC, so it is hard
> > > > > > to tell if there is any errata :/ In case anyone see a clue in debug
> > > > > > logs bellow, I'll happily give it a try.
> > > > >
> > > > > So I do have datasheet now. As a wild guess I tried to use dlmc_ref_clk0
> > > > > instead of dlmc_ref_clk1 as a refclk-type-ss and it fixed unplug death.
> > > > > I have no clue why, but anyway - sorry for the noise :) Perhaps Octeon's
> > > > > clock init is worth to be verified...
> > > >
> > > > After all whenever xhci dies with "xHCI host not responding to stop endpoint
> > > > command" depends also on temperature, so there seems to be race somewhere.
> > > >
> > > > As a quick and dirty verification, whenever xhci really died, following patch
> > > > was tested and it fixed issue. It just treats ep as if stop endpoint command
> > > > succeeded. Any clues? I'll happily provide more traces.
> > >
> > > It's possible the controller did complete the stop endpoint command but driver
> > > didn't get the interrupt for the event for some reason.
> > >
>
> Looks like controller didn't complete the stop endpoint command.
>
> Event for last completed command (before cycle bit change "c" -> "C") was:
> 0x00000000028f55a0: TRB 00000000035e81a0 status 'Success' len 0 slot 1 ep 0 type 'Command Completion Event' flags e:c,
>
> This was for command at 35e81a0, which in the command ring was:
> 0x00000000035e81a0: Reset Endpoint Command: ctx 0000000000000000 slot 1 ep 3 flags T:c
>
> The stop endpoint command was the next command queued, at 35e81b0:
> 0x00000000035e81b0: Stop Ring Command: slot 1 sp 0 ep 3 flags c
>
> There were a lot of URBs queued for this device, and they are cancelled one by one after disconnect.
>
> Was this the only device connected? If so does connecting another usb device to another root port help?
> Just to test if the host for some reason partially stops a while after last device disconnect?
Device is connected directly into SoC. Once connected into HUB, host doesn't die
(as noted in other email, sorry for not replying to my own message, so it got lost)
It seems as intentional (power management?) optimization. If another device is
plugged in before 5 sec timeout expires, host completes stop endpoint command.
Unfortunately I cannot find anything describing this behavior in
documentation, so I'll ask manufacturer support.
Both solutions, do nothing or reset controller once last device is unpluged
works, but I doubt they are suitable for mainline kernel without further
investigation.
> Another thing is that the stop endpoint command fails after three soft reset tries,
> does disabling soft reset help?
No, this does not cause any change.
ladis
next prev parent reply other threads:[~2022-12-19 18:32 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-05 21:27 xHCI host dies on device unplug Ladislav Michl
2022-12-06 13:17 ` Ladislav Michl
2022-12-15 16:12 ` Ladislav Michl
2022-12-16 10:13 ` Mathias Nyman
2022-12-16 21:32 ` Ladislav Michl
2022-12-19 12:25 ` Mathias Nyman
2022-12-19 18:31 ` Ladislav Michl [this message]
2022-12-19 21:45 ` Ladislav Michl
2022-12-20 7:58 ` Ladislav Michl
2022-12-21 9:46 ` Mathias Nyman
2022-12-21 7:14 ` Ladislav Michl
2022-12-21 9:58 ` Mathias Nyman
2022-12-21 10:11 ` Ladislav Michl
2022-12-21 12:05 ` Ladislav Michl
2022-12-21 12:12 ` Mathias Nyman
2022-12-21 12:21 ` Ladislav Michl
2022-12-19 7:11 ` Ladislav Michl
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6Ct5s5fIoA9FsAt@lenoch \
--to=oss-lists@triops.cz \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).