* [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages
@ 2020-04-21 3:24 Matthew Ruffell
2020-04-21 7:37 ` Mathias Nyman
2020-04-21 9:30 ` Jan-Marek Glogowski
0 siblings, 2 replies; 4+ messages in thread
From: Matthew Ruffell @ 2020-04-21 3:24 UTC (permalink / raw)
To: linux-usb; +Cc: dann.frazier, heitor.de.siqueira, glogow
Hi,
I am struggling to debug a regression, and it would be great if we could work
this out together. I'm not very familiar with the USB subsystem, but I will try
my best.
We have had a few reports from users where their USB devices are extremely slow,
to the point where they are unusable. Their dmesg buffers are filled with:
[ 14.000130] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
[ 18.092123] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
[ 22.172116] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
[ 26.252116] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
This only seems to happen with specific hardware combinations. We did a git
bisect, and determined that the below commit was the root cause:
commit 4fdc1790e6a9ef22399c6bc6e63b80f4609f3b7e
Author: Jan-Marek Glogowski <glogow@fbihome.de>
Date: Fri Feb 1 13:52:31 2019 +0100
Subject: usb: handle warm-reset port requests on hub resume
We had to revert this commit from the Ubuntu kernels for the time being, but we
wish to re-apply it once we determine how to fix this regression.
The problem still exists on the latest 5.7-rc1 kernel, and if we revert the
above commit, the problem vanishes.
Logs (from an affected system, it has a LOT of USB devices):
System Info:
Ubuntu 18.04, 4.15.0-72-generic kernel.
lsusb: https://paste.ubuntu.com/p/Syh2StCyyT/
lsusb -t: https://paste.ubuntu.com/p/fG7DdXvh58/
filtered dmesg, with usbcore.dyndbg=+p: https://paste.ubuntu.com/p/DyY9SJRcdv/
filtered lspci: https://paste.ubuntu.com/p/Kr96PvRmH4/
My commentary on the problem:
First the hubs come onto the scene:
USB 2:
xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
usb usb1: Product: xHCI Host Controller
hub 1-0:1.0: USB hub found
USB 3:
xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
usb usb2: Product: xHCI Host Controller
hub 2-0:1.0: USB hub found
Next, the USB 2 and 3 port peering happens. Let's only look at usb1-port2 and
usb2-port2.
usb usb2-port2: peered to usb1-port2
The next interesting message is hub 1-0 reporting normal state, followed by
hub 2-0 reporting okay, then going into hub suspend.
hub 1-0:1.0: state 7 ports 16 chg 201c evt 0000
hub 2-0:1.0: state 7 ports 10 chg 0000 evt 0000
hub 2-0:1.0: hub_suspend
usb usb2: bus auto-suspend, wakeup 1
A new USB device is found on port 2. It gets registered under usb 1-2 and not
usb 2-2.
usb 1-2: new high-speed USB device number 2 using xhci_hcd
Following that, hub 2-0 resumes, noting a race event:
usb usb2: suspend raced with wakeup event
usb usb2: usb auto-resume
hub 2-0:1.0: hub_resume
Things keep on going with usb 1-2 though:
usb 1-2: New USB device found, idVendor=0424, idProduct=2807
usb 1-2: Product: USB2807 Hub
usb 1-2: Manufacturer: Microchip
hub 1-2:1.0: USB hub found
Its a Microchip USB hub. It becomes hub 1-2. The next thing to happen is where
the faults start coming in:
usb usb2-port2: do warm reset
A warm reset is issued to usb 2-2. But as we have previously gathered before,
2-2 is an "empty" port, peered to 1-2, and the device is under 1-2. This seems
to be linked with the hub 2-0 suspend and auto-resume event.
This of course matches the text in the problematic commit message:
> This just happens, if its the only device on the root hub, the hub
> therefore resumes and the HCDs status_urb isn't yet available.
> If a warm-reset request is detected, this sets the hubs event_bits,
> which will prevent any auto-suspend and allows the hubs workqueue
> to warm-reset the port later in port_event.
At the moment, usb 1-2 is the only device on the hub 1-0 root hub. Or so I think
anyway. I think I am reading correctly that hub 2-0 has no devices yet.
Anyway, the warm reset is tried again after the first timeout:
usb usb2-port2: not warm reset yet, waiting 50ms
usb usb2-port2: not warm reset yet, waiting 200ms
While this is happening, usb 1-2 is trying to set up the new hub 1-2.
usb 1-2-port1: status 0101 change 0001
usb 1-2-port2: status 0101 change 0001
...
A new warm reset is tried:
usb usb2-port2: not warm reset yet, waiting 200ms
hub 1-2 starts discovering new devices:
usb 1-2.1: New USB device found, idVendor=0403, idProduct=6011
usb 1-2.1: Product: Quad RS232-HS
usb 1-2.1: Manufacturer: FTDI
So things seem to be functioning normally at USB 2.0 speeds for usb 1-2. Back
on the usb 2-2 front:
usb usb2-port2: not warm reset yet, waiting 200ms
usb usb2-port2: not enabled, trying warm reset again...
From there on hub 1-2 keeps on discovering new devices and bringing them up,
and usb 2-2 keeps trying to warm reset.
Looking at the problematic code:
/* Make sure a warm-reset request is handled by port_event */
if (type == HUB_RESUME &&
hub_port_warm_reset_required(hub, port1, portstatus))
set_bit(port1, hub->event_bits);
It almost feels like we are missing a check to see if the port in question is
already in use in another bus. Or if the port is "empty".
In any case, reverting this commit on a mainline 5.7-rc1 kernel fixes the problem.
We have obtained access to an affected machine, and we can gather any debug data
that might be useful, or test any patches. Let me know what I can do, and I will
promptly fetch output.
Thanks for any help,
Matthew Ruffell
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages
2020-04-21 3:24 [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages Matthew Ruffell
@ 2020-04-21 7:37 ` Mathias Nyman
2020-04-23 5:47 ` Matthew Ruffell
2020-04-21 9:30 ` Jan-Marek Glogowski
1 sibling, 1 reply; 4+ messages in thread
From: Mathias Nyman @ 2020-04-21 7:37 UTC (permalink / raw)
To: Matthew Ruffell, linux-usb; +Cc: dann.frazier, heitor.de.siqueira, glogow
On 21.4.2020 6.24, Matthew Ruffell wrote:
> Hi,
>
> I am struggling to debug a regression, and it would be great if we could work
> this out together. I'm not very familiar with the USB subsystem, but I will try
> my best.
>
> We have had a few reports from users where their USB devices are extremely slow,
> to the point where they are unusable. Their dmesg buffers are filled with:
>
> [ 14.000130] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
> [ 18.092123] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
> [ 22.172116] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
> [ 26.252116] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
>
> This only seems to happen with specific hardware combinations. We did a git
> bisect, and determined that the below commit was the root cause:
>
> commit 4fdc1790e6a9ef22399c6bc6e63b80f4609f3b7e
> Author: Jan-Marek Glogowski <glogow@fbihome.de>
> Date: Fri Feb 1 13:52:31 2019 +0100
> Subject: usb: handle warm-reset port requests on hub resume
>
> We had to revert this commit from the Ubuntu kernels for the time being, but we
> wish to re-apply it once we determine how to fix this regression.
>
> The problem still exists on the latest 5.7-rc1 kernel, and if we revert the
> above commit, the problem vanishes.
>
> Logs (from an affected system, it has a LOT of USB devices):
>
> System Info:
> Ubuntu 18.04, 4.15.0-72-generic kernel.
>
> lsusb: https://paste.ubuntu.com/p/Syh2StCyyT/
> lsusb -t: https://paste.ubuntu.com/p/fG7DdXvh58/
> filtered dmesg, with usbcore.dyndbg=+p: https://paste.ubuntu.com/p/DyY9SJRcdv/
> filtered lspci: https://paste.ubuntu.com/p/Kr96PvRmH4/
>
There are no USB3 devices enumerated.
Do any of the devices connected to the hub support USB 3 speeds (5Gbps or faster)?
If not, could you add one the hub hub?
> My commentary on the problem:
>
> First the hubs come onto the scene:
>
> USB 2:
>
> xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
> usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
> usb usb1: Product: xHCI Host Controller
> hub 1-0:1.0: USB hub found
>
> USB 3:
>
> xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 2
> xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed
> usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
> usb usb2: Product: xHCI Host Controller
> hub 2-0:1.0: USB hub found
>
> Next, the USB 2 and 3 port peering happens. Let's only look at usb1-port2 and
> usb2-port2.
>
> usb usb2-port2: peered to usb1-port2
>
> The next interesting message is hub 1-0 reporting normal state, followed by
> hub 2-0 reporting okay, then going into hub suspend.
>
> hub 1-0:1.0: state 7 ports 16 chg 201c evt 0000
> hub 2-0:1.0: state 7 ports 10 chg 0000 evt 0000
> hub 2-0:1.0: hub_suspend
> usb usb2: bus auto-suspend, wakeup 1
>
> A new USB device is found on port 2. It gets registered under usb 1-2 and not
> usb 2-2.
>
> usb 1-2: new high-speed USB device number 2 using xhci_hcd
>
> Following that, hub 2-0 resumes, noting a race event:
>
> usb usb2: suspend raced with wakeup event
> usb usb2: usb auto-resume
> hub 2-0:1.0: hub_resume
>
> Things keep on going with usb 1-2 though:
>
> usb 1-2: New USB device found, idVendor=0424, idProduct=2807
> usb 1-2: Product: USB2807 Hub
> usb 1-2: Manufacturer: Microchip
> hub 1-2:1.0: USB hub found
>
> Its a Microchip USB hub. It becomes hub 1-2. The next thing to happen is where
> the faults start coming in:
>
> usb usb2-port2: do warm reset
>
> A warm reset is issued to usb 2-2. But as we have previously gathered before,
> 2-2 is an "empty" port, peered to 1-2, and the device is under 1-2. This seems
> to be linked with the hub 2-0 suspend and auto-resume event.
>
> This of course matches the text in the problematic commit message:
>
>> This just happens, if its the only device on the root hub, the hub
>> therefore resumes and the HCDs status_urb isn't yet available.
>> If a warm-reset request is detected, this sets the hubs event_bits,
>> which will prevent any auto-suspend and allows the hubs workqueue
>> to warm-reset the port later in port_event.
>
> At the moment, usb 1-2 is the only device on the hub 1-0 root hub. Or so I think
> anyway. I think I am reading correctly that hub 2-0 has no devices yet.
>
> Anyway, the warm reset is tried again after the first timeout:
>
> usb usb2-port2: not warm reset yet, waiting 50ms
> usb usb2-port2: not warm reset yet, waiting 200ms
>
> While this is happening, usb 1-2 is trying to set up the new hub 1-2.
>
> usb 1-2-port1: status 0101 change 0001
> usb 1-2-port2: status 0101 change 0001
> ...
>
> A new warm reset is tried:
>
> usb usb2-port2: not warm reset yet, waiting 200ms
>
> hub 1-2 starts discovering new devices:
>
> usb 1-2.1: New USB device found, idVendor=0403, idProduct=6011
> usb 1-2.1: Product: Quad RS232-HS
> usb 1-2.1: Manufacturer: FTDI
>
> So things seem to be functioning normally at USB 2.0 speeds for usb 1-2. Back
> on the usb 2-2 front:
>
> usb usb2-port2: not warm reset yet, waiting 200ms
> usb usb2-port2: not enabled, trying warm reset again...
>
> From there on hub 1-2 keeps on discovering new devices and bringing them up,
> and usb 2-2 keeps trying to warm reset.
>
> Looking at the problematic code:
>
> /* Make sure a warm-reset request is handled by port_event */
> if (type == HUB_RESUME &&
> hub_port_warm_reset_required(hub, port1, portstatus))
> set_bit(port1, hub->event_bits);
>
> It almost feels like we are missing a check to see if the port in question is
> already in use in another bus. Or if the port is "empty".
>
Unlike other devices USB 3 hubs will enumerate as both USB 2.0 and USB 3 devices.
Looks like the USB 3 part of the hub is not working correctly.
My guess is that without the patch the USB 3 part of the hub is not working either.
Patch probably helps hub driver discover there is a USB 3 device in a SS_INVALID link
state in the first place, and starts recovering it by warm resetting it.
> In any case, reverting this commit on a mainline 5.7-rc1 kernel fixes the problem.
>
> We have obtained access to an affected machine, and we can gather any debug data
> that might be useful, or test any patches. Let me know what I can do, and I will
> promptly fetch output.
If it's an external hub can you try connecting some other USB 3 device first to a
USB 3 roothub port, and then the hub to another USB 3 roothub port?
This way the USB 3 roothub (bus) should not be suspended when the USB 3 hub is connected.
Does that help, or change anything
Thanks
-Mathias
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages
2020-04-21 3:24 [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages Matthew Ruffell
2020-04-21 7:37 ` Mathias Nyman
@ 2020-04-21 9:30 ` Jan-Marek Glogowski
1 sibling, 0 replies; 4+ messages in thread
From: Jan-Marek Glogowski @ 2020-04-21 9:30 UTC (permalink / raw)
To: Matthew Ruffell, linux-usb; +Cc: dann.frazier, heitor.de.siqueira
Hi,
this was the only time I had to work on a kernel USB problem, so can't
really help with knowledge of the USB subsystem.
And unfortunately I don't have access to that hardware anymore. I've
pinged some people, who might be able to get a hand on it again and then
can verify a new patch to prevent the original bug from reappearing, but
that's not up to me to decide.
ATB
Jan-Marek
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages
2020-04-21 7:37 ` Mathias Nyman
@ 2020-04-23 5:47 ` Matthew Ruffell
0 siblings, 0 replies; 4+ messages in thread
From: Matthew Ruffell @ 2020-04-23 5:47 UTC (permalink / raw)
To: Mathias Nyman, linux-usb; +Cc: dann.frazier, heitor.de.siqueira
Hi Mathias,
Thanks for responding.
On 21/04/20 7:37 pm, Mathias Nyman wrote:
> There are no USB3 devices enumerated.
>
> Do any of the devices connected to the hub support USB 3 speeds (5Gbps or faster)?
> If not, could you add one the hub hub?
> Unlike other devices USB 3 hubs will enumerate as both USB 2.0 and USB 3 devices.
>
> Looks like the USB 3 part of the hub is not working correctly.
Interesting. Looking at the general overview of this system:
There is only one hub, an Intel PCH LBG-1, which has
-- 2x USB 3.0 ports connected to 2x USB 3.0 ports in the front of the chassis
-- 2x USB 3.0 ports connected to 2x USB 3.0 ports in the back of the chassis
-- 2x USB 3.0 ports connected to 2x USB 3.0 ports on the motherboard itself
-- 1x USB 2.0 port connected to the BMC (AST2500)
A USB 3 hub is connected to a USB 3 port on the back of the chassis with a USB 3 cable,
and this hub contains many devices.
There was a previous incident with the same hardware where a "bad" USB 3 cable
caused a similar warm reset loop. The symptoms were the same, with the infinite
warm resets on usb2-port2, and logs filled with "Cannot enable. Maybe the USB cable is bad?"
messages. One of my colleagues looked into a usbmon dump, and came up with the
following analysis:
The driver starts by inquiring the current port status with a GET_STATUS request:
ID TIME SOURCE DEST PROT LEN INFO
35 0.624266 host 2.1.0 USBHUB 64 GET_STATUS Request [Port 2]
36 0.624280 2.1.0 host USBHUB 68 GET_STATUS Response [Port 2]
Frame 36 responds with [Port Status: 0x0340, PORT_POWER, PORT_LOW_SPEED]. When
taking bits 5-8 from 0x340, we get '0xA'. According to "Table 10-13. Port Status
Field, wPortStatus" from the USB specification, this corresponds to Compliance
Mode.
The driver then issues a Warm Reset. According to "7.5.5.2 Exit from
Compliance Mode", a Warm Reset shall transition the port to Rx.Detect.
ID TIME SOURCE DEST PROT LEN INFO
37 0.624299 host 2.1.0 USBHUB 64 SET_FEATURE Request [Port 2: BH_PORT_RESET]
38 0.624321 2.1.0 host USBHUB 64 SET_FEATURE Response [Port 2: BH_PORT_RESET]
What follows is the driver querying the port status for the C_PORT_RESET
flag. This flag is responsible for indicating the port is done with the Warm
Reset, and operation can proceed from the Rx.Detect state:
ID TIME SOURCE DEST PROT LEN INFO
39 0.683989 host 2.1.0 USBHUB 64 GET_STATUS Request [Port 2]
40 0.684015 2.1.0 host USBHUB 68 GET_STATUS Response [Port 2] (Port Status: 0x350 | Port Change: 0x0)
41 0.743988 host 2.1.0 USBHUB 64 GET_STATUS Request [Port 2]
42 0.744012 2.1.0 host USBHUB 68 GET_STATUS Response [Port 2] (Port Status: 0x2f0 | Port Change: 0x0)
43 0.951986 host 2.1.0 USBHUB 64 GET_STATUS Request [Port 2]
44 0.952011 2.1.0 host USBHUB 68 GET_STATUS Response [Port 2] (Port Status: 0x2f0 | Port Change: 0x0)
45 1.159985 host 2.1.0 USBHUB 64 GET_STATUS Request [Port 2]
46 1.160010 2.1.0 host USBHUB 68 GET_STATUS Response [Port 2] (Port Status: 0x2f0 | Port Change: 0x0)
49 1.367981 host 2.1.0 USBHUB 64 GET_STATUS Request [Port 2]
50 1.368010 2.1.0 host USBHUB 68 GET_STATUS Response [Port 2] (Port Status: 0x340 | Port Change: 0x30)
In frame 40, the Port Status changes to 0x350 by setting PORT_RESET,
acknowledging the Warm Reset request. It then changes to 0x2f0 in frame 42,
which corresponds to the Polling state. The C_PORT_RESET flag is still not set,
so we shouldn't be transitioning to this state yet! (we need to end in Rx.Detect)
At last in frame 50, the Port Change flag is set to 0x30, corresponding to the
C_PORT_RESET bit. The Port Status is now back to 0x340, indicating the port is
back to Compliance Mode.
The driver then clears the Port Change flags, and starts issuing the Warm Reset
again. This repeats until PORT_RESET_TRIES, which is 5 for the Bionic
kernel. This process is also the cause of all the log spam, since even without
debug information this section of the driver has a lot of messages.
ID TIME SOURCE DEST PROT LEN INFO
51 1.368020 host 2.1.0 USBHUB 64 CLEAR_FEATURE Request [Port 2: C_PORT_RESET]
52 1.368031 2.1.0 host USBHUB 64 CLEAR_FEATURE Response [Port 2: C_PORT_RESET]
53 1.368035 host 2.1.0 USBHUB 64 CLEAR_FEATURE Request [Port 2: C_BH_PORT_RESET]
54 1.368045 2.1.0 host USBHUB 64 CLEAR_FEATURE Response [Port 2: C_BH_PORT_RESET]
55 1.368049 host 2.1.0 USBHUB 64 CLEAR_FEATURE Request [Port 2: C_PORT_LINK_STATE]
56 1.368058 2.1.0 host USBHUB 64 CLEAR_FEATURE Response [Port 2: C_PORT_LINK_STATE]
57 1.368062 host 2.1.0 USBHUB 64 CLEAR_FEATURE Request [Port 2: C_PORT_CONNECTION]
58 1.368071 2.1.0 host USBHUB 64 CLEAR_FEATURE Response [Port 2: C_PORT_CONNECTION]
59 1.368074 host 2.1.0 USBHUB 64 GET_STATUS Request [Port 2]
60 1.368085 2.1.0 host USBHUB 68 GET_STATUS Response [Port 2] (Port Status: 0x340 | Port Change: 0x0)
61 1.368093 host 2.1.0 USBHUB 64 SET_FEATURE Request [Port 2: BH_PORT_RESET]
62 1.368100 2.1.0 host USBHUB 64 SET_FEATURE Response [Port 2: BH_PORT_RESET]
The thing is, the technicians swapped out the USB 3 cable which connected one of the
rear USB 3 ports to the USB 3 hub with many devices for a different cable from a
different manufacturer. The problem vanished, and it was assumed to be a bad batch
of cables.
The problem returned again when the before mentioned patch was backported to the Ubuntu
kernel, and caused this "regression".
>
> My guess is that without the patch the USB 3 part of the hub is not working either.
> Patch probably helps hub driver discover there is a USB 3 device in a SS_INVALID link
> state in the first place, and starts recovering it by warm resetting it.
>
You may very well be right, considering our previous experience with the "bad" USB 3 cable.
How can we determine which device might be in this SS_INVALID link state? Should we do
another usbmon binary dump with a recent mainline kernel and go from there?
If the current warm reset regression is the same as what we previously saw when there was
a "bad" USB cable, does the non-compliant behaviour indicate that this is a hardware problem
and not a kernel problem?
> If it's an external hub can you try connecting some other USB 3 device first to a
> USB 3 roothub port, and then the hub to another USB 3 roothub port?
>
> This way the USB 3 roothub (bus) should not be suspended when the USB 3 hub is connected.
> Does that help, or change anything
Unfortunately no one has physical access to the server at the moment due to the lockdown,
and we only have remote ssh access. We will try this out once the technicians have access
to their offices again.
Again, thank you for your help.
Matthew
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-04-23 5:47 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-04-21 3:24 [PROBLEM]: Infinite warm reset loops resulting in "Cannot enable. Maybe the USB cable is bad?" messages Matthew Ruffell
2020-04-21 7:37 ` Mathias Nyman
2020-04-23 5:47 ` Matthew Ruffell
2020-04-21 9:30 ` Jan-Marek Glogowski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox