linux-usb.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Bug 219824] New: [6.13 regression] USB controller just died
@ 2025-02-26 22:23 bugzilla-daemon
  2025-02-26 22:28 ` [Bug 219824] " bugzilla-daemon
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-26 22:23 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

            Bug ID: 219824
           Summary: [6.13 regression] USB controller just died
           Product: Drivers
           Version: 2.5
          Hardware: AMD
                OS: Linux
            Status: NEW
          Severity: blocking
          Priority: P3
         Component: USB
          Assignee: drivers_usb@kernel-bugs.kernel.org
          Reporter: aros@gmx.com
        Regression: No

This is the first time it's happened to me.

My USB mouse just died.

The system was more or less idle, then I got these messages from the kernel:

[109773.985092] xhci_hcd 0000:c3:00.3: xHCI host not responding to stop
endpoint command
[109773.998577] xhci_hcd 0000:c3:00.3: xHCI host controller not responding,
assume dead
[109773.998622] xhci_hcd 0000:c3:00.3: HC died; cleaning up
[109773.998668] xhci_hcd 0000:c3:00.3: Timeout while waiting for stop endpoint
command
[109773.998740] usb 1-2: USB disconnect, device number 2
[109774.032612] usb 1-3: USB disconnect, device number 3
[109774.033087] usb 1-4: USB disconnect, device number 4

This has never happened before with any of previous kernels, 6.9, 6.10, 6.11,
6.12.

Now on 6.13.4 this happened a few minutes after the system resumed.

That looks like a major regression.

The kernel didn't try anything.

Unbinding and binding the USB endpoint in /sys using this script has fixed the
mouse but I never had to do that before:

https://unix.stackexchange.com/a/704342

My lspci:

c3:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon
High Definition Audio Controller
c3:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD
Audio Controller
c3:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Phoenix
CCP/PSP 3.0 Device
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Data Fabric;
Function 7
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host
Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host
Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host
Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host
Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Dummy Host
Bridge
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Phoenix IOMMU
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
c3:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD]
ACP/ACP3X/ACP6x Audio Coprocessor (rev 63)
01:00.0 Network controller: MEDIATEK Corp. MT7922 802.11ax PCI Express Wireless
Network Adapter
c4:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc.
[AMD] Phoenix Dummy Function
c5:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc.
[AMD] Phoenix Dummy Function
02:00.0 Non-Volatile memory controller: Micron Technology Inc 3400 NVMe SSD
[Hendrix]
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h
USB4/Thunderbolt PCIe tunnel
00:04.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 19h
USB4/Thunderbolt PCIe tunnel
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix GPP Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Internal GPP
Bridge to Bus [C:A]
00:08.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Internal GPP
Bridge to Bus [C:A]
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Phoenix Internal GPP
Bridge to Bus [C:A]
c4:00.1 Signal processing controller: Advanced Micro Devices, Inc. [AMD] AMD
IPU Device
c3:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor
Fusion Hub
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
c3:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15b9
c3:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15ba
c5:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15c0
c5:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Device 15c1
c5:00.5 USB controller: Advanced Micro Devices, Inc. [AMD] Pink Sardine
USB4/Thunderbolt NHI controller #1
c5:00.6 USB controller: Advanced Micro Devices, Inc. [AMD] Pink Sardine
USB4/Thunderbolt NHI controller #2
c3:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Phoenix1 (rev d4)

My lsusb:

Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 0489:e0f2 Foxconn / Hon Hai Wireless_Device
Bus 001 Device 003: ID 06cb:00f0 Synaptics, Inc. 
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 002: ID 0408:545f Quanta Computer, Inc. HP 5MP Camera
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 008 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub

I'm not bisecting this issue because so far it's happened just once and I've no
idea how to trigger it. Yet it has never happened before with previous kernels.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
@ 2025-02-26 22:28 ` bugzilla-daemon
  2025-02-26 22:31 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-26 22:28 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #1 from Artem S. Tashkinov (aros@gmx.com) ---
I'm utterly confused as to why the kernel decided to "xHCI host not responding
to stop endpoint command".

I didn't do anything at the time. Wasn't even using the mouse.

Something funky is going on with 6.13.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
  2025-02-26 22:28 ` [Bug 219824] " bugzilla-daemon
@ 2025-02-26 22:31 ` bugzilla-daemon
  2025-02-26 22:47 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-26 22:31 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #2 from Artem S. Tashkinov (aros@gmx.com) ---
This was reported in the SUSE bug tracker earlier:

https://bugzilla.suse.com/show_bug.cgi?id=1236992

I don't see it being reported here, so the issue is not new.

Yet I see no patches queued for 6.13.5.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
  2025-02-26 22:28 ` [Bug 219824] " bugzilla-daemon
  2025-02-26 22:31 ` bugzilla-daemon
@ 2025-02-26 22:47 ` bugzilla-daemon
  2025-02-27 12:58 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-26 22:47 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #3 from Artem S. Tashkinov (aros@gmx.com) ---
The SUSE issue is seemingly unrelated, please dismiss.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (2 preceding siblings ...)
  2025-02-26 22:47 ` bugzilla-daemon
@ 2025-02-27 12:58 ` bugzilla-daemon
  2025-02-27 15:12 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-27 12:58 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #4 from Artem S. Tashkinov (aros@gmx.com) ---
This just happened again:

[161470.836493] PM: resume devices took 0.547 seconds
[161470.836720] OOM killer enabled.
[161470.836721] Restarting tasks ... done.
[161470.839715] random: crng reseeded on system resumption
[161470.845090] PM: suspend exit
[161471.322491] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: Firmware: 400a4
vendor: 0x2 v0.43.1, 2 algorithms
[161471.324469] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1:
cirrus/cs35l41-dsp1-spk-prot-103c8b72.bin: v0.43.1
[161471.324480] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: DSP1: spk-prot:
D:\Amp Tuning\HP\840\0930\103C8B45_220930.bin
[161471.403951] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Calibration applied:
R0=10446
[161471.407392] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Calibration applied:
R0=10526
[161471.432157] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.1: Firmware Loaded -
Type: spk-prot, Gain: 17
[161471.433916] cs35l41-hda i2c-CSC3551:00-cs35l41-hda.0: Firmware Loaded -
Type: spk-prot, Gain: 17
[161471.523827] hp_wmi: Unknown event_id - 131073 - 0x0
[162644.637587] xhci_hcd 0000:c3:00.3: xHCI host not responding to stop
endpoint command
[162644.651068] xhci_hcd 0000:c3:00.3: xHCI host controller not responding,
assume dead
[162644.651076] xhci_hcd 0000:c3:00.3: HC died; cleaning up
[162644.651099] xhci_hcd 0000:c3:00.3: Timeout while waiting for stop endpoint
command
[162644.651102] usb 1-2: USB disconnect, device number 4
[162644.678374] usb 1-3: USB disconnect, device number 2
[162644.678748] usb 1-4: USB disconnect, device number 3

Shortly after resume all the USB ports are disabled.

I'm reverting back to Linux 6.11. I cannot use my device like this.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (3 preceding siblings ...)
  2025-02-27 12:58 ` bugzilla-daemon
@ 2025-02-27 15:12 ` bugzilla-daemon
  2025-02-27 16:05 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-27 15:12 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

Mathias Nyman (mathias.nyman@linux.intel.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mathias.nyman@linux.intel.c
                   |                            |om

--- Comment #5 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
6.13 has a lot of changes related to endpoint stopping:

e21ebe51af68 xhci: Turn NEC specific quirk for handling Stop Endpoint errors
generic
474538b8dd1c usb: xhci: Avoid queuing redundant Stop Endpoint commands
484c3bab2d5d usb: xhci: Fix TD invalidation under pending Set TR Dequeue
42b758137601 usb: xhci: Limit Stop Endpoint retries

Endpoints are stopped in order to cancel transfers, before suspend, and to soft
reset an endpoint after clearing a halt. 

I understand that bisecting an issue like this that triggers rarely isn't an
option, but can I ask you to try running 6.13 with xhci dynamic debug enabled.

mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
and send dmesg after issue is triggered.

It could reveal a bit more what's going on

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (4 preceding siblings ...)
  2025-02-27 15:12 ` bugzilla-daemon
@ 2025-02-27 16:05 ` bugzilla-daemon
  2025-02-27 17:14 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-27 16:05 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #6 from Artem S. Tashkinov (aros@gmx.com) ---
(In reply to Mathias Nyman from comment #5)
> I understand that bisecting an issue like this that triggers rarely isn't an
> option, but can I ask you to try running 6.13 with xhci dynamic debug
> enabled.

Will do as soon as possible. Thanks a lot!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (5 preceding siblings ...)
  2025-02-27 16:05 ` bugzilla-daemon
@ 2025-02-27 17:14 ` bugzilla-daemon
  2025-02-27 21:07 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-27 17:14 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

Michał Pecio (michal.pecio@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michal.pecio@gmail.com

--- Comment #7 from Michał Pecio (michal.pecio@gmail.com) ---
Which exact versions were you running successfully and for how long?

These patches listed by Mathias are instant first suspects, but they were all
backported to v6.12.7 in December. Most of them also to v6.11.11 in early
December and later in January to some LTS series.

Any chance that hibernation is indeed a (delayed) trigger and you weren't doing
it as often in the past?

Did you come across similar reports from stable kernel branches in this year?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (6 preceding siblings ...)
  2025-02-27 17:14 ` bugzilla-daemon
@ 2025-02-27 21:07 ` bugzilla-daemon
  2025-02-28 19:49 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-27 21:07 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #8 from Artem S. Tashkinov (aros@gmx.com) ---
> Which exact versions were you running successfully and for how long?

Kernel 6.12.14 that I was running earlier didn't have this issue.

Used software suspend/resume multiple times successfully.

> Any chance that hibernation is indeed a (delayed) trigger and you weren't
> doing it as often in the past?

Not using hibernation, just software suspend.

I've not changed anything software-wise except installing a new kernel on this
laptop.

> Did you come across similar reports from stable kernel branches in this year?

I've Googled a couple of times already for this exact error message and nothing
turned up.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (7 preceding siblings ...)
  2025-02-27 21:07 ` bugzilla-daemon
@ 2025-02-28 19:49 ` bugzilla-daemon
  2025-02-28 20:10 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-28 19:49 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #9 from Artem S. Tashkinov (aros@gmx.com) ---
Created attachment 307725
  --> https://bugzilla.kernel.org/attachment.cgi?id=307725&action=edit
xhci_hcd and usb debug log

(In reply to Mathias Nyman from comment #5)
> 6.13 has a lot of changes related to endpoint stopping:
> 
> e21ebe51af68 xhci: Turn NEC specific quirk for handling Stop Endpoint errors
> generic
> 474538b8dd1c usb: xhci: Avoid queuing redundant Stop Endpoint commands
> 484c3bab2d5d usb: xhci: Fix TD invalidation under pending Set TR Dequeue
> 42b758137601 usb: xhci: Limit Stop Endpoint retries
> 
> Endpoints are stopped in order to cancel transfers, before suspend, and to
> soft reset an endpoint after clearing a halt. 
> 
> I understand that bisecting an issue like this that triggers rarely isn't an
> option, but can I ask you to try running 6.13 with xhci dynamic debug
> enabled.
> 
> mount -t debugfs none /sys/kernel/debug
> echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
> echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
> and send dmesg after issue is triggered.
> 
> It could reveal a bit more what's going on

I'm confused.

If I resume the laptop and don't run these three commands immediately, all the
USB ports eventually die (usually under 5 minutes).

If I resume the laptop and run these commands immediately, USB ports continue
working like they always did before. So, weirdly and unexpectedly, when
debugging is on ... it fixes the issue.

If I resume the laptop, don't run these commands, and then when the USB ports
die I run them, there are no further messages from the xhci_hcd module.

I'm attaching the debug log (again, no bug here) regardless. Maybe it contains
something that will let you understand what is going on.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (8 preceding siblings ...)
  2025-02-28 19:49 ` bugzilla-daemon
@ 2025-02-28 20:10 ` bugzilla-daemon
  2025-03-03 13:54 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-02-28 20:10 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #10 from Michał Pecio (michal.pecio@gmail.com) ---
What if you enable only this one thing? Does anything show up under normal use
or before the HC dies (if it still does)?

echo 'func xhci_handle_cmd_stop_ep +p' >/proc/dynamic_debug/control

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (9 preceding siblings ...)
  2025-02-28 20:10 ` bugzilla-daemon
@ 2025-03-03 13:54 ` bugzilla-daemon
  2025-03-03 15:42 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-03 13:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #11 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
(In reply to Artem S. Tashkinov from comment #9)
> Created attachment 307725 [details]
> xhci_hcd and usb debug log

Thanks, It does show some last minute urb canceling for endpoints
on the 1-3 device before suspend, and some more canceling after resume.
Also overflow/underflow messages after resume indicating that endpoint
might be started early.   

> 
> I'm confused.
> 
> If I resume the laptop and don't run these three commands immediately, all
> the USB ports eventually die (usually under 5 minutes).
> 
> If I resume the laptop and run these commands immediately, USB ports
> continue working like they always did before. So, weirdly and unexpectedly,
> when debugging is on ... it fixes the issue.

dynamic debug adds delays, and the code that starts and stops endpoints is
a bit timing sensitive. Could be that enabling debug hides the issue.

Can you still run one more try with xhci tracing instead of dynamic debug?
It does not affect timing as much:

mount -t debugfs none /sys/kernel/debug
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
echo 1 > /sys/kernel/debug/tracing/tracing_on
< Reproduce issue >
Send content of /sys/kernel/debug/tracing/trace

The trace file grows fast so copy it as soon as possible after issue is
triggered.

Thanks

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (10 preceding siblings ...)
  2025-03-03 13:54 ` bugzilla-daemon
@ 2025-03-03 15:42 ` bugzilla-daemon
  2025-03-03 19:23 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-03 15:42 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #12 from Artem S. Tashkinov (aros@gmx.com) ---
(In reply to Michał Pecio from comment #10)
> What if you enable only this one thing? Does anything show up under normal
> use or before the HC dies (if it still does)?
> 
> echo 'func xhci_handle_cmd_stop_ep +p' >/proc/dynamic_debug/control

The same issue. Without debug the USB controller dies in a few minutes, with
just this line it is working just fine.

> Can you still run one more try with xhci tracing instead of dynamic debug?
It does not affect timing as much:

I will try this ASAP.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (11 preceding siblings ...)
  2025-03-03 15:42 ` bugzilla-daemon
@ 2025-03-03 19:23 ` bugzilla-daemon
  2025-03-03 22:38 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-03 19:23 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #13 from Michał Pecio (michal.pecio@gmail.com) ---
Hmm, can enabling a debug message which never gets printed really make a big
difference? Maybe I'm crazy, but I wonder if it's possible that your
interaction with USB devices while entering those commands somehow prevents (or
at least delays) the failure?

I suppose that both dynamic debug and tracing settings persist across a suspend
cycle, so if you still have no luck, maybe try setting everything up before
suspending and then do nothing after resuming?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (12 preceding siblings ...)
  2025-03-03 19:23 ` bugzilla-daemon
@ 2025-03-03 22:38 ` bugzilla-daemon
  2025-03-06 11:15 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-03 22:38 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #14 from Michał Pecio (michal.pecio@gmail.com) ---
I think I found it.

Does it help if you revert 36b972d4b7cef?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (13 preceding siblings ...)
  2025-03-03 22:38 ` bugzilla-daemon
@ 2025-03-06 11:15 ` bugzilla-daemon
  2025-03-06 11:54 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-06 11:15 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

bugzilla@academo.me changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@academo.me

--- Comment #15 from bugzilla@academo.me ---
I found this issue happening this after resuming from sleep. Downgrading to
Kernel 6.12.9 (6.12.9-arch1-1) removed the issue completely. The error message
was the same.

I am not much knowledgeable in reporting or helping with kernel issues but if
you give me instructions of what I can do to maybe test a patch and report back
I can help.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (14 preceding siblings ...)
  2025-03-06 11:15 ` bugzilla-daemon
@ 2025-03-06 11:54 ` bugzilla-daemon
  2025-03-06 11:57 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-06 11:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #16 from Artem S. Tashkinov (aros@gmx.com) ---
(In reply to bugzilla from comment #15)
> I found this issue happening this after resuming from sleep. Downgrading to
> Kernel 6.12.9 (6.12.9-arch1-1) removed the issue completely. The error
> message was the same.
> 
> I am not much knowledgeable in reporting or helping with kernel issues but
> if you give me instructions of what I can do to maybe test a patch and
> report back I can help.

Please follow the instructions in
https://bugzilla.kernel.org/show_bug.cgi?id=219824#c11

I'm currently on vacation, I left my mouse at home, so I cannot debug this.

BTW, here's an interesting tidbit: even with debugging enabled in comment 11 I
wasn't able to get my USB ports die but (!) on a second resume they die
_immediately_. So, maybe I just had to suspend/resume twice and then generate
the debug data. Sadly, I just ran out of time.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (15 preceding siblings ...)
  2025-03-06 11:54 ` bugzilla-daemon
@ 2025-03-06 11:57 ` bugzilla-daemon
  2025-03-06 12:03 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-06 11:57 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #17 from Artem S. Tashkinov (aros@gmx.com) ---
Also, if you're able to compile your kernel, please try to revert commit
36b972d4b7cef on top of e.g. 6.13.5 and check it you can reproduce the issue.

The patch can be downloaded here:
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=36b972d4b7cef5d098de63fee8d00720c051f335

You'll need to `patch -R < patch` it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (16 preceding siblings ...)
  2025-03-06 11:57 ` bugzilla-daemon
@ 2025-03-06 12:03 ` bugzilla-daemon
  2025-03-07 23:28 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-06 12:03 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #18 from Artem S. Tashkinov (aros@gmx.com) ---
Looks like there are multiple people affected by this regression, reports have
started to trickle in:

https://bbs.archlinux.org/viewtopic.php?pid=2229550

BTW https://lore.kernel.org/lkml/20250304085139.4610e8ff@foxbook/ has been
queued for stable, looks like 6.13.6 will have the issue fixed.

Please check.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (17 preceding siblings ...)
  2025-03-06 12:03 ` bugzilla-daemon
@ 2025-03-07 23:28 ` bugzilla-daemon
  2025-03-08  6:56 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-07 23:28 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Bisected commit-id|                            |36b972d4b7cef5d098de63fee8d
                   |                            |00720c051f335
             Status|NEW                         |RESOLVED
         Resolution|---                         |CODE_FIX

--- Comment #19 from Artem S. Tashkinov (aros@gmx.com) ---
(In reply to Michał Pecio from comment #14)
> I think I found it.
> 
> Does it help if you revert 36b972d4b7cef?

People claim it fixes the issue.

Please actually submit the revert to stable as 6.13.6 has been released without
it and neither 6.14-rc has seemingly seen it.

And seeing "Reported by:" would be nice. Thanks.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (18 preceding siblings ...)
  2025-03-07 23:28 ` bugzilla-daemon
@ 2025-03-08  6:56 ` bugzilla-daemon
  2025-03-14  8:42 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-08  6:56 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #20 from Michał Pecio (michal.pecio@gmail.com) ---
Sorry about Reported-by, but if I added that then they would also want a Closes
and I wasn't sure if it really is this bug. It's something I produced on my
machine under my workload, using knowledge of how one commit is broken.

I guess I could have taken a gamble and tagged it anyway because it seemed
likely, but I had no confirmation and no idea if I would get any by the end of
the week. Indeed, to this day the best I've seen is reports that downgrading to
6.12 helps, I haven't heard from anyone previously affected reverting the
suspect patch.

Like Mathias, I was expecting some nightmare scenario and not that. I hope that
this is all there was to it, but we will really know next week.

My patch made it to Greg's usb-linus branch, which normally means it will land
in the next -rc tomorrow and then propagate to stable. No other -rc has it
because they were released before it existed.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (19 preceding siblings ...)
  2025-03-08  6:56 ` bugzilla-daemon
@ 2025-03-14  8:42 ` bugzilla-daemon
  2025-03-14  9:06 ` bugzilla-daemon
  2025-03-14 10:49 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-14  8:42 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #21 from christian.rohmann@frittentheke.de ---
Will this be backported to 6.13? Seem 6.13.7 doesn't have it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (20 preceding siblings ...)
  2025-03-14  8:42 ` bugzilla-daemon
@ 2025-03-14  9:06 ` bugzilla-daemon
  2025-03-14 10:49 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-14  9:06 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #22 from Artem S. Tashkinov (aros@gmx.com) ---
6.13.7 absolutely includes it:

https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.13.7

> commit 80cb8e694110dee4ac6fbf0956ba7439aeb0603d
> Author: Michal Pecio <michal.pecio@gmail.com>
> Date:   Tue Mar 4 13:31:47 2025 +0200
> 
>     usb: xhci: Fix host controllers "dying" after suspend and resume
>     
>     commit c7c1f3b05c67173f462d73d301d572b3f9e57e3b upstream.
>     
>     A recent cleanup went a bit too far and dropped clearing the cycle bit
>     of link TRBs, so it stays different from the rest of the ring half of
>     the time. Then a race occurs: if the xHC reaches such link TRB before
>     more commands are queued, the link's cycle bit unintentionally matches
>     the xHC's cycle so it follows the link and waits for further commands.
>     If more commands are queued before the xHC gets there, inc_enq() flips
>     the bit so the xHC later sees a mismatch and stops executing commands.
>     
>     This function is called before suspend and 50% of times after resuming
>     the xHC is doomed to get stuck sooner or later. Then some Stop Endpoint
>     command fails to complete in 5 seconds and this shows up
>     
>     xhci_hcd 0000:00:10.0: xHCI host not responding to stop endpoint command
>     xhci_hcd 0000:00:10.0: xHCI host controller not responding, assume dead
>     xhci_hcd 0000:00:10.0: HC died; cleaning up
>     
>     followed by loss of all USB decives on the affected bus. That's if you
>     are lucky, because if Set Deq gets stuck instead, the failure is silent.
>     
>     Likely responsible for kernel bug 219824. I found this while searching
>     for possible causes of that regression and reproduced it locally before
>     hearing back from the reporter. To repro, simply wait for link cycle to
>     become set (debugfs), then suspend, resume and wait. To accelerate the
>     failure I used a script which repeatedly starts and stops a UVC camera.
>     
>     Some HCs get fully reinitialized on resume and they are not affected.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 219824] [6.13 regression] USB controller just died
  2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
                   ` (21 preceding siblings ...)
  2025-03-14  9:06 ` bugzilla-daemon
@ 2025-03-14 10:49 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2025-03-14 10:49 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=219824

--- Comment #23 from christian.rohmann@frittentheke.de ---
(In reply to Artem S. Tashkinov from comment #22)
> 6.13.7 absolutely includes it:

> https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.13.7


Ah sorry, my bad. Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-03-14 10:49 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-26 22:23 [Bug 219824] New: [6.13 regression] USB controller just died bugzilla-daemon
2025-02-26 22:28 ` [Bug 219824] " bugzilla-daemon
2025-02-26 22:31 ` bugzilla-daemon
2025-02-26 22:47 ` bugzilla-daemon
2025-02-27 12:58 ` bugzilla-daemon
2025-02-27 15:12 ` bugzilla-daemon
2025-02-27 16:05 ` bugzilla-daemon
2025-02-27 17:14 ` bugzilla-daemon
2025-02-27 21:07 ` bugzilla-daemon
2025-02-28 19:49 ` bugzilla-daemon
2025-02-28 20:10 ` bugzilla-daemon
2025-03-03 13:54 ` bugzilla-daemon
2025-03-03 15:42 ` bugzilla-daemon
2025-03-03 19:23 ` bugzilla-daemon
2025-03-03 22:38 ` bugzilla-daemon
2025-03-06 11:15 ` bugzilla-daemon
2025-03-06 11:54 ` bugzilla-daemon
2025-03-06 11:57 ` bugzilla-daemon
2025-03-06 12:03 ` bugzilla-daemon
2025-03-07 23:28 ` bugzilla-daemon
2025-03-08  6:56 ` bugzilla-daemon
2025-03-14  8:42 ` bugzilla-daemon
2025-03-14  9:06 ` bugzilla-daemon
2025-03-14 10:49 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).