[Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
@ 2026-02-10 17:46 bugzilla-daemon
  2026-02-10 18:04 ` [Bug 221073] " bugzilla-daemon
                   ` (41 more replies)
  0 siblings, 42 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-10 17:46 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

            Bug ID: 221073
           Summary: xHCI host controller dies on resume from s2idle on AMD
                    Strix Halo [1022:1587]
           Product: Drivers
           Version: 2.5
          Hardware: AMD
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: USB
          Assignee: drivers_usb@kernel-bugs.kernel.org
          Reporter: mrh@frame.work
        Regression: No

Created attachment 309339
  --> https://bugzilla.kernel.org/attachment.cgi?id=309339&action=edit
dmesg from Feb 10 2026 reproduction - kernel 6.18.8-200.fc43.x86_64

Hardware: Framework Desktop (AMD Ryzen AI Max 300 Series)/FRANMFCP02
BIOS: 03.04
OS: Fedora 43 (kernel 6.18.8-200.fc43.x86_64)
Reporter: Matt H.— Framework Computer

Affected controller:

c1:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Strix Halo
USB 3.1 xHCI [1022:1587] (prog-if 30 [XHCI])
        Subsystem: Framework Computer Inc. Device 000a
        Flags: bus master, fast devsel, latency 0, IRQ 25, IOMMU group 19
        Memory at 90000000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, IntMsgNum 0
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=1 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010
<?>
        Capabilities: [2a0] Access Control Services
        Kernel driver in use: xhci_hcd

PROBLEM:

The xHCI host controller at 0000:c1:00.4 dies on resume from s2idle.
All USB devices behind this controller are lost. Unbinding and rebinding
the driver restores functionality, proving this is a driver resume path
bug — the hardware is fine.

Reported by Framework customers across multiple distributions:
  - CachyOS (6.18.2, 6.18.8)
  - Debian 13 (6.12.63)
  - Bluefin / Fedora Atomic 43 (6.17.11)

Reproduced by reporter (this report):
  - Fedora 43 (6.18.8-200.fc43.x86_64)

It has also been reported on non-Framework AMD hardware:
  - Lenovo ThinkPad T14 Gen 6 AMD — identical xHCI timeout, identical
    unbind/rebind fix

REPRODUCTION — Feb 10 2026, kernel 6.18.8-200.fc43.x86_64, BIOS 03.04:

  08:10:41 — booted
  08:32:30 — suspended and resumed, controller dead after 22 minutes

  xhci_hcd 0000:c1:00.4: xHCI host not responding to stop endpoint command
  xhci_hcd 0000:c1:00.4: xHCI host controller not responding, assume dead
  xhci_hcd 0000:c1:00.4: HC died; cleaning up

Full dmesg attached.

REGRESSION DATA (cross-distro, same hardware):

  Customer-reported:
  - Kernel 6.12.63 (Debian 13): USB resume fails ~40% of the time
  - Kernel 6.18.2 (CachyOS): USB resume fails 100% of the time
  - Kernel 6.18.8 (CachyOS): USB resume fails 100% of the time

  Reporter-reproduced:
  - Kernel 6.17.1 (Fedora 43): USB resume fails
  - Kernel 6.18.8 (Fedora 43): USB resume fails 100% of the time

The bug exists on 6.12 but is intermittent. By 6.18 it is deterministic.
Something between 6.12 and 6.18 made it worse, but it was already present.

WORKAROUND:

Unbinding and rebinding the xHCI PCI device restores full functionality:

  echo -n "0000:c1:00.4" > /sys/bus/pci/drivers/xhci_hcd/unbind
  sleep 2
  echo -n "0000:c1:00.4" > /sys/bus/pci/drivers/xhci_hcd/bind

This works every time. If the hardware were in a broken state, a driver
rebind would not fix it. The bind path fully reinitializes the controller.
The resume path does not perform the same initialization. This is a kernel
driver bug.

NOT BIOS-SPECIFIC:
  Reproduced across BIOS 3.03, 3.04, and 3.05 on Framework Desktop.

NOT FRAMEWORK-SPECIFIC:
  Same failure on Lenovo ThinkPad T14 Gen 6 AMD with identical symptoms
  and identical workaround.

NOT THE SAME AS BUG #220702 OR #220812:
  Bug #220702 (Strix Halo sleep not working with 6.17 and later) is a VPE
  suspend regression with a specific fix (commit 3925683515e9). That fix
  does not resolve this issue. Bug #220812 (HP ZBook Ultra s2idle failure)
  is the same class, resolved. Our bug predates both — it reproduces on
  6.12 LTS which is unaffected by the VPE commit — and affects non-Strix
  Halo hardware.

NOT THE SAME AS BUG #219824:
  Bug #219824 (cycle bit on link TRBs, fixed in 6.13.7 via commit
  c7c1f3b05c67) is a different xHCI resume failure. That fix does not
  address this issue.

REFERENCES:
  https://github.com/FrameworkComputer/SoftwareFirmwareIssueTracker/issues/163

https://community.frame.work/t/framework-desktop-wired-keyboard-and-mouse-dont-return-after-sleep-linux/76414

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
@ 2026-02-10 18:04 ` bugzilla-daemon
  2026-02-11  6:54 ` bugzilla-daemon
                   ` (40 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-10 18:04 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

mattwork (mrh@frame.work) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                URL|https://github.com/Framewor |
                   |kComputer/SoftwareFirmwareI |
                   |ssueTracker/issues/163      |

--- Comment #1 from mattwork (mrh@frame.work) ---
Update: new finding from GitHub #163.

@davidhubbard isolated iGPU memory allocation as a variable:
- iGPU Memory Size = 96GB: 3/4 suspends = xHCI dead
- iGPU Memory Size = Auto (0.5GB): cannot reproduce

However, my system reproduces at 16GB iGPU allocation. Higher iGPU
memory makes it more likely, but does not eliminate it at lower
settings. This is a severity dial, not an on/off switch.

This makes sense if iGPU memory allocation changes how the PCI root
complex manages power states during s2idle — the xHCI controller
at c1:00.4 shares the same PCI root as the amdgpu at c1:00.0.

See:
https://github.com/FrameworkComputer/SoftwareFirmwareIssueTracker/issues/163#issuecomment-3874652761

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
  2026-02-10 18:04 ` [Bug 221073] " bugzilla-daemon
@ 2026-02-11  6:54 ` bugzilla-daemon
  2026-02-11 23:04 ` bugzilla-daemon
                   ` (39 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-11  6:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #2 from Artem S. Tashkinov (aros@gmx.com) ---
Looks like a regression, please bisect:

https://docs.kernel.org/admin-guide/bug-bisect.html

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
  2026-02-10 18:04 ` [Bug 221073] " bugzilla-daemon
  2026-02-11  6:54 ` bugzilla-daemon
@ 2026-02-11 23:04 ` bugzilla-daemon
  2026-02-12  8:27 ` bugzilla-daemon
                   ` (38 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-11 23:04 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #3 from mattwork (mrh@frame.work) ---
Thanks for looking at this. A few notes on the bisect request:

The regression data in the report already narrows the timeline: the bug exists
on 6.12 LTS (intermittent, ~40%) and is deterministic by 6.18. That's a wide
bisect range — thousands of commits across multiple merge windows — and the
result would land on whichever commit pushed the failure rate from "sometimes"
to "always," not the root cause.

The root cause evidence is already in the report:

- The resume path does not fully reinitialize the xHCI controller. The bind
path does. Unbind/rebind fixes it 100% of the time.

- The controller shares a PCI root complex with the amdgpu (c1:00.0). iGPU
memory allocation correlates with failure rate, suggesting PCI power state
management during s2idle is involved.

- This reproduces on non-Framework hardware (Lenovo ThinkPad T14 Gen 6 AMD)
with identical symptoms and identical workaround, ruling out platform-specific
firmware.

A bisect would confirm which commit worsened an already-broken resume path, but
the xHCI maintainers likely already know what changed in the s2idle resume
sequence for AMD xHCI between 6.12 and 6.18. If a bisect is still needed after
review of the above, I can arrange one, but I'd like to get USB/xHCI maintainer
eyes on the dmesg and the PCI topology first.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (2 preceding siblings ...)
  2026-02-11 23:04 ` bugzilla-daemon
@ 2026-02-12  8:27 ` bugzilla-daemon
  2026-02-12 10:02 ` bugzilla-daemon
                   ` (37 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-12  8:27 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

Artem S. Tashkinov (aros@gmx.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mario.limonciello@amd.com

--- Comment #4 from Artem S. Tashkinov (aros@gmx.com) ---
Mario, please take a look.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (3 preceding siblings ...)
  2026-02-12  8:27 ` bugzilla-daemon
@ 2026-02-12 10:02 ` bugzilla-daemon
  2026-02-12 16:15 ` bugzilla-daemon
                   ` (36 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-12 10:02 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

Michał Pecio (michal.pecio@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michal.pecio@gmail.com

--- Comment #5 from Michał Pecio (michal.pecio@gmail.com) ---
It's intriguing that failure rate varies with kernel version, but bisecting an
intermittent problem is not going to be easy and there is risk that the result
will be boring.

Per dmesg, the crash happens a few seconds, not 22 minutes after suspend. Looks
like you have a fast and reliable repro on 6.18.

There is a warning about resume taking long time, not sure if a result of the
xhci problem or possibly both issues have a common cause elsewhere?

Is this unique to s2idle or can it also happen after suspend to RAM or
hibernation?

Do any particular devices need to be connected for this to happen? Stop
endpoint timeout implies that there exists some endpoint to be stopped in the
first place, unless the driver became completely confused.

Rebinding the driver almost always helps because it resets the HW and all
software state, so this doesn't say much about the cause. It may still be buggy
HW.

The RESET_ON_RESUME quirk may possibly work as bandaid, try
xhci_hcd.quirks=0x80 boot parameter or pass it to xhci_hcd module if using one.
All USB devices will be dropped and re-enumerated on resume.

Can you repro with dynamic debug enabled and collect debugfs data before
unbinding the driver?
mount debugfs -t debugfs /sys/kernel/debug # if not mounted already
echo "module xhci_hcd +p" >/proc/dynamic_debug/control
echo freeze >/sys/power/state or whatever you do to trigger it
tar czf debug.tgz /sys/kernel/debug/usb/xhci/0000:c1:00.4 # PCI address of the
dead chip

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (4 preceding siblings ...)
  2026-02-12 10:02 ` bugzilla-daemon
@ 2026-02-12 16:15 ` bugzilla-daemon
  2026-02-25 11:10 ` bugzilla-daemon
                   ` (35 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-12 16:15 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #6 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
> Is this unique to s2idle or can it also happen after suspend to RAM or
> hibernation?

The hardware doesn't support s3, only s2idle.

> @davidhubbard isolated iGPU memory allocation as a variable:

That's really odd to me to change things.  Are you 100% sure?   This issue
feels like a race condition of some sort, especially if you can unbind and
rebind after the issue happens to recover.

> The RESET_ON_RESUME quirk may possibly work as bandaid, try
> xhci_hcd.quirks=0x80 boot parameter or pass it to xhci_hcd module if using
> one. All USB devices will be dropped and re-enumerated on resume.

This doesn't seem ideal to me if most suspends are fine.  Until we get to the
bottom of this maybe we want a RESET_ON_FAILURE type of behavior instead to use
in the resume routine?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (5 preceding siblings ...)
  2026-02-12 16:15 ` bugzilla-daemon
@ 2026-02-25 11:10 ` bugzilla-daemon
  2026-02-26  8:48 ` bugzilla-daemon
                   ` (34 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-25 11:10 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #7 from Michał Pecio (michal.pecio@gmail.com) ---
Any updates? Has anyone got dynamic debug logs?

May or may not be related to bug 221103. Both issues seem to affect xHCI
controllers embedded in some generations of AMD IGPs and the controllers stop
working during or after resume from suspend, albeit with different symptoms.
The other bug has a quick and reliable repro confirmed by multiple people and
it completely hangs the machine, no recovery is known.

> @davidhubbard isolated iGPU memory allocation as a variable:
Could be due to sharing some resources with the GPU?

> This doesn't seem ideal to me if most suspends are fine.  Until we get to the
> bottom of this maybe we want a RESET_ON_FAILURE type of behavior instead to
> use in the resume routine?
Guess you're right, but I thought some people may find RESET_ON_RESUME
preferable to the status quo until the bug is solved.

The resume routine should already reset the HC if it encounters problems
(except if MMIO reads as 0xffffffff, as in bug 221103). Here resume seems to
complete successfully, but the HC doesn't function properly. Or driver state is
utterly screwed up, like suspending with a Stop Endpoint command still in
flight. Debug logs would help.

Automatic reset on Stop EP timeout or other anomalies is possible in principle
and would help in this case, but it's not implemented.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (6 preceding siblings ...)
  2026-02-25 11:10 ` bugzilla-daemon
@ 2026-02-26  8:48 ` bugzilla-daemon
  2026-02-26  8:50 ` bugzilla-daemon
                   ` (33 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-26  8:48 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

Alexander F (superveridical@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |superveridical@gmail.com

--- Comment #8 from Alexander F (superveridical@gmail.com) ---
Created attachment 309470
  --> https://bugzilla.kernel.org/attachment.cgi?id=309470&action=edit
Z13 dmesg with dynamic debug on 6.19.3

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (7 preceding siblings ...)
  2026-02-26  8:48 ` bugzilla-daemon
@ 2026-02-26  8:50 ` bugzilla-daemon
  2026-02-26  9:30 ` bugzilla-daemon
                   ` (32 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-26  8:50 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #9 from Alexander F (superveridical@gmail.com) ---
Created attachment 309471
  --> https://bugzilla.kernel.org/attachment.cgi?id=309471&action=edit
Z13 dynamic debug folder after the issue

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (8 preceding siblings ...)
  2026-02-26  8:50 ` bugzilla-daemon
@ 2026-02-26  9:30 ` bugzilla-daemon
  2026-02-26  9:37 ` bugzilla-daemon
                   ` (31 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-26  9:30 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #10 from Alexander F (superveridical@gmail.com) ---
Also experience this issue on ASUS Z13 (2025) with Strix Halo

BIOS: 311
OS: Gentoo Linux, sys-kernel/vanilla-kernel-6.19.3

The probability of it occurring on resume is between 1/3 and 1/10, resume with
the issue takes 10 seconds, and after that all devices (internal cameras) that
are connected to this particular bus disappear. The unbind/bind restores the
devices.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (9 preceding siblings ...)
  2026-02-26  9:30 ` bugzilla-daemon
@ 2026-02-26  9:37 ` bugzilla-daemon
  2026-02-26 12:16 ` bugzilla-daemon
                   ` (30 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-26  9:37 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #11 from Michał Pecio (michal.pecio@gmail.com) ---
Thanks.
This looks like we aren't getting IRQs or there is some problem with the IRQ
handler.

In dmesg (grep 0000:c4:00.4) we have the following:

// resuming the host controller and some root hub port manipulation
[  +0.190938] xhci_hcd 0000:c4:00.4: Setting command ring address to 0xffffe001
[  +0.002557] xhci_hcd 0000:c4:00.4: xhci_resume: starting usb1 port polling.
[  +0.000192] xhci_hcd 0000:c4:00.4: xhci_hub_status_data: stopping usb2 port
polling
[  +0.002367] xhci_hcd 0000:c4:00.4: xhci_hub_status_data: stopping usb1 port
polling
[  +0.000194] xhci_hcd 0000:c4:00.4: Get port status 1-1 read: 0xe63, return
0x507
[  +0.000007] xhci_hcd 0000:c4:00.4: Get port status 1-1 read: 0xe63, return
0x507
[  +0.000005] xhci_hcd 0000:c4:00.4: clear USB_PORT_FEAT_SUSPEND
[  +0.000001] xhci_hcd 0000:c4:00.4: PORTSC 0e63
[  +0.000008] xhci_hcd 0000:c4:00.4: Set port 1-1 link state, portsc: 0xe63,
write 0x10fe1
[  +0.009957] xhci_hcd 0000:c4:00.4: Get port status 2-1 read: 0x2a0, return
0x2a0
[  +0.006769] xhci_hcd 0000:c4:00.4: Set port 1-1 link state, portsc: 0xfe3,
write 0x10e01
[  +0.013286] xhci_hcd 0000:c4:00.4: Get port status 1-1 read: 0x400e03, return
0x40503
[  +0.000040] xhci_hcd 0000:c4:00.4: clear port1 suspend/resume change, portsc:
0xe03

// About 5 seconds later, a control URB is unlinked.
// This usually means timeout.
// It was probably an attempt to resume some USB device.
[  +4.026739] xhci_hcd 0000:c4:00.4: Cancel URB 000000003faffdb9, dev 1, ep
0x0, starting at offset 0xffff5960
[  +0.000065] xhci_hcd 0000:c4:00.4: // Ding dong!

// We try to stop this control EP, but no confirmation in 5 seconds.
[  +5.114893] xhci_hcd 0000:c4:00.4: Command timeout, USBSTS: 0x00000018 EINT
PCD
[  +0.000015] xhci_hcd 0000:c4:00.4: xHCI host not responding to stop endpoint
command

The command ring shows a single command at 0xffffe000 as expected:

 0 0x00000000ffffe000: Stop Ring Command: slot 1 sp 0 ep 1 flags C

The event ring shows completion of a transfer at 0xffff5980, which is probably
the last TRB of the URB starting at 0xffff5960, and completion of a command at
0xffffe000.

 0 0x00000000ffffd5f0: TRB 00000000ffff5980 status 'Success' len 0 slot 1 ep 1
type 'Transfer Event' flags e:C
 0 0x00000000ffffd600: TRB 00000000ffff5990 status 'Stopped - Length Invalid'
len 0 slot 1 ep 1 type 'Transfer Event' flags e:C
 0 0x00000000ffffd610: TRB 00000000ffffe000 status 'Success' len 0 slot 1 ep 0
type 'Command Completion Event' flags e:C

So it seems that the HW performed the control transfer and then stopped the
endpoint as requested, but we never learned about it.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (10 preceding siblings ...)
  2026-02-26  9:37 ` bugzilla-daemon
@ 2026-02-26 12:16 ` bugzilla-daemon
  2026-02-26 12:18 ` bugzilla-daemon
                   ` (29 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-26 12:16 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #12 from Alexander F (superveridical@gmail.com) ---
Created attachment 309475
  --> https://bugzilla.kernel.org/attachment.cgi?id=309475&action=edit
Z13 dmesg with xhci_hcd.quirks=0x80

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (11 preceding siblings ...)
  2026-02-26 12:16 ` bugzilla-daemon
@ 2026-02-26 12:18 ` bugzilla-daemon
  2026-02-26 22:51 ` bugzilla-daemon
                   ` (28 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-26 12:18 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #13 from Alexander F (superveridical@gmail.com) ---
Regarding the RESET_ON_RESUME / xhci_hcd.quirks=0x80 quirk -- it doesn't help.
It results in "PM: resume devices took 58.357 seconds" and devices don't appear
without unbind/bind either.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (12 preceding siblings ...)
  2026-02-26 12:18 ` bugzilla-daemon
@ 2026-02-26 22:51 ` bugzilla-daemon
  2026-02-27 14:04 ` bugzilla-daemon
                   ` (27 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-26 22:51 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #14 from Michał Pecio (michal.pecio@gmail.com) ---
Interesting. Your new dmesg shows that the quirk works: all driver state is
discarded and the xHC is reset and configured again.

Also, with or without the quirk, USBSTS on command timeout has the EINT bit
set, which means that IRQ is pending according to the xHC and our handler
hasn't cleared the bit yet, which is the first thing it should do when it runs.

Obviously the HC could be broken and ignoring requests to clear the bit, while
we also have a bug which causes pending events to be missed by the handler
without even logging an error, but I think the most likely and straightforward
explanation is that IRQs really aren't being delivered for some reason.

Unbind/bind possibly helps because it frees and allocates IRQs again. Maybe
disabling MSIs (xhci_hcd.quirks=0x40) would make a difference.

I think at this point I would try emailing linux-pci to see what they think
(I'm not sure if that subsystem monitors bugzilla). Because it looks like the
IRQ simply isn't being reactivated on resume.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (13 preceding siblings ...)
  2026-02-26 22:51 ` bugzilla-daemon
@ 2026-02-27 14:04 ` bugzilla-daemon
  2026-03-02 16:45 ` bugzilla-daemon
                   ` (26 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-02-27 14:04 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

Mathias Nyman (mathias.nyman@linux.intel.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mathias.nyman@linux.intel.c
                   |                            |om

--- Comment #15 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
So xhci interrupt handler does not get called when it should.

The debugfs dump and xhci dynamic debug dmesg show that the controller handled
all events, even completed the control transfer that the driver assumed timed
out, and thus cancelled. Event ring shows all the events:

0x00000000ffffd5d0: TRB 00000000ffffe020 status 'Success' len 0 slot 1 ep 0
type 'Command Completion Event' flags e:C

0x00000000ffffd5e0: TRB 0000000001000000 status 'Success' len 0 slot 0 ep 0
type 'Port Status Change Event' flags e:C

0x00000000ffffd5f0: TRB 00000000ffff5980 status 'Success' len 0 slot 1 ep 1
type 'Transfer Event' flags e:C

0x00000000ffffd600: TRB 00000000ffff5990 status 'Stopped - Length Invalid' len
0 slot 1 ep 1 type 'Transfer Event' flags e:C

0x00000000ffffd610: TRB 00000000ffffe000 status 'Success' len 0 slot 1 ep 0
type 'Command Completion Event' flags e:C
 0 0x00000000ffffd620: type 'UNKNOWN' -> raw 00000000 00000000 00000000
00000000

at event ring address 0ffffd5f0 we see the successful completion of a transfer
ending at 0xffff5980, This would be the control transfer that dmesg shows
driver cancelled due to timeout. control transfer are 2-3 TRBs long, so
starting at 0xffff5960 and end at 0xffff5980.

[  +4.026739] xhci_hcd 0000:c4:00.4: Cancel URB 000000003faffdb9, dev 1, ep
0x0, starting at offset 0xffff5960

debugfs shows that event ring dequeue is at x00000000ffffd5e0, meaning this is
the
last event the driver is aware of.

So xhci interrupt handler was not called for some reason even if xHC handled
all events, and USBSTS register EINT bit indicates xHC tried to inform of the
pending unhandled events.

Only xHCI side issue I can think of is that xhci driver for some odd reason
turned off interrupt generation by clearing USBCMD register INTE bit. Otherwise
I would look at PCI MSI/MSIX generation, or if something is preventing
interrupts on this cpu

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (14 preceding siblings ...)
  2026-02-27 14:04 ` bugzilla-daemon
@ 2026-03-02 16:45 ` bugzilla-daemon
  2026-03-02 18:08 ` bugzilla-daemon
                   ` (25 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-02 16:45 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #16 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
I'm wondering if the patch suggested as part of bug 221103
(https://bugzilla.kernel.org/attachment.cgi?id=309444&action=diff) might help
this issue.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (15 preceding siblings ...)
  2026-03-02 16:45 ` bugzilla-daemon
@ 2026-03-02 18:08 ` bugzilla-daemon
  2026-03-02 18:14 ` bugzilla-daemon
                   ` (24 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-02 18:08 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #17 from Michał Pecio (michal.pecio@gmail.com) ---
That code won't even run, because the HC resumes normally. And it appears to
function normally too, we just aren't getting interrupts so things time out and
the driver gives up trying.

Considering that we have two similar bugs affecting different generations of
same hardware and (in this case) xHCI reset doesn't help but rebinding the PCI
driver does, I think a power management bug in HW or PCI/x86 subsystems is more
likely.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (16 preceding siblings ...)
  2026-03-02 18:08 ` bugzilla-daemon
@ 2026-03-02 18:14 ` bugzilla-daemon
  2026-03-02 19:05 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-02 18:14 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #18 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
Ah right, thanks.  I tend to think this feels like a race.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (17 preceding siblings ...)
  2026-03-02 18:14 ` bugzilla-daemon
@ 2026-03-02 19:05 ` bugzilla-daemon
  2026-03-03 14:54 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-02 19:05 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #19 from Alexander F (superveridical@gmail.com) ---
Created attachment 309514
  --> https://bugzilla.kernel.org/attachment.cgi?id=309514&action=edit
Z13 dmesg with xhci_hcd.quirks=0x40

The xhci_hcd.quirks=0x40 quirk resolves the 10 second resume issue, the devices
on this bus also don't disappear and work nominally. Not sure if there are side
effects to this quirk. Since I don't know what to look for without "HC died", I
uploaded a dmesg with the number of resume/suspends that usually triggers the
issue.

Regarding the brokenness of my device -- it's highly likely. I thought that it
was the improper interaction with / "fragility" of the hardware/firmware that
causes the issues described below. I assumed I should just quietly wait for the
next AGESA, but it seems that it could just as well be some kind of latent
hardware failure mode (a common type of failure, or some QC issue?). It's just
strange to my inexperienced self that it manifests itself so similarly for a
number of people on different devices in the same way. The debug data reports
from others are definitely required before making a decision on this, since my
sample might not be representative. (I think it would be even better if the
reporter from Framework sent the device on which it could be repeatedly
reproduced to AMD)

Here is circumstantial evidence for my device/soc being broken, over a month+
of semi-active use, the issues are seemingly adjacent to
sleep/restart/hibernate actions and IRQs (I apologize for the non-technical
nature, and if it's inappropriate, but since I'm the only who provided debug
data so far, and I'm not a hardware person, that's all I could do to provide
the context. I'm prepared to supply any data requested):

- No mass complaints about 10 second resumes on Linux here
https://www.reddit.com/r/FlowZ13/ and there is a good number of Linux users.
- I primarily used this device with linux.
- The first issues I encountered were audible latency issues on battery on
pre-installed windows -- LatencyMon showed 30000 to 50000 (i.e. several full
frames) interprocess stutters for me, during a 3 minute test, while simply
playing a video in a browser at about 8-10w soc power.  But since I wasn't the
only one
https://www.reddit.com/r/FlowZ13/comments/1jcgfn9/2025_395_audio_cracking/
(note the screenshots, these are not mine) I thought it was "normal".
Interestingly, clean installation of 23h2 windows cleared this issue
completely. Upgrade to 25h2 returned the increase in latencies on battery power
from about 150 to 400, but the horrific 10k+ stutters didn't return. If it's a
hardware issue it could be that it was "fixed" just be due to the less bloat on
a clean install. 
- On Linux I only observed smaller stutters using
https://testufo.com/animation-time-graph#scale=20&measure=interval , I'm yet to
do it properly using a lightweight compositor that is not mutter + mangohud +
vkcube.  Several times on Linux liveusbs I encountered "hrtimer interrupt took
2million+ nanoseconds" messages in dmesg, but not on the up to date kernel and
firmware on Gentoo though, so far.
- The device could simply die(as in you have to power it on as if it was off)
on suspend / restart action. This is rare, but it's at least 1/100 - 1/200
chance. Detaching power / USB devices could  be a factor, not sure. 
- Two times it died while in use. One of those was shortly after hibernation
resumption on linux. The other one was on windows, but I don't remember the
context, probably also after hibernation, since ASUS software forces it. I'm
afraid to explore repeatability of this, since it's a tablet, and powercycling
through battery disconnection requires special tools.
- Haven't observed any instability outside of power related actions -- I
compiled an entire Gentoo on it(-j16), played heavy games for hour+, left it on
idle for days. So far my understanding is that if it survives a minute after a
power action it will be rock solid until the next power action.
- The device could accrue "brokenness" that is preserved between "soft" resets.
This is also rare. For example one time it was systematically doing very long
boots due to getting stuck on a btusb device, and it was cleared only by hard
reset. In this state it was similarly stuck on power on at the bios stage (logo
appeared with a significant delay) 

Feb 13 22:36:46 gentoo kernel: usb 3-3: device descriptor read/64, error -110
Feb 13 22:37:02 gentoo kernel: usb 3-3: device descriptor read/64, error -110
Feb 13 22:37:02 gentoo kernel: usb 3-3: new high-speed USB device number 3
using xhci_hcd
Feb 13 22:37:07 gentoo kernel: usb 3-3: device descriptor read/64, error -110
Feb 13 22:37:23 gentoo kernel: usb 3-3: device descriptor read/64, error -110
Feb 13 22:37:23 gentoo kernel: usb usb3-port3: attempt power cycle
Feb 13 22:37:24 gentoo kernel: usb 3-3: new high-speed USB device number 4
using xhci_hcd
Feb 13 22:37:29 gentoo kernel: xhci_hcd 0000:c6:00.0: Timeout while waiting for
setup device command
Feb 13 22:37:34 gentoo kernel: xhci_hcd 0000:c6:00.0: Timeout while waiting for
setup device command
Feb 13 22:37:34 gentoo kernel: usb 3-3: device not accepting address 4, error
-62
Feb 13 22:37:35 gentoo kernel: usb 3-3: new high-speed USB device number 5
using xhci_hcd
Feb 13 22:37:40 gentoo kernel: xhci_hcd 0000:c6:00.0: Timeout while waiting for
setup device command
Feb 13 22:37:42 gentoo systemd-udevd[595]: usb3: Worker [673] processing
SEQNUM=2836 is taking a long time.
Feb 13 22:37:45 gentoo kernel: xhci_hcd 0000:c6:00.0: Timeout while waiting for
setup device command
Feb 13 22:37:45 gentoo kernel: usb 3-3: device not accepting address 5, error
-62
Feb 13 22:37:45 gentoo kernel: usb usb3-port3: unable to enumerate USB device

- The other time it had caught a bout of systematically dying on restart. Was
also cleared by a hard reset (holding power button for some N>10 seconds).

While there are similar reports in /r/FlowZ13, if such state of the device was
prevalent it would be more notable. There is also a possibility that the stuff
above is independent of this particular bug.

Miscellaneous (likely unrelated):
- There is a back RGB backlight window usb hid device (3-5) and I observed it
in the 3 states: Right after a hard reset it's turned off both on DC and
battery, as it should be per ArmoryCrate settings. Power off / power on cycle
turns it on for some reason, until the next hard reset. But in that turned on
state it could be responsive to systemd and echo commands (systemd turns it off
after a short flare up on resume), but could become unresponsive, and in that
case I have to do the following to turn it off.

echo "3-5" > /sys/bus/usb/drivers/usb/unbind; sleep 0.5; echo "3-5" >
/sys/bus/usb/drivers/usb/bind; sleep 0.5; echo 0 >
/sys/class/leds/asus\:\:kbd_backlight_1/brightness

I managed to reproduce it repeatedly by having it ON after a power on/off
cycle, and then resuming once on battery, and after that on every resume you
have to unbind/bind it. Other devices on this bus don't have resume issues. 

Other likely-real bugs I won't be filing/reporting/commenting on since my
device is likely broken(just putting it out here on the internet):
- cat /sys/kernel/debug/dri/0/amdgpu_pm_info and amdgpu_top report the wrong
soc power wattage while on battery -- when I disconnect DC, on idle, soc power
instantly jumps from 3-something w to 6w. upower --dump and powerstat show 1w+
lower settled actual battery discharge rate, which should be impossible.
- encountered https://gitlab.freedesktop.org/drm/amd/-/issues/5000 over HDMI to
LG 27GR93U-B at 4k 120hz, wasn't able to reproduce it a second time after
trying for an hour

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (18 preceding siblings ...)
  2026-03-02 19:05 ` bugzilla-daemon
@ 2026-03-03 14:54 ` bugzilla-daemon
  2026-03-03 14:55 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 14:54 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

David Hubbard (david.c.hubbard@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |david.c.hubbard@gmail.com

--- Comment #20 from David Hubbard (david.c.hubbard@gmail.com) ---
Created attachment 309523
  --> https://bugzilla.kernel.org/attachment.cgi?id=309523&action=edit
comment16-cap01.tgz

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (19 preceding siblings ...)
  2026-03-03 14:54 ` bugzilla-daemon
@ 2026-03-03 14:55 ` bugzilla-daemon
  2026-03-03 14:55 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 14:55 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #21 from David Hubbard (david.c.hubbard@gmail.com) ---
Created attachment 309524
  --> https://bugzilla.kernel.org/attachment.cgi?id=309524&action=edit
comment16-cap02.tgz

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (20 preceding siblings ...)
  2026-03-03 14:55 ` bugzilla-daemon
@ 2026-03-03 14:55 ` bugzilla-daemon
  2026-03-03 14:56 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 14:55 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #22 from David Hubbard (david.c.hubbard@gmail.com) ---
Created attachment 309525
  --> https://bugzilla.kernel.org/attachment.cgi?id=309525&action=edit
comment16-dmesg01.gz

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (21 preceding siblings ...)
  2026-03-03 14:55 ` bugzilla-daemon
@ 2026-03-03 14:56 ` bugzilla-daemon
  2026-03-03 15:05 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 14:56 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #23 from David Hubbard (david.c.hubbard@gmail.com) ---
Created attachment 309526
  --> https://bugzilla.kernel.org/attachment.cgi?id=309526&action=edit
comment16-dmesg02.gz

I'm fascinated with diagnosing the xHC USBSTS with the EINT bit set, using the
instructions in comment #5. I'm zeroing in on that. Maybe the issues with
resume taking 10 seconds in comment #19 can be considered separately in a
different bug?

My kernel has hand-applied patches I backported from
https://github.com/mkopec/linux/tree/hdmi_frl to 6.17.7 (ROCm 6.4 is still 10%
faster than newer ROCm versions). I also applied the Mediatek driver patches in
https://zbowling.github.io/mt7925/.

Anyway, I took a few interesting captures:

Cap1: used power button to resume. Can anyone else confirm this is a
workaround? The bug did not appear for this attempt.

* For some reason, debugfs files were all 0 in size?

Cap2: used space bar on keyboard to resume. Bug appeared.

* For some reason, debugfs files were all 0 in size?

# free
               total        used        free      shared  buff/cache  
available
Mem:        32636908     4008008    27282920       35664     1764500   
28628900
Swap:              0           0           0

# (this shows 32GB free because the BIOS setting is iGPU = custom, iGPU memory
= 96GB)
# echo "module xhci_hcd +p" >/proc/dynamic_debug/control
# systemctl suspend
# (wait 2 minutes, press power button to resume, no bug)
# tar czf cap01.tgz /sys/kernel/debug/usb/xhci/0000\:c2\:00.4
# dmesg | gzip > dmesg01.gz
# systelctl shutdown

# (press power button, wait for system to boot)
# echo "module xhci_hcd +p" >/proc/dynamic_debug/control
# systemctl suspend
# (wait 2 minutes, press spacebar, get this in dmesg)
[  243.002269] xhci_hcd 0000:c2:00.4: xHCI host controller not responding,
assume dead
# (unplug usb hub from c2:00.4 and plug it into c4:00.0, keyboard and mouse
alive again)
# tar czf cap02.tgz /sys/kernel/debug/usb/xhci/0000\:c2\:00.4
# dmesg | gzip > dmesg02.gz

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (22 preceding siblings ...)
  2026-03-03 14:56 ` bugzilla-daemon
@ 2026-03-03 15:05 ` bugzilla-daemon
  2026-03-03 15:47 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 15:05 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #24 from Alexander F (superveridical@gmail.com) ---
Debugfs files were zeroed because tar gzing them directly doesn't work(due to
stat call, or something else). You should cp it into a temp directory before
archiving.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (23 preceding siblings ...)
  2026-03-03 15:05 ` bugzilla-daemon
@ 2026-03-03 15:47 ` bugzilla-daemon
  2026-03-03 15:51 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 15:47 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #25 from David Hubbard (david.c.hubbard@gmail.com) ---
Created attachment 309531
  --> https://bugzilla.kernel.org/attachment.cgi?id=309531&action=edit
comment26-dmesg03.gz

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (24 preceding siblings ...)
  2026-03-03 15:47 ` bugzilla-daemon
@ 2026-03-03 15:51 ` bugzilla-daemon
  2026-03-03 16:59 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 15:51 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #26 from David Hubbard (david.c.hubbard@gmail.com) ---
Created attachment 309532
  --> https://bugzilla.kernel.org/attachment.cgi?id=309532&action=edit
comment25-cap03.tgz

Ok tried it again:

Cap3:

# free
               total        used        free      shared  buff/cache  
available
Mem:        32636916     3709384    27486748       37088     1822492   
28927532
Swap:              0           0           0
# (this shows 32GB free because the BIOS setting is iGPU = custom, iGPU memory
= 96GB)
# echo "module xhci_hcd +p" >/proc/dynamic_debug/control
# systemctl suspend
# (wait 2 minutes, press spacebar, get this in dmesg)
[  183.231723] xhci_hcd 0000:c2:00.4: xHCI host controller not responding,
assume dead
# (unplug usb hub from c2:00.4 and plug it into c4:00.0, keyboard and mouse
alive again)
# cd /; find sys/kernel/debug/usb/xhci/0000\:c2\:00.4 -type f | while read a;
do echo "$a"; mkdir -p "/home/user/c/$(dirname "$a")"; cp "$a"
"/home/user/c/$a"; done
sys/kernel/debug/usb/xhci/0000:c2:00.4/port_bandwidth/FS_BW
cp: error reading
'sys/kernel/debug/usb/xhci/0000:c2:00.4/port_bandwidth/FS_BW': Cannot send
after transport endpoint shutdown
sys/kernel/debug/usb/xhci/0000:c2:00.4/port_bandwidth/HS_BW
cp: error reading
'sys/kernel/debug/usb/xhci/0000:c2:00.4/port_bandwidth/HS_BW': Cannot send
after transport endpoint shutdown
sys/kernel/debug/usb/xhci/0000:c2:00.4/port_bandwidth/SS_BW
cp: error reading
'sys/kernel/debug/usb/xhci/0000:c2:00.4/port_bandwidth/SS_BW': Cannot send
after transport endpoint shutdown
sys/kernel/debug/usb/xhci/0000:c2:00.4/ports/port01/portsc
sys/kernel/debug/usb/xhci/0000:c2:00.4/ports/port02/portsc
sys/kernel/debug/usb/xhci/0000:c2:00.4/event-ring/trbs
sys/kernel/debug/usb/xhci/0000:c2:00.4/event-ring/cycle
sys/kernel/debug/usb/xhci/0000:c2:00.4/event-ring/dequeue
sys/kernel/debug/usb/xhci/0000:c2:00.4/event-ring/enqueue
sys/kernel/debug/usb/xhci/0000:c2:00.4/command-ring/trbs
sys/kernel/debug/usb/xhci/0000:c2:00.4/command-ring/cycle
sys/kernel/debug/usb/xhci/0000:c2:00.4/command-ring/dequeue
sys/kernel/debug/usb/xhci/0000:c2:00.4/command-ring/enqueue
sys/kernel/debug/usb/xhci/0000:c2:00.4/reg-ext-dbc:00
sys/kernel/debug/usb/xhci/0000:c2:00.4/reg-ext-protocol:01
sys/kernel/debug/usb/xhci/0000:c2:00.4/reg-ext-protocol:00
sys/kernel/debug/usb/xhci/0000:c2:00.4/reg-ext-legsup:00
sys/kernel/debug/usb/xhci/0000:c2:00.4/reg-runtime
sys/kernel/debug/usb/xhci/0000:c2:00.4/reg-op
sys/kernel/debug/usb/xhci/0000:c2:00.4/reg-cap
# dmesg | gzip > dmesg03.gz

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (25 preceding siblings ...)
  2026-03-03 15:51 ` bugzilla-daemon
@ 2026-03-03 16:59 ` bugzilla-daemon
  2026-03-03 17:05 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 16:59 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #27 from Alexander F (superveridical@gmail.com) ---
>Maybe the issues with resume taking 10 seconds in comment #19 can be
>considered separately in a different bug?

The initial bug reporter's dmesg has it. My dmesg has it. User uioped1's dmesg
on github has it. User Neilson_Soult on the Framework forum has it. So it looks
like it's your issue that could be a different bug. 

Have you tried the 0x40 quirk?

PS: I forgot to mention that I suspect that the instability I've mentioned in
my comment could have environmental nature, and my device could be perfectly
fine, just prone to ESD / static charge build up -- it's been cold here, the
air is unusually dry, with the resulting static build up. (I've been zapping
things)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (26 preceding siblings ...)
  2026-03-03 16:59 ` bugzilla-daemon
@ 2026-03-03 17:05 ` bugzilla-daemon
  2026-03-03 22:57 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 17:05 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #28 from David Hubbard (david.c.hubbard@gmail.com) ---
>So it looks like it's your issue that could be a different bug. 
>
>Have you tried the 0x40 quirk?

I'd like to help with this bug, it's fine if we decide not to resolve the
message "xhci_hcd 0000:c2:00.4: xHCI host controller not responding, assume
dead"

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (27 preceding siblings ...)
  2026-03-03 17:05 ` bugzilla-daemon
@ 2026-03-03 22:57 ` bugzilla-daemon
  2026-03-04  0:20 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-03 22:57 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

Alexander F (superveridical@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
 Attachment #309514|0                           |1
        is obsolete|                            |

--- Comment #29 from Alexander F (superveridical@gmail.com) ---
Created attachment 309536
  --> https://bugzilla.kernel.org/attachment.cgi?id=309536&action=edit
Z13 dmesg with xhci_hcd.quirks=0x40 (fixed)

I apologize -- my previous attachment was truncated.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (28 preceding siblings ...)
  2026-03-03 22:57 ` bugzilla-daemon
@ 2026-03-04  0:20 ` bugzilla-daemon
  2026-03-04  9:15 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-04  0:20 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #30 from Michał Pecio (michal.pecio@gmail.com) ---
(In reply to Alexander F from comment #19)
> The xhci_hcd.quirks=0x40 quirk resolves the 10 second resume issue, the
> devices on this bus also don't disappear and work nominally. Not sure if
> there are side effects to this quirk.
This should be harmless in principle, it just switches from MSI to (emulated)
PCI INTx interrupt.

> - The device could accrue "brokenness" that is preserved between "soft"
> resets. This is also rare. For example one time it was systematically doing
> very long boots due to getting stuck on a btusb device, and it was cleared
> only by hard reset.
Unrelated. Onboard bluetooth chips are known to get flakey and fail to
enumerate until power cycled, which is tricky if they run off a standby rail.

(In reply to David Hubbard from comment #23)
> I'm fascinated with diagnosing the xHC USBSTS with the EINT bit set, using
> the instructions in comment #5. I'm zeroing in on that. Maybe the issues
> with resume taking 10 seconds in comment #19 can be considered separately in
> a different bug?
The delay is a consequence of missing interrupts *and* having some USB device
connected. Your case is similar to the other one.

Sorry about the wrong instructions. Indeed, tar doesn't work with debugfs.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (29 preceding siblings ...)
  2026-03-04  0:20 ` bugzilla-daemon
@ 2026-03-04  9:15 ` bugzilla-daemon
  2026-03-06 11:11 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-04  9:15 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #31 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
you lspci showed MSI-X are used for this controller.
c1:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Strix Halo
USB 3.1 xHCI [1022:1587] (prog-if 30 [XHCI])
        ...
        Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=1 Masked-

Are the other xHCI hosts also using MSI-X, or just this one?

Are there any changes in any of the PCI MSI and MSI-X capabilities fields 
after resume, like address field?
check with lspci -vvv before and after resume.

Does moving the xHC irq to CPU0 before suspend help?
(check actual xhci irq number from /proc/interrupts, assume 25 in example)
echo 1 > /proc/irq/25/smp_affinity

Another thing to try would be to force MSI interrupt instead of MSI-X.
Not sure if there is an easy way to do this, couldn't find a kernel parameter
for it.

One way to do it is to modify xhci driver:

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 585b2f3117b0..3acb6ad86f4e 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -167,7 +167,7 @@ static int xhci_try_enable_msi(struct usb_hcd *hcd)

        /* TODO: Check with MSI Soc for sysdev */
        xhci->nvecs = pci_alloc_irq_vectors(pdev, 1, xhci->nvecs,
-                                           PCI_IRQ_MSIX | PCI_IRQ_MSI);
+                                           PCI_IRQ_MSI);
        if (xhci->nvecs < 0) {
                xhci_dbg_trace(xhci, trace_xhci_dbg_init,
                               "failed to allocate IRQ vectors");

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (30 preceding siblings ...)
  2026-03-04  9:15 ` bugzilla-daemon
@ 2026-03-06 11:11 ` bugzilla-daemon
  2026-03-06 11:40 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-06 11:11 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #32 from Alexander F (superveridical@gmail.com) ---
I tried moving smp_affinity to the CPU0 -- it didn't help, and it failed on the
6th resume. Verified by checking interrupts file again -- the CPU0 column had
acquired 15 interrupts during resumes after the change (before that only CPU16
had 79 interrupts)

I'll try to modify my kernel a bit later, if the other person won't respond.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (31 preceding siblings ...)
  2026-03-06 11:11 ` bugzilla-daemon
@ 2026-03-06 11:40 ` bugzilla-daemon
  2026-03-09 10:31 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-06 11:40 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #33 from Alexander F (superveridical@gmail.com) ---
Created attachment 309553
  --> https://bugzilla.kernel.org/attachment.cgi?id=309553&action=edit
Z13 lspci -vvv

lspci-pre was made right after boot up, before resumes, lspci-after was made
after a 10 second resume on 4th attempt. The offending HC c4:00.4 seemingly
doesn't have changes to MSI/MSI-X capabilities section.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (32 preceding siblings ...)
  2026-03-06 11:40 ` bugzilla-daemon
@ 2026-03-09 10:31 ` bugzilla-daemon
  2026-03-11 22:09 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-09 10:31 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #34 from Michał Pecio (michal.pecio@gmail.com) ---
Created attachment 309583
  --> https://bugzilla.kernel.org/attachment.cgi?id=309583&action=edit
debug suspend/resume

There are some error flags set on DevSta and changes to MSI capabilities of
other functions on c4:00 device, but IDK if it means anything. Though it would
be suspicious if those changes only occur during problematic resumes.

Mathias suggested that CMD_INTE may be getting cleared inadvertently. I don't
see how it could happen, but just in case here's patch which logs xHCI
registers relevant to interrupts during suspend, resume and on command timeout.

It also checks for pending commands on suspend (shouldn't happen, but who
knows) and explicitly disables and reenables IRQ generation at xHCI layer. I
gave it a quick test and it seems to be OK, but you could also try without
those two blocks which manipulate CMD_EIE in xhci->op_regs->command.

I also tried removing only the part which enables interrupts on resume and got
a failure similar to yours, as expected.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (33 preceding siblings ...)
  2026-03-09 10:31 ` bugzilla-daemon
@ 2026-03-11 22:09 ` bugzilla-daemon
  2026-03-12  0:04 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-11 22:09 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #35 from Alexander F (superveridical@gmail.com) ---
Created attachment 309621
  --> https://bugzilla.kernel.org/attachment.cgi?id=309621&action=edit
Z13 klogs with reg dump and MSI-only patches

Sorry for the delay.

I wanted to be succinct but the circumstances make it impossible. My main OS
uses zfs root which taints the kernel, so I've been reporting from a copy of
the system modified to run from a 32gb usb stick to avoid taint. (Also to
isolate all the testing modifications) So far all the prior reports I used
6.19.3 kernel and 20260110 firmware. To make my reports more relevant I decided
to update the version of the kernel to the latest stable release of 6.19.6 and
20260221 firmware. But that caused an unforeseen issue of the system not being
able to reach the proper sleep state:

    Mar 10 19:14:57 rescue-flow kernel: amd_pmc AMDI000B:00: Last suspend
didn't reach deepest state 

I isolated the issue to the firmware upgrade, downgraded the firmware to
20260110 version, and it became like it was before. I don't know whether it's a
bug, firmware-kernel version compatibility discrepancy, potential brokenness of
my device manifesting itself, or something else, but I have to mention that I
used `make localmodconfig` from a 6.19.3 system in order to avoid long build
times (due to building all the modules) on the slow usb stick, and on the
laptop in general. 

What's weird/interesting is that when I tried to make sure that the issue is
reproducible after updates and encountered "didn't reach deepest state", I
still managed to trigger the "HC died" issue -- I did it 2 times and both times
it took around 30 tries, which is much less frequent than the usual state,
where I need 3-7 tries. (klog-deepest-state file)

So I returned to vanilla-sources-6.19.6 manually built kernel and 20260110
firmware, patched the kernel with the register dump patch. Interestingly it was
a little harder to trigger than before, but I didn't do enough runs to say
definitively. The files are klog-pecio-patch and klog-pecio-patch2.

I then applied the patch that makes the HC use the MSI interrupt(had to
manually erase it since the patch wasn't working with that version of the
kernel). I provided lspci files of the effect. I was not able to reproduce the
issue with that patch. I had to automate the suspend/resume cycles with `while
true; do sleep 5; rtcwake -m freeze -s 7; if dmesg | grep -q "HC died"; then
break; fi; done;` I did about 70 cycles. I'll do at least 200 later to confirm.

Since there are power issues involved I also provided 3 reports from amd-s2idle
tool. Two of them are from the newer and the older firmwares on the live usb,
and one of them is from the main zfs system with the 6.18.10 kernel, which I
included to show the following log entries in the 46th cycle:

ACPI: \_SB_.PCI0.GPP3: LPI: Constraint not met; min power state:D1 current
power state:D0
ACPI: \_SB_.PCI0.GPP6: LPI: Constraint not met; min power state:D3hot current
power state:D0
ACPI: \_SB_.PCI0.GP10: LPI: Device not power manageable
ACPI: \_SB_.PCI0.GPP0.SWUS: LPI: Constraint not met; min power state:D3hot
current power state:D0
ACPI: \_SB_.PCI0.GPP1.SWUS: LPI: Constraint not met; min power state:D3hot
current power state:D0
ACPI: \_SB_.PCI0.GPP5.WLAN: LPI: Device not power manageable

which for some reason (missing modules due to localmodconfig, misconfiguration,
newer and vanilla kernel, don't know) I wasn't able to make the tool produce on
liveusb, neither in the test nor in the report mode. But the output above was
produced for a what I think is regular working suspend, and I think also
without any potentially broken state from "HC died". I don't remember if it was
with 0x40 quirk or not. Not sure if it's nominal sleep discharge rate -- it's a
bit high. I can do more digging if any of that is important. 

>There are some error flags set on DevSta 

These flags only appear after the "HC died" occurs. (That event also adds
(warning) taint.) I verified that by running `lspci -vvv | grep DevSta: | grep
+` before/after every resume, and NonFatalError on all c4:00 devices flips only
after the event.

Files: 
klog-deepest-state -- kernel log of of an attempt to trigger the issue with the
newer firmware, didn't enable debug output for that
lspci-right-after-boot -- lspci for older firmware right after boot
lspci-0221-firmware-after-boot -- lspci for newer firmware 
klog-pecio-patch, klog-pecio-patch2 - register dump patched debug output, use
any of the two
lspci-msi-patched-right-after-boot -- shows that MSI, not MSI-X interrupts are
enabled
klog-msi-patched - kernel log with the two patches. I had to trim it, since the
issue wasn't triggered.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (34 preceding siblings ...)
  2026-03-11 22:09 ` bugzilla-daemon
@ 2026-03-12  0:04 ` bugzilla-daemon
  2026-03-12  6:49 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-12  0:04 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #36 from Mario Limonciello (AMD) (mario.limonciello@amd.com) ---
> I isolated the issue to the firmware upgrade, downgraded the firmware to
> 20260110 version, and it became like it was before. I don't know whether it's
> a bug, firmware-kernel version compatibility discrepancy, potential
> brokenness of my device manifesting itself, or something else, but I have to
> mention that I used `make localmodconfig` from a 6.19.3 system in order to
> avoid long build times (due to building all the modules) on the slow usb
> stick, and on the laptop in general. 

There was a linux-firmware regression with amdxdna.  If you want to use that
firmware release then you can blacklist it to avoid it.  Otherwise the month
before or the one just tagged will work fine.

> What's weird/interesting is that when I tried to make sure that the issue is
> reproducible after updates and encountered "didn't reach deepest state", I
> still managed to trigger the "HC died" issue -- I did it 2 times and both
> times it took around 30 tries, which is much less frequent than the usual
> state, where I need 3-7 tries. (klog-deepest-state file)

This is actually an interesting data point because it means that hardware sleep
has no bearing on this issue.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (35 preceding siblings ...)
  2026-03-12  0:04 ` bugzilla-daemon
@ 2026-03-12  6:49 ` bugzilla-daemon
  2026-03-12 10:35 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-12  6:49 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #37 from Michał Pecio (michal.pecio@gmail.com) ---
Can you try this magic trick which solves all recent problems with AMD xHCI?

echo on > /sys/bus/pci/devices/0000:c4:00.4/power/control

You log shows the PCI device being runtime autosuspended and resumed
immediately before suspending again for s2idle. The above disables autosuspend.

Other than that, nothing really stands out. Suspend clears the RUN bit and INTE
bit (my addition), on resume registers are at their defaults and get restored
to pre-suspend state, INTE and RUN are enabled again. A timeout they are still
the same, except for sts/erdp flags indicating that interrupt is pending.

(In reply to Alexander F from comment #35)
> So I returned to vanilla-sources-6.19.6 manually built kernel and 20260110
> firmware, patched the kernel with the register dump patch. Interestingly it
> was a little harder to trigger than before, but I didn't do enough runs to
> say definitively. The files are klog-pecio-patch and klog-pecio-patch2.
Maybe it's a race condition and logging slows things down. Bottom line, it's
not a matter of xHCI interrupt control bits being cleared.

> >There are some error flags set on DevSta 
> 
> These flags only appear after the "HC died" occurs. (That event also adds
> (warning) taint.)
The warning is unimportant, it's just USB resume taking unexpectedly long. We
know it already :)

I think all we got is just more evidence that it's a PCI or x86 architecture
problem, not USB. I would mail linux-pci, or at least reassign the bug to
drivers/PCI (but not sure if this subsystem pays attention to bugzilla).

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (36 preceding siblings ...)
  2026-03-12  6:49 ` bugzilla-daemon
@ 2026-03-12 10:35 ` bugzilla-daemon
  2026-03-14  4:29 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-12 10:35 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #38 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
(In reply to Alexander F from comment #35)

> >There are some error flags set on DevSta 
> 
> These flags only appear after the "HC died" occurs. (That event also adds
> (warning) taint.) I verified that by running `lspci -vvv | grep DevSta: |
> grep +` before/after every resume, and NonFatalError on all c4:00 devices
> flips only after the event.

Interesting, was PCI DevSta: NonFatalErr+ ever set with the 'Forced MSI only'
patch after resume?

i.e. Does MSI-X usage on xHC trigger the DevSta: NonFatalErr+, causing xHC
interrupt handler to hot be called

or,

is there something else causing PCI DevSta: NonFatalErr+ in resume which for
some reason only affects/omits MSI-X handler while MSI work and handler is
called as it should.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (37 preceding siblings ...)
  2026-03-12 10:35 ` bugzilla-daemon
@ 2026-03-14  4:29 ` bugzilla-daemon
  2026-03-16  0:39 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-14  4:29 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

bugzilla@logical.ink changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla@logical.ink

--- Comment #39 from bugzilla@logical.ink ---
Also experiencing this bug on Framework 13 AMD Ryzen AI 300 using Fedora. 

### System Information

| Field | Value |
|---|---|
| Machine | Framework Laptop 13 (AMD Ryzen AI 300 Series) |
| CPU | AMD Ryzen AI 9 HX 370 (Hawk Point, 24 threads) |
| GPU | AMD Radeon 890M (integrated) |
| BIOS | FRANMGCP09 v03.05 (2025-10-30) |
| RAM | 64 GB DDR5 |
| Kernel (at time of event) | 6.18.12-100.fc42.x86_64 |
| Kernel (current) | 6.18.16-100.fc42.x86_64 |
| Distribution | Fedora 42 |
| xHCI controller | 0000:c1:00.4 |
| xHCI hcc params | 0x0118ffc5 |
| xHCI version | 0x120 (USB 3.1) |
| xHCI quirks at init | 0x0000000200000010 |

The affected USB device is `usb 1-1`: a full-speed internal device (device
number 2, bus 1) which corresponds to the Framework laptop's internal input
device hub (trackpad, keyboard controller, fingerprint reader).

Relevant Logs:
**Failing resume (44-minute sleep, lid closed):**
```
Mar 13 22:40:43 kernel: usb 1-1: reset full-speed USB device number 2 using
xhci_hcd
Mar 13 22:40:43 kernel: PM: suspend entry (s2idle)

[~44 minutes elapse]

Mar 13 23:25:33 kernel: PM: suspend devices took 0.162 seconds
Mar 13 23:25:33 kernel: PM: resume devices took 0.285 seconds
Mar 13 23:25:33 systemd-logind: Lid opened.
Mar 13 23:25:33 kernel: PM: suspend exit
                                                          <-- usb 1-1 reset
ABSENT
Mar 13 23:26:13 kernel: PM: suspend entry (s2idle)        <-- next suspend,
still no usb 1-1 reset
Mar 13 23:26:18 kernel: PM: suspend exit
                                                          <-- usb 1-1 reset
still ABSENT
Mar 13 23:27:09 kernel: PM: suspend entry (s2idle)
Mar 13 23:28:56 kernel: PM: suspend exit
                                                          <-- usb 1-1 reset
still ABSENT
[reboot required to restore input]
```

**xHCI controller initialization (boot):**
```
kernel: xhci_hcd 0000:c1:00.4: xHCI Host Controller
kernel: xhci_hcd 0000:c1:00.4: new USB bus registered, assigned bus number 1
kernel: xhci_hcd 0000:c1:00.4: hcc params 0x0118ffc5 hci version 0x120 quirks
0x0000000200000010
```

Additional Context:
- The xHCI controller at `0000:c1:00.4` already has quirks `0x0000000200000010`
applied at init. The `XHCI_RESET_ON_RESUME` bit (0x80) is not present and may
need to be added to the AMD platform quirk table for this hardware.
- `bluetoothd` logs `Controller resume with wake event 0x0` on all resumes
including the failing one, indicating the Bluetooth/wakeup path is not the
differentiating factor.
- No kernel error or warning is emitted at the time of the failed USB reinit —
the failure is silent.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (38 preceding siblings ...)
  2026-03-14  4:29 ` bugzilla-daemon
@ 2026-03-16  0:39 ` bugzilla-daemon
  2026-03-17  0:03 ` bugzilla-daemon
  2026-03-18 23:18 ` bugzilla-daemon
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-16  0:39 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #40 from Alexander F (superveridical@gmail.com) ---
>echo on > /sys/bus/pci/devices/0000:c4:00.4/power/control

no effect.

>The warning is unimportant,

Yeah, I understand. I (likely mistakenly) assumed that whatever sets taint on
the kernel also flips the NonFatalError. If "DevSta:" can only be set by the
hardware internally, than of course it's a different matter.  

>was PCI DevSta: NonFatalErr+ ever set with the 'Forced MSI only' patch after
>resume?

Never. I did 300 cycles with MSI-only patch with no issues and it was never set
to +. The only concerning things in dmesg during these 300 cycles were multiple
(about 9) errors of this kind:

amdgpu 0000:c4:00.0: amdgpu: Register(1) [regVPEC_QUEUE_RESET_REQ_6_1_1] failed
to reach value 0x00000000 != 0x00000001n
amdgpu 0000:c4:00.0: amdgpu: VPE queue reset failed

>i.e. Does MSI-X usage on xHC trigger the DevSta: NonFatalErr+, causing xHC
>interrupt handler to hot be called
>is there something else causing PCI DevSta: NonFatalErr+ in resume which for
>some reason only affects/omits MSI-X handler while MSI work and handler is
>called as it should.

Unfortunately, I'm not equipped to find that out. I can imagine it's possible
to write a kernel module(or modify an existing one) that tests that, but that's
beyond me. My understanding ends at the system call boundary.

>I think all we got is just more evidence that it's a PCI or x86 architecture
>problem, not USB. I would mail linux-pci

I can probably do that, but I'm not really confident that my device is
functioning properly hardware-wise, and I wouldn't be wasting everyone's time.
If I had access to another sample of the device, that was not self-selected, I
would  at least be able to tell that it reproduces on a randomly sampled device
beside mine. Unfortunately the bugreport starter with access to multiple
samples is MIA for some reason.

...

Meanwhile I think I determined the source of instability I had during the
sleep/restart actions. I had a working hypothesis that it's static zaps, and I
happened to pretty severely zap something in the device through a (rather thin)
keyboard key recently, severely enough to force my desktop's monitor, that only
has common connection with the Z13 through mains, to shutdown momentarily,
likely due to power protection circuitry in its PSU. (There is also no
grounding wire in this house) The device functioned nominally, but the moment I
tried to suspend it after that zap it died, and I had to longpress the power
button. It means I did at least 5-7 similar level zaps, and it could have of
course damaged something. All of this could mean nothing, but that makes me
less confident that I have a properly functioning device.

There are 4-7 people complaining of this issue on Linux, so it means at least
100 users with their devices in the similar state. Not everyone reports issues
of course -- absolutely real bugs get 1-2 reporters on drm/amd for example, so
the number could be greater. Could it be that this number of people also zapped
their devices, and did the same kind of latent damage to the whatever machinery
responsible for the MSI-X interrupt? Sounds kind of implausible. So if it
doesn't manifest on all devices the only other reason I can think of is
something to do with manufacturing.

I think we need more people supplying debug data to be sure before bothering
the other subsystems. But I would do as you recommend. And the issue looks like
something hardware/firmware related, i.e. beyond the level of the kernel.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (39 preceding siblings ...)
  2026-03-16  0:39 ` bugzilla-daemon
@ 2026-03-17  0:03 ` bugzilla-daemon
  2026-03-18 23:18 ` bugzilla-daemon
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-17  0:03 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #41 from David Hubbard (david.c.hubbard@gmail.com) ---
I have a Framework Desktop and can repro the issue. I'm willing to gather debug
data if Michał Pecio or Mathias Nyman can comment with what is needed.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [Bug 221073] xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587]
  2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
                   ` (40 preceding siblings ...)
  2026-03-17  0:03 ` bugzilla-daemon
@ 2026-03-18 23:18 ` bugzilla-daemon
  41 siblings, 0 replies; 43+ messages in thread
From: bugzilla-daemon @ 2026-03-18 23:18 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221073

--- Comment #42 from Alexander F (superveridical@gmail.com) ---
>I'm willing to gather debug data

Perhaps you could test your system with the MSI-only patch(just erase MSI-X
manually if it doesn't apply), and confirm that it behaves nominally with that
alteration. Also confirm NonFatalError+ on the entire PCI device when the issue
occurs. That would be helpful for submitting it to the other subsystems, since
we would have evidence that not only my Z13 behaves this way.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2026-03-18 23:18 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-10 17:46 [Bug 221073] New: xHCI host controller dies on resume from s2idle on AMD Strix Halo [1022:1587] bugzilla-daemon
2026-02-10 18:04 ` [Bug 221073] " bugzilla-daemon
2026-02-11  6:54 ` bugzilla-daemon
2026-02-11 23:04 ` bugzilla-daemon
2026-02-12  8:27 ` bugzilla-daemon
2026-02-12 10:02 ` bugzilla-daemon
2026-02-12 16:15 ` bugzilla-daemon
2026-02-25 11:10 ` bugzilla-daemon
2026-02-26  8:48 ` bugzilla-daemon
2026-02-26  8:50 ` bugzilla-daemon
2026-02-26  9:30 ` bugzilla-daemon
2026-02-26  9:37 ` bugzilla-daemon
2026-02-26 12:16 ` bugzilla-daemon
2026-02-26 12:18 ` bugzilla-daemon
2026-02-26 22:51 ` bugzilla-daemon
2026-02-27 14:04 ` bugzilla-daemon
2026-03-02 16:45 ` bugzilla-daemon
2026-03-02 18:08 ` bugzilla-daemon
2026-03-02 18:14 ` bugzilla-daemon
2026-03-02 19:05 ` bugzilla-daemon
2026-03-03 14:54 ` bugzilla-daemon
2026-03-03 14:55 ` bugzilla-daemon
2026-03-03 14:55 ` bugzilla-daemon
2026-03-03 14:56 ` bugzilla-daemon
2026-03-03 15:05 ` bugzilla-daemon
2026-03-03 15:47 ` bugzilla-daemon
2026-03-03 15:51 ` bugzilla-daemon
2026-03-03 16:59 ` bugzilla-daemon
2026-03-03 17:05 ` bugzilla-daemon
2026-03-03 22:57 ` bugzilla-daemon
2026-03-04  0:20 ` bugzilla-daemon
2026-03-04  9:15 ` bugzilla-daemon
2026-03-06 11:11 ` bugzilla-daemon
2026-03-06 11:40 ` bugzilla-daemon
2026-03-09 10:31 ` bugzilla-daemon
2026-03-11 22:09 ` bugzilla-daemon
2026-03-12  0:04 ` bugzilla-daemon
2026-03-12  6:49 ` bugzilla-daemon
2026-03-12 10:35 ` bugzilla-daemon
2026-03-14  4:29 ` bugzilla-daemon
2026-03-16  0:39 ` bugzilla-daemon
2026-03-17  0:03 ` bugzilla-daemon
2026-03-18 23:18 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox