[Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

From: bugzilla-daemon@kernel.org
To: linux-usb@vger.kernel.org
Subject: [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
Date: Tue, 24 Feb 2026 07:45:03 +0000	[thread overview]
Message-ID: <bug-221103-208809-YI3kLaShqa@https.bugzilla.kernel.org/> (raw)
In-Reply-To: <bug-221103-208809@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #19 from Paul Alesius (paul@unnservice.com) ---
(In reply to Michał Pecio from comment #18)
> Could you enable dynamic debug and check if simply toggling power/control
> between 'on' and 'auto' produces the same xhci_suspend/xhci_resume messages?
> Would this be enough to hang the system
Enabling dynamic debug and changing power/control on/auto rapidly produces the
same suspend/resume messages on all devices. Changing control= on and auto
rapidly on 0000:7a:00.4 does not trigger the freeze.

> What's the state of power/control for those HCs which aren't causing
> problems? Are they also getting resumed and suspended under your test, but
> without crashing? That would be at least one optimistic result in this whole
> mess :)
About half of them are on and auto, those with control=auto by default do not
trigger the freeze (except the known-bad 7a:00.4, and I've not stressed the
others as much until arriving at the conclusion that it's 7a:00.4 triggering
the freeze). Here's their default values and notes on them:

 control=on 0000:0e:00.0 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0
root hub
 control=on 0000:0e:00.0 Bus 001 Device 002: ID 13d3:3588 IMC Networks
Wireless_Device (Internal)
 control=on 0000:0e:00.0 Bus 001 Device 003: ID 0b05:19af ASUSTek Computer,
Inc. AURA LED Controller (Internal)
 control=on 0000:0e:00.0 Bus 001 Device 004: ID 046d:c548 Logitech, Inc. Logi
Bolt Receiver (Plugged in)
 control=on 0000:0e:00.0 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0
root hub
 control=on 0000:10:00.0 Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0
root hub
 control=on 0000:10:00.0 Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0
root hub
The 78:00.0 have xhci_pci_suspend -110 errors during boot:
[   17.918387] xhci_hcd 0000:78:00.0: WARN: xHC CMD_RUN timeout
[   17.918508] xhci_hcd 0000:78:00.0: PM: suspend_common(): xhci_pci_suspend
returns -110
[   17.918586] xhci_hcd 0000:78:00.0: can't suspend (hcd_pci_runtime_suspend
returned -110)
 control=auto 0000:78:00.0 Bus 005 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub
 control=auto 0000:78:00.0 Bus 006 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub
 control=auto 0000:7a:00.3 Bus 007 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub
 control=auto 0000:7a:00.3 Bus 008 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub
 control=auto 0000:7a:00.4 Bus 009 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub
This is the root hub that freeze during rapid polling, same PCI ID as the line
above that is unaffected:
 control=auto 0000:7a:00.4 Bus 010 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub
 control=auto 0000:7b:00.0 Bus 011 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub

I then enabled full dynamic debug + netconsole (printk=8):
 $ echo 'module xhci_hcd +p' | sudo tee /proc/dynamic_debug/control
 $ echo 'module usbcore +p' | sudo tee /proc/dynamic_debug/control
 $ echo 'module pci +p' | sudo tee /proc/dynamic_debug/control
 $ echo 8 | sudo tee /proc/sys/kernel/printk

Surprisingly, the system did not freeze for over 20 minutes with 3 instances
polling simultaneously and stress-ng --cpu 0. The moment I killed stress-ng
first by coincidence, the system froze immediately. Netconsole captured this up
until the lockup:
...
[ 1766.915244] xhci_hcd 0000:7a:00.4: PME# disabled
[ 1766.915262] xhci_hcd 0000:7a:00.4: enabling bus mastering
... (normal suspend/resume cycle) ...
[ 1767.170769] xhci_hcd 0000:7a:00.4: PME# disabled
[ 1767.170774] xhci_hcd 0000:7a:00.4: enabling bus mastering
[ 1767.181194] xhci_hcd 0000:7a:00.4: Controller not ready at resume -19
[ 1767.181209] xhci_hcd 0000:7a:00.4: PCI post-resume error -19!
[ 1767.181213] xhci_hcd 0000:7a:00.4: HC died; cleaning up
[ 1767.181222] xhci_hcd 0000:7a:00.4: hcd_pci_runtime_resume: -19
[ 1767.181232] hub 9-0:1.0: state 0 ports 2 chg 0000 evt 0000
[ 1767.181238] hub 10-0:1.0: state 0 ports 2 chg 0000 evt 0000

> There is another bug 221073 about some AMD HCs dying on resume from system
> sleep,
> may be related. So far nobody knows why it happens.
I don't know enough to say whether they are the same root cause, but both
involve an AMD xHC dying on resume, so they may be related.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

next prev parent reply	other threads:[~2026-02-24  7:45 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
2026-02-20  7:30 ` [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms bugzilla-daemon
2026-02-20  8:31 ` bugzilla-daemon
2026-02-20  9:17   ` Greg KH
2026-02-20  9:16 ` bugzilla-daemon
2026-02-20  9:17 ` bugzilla-daemon
2026-02-20  9:24 ` bugzilla-daemon
2026-02-20  9:26 ` bugzilla-daemon
2026-02-20  9:28 ` bugzilla-daemon
2026-02-20  9:40 ` bugzilla-daemon
2026-02-20 10:07 ` bugzilla-daemon
2026-02-20 10:17   ` Greg KH
2026-02-20 10:17 ` bugzilla-daemon
2026-02-20 10:21 ` bugzilla-daemon
2026-02-20 11:19 ` bugzilla-daemon
2026-02-20 14:07 ` bugzilla-daemon
2026-02-20 17:18 ` bugzilla-daemon
2026-02-21  1:12 ` bugzilla-daemon
2026-02-23 13:05 ` bugzilla-daemon
2026-02-23 17:52 ` bugzilla-daemon
2026-02-23 22:33 ` bugzilla-daemon
2026-02-24  7:45 ` bugzilla-daemon [this message]
2026-02-24  8:52 ` bugzilla-daemon
2026-02-24 10:19 ` bugzilla-daemon
2026-02-24 12:03 ` bugzilla-daemon
2026-02-24 12:21 ` bugzilla-daemon
2026-02-24 15:42 ` bugzilla-daemon
2026-03-08 17:56 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-221103-208809-YI3kLaShqa@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=linux-usb@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox