[Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms
@ 2026-02-18 14:52 bugzilla-daemon
  2026-02-20  7:30 ` [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms bugzilla-daemon
                   ` (24 more replies)
  0 siblings, 25 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-18 14:52 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

            Bug ID: 221103
           Summary: xhci_hcd: System lockup under CPU load during rapid
                    usbfs polling of SuperSpeed root hubs on AMD Ryzen
                    platforms
           Product: Drivers
           Version: 2.5
    Kernel Version: 6.12 - 6.19, (and mainline)
          Hardware: AMD
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P3
         Component: USB
          Assignee: drivers_usb@kernel-bugs.kernel.org
          Reporter: paul@unnservice.com
        Regression: No

Created attachment 309401
  --> https://bugzilla.kernel.org/attachment.cgi?id=309401&action=edit
Test program that polls USB nodes and triggers the system freeze

On multiple AMD Ryzen 7000/9000-series systems (X670E/X870 motherboards with
recent AGESA firmware) rapidly polling xHCI device nodes via usbfs can cause a
complete system freeze.

Reproduction:
1. Compile/run the attached usb_poll.c
2. Create CPU load (stress-ng --cpu 0 works reliably)
3. On affected machines the system freezes within a few minutes without any log

The trigger on my system is consistently one of the SuperSpeed (USB 3.x 10
Gbps) xHCI root hubs (1d6b:0003). I cannot confirm if other affected users
freeze on exactly the same device node, but all reports point to the same class
bug polling xHCI.

Kernels affected:
6.12 - 6.18 (LTS + mainline) on openSUSE Tumbleweed, Manjaro, Ubuntu, Gentoo,
Bazzite, Fedora, etc.

I can provide any additional diagnostics needed:
- Full lsusb -t -v and lsusb -vvv
- lspci -vvv

My MB and CPU, others in the ADB bug report have different AMD systems:
- Motherboard: Asus ProArt X870-E CREATOR
- CPU: AMD Ryzen 9 9950X

Originally exposed by ADB’s USB backend (platform-tools ≥ 36.0.2):
https://issuetracker.google.com/issues/472398009 (full discussion with many
affected users)

Tests:

test: Poll all USB devices
method: usb_poll.c
iterations: 600
result: SYSTEM FREEZE opening Bus 010 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub

test: Skip all root hubs (Device 001 on all busses)
method: usb_poll.c
iterations: 2400
result: SUCCESS

test: Only root hubs
method: usb_poll.c
iterations: 159
result: SYSTEM FREEZE opening Bus 010 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub

test: All devices except Bus 010 Device 001
method: usb_poll.c
iterations: 1900
result: SUCCESS

test: Only the specific xHCI root hub
method: usb_poll.c
iterations: 209
result: SYSTEM FREEZE

test: Only the specific xHCI root hub
method: while true; do sudo lsusb -v -s 010:001 >/dev/null; sleep 0.1; done
iterations: 2400
result: SUCCESS

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
@ 2026-02-20  7:30 ` bugzilla-daemon
  2026-02-20  8:31 ` bugzilla-daemon
                   ` (23 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  7:30 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

Michał Pecio (michal.pecio@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |michal.pecio@gmail.com

--- Comment #1 from Michał Pecio (michal.pecio@gmail.com) ---
I tried running this and all ioctl() fail with EPERM because the usbfs file is
opened read only. If it really crashes your system, the bug may be not in USB.

Chances are that it's a panic so I'd try to get the kernel log - serial cable,
netconsole or something like that. At the very least, connect a PS/2 keyboard
and see if the lights begin to blink.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
  2026-02-20  7:30 ` [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms bugzilla-daemon
@ 2026-02-20  8:31 ` bugzilla-daemon
  2026-02-20  9:17   ` Greg KH
  2026-02-20  9:16 ` bugzilla-daemon
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  8:31 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #2 from Paul Alesius (paul@unnservice.com) ---
Try this command: sudo ./usb_poll

It is definitely in USB.
This program triggers the system freeze reliably.
Many of us are affected, as referenced in the ADB bug.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
  2026-02-20  7:30 ` [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms bugzilla-daemon
  2026-02-20  8:31 ` bugzilla-daemon
@ 2026-02-20  9:16 ` bugzilla-daemon
  2026-02-20  9:17 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  9:16 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #3 from Paul Alesius (paul@unnservice.com) ---
Sorry, you may be right that the bug is elsewhere. I just triggered it again
without O_RDWR using a program that also logs the exact point of the freeze.

1. It's always on opening the same USB root hub on my machine.
2. The freeze always occurs on the "    Opening device..."
3. The ioctl seems to complete on the device where the freeze occurs.

PS/2 - We don't have PS/2 ports on these MBs
SysRq - I don't have that on my Logitech Mechanical keyboard

I just tried with "dmesg -W" (follow mode so it prints messages directly) and
it didn't print anything before the system freeze, and there's nothing in the
logs, reflecting the experience of the others in the ADB bug report.


Here's the last logged line from the program before the freeze:


Iteration 189 complete — attempted 9 devices
------------------------------------------------
=== Starting iteration 190 ===
Processing bus 011
  Device: /dev/bus/usb/011/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 010
  Device: /dev/bus/usb/010/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 009
  Device: /dev/bus/usb/009/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 008
  Device: /dev/bus/usb/008/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 007
  Device: /dev/bus/usb/007/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 006
  Device: /dev/bus/usb/006/001
    Opening device...
    Open failed: Invalid argument
Processing bus 005
  Device: /dev/bus/usb/005/001
    Opening device...
    Open failed: Invalid argument
Processing bus 004
  Device: /dev/bus/usb/004/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 003
  Device: /dev/bus/usb/003/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 002
  Device: /dev/bus/usb/002/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 001
  Device: /dev/bus/usb/001/004
    Opening device...
    Skipping non-root hub device
  Device: /dev/bus/usb/001/003
    Opening device...
    Skipping non-root hub device
  Device: /dev/bus/usb/001/002
    Opening device...
    Skipping non-root hub device
  Device: /dev/bus/usb/001/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd

Iteration 190 complete — attempted 9 devices
------------------------------------------------
=== Starting iteration 191 ===
Processing bus 011
  Device: /dev/bus/usb/011/001
    Opening device...
    Opened fd=6
    Issuing USBDEVFS_CONTROL ioctl (GET_DESCRIPTOR)...
    ioctl completed
    Closed fd
Processing bus 010
  Device: /dev/bus/usb/010/001
    Opening device...


The C program used that only tries root hubs, while the system is under load
with "stress-ng --cpu 0":

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/ioctl.h>
#include <linux/usbdevice_fs.h>
#include <string.h>
#include <limits.h>
#include <errno.h>
#include <stdarg.h>

static FILE *logfile = NULL;

void log_and_sync(const char *format, ...)
{
    va_list args;
    va_start(args, format);
    vprintf(format, args);
    va_end(args);
    fflush(stdout);

    if (logfile) {
        va_list args_copy;
        va_start(args, format);
        va_copy(args_copy, args);
        vfprintf(logfile, format, args_copy);
        va_end(args_copy);
        va_end(args);
        fflush(logfile);
        fsync(fileno(logfile));
    }
}

void enumerate_usb() {
    static int iteration = 0;
    int device_count = 0;

    log_and_sync("=== Starting iteration %d ===\n", iteration + 1);

    DIR *usb_dir = opendir("/dev/bus/usb");
    if (!usb_dir) {
        log_and_sync("Error: opendir /dev/bus/usb failed: %s\n",
strerror(errno));
        return;
    }

    struct dirent *bus_entry;
    while ((bus_entry = readdir(usb_dir)) != NULL) {
        if (bus_entry->d_type != DT_DIR || bus_entry->d_name[0] == '.')
continue;

        char bus_path[PATH_MAX];
        snprintf(bus_path, sizeof(bus_path), "/dev/bus/usb/%s",
bus_entry->d_name);

        log_and_sync("Processing bus %s\n", bus_entry->d_name);

        DIR *bus_dir = opendir(bus_path);
        if (!bus_dir) {
            log_and_sync("  Warning: opendir %s failed: %s\n", bus_path,
strerror(errno));
            continue;
        }

        struct dirent *dev_entry;
        while ((dev_entry = readdir(bus_dir)) != NULL) {
            if (dev_entry->d_name[0] == '.') continue;

            char dev_path[PATH_MAX];
            snprintf(dev_path, sizeof(dev_path), "%s/%s", bus_path,
dev_entry->d_name);

            log_and_sync("  Device: %s\n", dev_path);
            log_and_sync("    Opening device...\n");

                        // Skip root hubs (usually /001)
                        if (strcmp(dev_entry->d_name, "001") != 0) {
                        log_and_sync("    Skipping non-root hub device\n");
                        continue;
                        }

            int fd = open(dev_path, O_RDONLY);
            if (fd < 0) {
                log_and_sync("    Open failed: %s\n", strerror(errno));
                continue;
            }

            log_and_sync("    Opened fd=%d\n", fd);

            unsigned char desc[18];
            struct usbdevfs_ctrltransfer ctrl = {
                .bRequestType = 0x80,
                .bRequest     = 6,      // GET_DESCRIPTOR
                .wValue       = 1 << 8, // Device descriptor
                .wIndex       = 0,
                .wLength      = sizeof(desc),
                .data         = desc,
                .timeout      = 1000
            };

            log_and_sync("    Issuing USBDEVFS_CONTROL ioctl
(GET_DESCRIPTOR)...\n");

            ioctl(fd, USBDEVFS_CONTROL, &ctrl);  // Errors ignored, as in
original

            log_and_sync("    ioctl completed\n");

            close(fd);
            log_and_sync("    Closed fd\n");

            device_count++;
        }
        closedir(bus_dir);
    }
    closedir(usb_dir);

    iteration++;
    log_and_sync("\nIteration %d complete — attempted %d devices\n", iteration,
device_count);
    log_and_sync("------------------------------------------------\n");
}

int main() {
    logfile = fopen("usb_poll6_roothubs.log", "w");
    if (!logfile) {
        perror("Failed to open usb_poll.log for writing (will continue without
file logging)");
    }

    log_and_sync("USB usbfs polling test started (1-second interval)\n");
    log_and_sync("You should now see device paths from all buses.\n");
    log_and_sync("On affected systems, freeze typically occurs within
minutes.\n");
    log_and_sync("Detailed logging with fsync() to 'usb_poll.log' for
post-freeze analysis.\n");
    log_and_sync("If freeze occurs during ioctl, you should see 'Issuing ...'
but not 'ioctl completed' for that device.\n\n");

    while (1) {
        enumerate_usb();
        //usleep(1000000);  // 1 second
                usleep(500000); // 0.5s
    }

    if (logfile) fclose(logfile);
    return 0;
}

Is there anything else that you can suggest to get further diagnosis?

Thank you

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-20  8:31 ` bugzilla-daemon
@ 2026-02-20  9:17   ` Greg KH
  0 siblings, 0 replies; 28+ messages in thread
From: Greg KH @ 2026-02-20  9:17 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-usb

On Fri, Feb 20, 2026 at 08:31:37AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=221103
> 
> --- Comment #2 from Paul Alesius (paul@unnservice.com) ---
> Try this command: sudo ./usb_poll
> 
> It is definitely in USB.
> This program triggers the system freeze reliably.
> Many of us are affected, as referenced in the ADB bug.

Works just fine for me here, perhaps this is a bug in your USB device
(or hub) that can not handle USB config requests?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (2 preceding siblings ...)
  2026-02-20  9:16 ` bugzilla-daemon
@ 2026-02-20  9:17 ` bugzilla-daemon
  2026-02-20  9:24 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  9:17 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #4 from Greg Kroah-Hartman (greg@kroah.com) ---
On Fri, Feb 20, 2026 at 08:31:37AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=221103
> 
> --- Comment #2 from Paul Alesius (paul@unnservice.com) ---
> Try this command: sudo ./usb_poll
> 
> It is definitely in USB.
> This program triggers the system freeze reliably.
> Many of us are affected, as referenced in the ADB bug.

Works just fine for me here, perhaps this is a bug in your USB device
(or hub) that can not handle USB config requests?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (3 preceding siblings ...)
  2026-02-20  9:17 ` bugzilla-daemon
@ 2026-02-20  9:24 ` bugzilla-daemon
  2026-02-20  9:26 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  9:24 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #5 from Michał Pecio (michal.pecio@gmail.com) ---
Actually, it may still be a problem with xhci_hcd because I found that simply
opening the USBFS file causes some xhci_hcd debug logs.

But this ioctl() looks like meaningless noise, the way you call it it's always
supposed to fail, either with or without root.

Does your system still crash if you remove the ioctl() and simply open and
close the usbfs file?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (4 preceding siblings ...)
  2026-02-20  9:24 ` bugzilla-daemon
@ 2026-02-20  9:26 ` bugzilla-daemon
  2026-02-20  9:28 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  9:26 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #6 from Paul Alesius (paul@unnservice.com) ---
> Works just fine for me here, perhaps this is a bug in your USB device
(or hub) that can not handle USB config requests?

I don't know. It seems to be related to boards with X670 or X870 chipset and
Ryzen/Threadripper. From what I can gather from ADB report comments:

Asus Strix X670E-F Gaming BIOS 3402
Asus PRIME X670E-PRO WIFI
Asus ProArt X870-E (My board)
Dell Precision 7875 

But since it triggers so reliably on that specific root hub on my machine, it
made me suspect that it's in the kernel. The machine has been restarted many
times and it's still while opening that specific root hub, which I think are
created by the XHCI_hcd if I'm not mistaken.

> Does your system still crash if you remove the ioctl() and simply open and
> close the usbfs file?

I will try removing the ioctl now

Thank you

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (5 preceding siblings ...)
  2026-02-20  9:26 ` bugzilla-daemon
@ 2026-02-20  9:28 ` bugzilla-daemon
  2026-02-20  9:40 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  9:28 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #7 from Michał Pecio (michal.pecio@gmail.com) ---
(In reply to Paul Alesius from comment #3)
> I just tried with "dmesg -W" (follow mode so it prints messages directly)
> and it didn't print anything before the system freeze, and there's nothing
> in the logs, reflecting the experience of the others in the ADB bug report.
That's a waste of time, if the system locks up completely or panics then your
dmesg will never get a chance to run again and print anything. Neither will
Xorg or wayland.

This needs a serial cable or netconsole (never tried it myself but AFAIK people
use it for panics too) or DRM_PANIC or something like that.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (6 preceding siblings ...)
  2026-02-20  9:28 ` bugzilla-daemon
@ 2026-02-20  9:40 ` bugzilla-daemon
  2026-02-20 10:07 ` bugzilla-daemon
                   ` (16 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20  9:40 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #8 from Michał Pecio (michal.pecio@gmail.com) ---
Or perhaps simply this would crash it?

while sleep .1 ; do true </dev/bus/usb/010/001 ; done

Chances are that sleep isn't necessary. This seems equivalent to what you are
doing and it triggers some code in xhci_hub.c, but no crash so far on my AM4
system.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (7 preceding siblings ...)
  2026-02-20  9:40 ` bugzilla-daemon
@ 2026-02-20 10:07 ` bugzilla-daemon
  2026-02-20 10:17   ` Greg KH
  2026-02-20 10:17 ` bugzilla-daemon
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20 10:07 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #9 from Paul Alesius (paul@unnservice.com) ---
> Or perhaps simply this would crash it?
> while sleep .1 ; do true </dev/bus/usb/010/001 ; done

Yes it froze the system on iteration 3000

A C program without ioctl also froze it on iteration 887. Doing ioctl seems to
trigger it much faster at iteration ~200.

I ordered a USB-to-ethernet adapter in the meantime to try to get more
diagnostics through netconsole, instead of serial cable, will that work?

Or do you have enough information about what may be wrong so that I can cancel
the USB adapter order? Otherwise, it will arrive in 4-5 days. (USB-to-Ethernet
because I don't have a laptop with RJ-45 from my PC.)

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-20 10:07 ` bugzilla-daemon
@ 2026-02-20 10:17   ` Greg KH
  0 siblings, 0 replies; 28+ messages in thread
From: Greg KH @ 2026-02-20 10:17 UTC (permalink / raw)
  To: bugzilla-daemon; +Cc: linux-usb

On Fri, Feb 20, 2026 at 10:07:57AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=221103
> 
> --- Comment #9 from Paul Alesius (paul@unnservice.com) ---
> > Or perhaps simply this would crash it?
> > while sleep .1 ; do true </dev/bus/usb/010/001 ; done
> 
> Yes it froze the system on iteration 3000
> 
> A C program without ioctl also froze it on iteration 887. Doing ioctl seems to
> trigger it much faster at iteration ~200.
> 
> I ordered a USB-to-ethernet adapter in the meantime to try to get more
> diagnostics through netconsole, instead of serial cable, will that work?

If the USB stack is locking up, dumping USB data out when that happens
would obviously not be possible :)

There is a special USB debug cable that will work instead, search around
on the web for one if you are curious about using it.

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (8 preceding siblings ...)
  2026-02-20 10:07 ` bugzilla-daemon
@ 2026-02-20 10:17 ` bugzilla-daemon
  2026-02-20 10:21 ` bugzilla-daemon
                   ` (14 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20 10:17 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #10 from Greg Kroah-Hartman (greg@kroah.com) ---
On Fri, Feb 20, 2026 at 10:07:57AM +0000, bugzilla-daemon@kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=221103
> 
> --- Comment #9 from Paul Alesius (paul@unnservice.com) ---
> > Or perhaps simply this would crash it?
> > while sleep .1 ; do true </dev/bus/usb/010/001 ; done
> 
> Yes it froze the system on iteration 3000
> 
> A C program without ioctl also froze it on iteration 887. Doing ioctl seems
> to
> trigger it much faster at iteration ~200.
> 
> I ordered a USB-to-ethernet adapter in the meantime to try to get more
> diagnostics through netconsole, instead of serial cable, will that work?

If the USB stack is locking up, dumping USB data out when that happens
would obviously not be possible :)

There is a special USB debug cable that will work instead, search around
on the web for one if you are curious about using it.

good luck!

greg k-h

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (9 preceding siblings ...)
  2026-02-20 10:17 ` bugzilla-daemon
@ 2026-02-20 10:21 ` bugzilla-daemon
  2026-02-20 11:19 ` bugzilla-daemon
                   ` (13 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20 10:21 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #11 from Paul Alesius (paul@unnservice.com) ---
Oh worry I meant:

The machine where it locks up is a PC, it has ethernet in the motherboard that
locks up

My mobile devices like laptop, only have USB, so I thought I'd try with a cheap
ethernet dongle on them instead of getting a specific Asus serial header that
cost much more

Thank you

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (10 preceding siblings ...)
  2026-02-20 10:21 ` bugzilla-daemon
@ 2026-02-20 11:19 ` bugzilla-daemon
  2026-02-20 14:07 ` bugzilla-daemon
                   ` (12 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20 11:19 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #12 from Michał Pecio (michal.pecio@gmail.com) ---
Isn't the required serial cable identical as in 99% of motherboards? ASUS is
usually decent enough to show all details, including pinout, in the user
manual.

Maybe one thing worth trying is to enable dynamic debug and run an iteration or
a few to see what gets logged in dmesg, check if it changes or is always the
same and compare the "bad" root hub with good ones, particularly of the same
speed.

echo 'module xhci_hcd +p' >/proc/dynamic_debug/control
dmesg -W

The most interesting information may be missing if it actually crashes, but
maybe something will hint at any unusual patterns in your system.

One thing I see is that this code tries to check status of the ports, so your
crashes may (or may not) depend on some particular device being connected.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (11 preceding siblings ...)
  2026-02-20 11:19 ` bugzilla-daemon
@ 2026-02-20 14:07 ` bugzilla-daemon
  2026-02-20 17:18 ` bugzilla-daemon
                   ` (11 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20 14:07 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #13 from Paul Alesius (paul@unnservice.com) ---
(In reply to Michał Pecio from comment #12)
> Isn't the required serial cable identical as in 99% of motherboards? ASUS is
> usually decent enough to show all details, including pinout, in the user
> manual.
> 
> Maybe one thing worth trying is to enable dynamic debug and run an iteration
> or a few to see what gets logged in dmesg, check if it changes or is always
> the same and compare the "bad" root hub with good ones, particularly of the
> same speed.
> 
> echo 'module xhci_hcd +p' >/proc/dynamic_debug/control
> dmesg -W
> 
> The most interesting information may be missing if it actually crashes, but
> maybe something will hint at any unusual patterns in your system.
> 
> One thing I see is that this code tries to check status of the ports, so
> your crashes may (or may not) depend on some particular device being
> connected.

I just triggered the system freeze three times with dynamic debug and dmesg -W.
Each time, the last message shown in the console is:

"Setting command ring address to 0x10e427001"

Dmesg logs "Setting command ring address" twice, and all three freezes occured
after the first of the two log records of "Setting command address"

Demsg example below, it did not flush to disk so the "dmesg -W" to the terminal
has newer log records but this is what I see in the console:

------ dmesg -W

...
[ 1139.446048] xhci_hcd 0000:7a:00.4: xhci_suspend: stopping usb9 port polling.
[ 1139.446066] xhci_hcd 0000:7a:00.4: Setting command ring address to
0x10e427001
[ 1139.561526] xhci_hcd 0000:7a:00.4: Setting command ring address to
0x10e427001
[ 1139.562304] xhci_hcd 0000:7a:00.4: xhci_resume: starting usb9 port polling.
[ 1139.562306] xhci_hcd 0000:7a:00.4: xhci_hub_status_data: stopping usb10 port
polling
[ 1139.562308] xhci_hcd 0000:7a:00.4: xhci_hub_status_data: stopping usb9 port
polling
[ 1139.585922] xhci_hcd 0000:7a:00.4: Get port status 10-1 read: 0x2a0, return
0x2a0
[ 1139.585931] xhci_hcd 0000:7a:00.4: Get port status 10-2 read: 0x2a0, return
0x2a0
[ 1139.792600] xhci_hcd 0000:7a:00.4: set port remote wake mask, actual port
10-1 status  = 0xe0002a0
[ 1139.792619] xhci_hcd 0000:7a:00.4: set port remote wake mask, actual port
10-2 status  = 0xe0002a0
[ 1139.792662] xhci_hcd 0000:7a:00.4: xhci_hub_status_data: stopping usb10 port
polling
[ 1139.792667] xhci_hcd 0000:7a:00.4: config port 10-1 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1139.792683] xhci_hcd 0000:7a:00.4: config port 10-2 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1139.792705] xhci_hcd 0000:7a:00.4: config port 9-1 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1139.792726] xhci_hcd 0000:7a:00.4: config port 9-2 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1139.792727] xhci_hcd 0000:7a:00.4: xhci_suspend: stopping usb9 port polling.
[ 1139.792746] xhci_hcd 0000:7a:00.4: Setting command ring address to
0x10e427001
[ 1139.821295] xhci_hcd 0000:7a:00.4: Setting command ring address to
0x10e427001
[ 1139.822070] xhci_hcd 0000:7a:00.4: xhci_resume: starting usb9 port polling.
[ 1139.822072] xhci_hcd 0000:7a:00.4: xhci_hub_status_data: stopping usb10 port
polling
[ 1139.822074] xhci_hcd 0000:7a:00.4: xhci_hub_status_data: stopping usb9 port
polling
[ 1139.845924] xhci_hcd 0000:7a:00.4: Get port status 10-1 read: 0x2a0, return
0x2a0
[ 1139.845931] xhci_hcd 0000:7a:00.4: Get port status 10-2 read: 0x2a0, return
0x2a0
[ 1140.052603] xhci_hcd 0000:7a:00.4: set port remote wake mask, actual port
10-1 status  = 0xe0002a0
[ 1140.052621] xhci_hcd 0000:7a:00.4: set port remote wake mask, actual port
10-2 status  = 0xe0002a0
[ 1140.052664] xhci_hcd 0000:7a:00.4: xhci_hub_status_data: stopping usb10 port
polling
[ 1140.052670] xhci_hcd 0000:7a:00.4: config port 10-1 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1140.052685] xhci_hcd 0000:7a:00.4: config port 10-2 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1140.052706] xhci_hcd 0000:7a:00.4: config port 9-1 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1140.052727] xhci_hcd 0000:7a:00.4: config port 9-2 wake bits, portsc:
0xa0002a0, write: 0xa0202a0
[ 1140.052728] xhci_hcd 0000:7a:00.4: xhci_suspend: stopping usb9 port polling.
[ 1140.052747] xhci_hcd 0000:7a:00.4: Setting command ring address to
0x10e427001

-- On plugged-in USB devices

There are 3 devices plugged in to USB, two are internal:

noname@devbox ~ $ sudo lsusb
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 13d3:3588 IMC Networks Wireless_Device (INTERNAL M.2
WiFi & Bluetooth)
Bus 001 Device 003: ID 0b05:19af ASUSTek Computer, Inc. AURA LED Controller
(INTERNAL)
Bus 001 Device 004: ID 046d:c548 Logitech, Inc. Logi Bolt Receiver (I have this
one plugged in for mouse and keyboard)
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 008 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 009 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 010 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 011 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

I changed the port of the Logitech Bolt Receiver during these three tests with
system freeze, and plugged it into a USB 2.0 port, but the system still freezes
on that root hub.

---------

If there's anything else to try to get more diagnostics, I can do that too,
otherwise we'll have to wait until I receive the adapter for netconsole.

I am unable to remove the M.2 wifi card at present because I need to
disassemble the entire computer and take out the MB because it's behind
shielding covers that are not meant to be removed.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (12 preceding siblings ...)
  2026-02-20 14:07 ` bugzilla-daemon
@ 2026-02-20 17:18 ` bugzilla-daemon
  2026-02-21  1:12 ` bugzilla-daemon
                   ` (10 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-20 17:18 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #14 from Alan Stern (stern@rowland.harvard.edu) ---
Have you tried disabling runtime PM?  Run any of the test scenarios after
booting with "usbcore.autosuspend=-1" on the boot command line.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (13 preceding siblings ...)
  2026-02-20 17:18 ` bugzilla-daemon
@ 2026-02-21  1:12 ` bugzilla-daemon
  2026-02-23 13:05 ` bugzilla-daemon
                   ` (9 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-21  1:12 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #15 from Paul Alesius (paul@unnservice.com) ---
(In reply to Alan Stern from comment #14)
> Have you tried disabling runtime PM?  Run any of the test scenarios after
> booting with "usbcore.autosuspend=-1" on the boot command line.

Disabling runtime PM for USB has eliminated the freeze, tested with hammering
all devices too for 40 minutes.

---- Other logs:

With runtime PM enabled, I see this in dmesg on every boot:
[   17.918387] xhci_hcd 0000:78:00.0: WARN: xHC CMD_RUN timeout
[   17.918508] xhci_hcd 0000:78:00.0: PM: suspend_common(): xhci_pci_suspend
returns -110
[   17.918586] xhci_hcd 0000:78:00.0: can't suspend (hcd_pci_runtime_suspend
returned -110)

But 0000:78:00.0 is a different controller than the root hub that triggers the
freeze. Here's lsusb with controller information:

0000:0e:00.0 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
0000:0e:00.0 Bus 001 Device 002: ID 13d3:3588 IMC Networks Wireless_Device
(Internal)
0000:0e:00.0 Bus 001 Device 003: ID 0b05:19af ASUSTek Computer, Inc. AURA LED
Controller (Internal)
0000:0e:00.0 Bus 001 Device 004: ID 046d:c548 Logitech, Inc. Logi Bolt Receiver
(Plugged in)
0000:0e:00.0 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
0000:10:00.0 Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
0000:10:00.0 Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
0000:78:00.0 Bus 005 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
(xhci_pci_suspend -110)
0000:78:00.0 Bus 006 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
(xhci_pci_suspend -110)
0000:7a:00.3 Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
0000:7a:00.3 Bus 008 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
0000:7a:00.4 Bus 009 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
0000:7a:00.4 Bus 010 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
(Triggers freeze with runtime PM)
0000:7b:00.0 Bus 011 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (14 preceding siblings ...)
  2026-02-21  1:12 ` bugzilla-daemon
@ 2026-02-23 13:05 ` bugzilla-daemon
  2026-02-23 17:52 ` bugzilla-daemon
                   ` (8 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-23 13:05 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #16 from Michał Pecio (michal.pecio@gmail.com) ---
Would this fix it as well?

echo 'on' > /sys/bus/pci/devices/<pci-id>/power/control

It seems that this must be set to 'auto' to trigger those suspend-resume cycles
of the host controller, and it is quite possible that the problem is caused by
suspending the HC, not any USB device.

That being said, I toggled it to 'auto' on a few HCs currently plugged into my
test system (was 'on' for all of them) and your test program (with failing
ioctl() included) still runs just fine. It's at 3000 iterations now.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (15 preceding siblings ...)
  2026-02-23 13:05 ` bugzilla-daemon
@ 2026-02-23 17:52 ` bugzilla-daemon
  2026-02-23 22:33 ` bugzilla-daemon
                   ` (7 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-23 17:52 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #17 from Paul Alesius (paul@unnservice.com) ---
(In reply to Michał Pecio from comment #16)
> Would this fix it as well?
> 
> echo 'on' > /sys/bus/pci/devices/<pci-id>/power/control

I am now able to use netconsole to capture the logs during the freeze. This is
all that is logged when the bug occurs and the system freezes:

cat netconsole-2026-02-23-1819.log-output-during-bug
[  549.009867] xhci_hcd 0000:7a:00.4: Controller not ready at resume -19
[  549.009891] xhci_hcd 0000:7a:00.4: PCI post-resume error -19!
[  549.009894] xhci_hcd 0000:7a:00.4: HC died; cleaning up

And then I tried with control=on on the affected device
(/sys/bus/pci/devices/0000:7a:00.4/power/control), it seems to have eliminated
the bug and system freeze. Control defaults to auto on the affected device as
you said.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (16 preceding siblings ...)
  2026-02-23 17:52 ` bugzilla-daemon
@ 2026-02-23 22:33 ` bugzilla-daemon
  2026-02-24  7:45 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-23 22:33 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #18 from Michał Pecio (michal.pecio@gmail.com) ---
My test got to almost 30k iterations with suspend-resume cycles, no crashes. I
loaded the CPUs with simple "while (1);" on each core.

It seems the problem may be caused by a failure to resume the HC. The driver
should handle it, but something goes wrong. I hoped it would be a stupid null
pointer dereference and netconsole would give us stack trace, but no luck
apparently. I tried crashing my system with SysRq-C and netconsole (over a PCIe
Ethernet NIC) delivered the panic message just fine, so not sure what happens
here.

Could you enable dynamic debug and check if simply toggling power/control
between 'on' and 'auto' produces the same xhci_suspend/xhci_resume messages?
Would this be enough to hang the system, or is it necessary that somebody waits
for the HC to resume to perform some operations on the root hub? Maybe it's
just lack of necessary checks for dead HC somewhere.

What's the state of power/control for those HCs which aren't causing problems?
Are they also getting resumed and suspended under your test, but without
crashing? That would be at least one optimistic result in this whole mess :)

There is another bug 221073 about some AMD HCs dying on resume from system
sleep, may be related. So far nobody knows why it happens.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (17 preceding siblings ...)
  2026-02-23 22:33 ` bugzilla-daemon
@ 2026-02-24  7:45 ` bugzilla-daemon
  2026-02-24  8:52 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-24  7:45 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #19 from Paul Alesius (paul@unnservice.com) ---
(In reply to Michał Pecio from comment #18)
> Could you enable dynamic debug and check if simply toggling power/control
> between 'on' and 'auto' produces the same xhci_suspend/xhci_resume messages?
> Would this be enough to hang the system
Enabling dynamic debug and changing power/control on/auto rapidly produces the
same suspend/resume messages on all devices. Changing control= on and auto
rapidly on 0000:7a:00.4 does not trigger the freeze.

> What's the state of power/control for those HCs which aren't causing
> problems? Are they also getting resumed and suspended under your test, but
> without crashing? That would be at least one optimistic result in this whole
> mess :)
About half of them are on and auto, those with control=auto by default do not
trigger the freeze (except the known-bad 7a:00.4, and I've not stressed the
others as much until arriving at the conclusion that it's 7a:00.4 triggering
the freeze). Here's their default values and notes on them:

 control=on 0000:0e:00.0 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0
root hub
 control=on 0000:0e:00.0 Bus 001 Device 002: ID 13d3:3588 IMC Networks
Wireless_Device (Internal)
 control=on 0000:0e:00.0 Bus 001 Device 003: ID 0b05:19af ASUSTek Computer,
Inc. AURA LED Controller (Internal)
 control=on 0000:0e:00.0 Bus 001 Device 004: ID 046d:c548 Logitech, Inc. Logi
Bolt Receiver (Plugged in)
 control=on 0000:0e:00.0 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0
root hub
 control=on 0000:10:00.0 Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0
root hub
 control=on 0000:10:00.0 Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0
root hub
The 78:00.0 have xhci_pci_suspend -110 errors during boot:
[   17.918387] xhci_hcd 0000:78:00.0: WARN: xHC CMD_RUN timeout
[   17.918508] xhci_hcd 0000:78:00.0: PM: suspend_common(): xhci_pci_suspend
returns -110
[   17.918586] xhci_hcd 0000:78:00.0: can't suspend (hcd_pci_runtime_suspend
returned -110)
 control=auto 0000:78:00.0 Bus 005 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub
 control=auto 0000:78:00.0 Bus 006 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub
 control=auto 0000:7a:00.3 Bus 007 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub
 control=auto 0000:7a:00.3 Bus 008 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub
 control=auto 0000:7a:00.4 Bus 009 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub
This is the root hub that freeze during rapid polling, same PCI ID as the line
above that is unaffected:
 control=auto 0000:7a:00.4 Bus 010 Device 001: ID 1d6b:0003 Linux Foundation
3.0 root hub
 control=auto 0000:7b:00.0 Bus 011 Device 001: ID 1d6b:0002 Linux Foundation
2.0 root hub

I then enabled full dynamic debug + netconsole (printk=8):
 $ echo 'module xhci_hcd +p' | sudo tee /proc/dynamic_debug/control
 $ echo 'module usbcore +p' | sudo tee /proc/dynamic_debug/control
 $ echo 'module pci +p' | sudo tee /proc/dynamic_debug/control
 $ echo 8 | sudo tee /proc/sys/kernel/printk

Surprisingly, the system did not freeze for over 20 minutes with 3 instances
polling simultaneously and stress-ng --cpu 0. The moment I killed stress-ng
first by coincidence, the system froze immediately. Netconsole captured this up
until the lockup:
...
[ 1766.915244] xhci_hcd 0000:7a:00.4: PME# disabled
[ 1766.915262] xhci_hcd 0000:7a:00.4: enabling bus mastering
... (normal suspend/resume cycle) ...
[ 1767.170769] xhci_hcd 0000:7a:00.4: PME# disabled
[ 1767.170774] xhci_hcd 0000:7a:00.4: enabling bus mastering
[ 1767.181194] xhci_hcd 0000:7a:00.4: Controller not ready at resume -19
[ 1767.181209] xhci_hcd 0000:7a:00.4: PCI post-resume error -19!
[ 1767.181213] xhci_hcd 0000:7a:00.4: HC died; cleaning up
[ 1767.181222] xhci_hcd 0000:7a:00.4: hcd_pci_runtime_resume: -19
[ 1767.181232] hub 9-0:1.0: state 0 ports 2 chg 0000 evt 0000
[ 1767.181238] hub 10-0:1.0: state 0 ports 2 chg 0000 evt 0000

> There is another bug 221073 about some AMD HCs dying on resume from system
> sleep,
> may be related. So far nobody knows why it happens.
I don't know enough to say whether they are the same root cause, but both
involve an AMD xHC dying on resume, so they may be related.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (18 preceding siblings ...)
  2026-02-24  7:45 ` bugzilla-daemon
@ 2026-02-24  8:52 ` bugzilla-daemon
  2026-02-24 10:19 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-24  8:52 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #20 from Michał Pecio (michal.pecio@gmail.com) ---
Have you ever seen this "Controller not ready / post-resume error / HC died"
sequence without the machine hanging? It would probably be hard not to notice,
because the whole bus with all devices becomes unavailable.

This is coming from xhci_resume() calling xhci_handshake() and getting -ENODEV,
which means that the hardware malfunctions (or is misconfigured by PCI layer?)
and returns 0xffffffff on register read.

I patched this code to pretend that it happened and I got the same log
messages, but still no crash or freeze.

Maybe I need to try it a few more times, but it seems that real users are
seeing a hang every time the HC dies while opening /dev/bus/usb/ files, because
otherwise we would get a separate bug about "HC died" and loss of all devices
when running ADB. Or maybe not, because the bus being suspended means that
nobody is actively using its devices, so nobody would notice if they
disappear...

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (19 preceding siblings ...)
  2026-02-24  8:52 ` bugzilla-daemon
@ 2026-02-24 10:19 ` bugzilla-daemon
  2026-02-24 12:03 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-24 10:19 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #21 from Paul Alesius (paul@unnservice.com) ---
(In reply to Michał Pecio from comment #20)
No, I have never seen "Controller not ready at resume -19", "PCI post-resume
error -19!", or "HC died; cleaning up" in my weeks of logs without the entire
system freezing, after days of hammering the USB devices to diagnose the
freeze.

I ran a test holding one usbfs fd open on /dev/bus/usb/010/001 (`exec 3<
/dev/bus/usb/010/001`) while rapidly cycling power/control between 'auto' and
'on' on 0000:7a:00.4. After the initial resume triggered by opening the fd,
there were no further suspend/resume messages at all in netconsole (the device
stayed powered on and the system remained stable with no further messages over
netconsole during hammering of control=on/auto with a 0.2s delay, with full
dynamic debug enabled xhci_hcd +p, usbcore +p, pci +p, printk=8)

The "HC died" messages have only ever shown up in netconsole right at the
moment of a full system lockup, and only during tests that *repeatedly open and
close* the usbfs root-hub file, and during the HC-dead cleanup path.

Let me know if you want me to test a patch or run any other specific test.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (20 preceding siblings ...)
  2026-02-24 10:19 ` bugzilla-daemon
@ 2026-02-24 12:03 ` bugzilla-daemon
  2026-02-24 12:21 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-24 12:03 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

Mathias Nyman (mathias.nyman@linux.intel.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mathias.nyman@linux.intel.c
                   |                            |om

--- Comment #22 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
As Michal stated above this is due to reading 0xffffffff from the xHC mmio
USBSTS 'status' register in xhci_resume(). This would happen if a PCI xHC
controller isn't properly powered up yet.

I'll add a hack that sleeps for a little while and then retries. Lets see if it
helps.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (21 preceding siblings ...)
  2026-02-24 12:03 ` bugzilla-daemon
@ 2026-02-24 12:21 ` bugzilla-daemon
  2026-02-24 15:42 ` bugzilla-daemon
  2026-03-08 17:56 ` bugzilla-daemon
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-24 12:21 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #23 from Mathias Nyman (mathias.nyman@linux.intel.com) ---
Created attachment 309444
  --> https://bugzilla.kernel.org/attachment.cgi?id=309444&action=edit
testpatch that retries xhci resume

Testpatch that retries xhci resume 50ms later if xHC status register read
returns 0xffffffff

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (22 preceding siblings ...)
  2026-02-24 12:21 ` bugzilla-daemon
@ 2026-02-24 15:42 ` bugzilla-daemon
  2026-03-08 17:56 ` bugzilla-daemon
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-02-24 15:42 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

--- Comment #24 from Alan Stern (stern@rowland.harvard.edu) ---
In view of comment #17 plus the fact that the entire system freezes, this looks
more likely to be a PCI issue rather than a USB problem.  A broken USB host
controller isn't the sort of thing you expect to take down the whole computer,
but a broken PCI bus or controller could easily do so.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms
  2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
                   ` (23 preceding siblings ...)
  2026-02-24 15:42 ` bugzilla-daemon
@ 2026-03-08 17:56 ` bugzilla-daemon
  24 siblings, 0 replies; 28+ messages in thread
From: bugzilla-daemon @ 2026-03-08 17:56 UTC (permalink / raw)
  To: linux-usb

https://bugzilla.kernel.org/show_bug.cgi?id=221103

S. Davar (gtschemer@gmail.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |gtschemer@gmail.com

--- Comment #25 from S. Davar (gtschemer@gmail.com) ---
I have been supporting users on an atomic distribution, so no one has tried the
test patch.

However, 4+ users who had this freeze behavior on ASUS motherboards with Ryzen
5 / 7 / 9 CPUs seem to have 100% success manually setting all the xHCI
controllers to "on" instead of "auto".

All of those users froze very quickly with the usb_poll test, and never froze
after changing the xHCI power control behavior.  Most have a B650M or similar
motherboard.

Was anyone with the freeze able to try the test patch?  Also, are there any
concerns that a small delay might still be unreliable, e.g. long enough in some
cases but not long enough in others?  Could it potentially cause a performance
hit in some cases, or is it expected to be called rarely enough that it doesn't
matter?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2026-03-08 17:56 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-18 14:52 [Bug 221103] New: xhci_hcd: System lockup under CPU load during rapid usbfs polling of SuperSpeed root hubs on AMD Ryzen platforms bugzilla-daemon
2026-02-20  7:30 ` [Bug 221103] xhci_hcd: System lockup under CPU load during usbfs polling of USB devices on AMD platforms bugzilla-daemon
2026-02-20  8:31 ` bugzilla-daemon
2026-02-20  9:17   ` Greg KH
2026-02-20  9:16 ` bugzilla-daemon
2026-02-20  9:17 ` bugzilla-daemon
2026-02-20  9:24 ` bugzilla-daemon
2026-02-20  9:26 ` bugzilla-daemon
2026-02-20  9:28 ` bugzilla-daemon
2026-02-20  9:40 ` bugzilla-daemon
2026-02-20 10:07 ` bugzilla-daemon
2026-02-20 10:17   ` Greg KH
2026-02-20 10:17 ` bugzilla-daemon
2026-02-20 10:21 ` bugzilla-daemon
2026-02-20 11:19 ` bugzilla-daemon
2026-02-20 14:07 ` bugzilla-daemon
2026-02-20 17:18 ` bugzilla-daemon
2026-02-21  1:12 ` bugzilla-daemon
2026-02-23 13:05 ` bugzilla-daemon
2026-02-23 17:52 ` bugzilla-daemon
2026-02-23 22:33 ` bugzilla-daemon
2026-02-24  7:45 ` bugzilla-daemon
2026-02-24  8:52 ` bugzilla-daemon
2026-02-24 10:19 ` bugzilla-daemon
2026-02-24 12:03 ` bugzilla-daemon
2026-02-24 12:21 ` bugzilla-daemon
2026-02-24 15:42 ` bugzilla-daemon
2026-03-08 17:56 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox