xhci_hcd: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] dies on resume from suspend

public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed

* xhci_hcd: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] dies on resume from suspend
@ 2026-03-29 21:52 martinalderson
  2026-03-30  0:07 ` Michal Pecio
  0 siblings, 1 reply; 4+ messages in thread
From: martinalderson @ 2026-03-29 21:52 UTC (permalink / raw)
  To: linux-usb

[BUG] xhci_hcd 0000:0f:00.0: controller declared dead on resume from suspend

Hardware:
  CPU: AMD Ryzen 9 7900 12-Core Processor
  Board: ASUS PRIME B650-PLUS
  Controller: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8]
  Subsystem: ASUSTeK Computer Inc. [1043:8877]
  PCI: 0000:0f:00.0 (IOMMU group 30)

Software:
  Kernel: 7.0.0-rc5 (commit be762d8b, built 2026-03-28)
  Distro: Fedora 43 (Workstation)
  Desktop: GNOME on Wayland

Description:
  On the first suspend/resume cycle after boot, the xHCI controller at
  0000:0f:00.0 (AMD Raphael/Granite Ridge USB 2.0) fails to resume and
  is declared dead. A Logitech Unifying Receiver (046d:c52b) on this
  controller is disconnected and the mouse (Logitech M720 Triathlon)
  stops functioning.

  A second xHCI controller on the same system (0000:0c:00.0, AMD 600
  Series Chipset USB 3.2 [1022:43f7]) also errors on resume (USBSTS
  0x401) but successfully recovers via reinit. The 0f:00.0 controller
  does not recover.

  Regression from rc4: suspend/resume worked correctly on 7.0-rc4 and
  earlier kernels on the same hardware.

Reproduce:
  1. Boot with USB device attached to a port on the 0000:0f:00.0 controller
  2. Suspend (systemd suspend)
  3. Resume

dmesg on resume:
  xhci_hcd 0000:0f:00.0: xHCI host not responding to stop endpoint command
  xhci_hcd 0000:0f:00.0: xHCI host controller not responding, assume dead
  xhci_hcd 0000:0f:00.0: HC died; cleaning up
  xhci_hcd 0000:0c:00.0: xHC error in resume, USBSTS 0x401, Reinit
  usb usb1: root hub lost power or was reset
  usb usb2: root hub lost power or was reset
  usb 1-7: WARN: invalid context state for evaluate context command.
  usb 1-10: WARN: invalid context state for evaluate context command.
  usb 7-1: USB disconnect, device number 2

Workaround:
  PCI remove + rescan recovers the controller:
    echo 1 > /sys/bus/pci/devices/0000:0f:00.0/remove
    echo 1 > /sys/bus/pci/rescan

  A simple PCI device reset (echo 1 > .../reset) was insufficient -- the
  controller came back but did not re-enumerate the attached device.

Notes:
  - The 0f:00.0 controller is USB 2.0 only (USB3 root hub has no ports)
  - hci version 0x120, hcc params 0x0110ffc5, quirks 0x0000000200000010

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xhci_hcd: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] dies on resume from suspend
  2026-03-29 21:52 xhci_hcd: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] dies on resume from suspend martinalderson
@ 2026-03-30  0:07 ` Michal Pecio
  2026-04-04 12:04   ` Martin Alderson
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Pecio @ 2026-03-30  0:07 UTC (permalink / raw)
  To: martinalderson; +Cc: linux-usb

On Sun, 29 Mar 2026 17:52:39 -0400, martinalderson@gmail.com wrote:
> [BUG] xhci_hcd 0000:0f:00.0: controller declared dead on resume from
> suspend
> 
> Hardware:
>   CPU: AMD Ryzen 9 7900 12-Core Processor
>   Board: ASUS PRIME B650-PLUS
>   Controller: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8]
>   Subsystem: ASUSTeK Computer Inc. [1043:8877]
>   PCI: 0000:0f:00.0 (IOMMU group 30)
> 
> Software:
>   Kernel: 7.0.0-rc5 (commit be762d8b, built 2026-03-28)
>   Distro: Fedora 43 (Workstation)
>   Desktop: GNOME on Wayland
> 
> Description:
>   On the first suspend/resume cycle after boot, the xHCI controller at
>   0000:0f:00.0 (AMD Raphael/Granite Ridge USB 2.0) fails to resume and
>   is declared dead. A Logitech Unifying Receiver (046d:c52b) on this
>   controller is disconnected and the mouse (Logitech M720 Triathlon)
>   stops functioning.
> 
>   A second xHCI controller on the same system (0000:0c:00.0, AMD 600
>   Series Chipset USB 3.2 [1022:43f7]) also errors on resume (USBSTS
>   0x401) but successfully recovers via reinit. The 0f:00.0 controller
>   does not recover.
> 
>   Regression from rc4: suspend/resume worked correctly on 7.0-rc4 and
>   earlier kernels on the same hardware.

That's interesting because there were no USB subsystem changes
between 7.0-rc4 and 7.0-rc5.

Any chance you could git-bisect this?
Are both kernels built with the same .config?

> Reproduce:
>   1. Boot with USB device attached to a port on the 0000:0f:00.0
>      controller
>   2. Suspend (systemd suspend)
>   3. Resume

By the way, are you using this affected controller to resume
(with a keyboard or something like that)?
 
> dmesg on resume:
>   xhci_hcd 0000:0f:00.0: xHCI host not responding to stop endpoint command
>   xhci_hcd 0000:0f:00.0: xHCI host controller not responding, assume dead
>   xhci_hcd 0000:0f:00.0: HC died; cleaning up
>   xhci_hcd 0000:0c:00.0: xHC error in resume, USBSTS 0x401, Reinit
>   usb usb1: root hub lost power or was reset
>   usb usb2: root hub lost power or was reset
>   usb 1-7: WARN: invalid context state for evaluate context command.
>   usb 1-10: WARN: invalid context state for evaluate context command.
>   usb 7-1: USB disconnect, device number 2
> 
> Workaround:
>   PCI remove + rescan recovers the controller:
>     echo 1 > /sys/bus/pci/devices/0000:0f:00.0/remove
>     echo 1 > /sys/bus/pci/rescan
> 
>   A simple PCI device reset (echo 1 > .../reset) was insufficient -- the
>   controller came back but did not re-enumerate the attached device.

What about the unbind/bind procedure described here?
https://bugzilla.kernel.org/show_bug.cgi?id=221073

> Notes:
>   - The 0f:00.0 controller is USB 2.0 only (USB3 root hub has no ports)
>   - hci version 0x120, hcc params 0x0110ffc5, quirks 0x0000000200000010

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xhci_hcd: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] dies on resume from suspend
  2026-03-30  0:07 ` Michal Pecio
@ 2026-04-04 12:04   ` Martin Alderson
  2026-04-04 13:24     ` Michal Pecio
  0 siblings, 1 reply; 4+ messages in thread
From: Martin Alderson @ 2026-04-04 12:04 UTC (permalink / raw)
  To: Michal Pecio; +Cc: linux-usb

Hi,

Just for clarity this never happened to me with the 6.19 kernel I was
on before (suspend/resumed many times on that kernel with no issues).
It's happened twice now (once with rc5, now with rc6) in a short space
of time. It may just be random luck though than a specific regression
- sorry if I confused things there.

Not sure I'm able to do a bisect because it's very intermittent so
would take an age to reproduce it sorry.

Previously I was on the Fedora 43 default kernel series, now I
switched to the COPR for 7.x (to try and fix something else).

Thanks for the bugzilla, I'll look at some of those workarounds.


On Mon, Mar 30, 2026 at 1:07 AM Michal Pecio <michal.pecio@gmail.com> wrote:
>
> On Sun, 29 Mar 2026 17:52:39 -0400, martinalderson@gmail.com wrote:
> > [BUG] xhci_hcd 0000:0f:00.0: controller declared dead on resume from
> > suspend
> >
> > Hardware:
> >   CPU: AMD Ryzen 9 7900 12-Core Processor
> >   Board: ASUS PRIME B650-PLUS
> >   Controller: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8]
> >   Subsystem: ASUSTeK Computer Inc. [1043:8877]
> >   PCI: 0000:0f:00.0 (IOMMU group 30)
> >
> > Software:
> >   Kernel: 7.0.0-rc5 (commit be762d8b, built 2026-03-28)
> >   Distro: Fedora 43 (Workstation)
> >   Desktop: GNOME on Wayland
> >
> > Description:
> >   On the first suspend/resume cycle after boot, the xHCI controller at
> >   0000:0f:00.0 (AMD Raphael/Granite Ridge USB 2.0) fails to resume and
> >   is declared dead. A Logitech Unifying Receiver (046d:c52b) on this
> >   controller is disconnected and the mouse (Logitech M720 Triathlon)
> >   stops functioning.
> >
> >   A second xHCI controller on the same system (0000:0c:00.0, AMD 600
> >   Series Chipset USB 3.2 [1022:43f7]) also errors on resume (USBSTS
> >   0x401) but successfully recovers via reinit. The 0f:00.0 controller
> >   does not recover.
> >
> >   Regression from rc4: suspend/resume worked correctly on 7.0-rc4 and
> >   earlier kernels on the same hardware.
>
> That's interesting because there were no USB subsystem changes
> between 7.0-rc4 and 7.0-rc5.
>
> Any chance you could git-bisect this?
> Are both kernels built with the same .config?
>
> > Reproduce:
> >   1. Boot with USB device attached to a port on the 0000:0f:00.0
> >      controller
> >   2. Suspend (systemd suspend)
> >   3. Resume
>
> By the way, are you using this affected controller to resume
> (with a keyboard or something like that)?
>
> > dmesg on resume:
> >   xhci_hcd 0000:0f:00.0: xHCI host not responding to stop endpoint command
> >   xhci_hcd 0000:0f:00.0: xHCI host controller not responding, assume dead
> >   xhci_hcd 0000:0f:00.0: HC died; cleaning up
> >   xhci_hcd 0000:0c:00.0: xHC error in resume, USBSTS 0x401, Reinit
> >   usb usb1: root hub lost power or was reset
> >   usb usb2: root hub lost power or was reset
> >   usb 1-7: WARN: invalid context state for evaluate context command.
> >   usb 1-10: WARN: invalid context state for evaluate context command.
> >   usb 7-1: USB disconnect, device number 2
> >
> > Workaround:
> >   PCI remove + rescan recovers the controller:
> >     echo 1 > /sys/bus/pci/devices/0000:0f:00.0/remove
> >     echo 1 > /sys/bus/pci/rescan
> >
> >   A simple PCI device reset (echo 1 > .../reset) was insufficient -- the
> >   controller came back but did not re-enumerate the attached device.
>
> What about the unbind/bind procedure described here?
> https://bugzilla.kernel.org/show_bug.cgi?id=221073
>
> > Notes:
> >   - The 0f:00.0 controller is USB 2.0 only (USB3 root hub has no ports)
> >   - hci version 0x120, hcc params 0x0110ffc5, quirks 0x0000000200000010

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: xhci_hcd: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] dies on resume from suspend
  2026-04-04 12:04   ` Martin Alderson
@ 2026-04-04 13:24     ` Michal Pecio
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Pecio @ 2026-04-04 13:24 UTC (permalink / raw)
  To: Martin Alderson; +Cc: linux-usb

On Sat, 4 Apr 2026 13:04:02 +0100, Martin Alderson wrote:
> Just for clarity this never happened to me with the 6.19 kernel I was
> on before (suspend/resumed many times on that kernel with no issues).
> It's happened twice now (once with rc5, now with rc6) in a short space
> of time.

So apparently about once per week. That's not very easy to debug.
One trick I have seen people use to accelerate such tests is running
"rtcwake -s 5 -m freeze" in a loop. This puts the system in s2idle and
resumes automatically after 5 seconds.

Do you have more complete dmesg from those failures with timestamps?
From suspend up to until everything has calmed down after resume, or
also including whatever you have done later to restore operation.

> Previously I was on the Fedora 43 default kernel series, now I
> switched to the COPR for 7.x (to try and fix something else).

Not sure what COPR is, but I gather it went like this:
1. Fedora 6.19 kernel was OK for a long time
2. Some other kernel, possibly other config, 7.0-rc4 still worked, but
   only used for a short time. What about 7.0-rc1 to -rc3? 
3. After updating to -rc5 it's definitely broken.

> Thanks for the bugzilla, I'll look at some of those workarounds.

Particularly, collecting dynamic debug and debugfs could tell if it's
the same problem with missing IRQ after resume or something else.

Regards,
Michal

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-04 13:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-29 21:52 xhci_hcd: AMD Raphael/Granite Ridge USB 2.0 xHCI [1022:15b8] dies on resume from suspend martinalderson
2026-03-30  0:07 ` Michal Pecio
2026-04-04 12:04   ` Martin Alderson
2026-04-04 13:24     ` Michal Pecio

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox