* USB: workqueues stuck in 'D' state?
@ 2024-12-06 23:07 Linus Torvalds
2024-12-07 2:02 ` Alan Stern
2024-12-08 15:51 ` Greg Kroah-Hartman
0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2024-12-06 23:07 UTC (permalink / raw)
To: Mathias Nyman, Greg Kroah-Hartman; +Cc: USB subsystem
So I'm not sure if this is new or not, but I *think* I would have
noticed it earlier.
On my Ampere Altra (128-core arm64 system), I started seeing 'top'
claiming a load average of roughly 2.3 even when idle, and it seems to
be all due to this:
$ ps ax | grep ' [DR] '
869 ? D 0:00 [kworker/24:1+usb_hub_wq]
1900 ? D 0:00 [kworker/24:7+pm]
where sometimes there are multiple of those 'pm' workers.
Doing a sysrq-w, I get this:
task:kworker/24:3 state:D stack:0 pid:1308 tgid:1308 ppid:2
flags:0x00000008
Workqueue: pm pm_runtime_work
Call trace:
__switch_to+0xf4/0x168 (T)
__schedule+0x248/0x648
schedule+0x3c/0xe0
usleep_range_state+0x118/0x150
xhci_hub_control+0xe80/0x1090
rh_call_control+0x274/0x7a0
usb_hcd_submit_urb+0x13c/0x3a0
usb_submit_urb+0x1c8/0x600
usb_start_wait_urb+0x7c/0x180
usb_control_msg+0xcc/0x150
usb_port_suspend+0x414/0x510
usb_generic_driver_suspend+0x68/0x90
usb_suspend_both+0x1c8/0x290
usb_runtime_suspend+0x3c/0xb0
__rpm_callback+0x50/0x1f0
rpm_callback+0x70/0x88
rpm_suspend+0xe8/0x5a8
__pm_runtime_suspend+0x4c/0x130
usb_runtime_idle+0x48/0x68
rpm_idle+0xa4/0x358
pm_runtime_work+0xb0/0xe0
task:kworker/24:7 state:D stack:0 pid:1900 tgid:1900 ppid:2
flags:0x00000208
Workqueue: pm pm_runtime_work
Call trace:
__switch_to+0xf4/0x168 (T)
__schedule+0x248/0x648
schedule+0x3c/0xe0
usleep_range_state+0x118/0x150
xhci_hub_control+0xe80/0x1090
rh_call_control+0x274/0x7a0
usb_hcd_submit_urb+0x13c/0x3a0
usb_submit_urb+0x1c8/0x600
usb_start_wait_urb+0x7c/0x180
usb_control_msg+0xcc/0x150
usb_port_suspend+0x414/0x510
usb_generic_driver_suspend+0x68/0x90
usb_suspend_both+0x1c8/0x290
usb_runtime_suspend+0x3c/0xb0
__rpm_callback+0x50/0x1f0
rpm_callback+0x70/0x88
rpm_suspend+0xe8/0x5a8
__pm_runtime_suspend+0x4c/0x130
so it seems to be all in that xhci_hub_control() path. I'm not seeing
anything that has changed in the xhci driver in this merge window, so
maybe this goes back further, and I just haven't noticed this odd load
average issue before.
The call trace for the usb_hub_wq seems a lot less stable, but I've
seen backtraces like
task:kworker/24:1 state:D stack:0 pid:869 tgid:869 ppid:2
flags:0x00000008
Workqueue: usb_hub_wq hub_event
Call trace:
__switch_to+0xf4/0x168 (T)
__schedule+0x248/0x648
schedule+0x3c/0xe0
schedule_preempt_disabled+0x2c/0x50
__mutex_lock.constprop.0+0x478/0x968
__mutex_lock_slowpath+0x1c/0x38
mutex_lock+0x6c/0x88
hub_event+0x144/0x4a0
process_one_work+0x170/0x408
worker_thread+0x2cc/0x400
kthread+0xf4/0x108
ret_from_fork+0x10/0x20
But also just
Workqueue: usb_hub_wq hub_event
Call trace:
__switch_to+0xf4/0x168 (T)
usb_control_msg+0xcc/0x150
or
Workqueue: usb_hub_wq hub_event
Call trace:
__switch_to+0xf4/0x168 (T)
__schedule+0x248/0x648
schedule+0x3c/0xe0
schedule_timeout+0x94/0x120
msleep+0x30/0x50
so at a guess it's just some interaction with that 'pm' workqueue.
I did a reboot just to verify that yes, it happened again after a
fresh boot. So it is at least *somewhat* consistently repeatable,
although I wouldn't be surprised if it's some kind of timing-dependent
race condition that just happens to trigger on this machine.
I could try to see if it's so consistent that I could bisect it, but
before I start doing that, maybe just the backtraces makes somebody go
"Ahh, that smells like XYZ"..
Hmm?
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: USB: workqueues stuck in 'D' state?
2024-12-06 23:07 USB: workqueues stuck in 'D' state? Linus Torvalds
@ 2024-12-07 2:02 ` Alan Stern
2024-12-08 15:51 ` Greg Kroah-Hartman
1 sibling, 0 replies; 6+ messages in thread
From: Alan Stern @ 2024-12-07 2:02 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mathias Nyman, Greg Kroah-Hartman, USB subsystem
On Fri, Dec 06, 2024 at 03:07:10PM -0800, Linus Torvalds wrote:
> So I'm not sure if this is new or not, but I *think* I would have
> noticed it earlier.
>
> On my Ampere Altra (128-core arm64 system), I started seeing 'top'
> claiming a load average of roughly 2.3 even when idle, and it seems to
> be all due to this:
>
> $ ps ax | grep ' [DR] '
> 869 ? D 0:00 [kworker/24:1+usb_hub_wq]
> 1900 ? D 0:00 [kworker/24:7+pm]
>
> where sometimes there are multiple of those 'pm' workers.
>
> Doing a sysrq-w, I get this:
>
> task:kworker/24:3 state:D stack:0 pid:1308 tgid:1308 ppid:2
> flags:0x00000008
> Workqueue: pm pm_runtime_work
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> usleep_range_state+0x118/0x150
> xhci_hub_control+0xe80/0x1090
> rh_call_control+0x274/0x7a0
> usb_hcd_submit_urb+0x13c/0x3a0
> usb_submit_urb+0x1c8/0x600
> usb_start_wait_urb+0x7c/0x180
> usb_control_msg+0xcc/0x150
> usb_port_suspend+0x414/0x510
> usb_generic_driver_suspend+0x68/0x90
> usb_suspend_both+0x1c8/0x290
> usb_runtime_suspend+0x3c/0xb0
> __rpm_callback+0x50/0x1f0
> rpm_callback+0x70/0x88
> rpm_suspend+0xe8/0x5a8
> __pm_runtime_suspend+0x4c/0x130
> usb_runtime_idle+0x48/0x68
> rpm_idle+0xa4/0x358
> pm_runtime_work+0xb0/0xe0
>
> task:kworker/24:7 state:D stack:0 pid:1900 tgid:1900 ppid:2
> flags:0x00000208
> Workqueue: pm pm_runtime_work
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> usleep_range_state+0x118/0x150
> xhci_hub_control+0xe80/0x1090
> rh_call_control+0x274/0x7a0
> usb_hcd_submit_urb+0x13c/0x3a0
> usb_submit_urb+0x1c8/0x600
> usb_start_wait_urb+0x7c/0x180
> usb_control_msg+0xcc/0x150
> usb_port_suspend+0x414/0x510
> usb_generic_driver_suspend+0x68/0x90
> usb_suspend_both+0x1c8/0x290
> usb_runtime_suspend+0x3c/0xb0
> __rpm_callback+0x50/0x1f0
> rpm_callback+0x70/0x88
> rpm_suspend+0xe8/0x5a8
> __pm_runtime_suspend+0x4c/0x130
>
> so it seems to be all in that xhci_hub_control() path. I'm not seeing
> anything that has changed in the xhci driver in this merge window, so
> maybe this goes back further, and I just haven't noticed this odd load
> average issue before.
>
> The call trace for the usb_hub_wq seems a lot less stable, but I've
> seen backtraces like
>
> task:kworker/24:1 state:D stack:0 pid:869 tgid:869 ppid:2
> flags:0x00000008
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> schedule_preempt_disabled+0x2c/0x50
> __mutex_lock.constprop.0+0x478/0x968
> __mutex_lock_slowpath+0x1c/0x38
> mutex_lock+0x6c/0x88
> hub_event+0x144/0x4a0
> process_one_work+0x170/0x408
> worker_thread+0x2cc/0x400
> kthread+0xf4/0x108
> ret_from_fork+0x10/0x20
>
> But also just
>
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> usb_control_msg+0xcc/0x150
>
> or
>
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> schedule_timeout+0x94/0x120
> msleep+0x30/0x50
>
> so at a guess it's just some interaction with that 'pm' workqueue.
>
> I did a reboot just to verify that yes, it happened again after a
> fresh boot. So it is at least *somewhat* consistently repeatable,
> although I wouldn't be surprised if it's some kind of timing-dependent
> race condition that just happens to trigger on this machine.
>
> I could try to see if it's so consistent that I could bisect it, but
> before I start doing that, maybe just the backtraces makes somebody go
> "Ahh, that smells like XYZ"..
>
> Hmm?
In the past (not recently) I have seen the system get into a loop,
repeatedly suspending and resuming some device over and over.
What do you get if you enable dynamic debugging for usbcore?
Alan Stern
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: USB: workqueues stuck in 'D' state?
2024-12-06 23:07 USB: workqueues stuck in 'D' state? Linus Torvalds
2024-12-07 2:02 ` Alan Stern
@ 2024-12-08 15:51 ` Greg Kroah-Hartman
2024-12-08 19:14 ` Linus Torvalds
1 sibling, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2024-12-08 15:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Mathias Nyman, USB subsystem
On Fri, Dec 06, 2024 at 03:07:10PM -0800, Linus Torvalds wrote:
> So I'm not sure if this is new or not, but I *think* I would have
> noticed it earlier.
>
> On my Ampere Altra (128-core arm64 system), I started seeing 'top'
> claiming a load average of roughly 2.3 even when idle, and it seems to
> be all due to this:
>
> $ ps ax | grep ' [DR] '
> 869 ? D 0:00 [kworker/24:1+usb_hub_wq]
> 1900 ? D 0:00 [kworker/24:7+pm]
>
> where sometimes there are multiple of those 'pm' workers.
>
> Doing a sysrq-w, I get this:
>
> task:kworker/24:3 state:D stack:0 pid:1308 tgid:1308 ppid:2
> flags:0x00000008
> Workqueue: pm pm_runtime_work
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> usleep_range_state+0x118/0x150
> xhci_hub_control+0xe80/0x1090
> rh_call_control+0x274/0x7a0
> usb_hcd_submit_urb+0x13c/0x3a0
> usb_submit_urb+0x1c8/0x600
> usb_start_wait_urb+0x7c/0x180
> usb_control_msg+0xcc/0x150
> usb_port_suspend+0x414/0x510
> usb_generic_driver_suspend+0x68/0x90
> usb_suspend_both+0x1c8/0x290
> usb_runtime_suspend+0x3c/0xb0
> __rpm_callback+0x50/0x1f0
> rpm_callback+0x70/0x88
> rpm_suspend+0xe8/0x5a8
> __pm_runtime_suspend+0x4c/0x130
> usb_runtime_idle+0x48/0x68
> rpm_idle+0xa4/0x358
> pm_runtime_work+0xb0/0xe0
>
> task:kworker/24:7 state:D stack:0 pid:1900 tgid:1900 ppid:2
> flags:0x00000208
> Workqueue: pm pm_runtime_work
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> usleep_range_state+0x118/0x150
> xhci_hub_control+0xe80/0x1090
> rh_call_control+0x274/0x7a0
> usb_hcd_submit_urb+0x13c/0x3a0
> usb_submit_urb+0x1c8/0x600
> usb_start_wait_urb+0x7c/0x180
> usb_control_msg+0xcc/0x150
> usb_port_suspend+0x414/0x510
> usb_generic_driver_suspend+0x68/0x90
> usb_suspend_both+0x1c8/0x290
> usb_runtime_suspend+0x3c/0xb0
> __rpm_callback+0x50/0x1f0
> rpm_callback+0x70/0x88
> rpm_suspend+0xe8/0x5a8
> __pm_runtime_suspend+0x4c/0x130
>
> so it seems to be all in that xhci_hub_control() path. I'm not seeing
> anything that has changed in the xhci driver in this merge window, so
> maybe this goes back further, and I just haven't noticed this odd load
> average issue before.
>
> The call trace for the usb_hub_wq seems a lot less stable, but I've
> seen backtraces like
>
> task:kworker/24:1 state:D stack:0 pid:869 tgid:869 ppid:2
> flags:0x00000008
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> schedule_preempt_disabled+0x2c/0x50
> __mutex_lock.constprop.0+0x478/0x968
> __mutex_lock_slowpath+0x1c/0x38
> mutex_lock+0x6c/0x88
> hub_event+0x144/0x4a0
> process_one_work+0x170/0x408
> worker_thread+0x2cc/0x400
> kthread+0xf4/0x108
> ret_from_fork+0x10/0x20
>
> But also just
>
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> usb_control_msg+0xcc/0x150
>
> or
>
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> schedule_timeout+0x94/0x120
> msleep+0x30/0x50
>
> so at a guess it's just some interaction with that 'pm' workqueue.
>
> I did a reboot just to verify that yes, it happened again after a
> fresh boot. So it is at least *somewhat* consistently repeatable,
> although I wouldn't be surprised if it's some kind of timing-dependent
> race condition that just happens to trigger on this machine.
>
> I could try to see if it's so consistent that I could bisect it, but
> before I start doing that, maybe just the backtraces makes somebody go
> "Ahh, that smells like XYZ"..
I can't duplicate this here running on your latest tree at all (it's a
"smaller" x86 box with only 64 cores.) So I don't know what to suggest.
Are you using any USB devices on this thing, or is that just for
internal connections?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: USB: workqueues stuck in 'D' state?
2024-12-08 15:51 ` Greg Kroah-Hartman
@ 2024-12-08 19:14 ` Linus Torvalds
2024-12-08 20:17 ` Michał Pecio
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2024-12-08 19:14 UTC (permalink / raw)
To: Greg Kroah-Hartman; +Cc: Mathias Nyman, USB subsystem
On Sun, 8 Dec 2024 at 07:51, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> I can't duplicate this here running on your latest tree at all (it's a
> "smaller" x86 box with only 64 cores.) So I don't know what to suggest.
Yeah, I suspect it's an older thing and triggered by some timing change.
I guess I will try to see if I can bisect it, but I'm not all that optimistic.
> Are you using any USB devices on this thing, or is that just for
> internal connections?
All internal motherboard devices. Mostly the usual random hubs, but
also some "ASRR USB Ethernet" and a "Multifunction Composite Gadget".
I assume that's for the whole management side.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: USB: workqueues stuck in 'D' state?
2024-12-08 19:14 ` Linus Torvalds
@ 2024-12-08 20:17 ` Michał Pecio
2024-12-09 23:27 ` Michał Pecio
0 siblings, 1 reply; 6+ messages in thread
From: Michał Pecio @ 2024-12-08 20:17 UTC (permalink / raw)
To: torvalds; +Cc: gregkh, linux-usb, mathias.nyman
Hi,
> > Are you using any USB devices on this thing, or is that just for
> > internal connections?
>
> All internal motherboard devices. Mostly the usual random hubs, but
> also some "ASRR USB Ethernet" and a "Multifunction Composite Gadget".
Dynamic debug has been suggested, it would tell which ports are being
suspended and how often. Either usbcore and/or this should do:
func xhci_set_link_state +p
Stacktrace looks like this function has just returned and the driver
is waiting for the HW to confirm new link state. (Which confirmation,
or lack thereof, seems to be subsequently ignored by the way.)
Regards,
Michal
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: USB: workqueues stuck in 'D' state?
2024-12-08 20:17 ` Michał Pecio
@ 2024-12-09 23:27 ` Michał Pecio
0 siblings, 0 replies; 6+ messages in thread
From: Michał Pecio @ 2024-12-09 23:27 UTC (permalink / raw)
To: torvalds; +Cc: gregkh, linux-usb, mathias.nyman
OK, I have no idea if it's the same bug or merely similar, but:
Linux v6.13-rc1. My hardware currently seems to insists that there is
overcurrent on all ports. It's an ASMedia chip, so nothing shocking.
This drives Linux nuts and it repeatedly hammers on hub_suspend() and
hub_resume(), apparently in a fairly tight loop, with no end in sight.
I have also seen short bursts of the same a few times while unplugging
some device from two ASMedia controllers (and not from one other HC),
but they stopped after a second. Maybe the HW came back to sanity?
It looks like something is very eager to retry things.
This shows up upon reloading xhci_hcd:
[613292.598841] usb usb8-port1: over-current condition
[613292.726843] usb usb8-port2: over-current condition
[613292.761832] usb usb9-port1: over-current condition
[613292.969830] usb usb9-port2: over-current condition
One core at 100% and junk like this repeats forever with dynamic debug:
[ +0.000001] hub 8-0:1.0: hub_resume
[ +0.000007] xhci_hcd 0000:02:00.0: Get port status 8-2 read: 0x288, return 0x108
[ +0.000005] hub 8-0:1.0: state 7 ports 2 chg 0006 evt 0000
[ +0.000006] usb usb8-port1: status 0108, change 0000, 12 Mb/s
[ +0.000003] xhci_hcd 0000:02:00.0: Get port status 8-2 read: 0x288, return 0x108
[ +0.000004] usb usb8-port2: status 0108, change 0000, 12 Mb/s
[ +0.000002] hub 8-0:1.0: hub_suspend
[ +0.000003] usb usb8: bus auto-suspend, wakeup 1
[ +0.000003] xhci_hcd 0000:02:00.0: Bus suspend bailout, port over-current detected
[ +0.000001] usb usb8: bus suspend fail, err -16
[ +0.000008] xhci_hcd 0000:02:00.0: Get port status 8-2 read: 0x288, return 0x108
[ +0.000004] hub 8-0:1.0: state 7 ports 2 chg 0006 evt 0000
[ +0.000003] xhci_hcd 0000:02:00.0: Get port status 8-1 read: 0x288, return 0x108
[ +0.000004] usb usb8-port1: status 0108, change 0000, 12 Mb/s
[ +0.000007] usb usb8-port2: status 0108, change 0000, 12 Mb/s
[ +0.000002] hub 8-0:1.0: hub_suspend
[ +0.000006] xhci_hcd 0000:02:00.0: Bus suspend bailout, port over-current detected
[ +0.000004] xhci_hcd 0000:02:00.0: Get port status 8-1 read: 0x288, return 0x108
[ +0.000009] hub 8-0:1.0: state 7 ports 2 chg 0006 evt 0000
[ +0.000003] xhci_hcd 0000:02:00.0: Get port status 8-1 read: 0x288, return 0x108
[ +0.000004] usb usb8-port1: status 0108, change 0000, 12 Mb/s
[ +0.000006] usb usb8-port2: status 0108, change 0000, 12 Mb/s
[ +0.000003] hub 8-0:1.0: hub_suspend
[ +0.000003] usb usb8: bus auto-suspend, wakeup 1
[ +0.000002] xhci_hcd 0000:02:00.0: Bus suspend bailout, port over-current detected
[ +0.000002] usb usb8: bus suspend fail, err -16
[ +0.000001] hub 8-0:1.0: hub_resume
[ +0.000007] xhci_hcd 0000:02:00.0: Get port status 8-2 read: 0x288, return 0x108
[ +0.000004] hub 8-0:1.0: state 7 ports 2 chg 0006 evt 0000
[ +0.000007] usb usb8-port1: status 0108, change 0000, 12 Mb/s
[ +0.000003] xhci_hcd 0000:02:00.0: Get port status 8-2 read: 0x288, return 0x108
[ +0.000003] usb usb8-port2: status 0108, change 0000, 12 Mb/s
[ +0.000003] hub 8-0:1.0: hub_suspend
Stack traces are similar: pm workqueue -> usbcore -> xhci.
Regards,
Michal
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-12-09 23:27 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 23:07 USB: workqueues stuck in 'D' state? Linus Torvalds
2024-12-07 2:02 ` Alan Stern
2024-12-08 15:51 ` Greg Kroah-Hartman
2024-12-08 19:14 ` Linus Torvalds
2024-12-08 20:17 ` Michał Pecio
2024-12-09 23:27 ` Michał Pecio
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).