From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>,
USB subsystem <linux-usb@vger.kernel.org>
Subject: Re: USB: workqueues stuck in 'D' state?
Date: Sun, 8 Dec 2024 16:51:04 +0100 [thread overview]
Message-ID: <2024120802-unwashed-repackage-5eb3@gregkh> (raw)
In-Reply-To: <CAHk-=whPKnwZbbAp1MjogDP1aDYrCmQ63VC82+OnsLKy9M+gvg@mail.gmail.com>
On Fri, Dec 06, 2024 at 03:07:10PM -0800, Linus Torvalds wrote:
> So I'm not sure if this is new or not, but I *think* I would have
> noticed it earlier.
>
> On my Ampere Altra (128-core arm64 system), I started seeing 'top'
> claiming a load average of roughly 2.3 even when idle, and it seems to
> be all due to this:
>
> $ ps ax | grep ' [DR] '
> 869 ? D 0:00 [kworker/24:1+usb_hub_wq]
> 1900 ? D 0:00 [kworker/24:7+pm]
>
> where sometimes there are multiple of those 'pm' workers.
>
> Doing a sysrq-w, I get this:
>
> task:kworker/24:3 state:D stack:0 pid:1308 tgid:1308 ppid:2
> flags:0x00000008
> Workqueue: pm pm_runtime_work
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> usleep_range_state+0x118/0x150
> xhci_hub_control+0xe80/0x1090
> rh_call_control+0x274/0x7a0
> usb_hcd_submit_urb+0x13c/0x3a0
> usb_submit_urb+0x1c8/0x600
> usb_start_wait_urb+0x7c/0x180
> usb_control_msg+0xcc/0x150
> usb_port_suspend+0x414/0x510
> usb_generic_driver_suspend+0x68/0x90
> usb_suspend_both+0x1c8/0x290
> usb_runtime_suspend+0x3c/0xb0
> __rpm_callback+0x50/0x1f0
> rpm_callback+0x70/0x88
> rpm_suspend+0xe8/0x5a8
> __pm_runtime_suspend+0x4c/0x130
> usb_runtime_idle+0x48/0x68
> rpm_idle+0xa4/0x358
> pm_runtime_work+0xb0/0xe0
>
> task:kworker/24:7 state:D stack:0 pid:1900 tgid:1900 ppid:2
> flags:0x00000208
> Workqueue: pm pm_runtime_work
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> usleep_range_state+0x118/0x150
> xhci_hub_control+0xe80/0x1090
> rh_call_control+0x274/0x7a0
> usb_hcd_submit_urb+0x13c/0x3a0
> usb_submit_urb+0x1c8/0x600
> usb_start_wait_urb+0x7c/0x180
> usb_control_msg+0xcc/0x150
> usb_port_suspend+0x414/0x510
> usb_generic_driver_suspend+0x68/0x90
> usb_suspend_both+0x1c8/0x290
> usb_runtime_suspend+0x3c/0xb0
> __rpm_callback+0x50/0x1f0
> rpm_callback+0x70/0x88
> rpm_suspend+0xe8/0x5a8
> __pm_runtime_suspend+0x4c/0x130
>
> so it seems to be all in that xhci_hub_control() path. I'm not seeing
> anything that has changed in the xhci driver in this merge window, so
> maybe this goes back further, and I just haven't noticed this odd load
> average issue before.
>
> The call trace for the usb_hub_wq seems a lot less stable, but I've
> seen backtraces like
>
> task:kworker/24:1 state:D stack:0 pid:869 tgid:869 ppid:2
> flags:0x00000008
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> schedule_preempt_disabled+0x2c/0x50
> __mutex_lock.constprop.0+0x478/0x968
> __mutex_lock_slowpath+0x1c/0x38
> mutex_lock+0x6c/0x88
> hub_event+0x144/0x4a0
> process_one_work+0x170/0x408
> worker_thread+0x2cc/0x400
> kthread+0xf4/0x108
> ret_from_fork+0x10/0x20
>
> But also just
>
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> usb_control_msg+0xcc/0x150
>
> or
>
> Workqueue: usb_hub_wq hub_event
> Call trace:
> __switch_to+0xf4/0x168 (T)
> __schedule+0x248/0x648
> schedule+0x3c/0xe0
> schedule_timeout+0x94/0x120
> msleep+0x30/0x50
>
> so at a guess it's just some interaction with that 'pm' workqueue.
>
> I did a reboot just to verify that yes, it happened again after a
> fresh boot. So it is at least *somewhat* consistently repeatable,
> although I wouldn't be surprised if it's some kind of timing-dependent
> race condition that just happens to trigger on this machine.
>
> I could try to see if it's so consistent that I could bisect it, but
> before I start doing that, maybe just the backtraces makes somebody go
> "Ahh, that smells like XYZ"..
I can't duplicate this here running on your latest tree at all (it's a
"smaller" x86 box with only 64 cores.) So I don't know what to suggest.
Are you using any USB devices on this thing, or is that just for
internal connections?
thanks,
greg k-h
next prev parent reply other threads:[~2024-12-08 15:51 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-06 23:07 USB: workqueues stuck in 'D' state? Linus Torvalds
2024-12-07 2:02 ` Alan Stern
2024-12-08 15:51 ` Greg Kroah-Hartman [this message]
2024-12-08 19:14 ` Linus Torvalds
2024-12-08 20:17 ` Michał Pecio
2024-12-09 23:27 ` Michał Pecio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2024120802-unwashed-repackage-5eb3@gregkh \
--to=gregkh@linuxfoundation.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox