public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ben Greear <greearb@candelatech.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Hillf Danton <hdanton@sina.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-usb@vger.kernel.org
Subject: Re: Deadlock in usb subsystem on shutdown, 6.18.3+
Date: Wed, 14 Jan 2026 09:51:48 -0800	[thread overview]
Message-ID: <a721a966-0a4b-cbc4-71ac-a482156ffa48@candelatech.com> (raw)
In-Reply-To: <c52546af-e39e-4096-ad11-9b38bb2d5f7e@rowland.harvard.edu>

On 1/14/26 07:13, Alan Stern wrote:
> On Wed, Jan 14, 2026 at 06:36:41AM -0800, Ben Greear wrote:
>> On 1/13/26 18:45, Hillf Danton wrote:
>>> On Tue, 13 Jan 2026 16:21:07 -0800 Ben Greear wrote:
>>>> Hello,
>>>>
>>>> We caught a deadlock that appears to be in the USB code during shutdown.
>>>> We do a lot of reboots and normally all goes well, so I don't think we
>>>> can reliably reproduce the problem.
>>>>
>>>> INFO: task systemd-shutdow:1 blocked for more than 180 seconds.
>>>>          Tainted: G S         O        6.18.3+ #33
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:systemd-shutdow state:D stack:0     pid:1     tgid:1     ppid:0      task_flags:0x400100 flags:0x00080001
>>>> Call Trace:
>>>>     <TASK>
>>>>     __schedule+0x46b/0x1140
>>>>     schedule+0x23/0xc0
>>>>     schedule_preempt_disabled+0x11/0x20
>>>>     __mutex_lock.constprop.0+0x4f7/0x9a0
>>>>     device_shutdown+0xa0/0x220
>>>>     kernel_restart+0x36/0x90
>>>>     __do_sys_reboot+0x127/0x220
>>>>     ? do_writev+0x76/0x110
>>>>     ? do_writev+0x76/0x110
>>>>     do_syscall_64+0x50/0x6d0
>>>>     entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>> RIP: 0033:0x7fad03531087
>>>> RSP: 002b:00007ffe137cf918 EFLAGS: 00000246 ORIG_RAX: 00000000000000a9
>>>> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fad03531087
>>>> RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
>>>> RBP: 00007ffe137cfac0 R08: 0000000000000069 R09: 0000000000000000
>>>> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>>>> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>>     </TASK>
>>>> INFO: task systemd-shutdow:1 is blocked on a mutex likely owned by task kworker/4:1:16648.
>>>
>>> This explains why the shutdown stalled.
>>>
>>>> INFO: task kworker/4:2:1520 blocked for more than 360 seconds.
>>>>          Tainted: G S         O        6.18.3+ #33
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:kworker/4:2     state:D stack:0     pid:1520  tgid:1520  ppid:2      task_flags:0x4288060 flags:0x00080000
>>>> Workqueue: events __usb_queue_reset_device
>>>> Call Trace:
>>>>     <TASK>
>>>>     __schedule+0x46b/0x1140
>>>>     ? schedule_timeout+0x79/0xf0
>>>>     schedule+0x23/0xc0
>>>>     usb_kill_urb+0x7b/0xc0
>>>>     ? housekeeping_affine+0x30/0x30
>>>>     usb_start_wait_urb+0xd6/0x160
>>>>     usb_control_msg+0xe2/0x140
>>>>     hub_port_init+0x647/0xf70
>>>>     usb_reset_and_verify_device+0x191/0x4a0
>>>>     ? device_release_driver_internal+0x4a/0x200
>>>>     usb_reset_device+0x138/0x280
>>>>     __usb_queue_reset_device+0x35/0x50
>>>>     process_one_work+0x17e/0x390
>>>>     worker_thread+0x2c8/0x3e0
>>>>     ? process_one_work+0x390/0x390
>>>>     kthread+0xf7/0x1f0
>>>>     ? kthreads_online_cpu+0x100/0x100
>>>>     ? kthreads_online_cpu+0x100/0x100
>>>>     ret_from_fork+0x114/0x140
>>>>     ? kthreads_online_cpu+0x100/0x100
>>>>     ret_from_fork_asm+0x11/0x20
>>>>     </TASK>
>>>> INFO: task kworker/4:1:16648 blocked for more than 360 seconds.
>>>>          Tainted: G S         O        6.18.3+ #33
>>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> task:kworker/4:1     state:D stack:0     pid:16648 tgid:16648 ppid:2      task_flags:0x4288060 flags:0x00080000
>>>> Workqueue: events __usb_queue_reset_device
>>>> Call Trace:
>>>>     <TASK>
>>>>     __schedule+0x46b/0x1140
>>>>     schedule+0x23/0xc0
>>>>     usb_kill_urb+0x7b/0xc0
>>>
>>> Kworker failed to kill urb within 300 seconds, so we know the underlying usb
>>> hardware failed to response within 300s.
>>>
>>> That said, the deadlock in the subject line is incorrect, but task hung due
>>> to hardware glitch.
> 
> In fact, we do not know whether this was a hardware glitch or a software
> bug.
> 
>> In the case where hardware is not responding, shouldn't we just consider it
>> dead and move on instead of deadlocking the whole OS?
>>
>> In this case, the system was un-plugged from a KVM (usb mouse & keyboard)
>> right around time of shutdown, so I guess that would explain why the USB device
>> didn't respond.
> 
> You misunderstand.  What's failing is the USB host controller on the
> computer, not the attached (or unplugged) USB device.  If the host
> controller really had a hardware glitch then the host controller driver
> should have realized it and moved on.  It seems to me at least as likely
> that the problem is caused by a bug in the host controller driver rather
> than anything wrong with the hardware.
> 
> (Of course, it could be a combination of things going wrong: a glitch in
> the hardware that the driver wasn't expecting and is unable to cope
> with.  But even in that case, the proper solution would be to fix the
> driver since we can't fix the hardware.)
> 
> Unfortunately, we have no to tell from the log you collected which host
> controller driver encountered this problem.  Nor, unless you can
> replicate the problem, any way to track down exactly where in that
> driver the bug is -- or even any way to tell whether a proposed fix
> actually solves the problem.
> 
> Alan Stern

The system was in the final stage of shutdown, so all we have in this case is serial
console output.  If we set console to more verbose mode, would that provide what
you need?

Or maybe 'dmesg' from when system boots?

We can try to reproduce, but need some clues about what to provide to make progress.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



  reply	other threads:[~2026-01-14 17:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14  0:21 Deadlock in usb subsystem on shutdown, 6.18.3+ Ben Greear
2026-01-14  2:45 ` Hillf Danton
2026-01-14 14:36   ` Ben Greear
2026-01-14 15:13     ` Alan Stern
2026-01-14 17:51       ` Ben Greear [this message]
2026-01-14 19:26         ` Alan Stern
2026-01-14 20:14           ` Oliver Neukum
2026-01-20 17:29             ` Ben Greear
2026-01-29  9:28               ` Michal Pecio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a721a966-0a4b-cbc4-71ac-a482156ffa48@candelatech.com \
    --to=greearb@candelatech.com \
    --cc=hdanton@sina.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox