From: Michal Pecio <michal.pecio@gmail.com>
To: Desnes Nunes <desnesn@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
gregkh@linuxfoundation.org, mathias.nyman@intel.com,
stable@vger.kernel.org
Subject: Re: [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock
Date: Thu, 30 Apr 2026 10:48:50 +0200 [thread overview]
Message-ID: <20260430104850.352bd946.michal.pecio@gmail.com> (raw)
In-Reply-To: <20260430014817.2006885-1-desnesn@redhat.com>
On Wed, 29 Apr 2026 22:48:17 -0300, Desnes Nunes wrote:
> The following deadlock in the usb subsystem can be triggered during kdump:
>
> systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 is taking a long time
> dracut-initqueue[432]: Timed out while waiting for udev queue to empty.
> systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 killed
> systemd-udevd[402]: usb3: Worker [419] terminated by signal 9 (KILL).
> ...
> kdump[720]: saving vmcore complete
> ...
> systemd-shutdown[1]: Rebooting.
> INFO: task kworker/0:6:76 blocked for more than 122 seconds.
That's suspiciously long indeed.
> Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
Pretty old kernel, and distribution to boot.
Have you tried 7.x, does the bug still exist?
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:kworker/0:6 state:D stack:0 pid:76 tgid:76 ppid:2 task_flags:0x4208060 flags:0x00004000
> Workqueue: usb_hub_wq hub_event
> Call Trace:
> <TASK>
> __schedule+0x2a5/0x630
> schedule+0x27/0x80
> schedule_timeout+0xbf/0x100
> __wait_for_common+0x95/0x1b0
> ? __pfx_schedule_timeout+0x10/0x10
> xhci_alloc_dev+0x9e/0x290
> usb_alloc_dev+0x77/0x3a0
> hub_port_connect+0x293/0x9a0
> hub_port_connect_change+0x94/0x260
> port_event+0x4d1/0x7f0
> hub_event+0x16f/0x480
> process_one_work+0x174/0x330
> worker_thread+0x256/0x3a0
> ? __pfx_worker_thread+0x10/0x10
> kthread+0xfa/0x240
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x31/0x50
> ? __pfx_kthread+0x10/0x10
> ret_from_fork_asm+0x1a/0x30
> </TASK>
> INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
> Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:systemd-shutdow state:D stack:0 pid:1 tgid:1 ppid:0 task_flags:0x400100 flags:0x00000002
> Call Trace:
> <TASK>
> __schedule+0x2a5/0x630
> schedule+0x27/0x80
> schedule_preempt_disabled+0x15/0x30
> __mutex_lock.constprop.0+0x497/0x860
> device_shutdown+0xac/0x190
> kernel_restart+0x3a/0x70
> __do_sys_reboot+0x146/0x240
> do_syscall_64+0x7d/0x160
> ? devkmsg_write.cold+0x24/0x4a
> ? update_load_avg+0x7f/0x730
> ? __dequeue_entity+0x3ec/0x4a0
> ? update_load_avg+0x7f/0x730
> ? pick_next_task_fair+0x1e6/0x330
> ? finish_task_switch.isra.0+0x97/0x2a0
> ? rseq_get_rseq_cs+0x1d/0x220
> ? rseq_ip_fixup+0x8d/0x1d0
> ? arch_exit_to_user_mode_prepare.isra.0+0xa5/0xd0
> ? syscall_exit_to_user_mode+0x32/0x190
> ? do_syscall_64+0x89/0x160
> ? handle_mm_fault+0x110/0x370
> ? do_user_addr_fault+0x606/0x830
> ? exc_page_fault+0x7f/0x150
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f32517d9917
> RSP: 002b:00007ffc018d4fb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f32517d9917
> RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
> RBP: 00007ffc018d5130 R08: 0000000000000069 R09: 00000000ffffffff
> R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
> R13: 0000000000000000 R14: 00007ffc018d5258 R15: 0000000000000000
> </TASK>
>
> During crashkernel's boot, hub_event() takes usb_lock_device(hdev) of the
> root hub and keeps it for the whole hub processing loop, since it calls
> hub_port_connect() -> usb_alloc_dev() -> xhci_alloc_dev(). If during kdump
> another device (e.g., a mis-initialized dGPU) hogs interrupts or DMAs, the
> TRB_ENABLE_SLOT command will be blocked from completion in time, moving
> the HC to an unstable condition (e.g., HSE in USBSTS).
What specifically have you seen?
If you have actually observed HSE (how?), maybe xhci-hcd could detect
it automatically by the same means and clean up immediately.
> After vmcore gets captured, init calls device_shutdown() trying to
> shut down the hub device, by also trying to take the same lock still
> held by the hub kworker task.
>
> Avoid the deadlock by adding a 2x timeout for command completion
nit: not a deadlock if X waits for Y and Y is just stuck by itself.
> before calling xhci_hc_died(). This gives enough time before marking
> the host un- stable, dying and calling xhci_cleanup_command_queue();
> which unblocks the hub worker into releasing the lock, allowing
> device_shutdown() to proceed.
Many functions which wait for command completion without timeouts.
Patch this one and you will get stuck in the next.
This shouldn't be happening in the first place. If a command doesn't
complete normally in time then xhci_handle_command_timeout() should
abort it, and if that times out too, then hc_died().
So not sure why this hasn't happened here.
Is it reproducible? Can you try again with debug logs?
echo 'module xhci_hcd +p' >/proc/dynamic_debug/control
Regards,
Michal
next prev parent reply other threads:[~2026-04-30 8:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 1:48 [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Desnes Nunes
2026-04-30 8:48 ` Michal Pecio [this message]
2026-04-30 17:27 ` Desnes Nunes
2026-04-30 21:54 ` Michal Pecio
2026-05-01 14:09 ` Desnes Nunes
2026-05-02 9:46 ` [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Michal Pecio
2026-05-02 11:38 ` Desnes Nunes
2026-05-02 21:55 ` Michal Pecio
2026-05-03 3:36 ` Desnes Nunes
2026-05-03 5:17 ` Michal Pecio
2026-05-03 16:20 ` Desnes Nunes
2026-05-03 19:31 ` Michal Pecio
2026-05-04 7:31 ` Michal Pecio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260430104850.352bd946.michal.pecio@gmail.com \
--to=michal.pecio@gmail.com \
--cc=desnesn@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox