All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock
@ 2026-04-30  1:48 Desnes Nunes
  2026-04-30  8:48 ` Michal Pecio
  0 siblings, 1 reply; 25+ messages in thread
From: Desnes Nunes @ 2026-04-30  1:48 UTC (permalink / raw)
  To: linux-kernel, linux-usb; +Cc: gregkh, mathias.nyman, Desnes Nunes, stable

The following deadlock in the usb subsystem can be triggered during kdump:

systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 is taking a long time
dracut-initqueue[432]: Timed out while waiting for udev queue to empty.
systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 killed
systemd-udevd[402]: usb3: Worker [419] terminated by signal 9 (KILL).
...
kdump[720]: saving vmcore complete
...
systemd-shutdown[1]: Rebooting.
INFO: task kworker/0:6:76 blocked for more than 122 seconds.
      Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:6     state:D stack:0     pid:76    tgid:76    ppid:2      task_flags:0x4208060 flags:0x00004000
Workqueue: usb_hub_wq hub_event
Call Trace:
 <TASK>
 __schedule+0x2a5/0x630
 schedule+0x27/0x80
 schedule_timeout+0xbf/0x100
 __wait_for_common+0x95/0x1b0
 ? __pfx_schedule_timeout+0x10/0x10
 xhci_alloc_dev+0x9e/0x290
 usb_alloc_dev+0x77/0x3a0
 hub_port_connect+0x293/0x9a0
 hub_port_connect_change+0x94/0x260
 port_event+0x4d1/0x7f0
 hub_event+0x16f/0x480
 process_one_work+0x174/0x330
 worker_thread+0x256/0x3a0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xfa/0x240
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x31/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
      Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-shutdow state:D stack:0     pid:1     tgid:1     ppid:0      task_flags:0x400100 flags:0x00000002
Call Trace:
 <TASK>
 __schedule+0x2a5/0x630
 schedule+0x27/0x80
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock.constprop.0+0x497/0x860
 device_shutdown+0xac/0x190
 kernel_restart+0x3a/0x70
 __do_sys_reboot+0x146/0x240
 do_syscall_64+0x7d/0x160
 ? devkmsg_write.cold+0x24/0x4a
 ? update_load_avg+0x7f/0x730
 ? __dequeue_entity+0x3ec/0x4a0
 ? update_load_avg+0x7f/0x730
 ? pick_next_task_fair+0x1e6/0x330
 ? finish_task_switch.isra.0+0x97/0x2a0
 ? rseq_get_rseq_cs+0x1d/0x220
 ? rseq_ip_fixup+0x8d/0x1d0
 ? arch_exit_to_user_mode_prepare.isra.0+0xa5/0xd0
 ? syscall_exit_to_user_mode+0x32/0x190
 ? do_syscall_64+0x89/0x160
 ? handle_mm_fault+0x110/0x370
 ? do_user_addr_fault+0x606/0x830
 ? exc_page_fault+0x7f/0x150
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f32517d9917
RSP: 002b:00007ffc018d4fb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f32517d9917
RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
RBP: 00007ffc018d5130 R08: 0000000000000069 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffc018d5258 R15: 0000000000000000
 </TASK>

During crashkernel's boot, hub_event() takes usb_lock_device(hdev) of the
root hub and keeps it for the whole hub processing loop, since it calls
hub_port_connect() -> usb_alloc_dev() -> xhci_alloc_dev(). If during kdump
another device (e.g., a mis-initialized dGPU) hogs interrupts or DMAs, the
TRB_ENABLE_SLOT command will be blocked from completion in time, moving
the HC to an unstable condition (e.g., HSE in USBSTS). After vmcore gets
captured, init calls device_shutdown() trying to shut down the hub device,
by also trying to take the same lock still held by the hub kworker task.

Avoid the deadlock by adding a 2x timeout for command completion before
calling xhci_hc_died(). This gives enough time before marking the host un-
stable, dying and calling xhci_cleanup_command_queue(); which unblocks the
hub worker into releasing the lock, allowing device_shutdown() to proceed.

Fixes: c311e391a7efd ("xhci: rework command timeout and cancellation,")
Cc: stable@vger.kernel.org
Signed-off-by: Desnes Nunes <desnesn@redhat.com>
---
 drivers/usb/host/xhci.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index a54f5b57f205..55250fe814c9 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4219,7 +4219,7 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
 	struct xhci_virt_device *vdev;
 	struct xhci_slot_ctx *slot_ctx;
-	unsigned long flags;
+	unsigned long flags, tflags;
 	int ret, slot_id;
 	struct xhci_command *command;
 
@@ -4238,9 +4238,14 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	xhci_ring_cmd_db(xhci);
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
-	wait_for_completion(command->completion);
-	slot_id = command->slot_id;
+	if (!wait_for_completion_timeout(command->completion,
+					 msecs_to_jiffies(2 * command->timeout_ms))) {
+		spin_lock_irqsave(&xhci->lock, tflags);
+		xhci_hc_died(xhci);
+		spin_unlock_irqrestore(&xhci->lock, tflags);
+	}
 
+	slot_id = command->slot_id;
 	if (!slot_id || command->status != COMP_SUCCESS) {
 		xhci_err(xhci, "Error while assigning device slot ID: %s\n",
 			 xhci_trb_comp_code_string(command->status));
-- 
2.51.0


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2026-06-18  4:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-30  1:48 [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Desnes Nunes
2026-04-30  8:48 ` Michal Pecio
2026-04-30 17:27   ` Desnes Nunes
2026-04-30 21:54     ` Michal Pecio
2026-05-01 14:09       ` Desnes Nunes
2026-05-02  9:46         ` [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Michal Pecio
2026-05-02 11:38           ` Desnes Nunes
2026-05-02 21:55             ` Michal Pecio
2026-05-03  3:36               ` Desnes Nunes
2026-05-03  5:17                 ` Michal Pecio
2026-05-03 16:20                   ` Desnes Nunes
2026-05-03 19:31                     ` Michal Pecio
2026-05-04  7:31                       ` Michal Pecio
2026-05-18  6:33                         ` Michal Pecio
2026-05-20  4:59                           ` Desnes Nunes
2026-05-22  9:03                             ` Michal Pecio
2026-05-22 20:45                               ` Desnes Nunes
2026-05-23  0:29                                 ` Michal Pecio
2026-05-23  3:47                                   ` Desnes Nunes
2026-05-23  8:28                                     ` Michal Pecio
2026-05-27  3:47                                       ` Desnes Nunes
2026-05-27  8:32                                         ` Michal Pecio
2026-06-10 15:32                                           ` Desnes Nunes
2026-06-18  0:57                                             ` Desnes Nunes
2026-06-18  4:46                                               ` Intel IOMMU bug: xHCI faults during crash kernel boot Michal Pecio

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.