[PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

From: Desnes Nunes <desnesn@redhat.com>
To: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org
Cc: gregkh@linuxfoundation.org, mathias.nyman@intel.com,
	Desnes Nunes <desnesn@redhat.com>,
	stable@vger.kernel.org
Subject: [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock
Date: Wed, 29 Apr 2026 22:48:17 -0300	[thread overview]
Message-ID: <20260430014817.2006885-1-desnesn@redhat.com> (raw)

The following deadlock in the usb subsystem can be triggered during kdump:

systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 is taking a long time
dracut-initqueue[432]: Timed out while waiting for udev queue to empty.
systemd-udevd[402]: usb3: Worker [419] processing SEQNUM=2194 killed
systemd-udevd[402]: usb3: Worker [419] terminated by signal 9 (KILL).
...
kdump[720]: saving vmcore complete
...
systemd-shutdown[1]: Rebooting.
INFO: task kworker/0:6:76 blocked for more than 122 seconds.
      Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/0:6     state:D stack:0     pid:76    tgid:76    ppid:2      task_flags:0x4208060 flags:0x00004000
Workqueue: usb_hub_wq hub_event
Call Trace:
 <TASK>
 __schedule+0x2a5/0x630
 schedule+0x27/0x80
 schedule_timeout+0xbf/0x100
 __wait_for_common+0x95/0x1b0
 ? __pfx_schedule_timeout+0x10/0x10
 xhci_alloc_dev+0x9e/0x290
 usb_alloc_dev+0x77/0x3a0
 hub_port_connect+0x293/0x9a0
 hub_port_connect_change+0x94/0x260
 port_event+0x4d1/0x7f0
 hub_event+0x16f/0x480
 process_one_work+0x174/0x330
 worker_thread+0x256/0x3a0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0xfa/0x240
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x31/0x50
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30
 </TASK>
INFO: task systemd-shutdow:1 blocked for more than 122 seconds.
      Not tainted 6.12.0-223.2443_2475543665.el10.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:systemd-shutdow state:D stack:0     pid:1     tgid:1     ppid:0      task_flags:0x400100 flags:0x00000002
Call Trace:
 <TASK>
 __schedule+0x2a5/0x630
 schedule+0x27/0x80
 schedule_preempt_disabled+0x15/0x30
 __mutex_lock.constprop.0+0x497/0x860
 device_shutdown+0xac/0x190
 kernel_restart+0x3a/0x70
 __do_sys_reboot+0x146/0x240
 do_syscall_64+0x7d/0x160
 ? devkmsg_write.cold+0x24/0x4a
 ? update_load_avg+0x7f/0x730
 ? __dequeue_entity+0x3ec/0x4a0
 ? update_load_avg+0x7f/0x730
 ? pick_next_task_fair+0x1e6/0x330
 ? finish_task_switch.isra.0+0x97/0x2a0
 ? rseq_get_rseq_cs+0x1d/0x220
 ? rseq_ip_fixup+0x8d/0x1d0
 ? arch_exit_to_user_mode_prepare.isra.0+0xa5/0xd0
 ? syscall_exit_to_user_mode+0x32/0x190
 ? do_syscall_64+0x89/0x160
 ? handle_mm_fault+0x110/0x370
 ? do_user_addr_fault+0x606/0x830
 ? exc_page_fault+0x7f/0x150
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f32517d9917
RSP: 002b:00007ffc018d4fb8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a9
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f32517d9917
RDX: 0000000001234567 RSI: 0000000028121969 RDI: 00000000fee1dead
RBP: 00007ffc018d5130 R08: 0000000000000069 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000000 R14: 00007ffc018d5258 R15: 0000000000000000
 </TASK>

During crashkernel's boot, hub_event() takes usb_lock_device(hdev) of the
root hub and keeps it for the whole hub processing loop, since it calls
hub_port_connect() -> usb_alloc_dev() -> xhci_alloc_dev(). If during kdump
another device (e.g., a mis-initialized dGPU) hogs interrupts or DMAs, the
TRB_ENABLE_SLOT command will be blocked from completion in time, moving
the HC to an unstable condition (e.g., HSE in USBSTS). After vmcore gets
captured, init calls device_shutdown() trying to shut down the hub device,
by also trying to take the same lock still held by the hub kworker task.

Avoid the deadlock by adding a 2x timeout for command completion before
calling xhci_hc_died(). This gives enough time before marking the host un-
stable, dying and calling xhci_cleanup_command_queue(); which unblocks the
hub worker into releasing the lock, allowing device_shutdown() to proceed.

Fixes: c311e391a7efd ("xhci: rework command timeout and cancellation,")
Cc: stable@vger.kernel.org
Signed-off-by: Desnes Nunes <desnesn@redhat.com>
---
 drivers/usb/host/xhci.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index a54f5b57f205..55250fe814c9 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -4219,7 +4219,7 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
 	struct xhci_virt_device *vdev;
 	struct xhci_slot_ctx *slot_ctx;
-	unsigned long flags;
+	unsigned long flags, tflags;
 	int ret, slot_id;
 	struct xhci_command *command;
 
@@ -4238,9 +4238,14 @@ int xhci_alloc_dev(struct usb_hcd *hcd, struct usb_device *udev)
 	xhci_ring_cmd_db(xhci);
 	spin_unlock_irqrestore(&xhci->lock, flags);
 
-	wait_for_completion(command->completion);
-	slot_id = command->slot_id;
+	if (!wait_for_completion_timeout(command->completion,
+					 msecs_to_jiffies(2 * command->timeout_ms))) {
+		spin_lock_irqsave(&xhci->lock, tflags);
+		xhci_hc_died(xhci);
+		spin_unlock_irqrestore(&xhci->lock, tflags);
+	}
 
+	slot_id = command->slot_id;
 	if (!slot_id || command->status != COMP_SUCCESS) {
 		xhci_err(xhci, "Error while assigning device slot ID: %s\n",
 			 xhci_trb_comp_code_string(command->status));
-- 
2.51.0

next             reply	other threads:[~2026-04-30  1:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  1:48 Desnes Nunes [this message]
2026-04-30  8:48 ` [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Michal Pecio
2026-04-30 17:27   ` Desnes Nunes
2026-04-30 21:54     ` Michal Pecio
2026-05-01 14:09       ` Desnes Nunes
2026-05-02  9:46         ` [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Michal Pecio
2026-05-02 11:38           ` Desnes Nunes
2026-05-02 21:55             ` Michal Pecio
2026-05-03  3:36               ` Desnes Nunes
2026-05-03  5:17                 ` Michal Pecio
2026-05-03 16:20                   ` Desnes Nunes
2026-05-03 19:31                     ` Michal Pecio
2026-05-04  7:31                       ` Michal Pecio

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:a54f5b57f20 dfblob:55250fe814c )
 OR (
bs:"[PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260430014817.2006885-1-desnesn@redhat.com \
    --to=desnesn@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox