From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EEBC0388E7C for ; Thu, 30 Apr 2026 21:54:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777586101; cv=none; b=mE4M3zFJiC4zILPq/VbRCP8XmZoc1vj696awZOtynV2zGx0Jma7Mq6u1vajkIvZ+cAQfucV3iZBVJP9rGyXxI7oT154sb3OKN/CFpL1PAiGy4FyZ/fjwx36tQrwD9PLAtyaSaOqolcv+OJ9nGAtFNnLM3B3F1JZD2NZ4mxzAQp0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777586101; c=relaxed/simple; bh=OfOzv1CwKJM9L7Z3x9sYHunCD19yBvZ9TOBl02q92j0=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=u8u2/Ck8eXFlLDNbQGJrhvLpwaGPcdR/vtQncqIdU6/XjRFcz7xabi5fukSJnez+AL969tiaQbGvbhjn7MTdcezv6DMxGgTeSQ4rD010ooUFdTXI/g2nagayF/+DHraYOV6Qw79z0n2gjIoFad7vp3CEMXbenr9wqPEXlgk2rMI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=p8awkKP+; arc=none smtp.client-ip=209.85.167.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="p8awkKP+" Received: by mail-lf1-f51.google.com with SMTP id 2adb3069b0e04-5a3fcb2c718so1252528e87.0 for ; Thu, 30 Apr 2026 14:54:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777586098; x=1778190898; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=htONy2JrV9dOzOKD0squdGdFWilpbZGrAvV2FXjkQoA=; b=p8awkKP+xeBPAoUs2/VTBqq0clGHIe1/2oWxk0aGnPcmldlM7HRw4bpK8+dhL/NV1S thOa+/3zBgeuirFZIq7Uh1gR0mieOj0pLALwtJG2H15mm0KKdDgyvaCJwsavOHMzhnln zNTHtVXGjhAgaFm1kTNDwbKf+Bls7FPvECHNvNq8Exv6aP2lqgrLE5LEMrCgh5KCwX9G +8v89JwzdZZt0NMd0jfAt2BwTOWyqjDRhyMP59qq413D2TIajCYlGHNuTBGp07kuRSnM Ytnse9jsk9mQf34i+0M9xf7kS5rsDPIm6GsbZuSAwiyOdqfdDCGdDleWiupAy5EBx++W 4Big== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777586098; x=1778190898; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=htONy2JrV9dOzOKD0squdGdFWilpbZGrAvV2FXjkQoA=; b=oHoweJ9NPOQ1KSGYcJtPUSn3V/gNv/lEb/pN3dy0qqUAqGi4QwhA9/6XYi/6QV0WyV bdd2YSLRXVVuPCQEXcg9u8uWoYe14cGICxXKFE6ty0uS9LGys1Gwwa469XILTF2yMZZA q6M1juVnJJOjXmD7fm8G4nDyWe2i5fp41nzOyIdZcFlVXJqPu7mDwkFvTgI77muKnRov C9ZZomEvJJNNtAnVeGorgw7b2skNCEfdPZyX+dfH1gzU26WXb5oCa3mgJL9wqHSdW8b9 jLLJ5FeGVxYj9tVpzT8USYTHH9MS395INmnYv/Xp5N4FHGFaw6bYTrjQWgMXl4RMIubd NHZw== X-Gm-Message-State: AOJu0Yy6EKYzPe1qmTLmszUwJa/8dPWLSzm4F+9zTK5rf3UHXdKEvUiL A40wrY2hR9q1im/44Q5oxlOxeXS9W/gZC8W2SGHcMo9FVPP2XTS7W4uo X-Gm-Gg: AeBDievHn6ssLqV5M0ppnaLYfJOKRCYy4291TljFkh5U1BqIF1YGDnREQmDhrJzovEn AvW7X9re5C29sxOel3cLNudSgaO38rCsdIb4TiYJIuvbEJUMQskdTrkIJdaHsPp/lVGYoSNq3pN H797qCaUD3TnqJpSpYUqlXTIxKE6tskR6QwiscAmtIfcmgmJZmkpEbhQKS5IEgGqOazDYyntxsT NlLjHgm5NSobA0XwRfWDYyidcIZJU2gWh2yn963atZtHNoO+UBF5LT+JylPS/tQQvc/CFoVOz+h SzmgqWl1ng10wR+mo+i1XRlBtA/Ce/7+Rye1CvnwlZTkqeZF1u01eunott7KSDvbTA/L1vJAz/I RyAjT7kFw0h8Wxd8bV7yC6pb101E9zsXNs4zQVKfOFNnyYCbcGVURAbqaCALwx9fZjCJ/0s635+ O1FuPkmdHZXSsdttuAedTQHP1V+/RjlWfxw+lksUq2vu0= X-Received: by 2002:a05:6512:3b22:b0:5a3:f305:a50f with SMTP id 2adb3069b0e04-5a8522d9438mr2292840e87.30.1777586097935; Thu, 30 Apr 2026 14:54:57 -0700 (PDT) Received: from foxbook (bfh75.neoplus.adsl.tpnet.pl. [83.28.45.75]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-5a85b18c70esm64646e87.70.2026.04.30.14.54.57 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Thu, 30 Apr 2026 14:54:57 -0700 (PDT) Date: Thu, 30 Apr 2026 23:54:53 +0200 From: Michal Pecio To: Desnes Nunes Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org, gregkh@linuxfoundation.org, mathias.nyman@intel.com, stable@vger.kernel.org Subject: Re: [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Message-ID: <20260430235453.2288c973.michal.pecio@gmail.com> In-Reply-To: References: <20260430014817.2006885-1-desnesn@redhat.com> <20260430104850.352bd946.michal.pecio@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit On Thu, 30 Apr 2026 14:27:59 -0300, Desnes Nunes wrote: > As for how I saw HSE, while testing the patch before submission, since > I already had the xhci lock, I just added a read of the usbsts before > calling xhci_hc_died(xhci): > > ... > - wait_for_completion(command->completion); > - slot_id = command->slot_id; > + if (!wait_for_completion_timeout(command->completion, > + msecs_to_jiffies(2 * > command->timeout_ms))) { > + spin_lock_irqsave(&xhci->lock, tflags); > + usbsts = readl(&xhci->op_regs->status); > + xhci_err(xhci, > + "TRB_ENABLE_SLOT: no command completion after %lums, USBSTS:%s\n", > + 2 * command->timeout_ms, > + xhci_decode_usbsts(ststr, usbsts)); > + xhci_hc_died(xhci); > + spin_unlock_irqrestore(&xhci->lock, tflags); > + } > ... > > This debug version of the patch printed: > > [ 17.481330] xhci_hcd 0000:80:14.0: TRB_ENABLE_SLOT: no command > completion after 10000ms, USBSTS: 0x00000015 HCHalted HSE PCD OK, so this chip is busted at that point. But it might still be better to improve xhci_handle_command_timeout() to deal with this and complete the command, instead of patching here and in other similar places. > Actually, from the beginning of all my debugging I already had > `usbcore.dyndbg=+p xhci_hcd.dyndbg=+p xhci_pci.dyndbg=+p` on the > kernel cmdline, as well as on the crashkernel's > KDUMP_COMMANDLINE_APPEND at /etc/sysconfig/kdump. > > On crashkernel's kexec-dmesg of the unpatched kernel I see multiple > doorbell rings stating the HSE: > > ... > [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Command timeout, > USBSTS: 0x00000015 HCHalted HSE PCD > [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Command timeout on > stopped ring > [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Turn aborted command > 000000005921b827 to no-op > [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: // Ding dong! > ... Hmm, the "Command timeout on stopped ring" case doesn't obviously lead to any immediate command completion, and ringing the command doorbell under HSE won't achieve any progress. Maybe that's the bug. Could you post full crash kernel dmesg up to that point? Not sure how it got to this place. When xhci_handle_command_timeout() logs USBSTS, does it help to add: if (usbsts & STS_FATAL) { xhci_halt(xhci); xhci_hc_died(xhci); goto time_out_completed; } It may not be perfect solution (race conditions?) but it could hint that we are on the right track, if it works. Regards, Michal