public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Desnes Nunes <desnesn@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
	gregkh@linuxfoundation.org, mathias.nyman@intel.com,
	stable@vger.kernel.org
Subject: Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout
Date: Sun, 3 May 2026 21:31:11 +0200	[thread overview]
Message-ID: <20260503213111.117db3a1.michal.pecio@gmail.com> (raw)
In-Reply-To: <CACaw+ew8uV5g1G-6qZGtVBEYZ3k+fvFrOq3XMyq-Nuhbq5mdnA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1710 bytes --]

On Sun, 3 May 2026 13:20:38 -0300, Desnes Nunes wrote:
> Yes, same patched binary on the main kernel and kdump kernel.

That's not a great news because it seems that the same HSE could
occur on any kexec, not just kdump. It's unclear why it happens,
it seems that after initial boot the HC works normally (does it?)
but then kexec-ing breaks it somehow.

I don't think this has anything to do with the Battlemage, because
in the particular case which you shared, GPU began initialization
*after* HSE had already been logged.

My first wild guess would be that HSE is caused by resetting IOMMU
while the xHC is unaware of kexec and continuing to DMA old buffers.
Attached patch checks for this and also tries to explicitly clear
HSE, although resetting ought to clear it too. But HW has bugs...

So it may not help, but maybe it will if we are lucky, or at least
it may offer some hint about when things go wrong.

> So, I confirm that this patch, which checks for HSE or HCE indeed
> fixes the bug, without having to rely to a
> wait_for_completion_timeout():
> 
> # grep -i HSE -A5 kexec-dmesg.log
> [Sun May  3 11:37:36 2026] xhci_hcd 0000:80:14.0: Command timeout,
> USBSTS: 0x00000015 HCHalted HSE PCD
> [Sun May  3 11:37:36 2026] xhci_hcd 0000:80:14.0: kill the damn thing
> [Sun May  3 11:37:36 2026] xhci_hcd 0000:80:14.0: xHCI host controller
> not responding, assume dead
> [Sun May  3 11:37:36 2026] xhci_hcd 0000:80:14.0: HC died; cleaning up
> [Sun May  3 11:37:36 2026] xhci_hcd 0000:80:14.0: Error while
> assigning device slot ID: Command Aborted

Thanks for testing, that's what the patch was intended to do.
There is no lockup, but of course the chip doesn't work afterwards.

Regards,
Michal

[-- Attachment #2: xhci-clear-hse.patch --]
[-- Type: text/x-patch, Size: 1131 bytes --]

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 849a568d0e63..c0f3d04c6241 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -5492,6 +5492,8 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks)
 	struct device		*dev = hcd->self.sysdev;
 	int			retval;
 	u32			hcs_params1;
+	u32			usbsts;
+	char			str[XHCI_MSG_MAX];
 
 	/* Accept arbitrarily long scatter-gather lists */
 	hcd->self.sg_tablesize = ~0;
@@ -5550,11 +5552,19 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks)
 		xhci->quirks |= XHCI_LINK_TRB_QUIRK;
 	}
 
+	usbsts = readl(&xhci->op_regs->status);
+	xhci_info(xhci, "gen_setup old USBSTS %s\n", xhci_decode_usbsts(str, usbsts));
 	/* Make sure the HC is halted. */
 	retval = xhci_halt(xhci);
 	if (retval)
 		return retval;
 
+	usbsts = readl(&xhci->op_regs->status);
+	if (usbsts & STS_FATAL)
+		writel(STS_FATAL, &xhci->op_regs->status);
+	usbsts = readl(&xhci->op_regs->status);
+	xhci_info(xhci, "gen_setup new USBSTS %s\n", xhci_decode_usbsts(str, usbsts));
+
 	xhci_zero_64b_regs(xhci);
 
 	xhci_dbg(xhci, "Resetting HCD\n");

  reply	other threads:[~2026-05-03 19:31 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  1:48 [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Desnes Nunes
2026-04-30  8:48 ` Michal Pecio
2026-04-30 17:27   ` Desnes Nunes
2026-04-30 21:54     ` Michal Pecio
2026-05-01 14:09       ` Desnes Nunes
2026-05-02  9:46         ` [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Michal Pecio
2026-05-02 11:38           ` Desnes Nunes
2026-05-02 21:55             ` Michal Pecio
2026-05-03  3:36               ` Desnes Nunes
2026-05-03  5:17                 ` Michal Pecio
2026-05-03 16:20                   ` Desnes Nunes
2026-05-03 19:31                     ` Michal Pecio [this message]
2026-05-04  7:31                       ` Michal Pecio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260503213111.117db3a1.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=desnesn@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox