From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 278A62D7BF for ; Sun, 3 May 2026 19:31:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.49 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777836679; cv=none; b=CoGDflknjfWYXTM/mQ3Bne+1Fhb7vWJluzlUdJjxuMDetZ76PTs5UM7GDawcxkjij4/H/mHZJtURstbTrcQzRZNvEW4Fa1Bj5zMZzSu9pSbBmogI5gQGxQViaK0I0bnZrtcaREQjLVzx040oGkVd0WAHSskGcOyksvPf1OtjHQg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777836679; c=relaxed/simple; bh=+jx8UdAo/Px/+Lfc+DgUdlcmWgo3ag3mFHpX05NWvxk=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Wn2cv5k3Lhv7Uk71tDLEsAPWG0vdTo4gyNTtwfgXINmiirrU17j0/117JKwgdX9yr7GGc+7mp+QMRme1TRZYy9MIm7LG6mnllaQH6jT2vEmzGoauTif/HYg14b1LU8NBlgeZOZp4MHnzdSjMY/b/hmmyTdWPul42DD3i2LgudMU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=sVKyevN9; arc=none smtp.client-ip=209.85.128.49 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="sVKyevN9" Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-488ba840146so28520075e9.1 for ; Sun, 03 May 2026 12:31:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1777836676; x=1778441476; darn=vger.kernel.org; h=mime-version:references:in-reply-to:message-id:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=WbIE169240FAGRdj7Q9npPRMFvNn3V89B4m8K3AWMUc=; b=sVKyevN9imiJQmHTtcN7Y5mudFfHxCKSaSI1QklDrXj5mdBtOFU5nm6BJKj3e3PEcp u/fUW2/ykrCXkx/yQrcZ4/XV5z49hnbn56eLbp8gnKqSzEotwUR/i895xf9URkPcHe2g lzQY++GpAgaxXIkLuRCIO4ZAk+bQZSaEF7Df4f9bHt98ePzeDEORFXvRXGL7viUTZvSa JmZ3Rkxa9zFh4OeIoXYNPJDmEOTPE8rrHrQ/j1CP3YsfLiaNusO1wdq94UJKYUSnjJCW lZTBbFyL83U8rWlV3RPijLL33xCx5ea/WXcaO9kzbFe9o+DgL4R1tXf+Hpnges3HzJDm 1ggA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1777836676; x=1778441476; h=mime-version:references:in-reply-to:message-id:subject:cc:to:from :date:x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WbIE169240FAGRdj7Q9npPRMFvNn3V89B4m8K3AWMUc=; b=gdT64evbZSYkC93KWKMi/82nOsQVu+pDb2thSQtxPKAPYxCgzWmo3Q+HMTTJ61mypL arWj9AmGTnnxtrKvIQkBFS+ELgYzC5TL5ApmB9k6SFFXbIEbt5uqrVcohKup4CCE0fgZ zQqBqSlF2xUZo196i1kFOk3owtUSlk7zgDU6WbVJgebGQ08AraR856j95A45LYniLXPT XXVZNa9IP4Bu+MFVvAltZq4u0Yex9XMFww5p3eNSqZ3FM2pZ4fGa/8ydMrgTchx+0cBb VUsb2nOdsfX72obcjdluOZhCCL92pFbG0BWdChbSl/hvOx0aSG91CNaAJyqttMR3dKjk wKkQ== X-Gm-Message-State: AOJu0YzOzc+utr2PLOyitQN/JL58oi0pEhibjaIvYRWd5uMmXrEG/jLg RdBPdpGU1ocPTsRraFuGDFiW3kHambtTBYq2p6lFSZQHtiOHfylKhtu8 X-Gm-Gg: AeBDieuoTH1ax8KzFWpA7XJ7FkxezfQneLxxYKQl8dLM0ol8OxBYwRA5yYsbx5I3KRL dn4VxICMdCdVXV0pgkOu48t9ktuDIxpQMgVObkoBTQ528voDlvL5pjyvYWaG9JYw2mp/XwTiWX4 xAq+ewuKFjdaueC4lIGhG+VBby0DnUmvsFyXeFq0qhybSMq5QPdvhskBrUWcz2n8u/jDkv+hot7 iBB3o118ysP7LPjNzpccUQx3mMN15vNrOKo6nv0R8oq0x0W7o43pmQ/V8WS24sYQtT7YhEjuUtk omgaWZ2ZBhmhoFHf8Srze7otEc6tssoEODjE29TfM0W9LgXEe29LpjH8bHBbjrce/0aAssBm8Wl Mf2AJHPYZTnYplSbo4imlPCpyYZWXC64ygwwB0QZj0QanAzYxh1fdOvXKK+EBoG76823XqmK+JG fid0g+GoHfx0Zv6jBGz64/J4T9x/IXZyvCYl6ZCajJt+0DZA== X-Received: by 2002:a05:600c:6211:b0:488:a723:ea53 with SMTP id 5b1f17b1804b1-48a9853a014mr125182365e9.7.1777836676453; Sun, 03 May 2026 12:31:16 -0700 (PDT) Received: from foxbook (bgt227.neoplus.adsl.tpnet.pl. [83.28.83.227]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-48a820c865esm260156445e9.5.2026.05.03.12.31.15 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Sun, 03 May 2026 12:31:16 -0700 (PDT) Date: Sun, 3 May 2026 21:31:11 +0200 From: Michal Pecio To: Desnes Nunes Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org, gregkh@linuxfoundation.org, mathias.nyman@intel.com, stable@vger.kernel.org Subject: Re: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Message-ID: <20260503213111.117db3a1.michal.pecio@gmail.com> In-Reply-To: References: <20260430014817.2006885-1-desnesn@redhat.com> <20260430104850.352bd946.michal.pecio@gmail.com> <20260430235453.2288c973.michal.pecio@gmail.com> <20260502114644.76e6b5a3.michal.pecio@gmail.com> <20260502235517.089ba5bf.michal.pecio@gmail.com> <20260503071749.6abda137.michal.pecio@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/RJPkfCvaUEZTz8gq3rnPbGG" --MP_/RJPkfCvaUEZTz8gq3rnPbGG Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline On Sun, 3 May 2026 13:20:38 -0300, Desnes Nunes wrote: > Yes, same patched binary on the main kernel and kdump kernel. That's not a great news because it seems that the same HSE could occur on any kexec, not just kdump. It's unclear why it happens, it seems that after initial boot the HC works normally (does it?) but then kexec-ing breaks it somehow. I don't think this has anything to do with the Battlemage, because in the particular case which you shared, GPU began initialization *after* HSE had already been logged. My first wild guess would be that HSE is caused by resetting IOMMU while the xHC is unaware of kexec and continuing to DMA old buffers. Attached patch checks for this and also tries to explicitly clear HSE, although resetting ought to clear it too. But HW has bugs... So it may not help, but maybe it will if we are lucky, or at least it may offer some hint about when things go wrong. > So, I confirm that this patch, which checks for HSE or HCE indeed > fixes the bug, without having to rely to a > wait_for_completion_timeout(): > > # grep -i HSE -A5 kexec-dmesg.log > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Command timeout, > USBSTS: 0x00000015 HCHalted HSE PCD > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: kill the damn thing > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: xHCI host controller > not responding, assume dead > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: HC died; cleaning up > [Sun May 3 11:37:36 2026] xhci_hcd 0000:80:14.0: Error while > assigning device slot ID: Command Aborted Thanks for testing, that's what the patch was intended to do. There is no lockup, but of course the chip doesn't work afterwards. Regards, Michal --MP_/RJPkfCvaUEZTz8gq3rnPbGG Content-Type: text/x-patch Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=xhci-clear-hse.patch diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 849a568d0e63..c0f3d04c6241 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -5492,6 +5492,8 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks) struct device *dev = hcd->self.sysdev; int retval; u32 hcs_params1; + u32 usbsts; + char str[XHCI_MSG_MAX]; /* Accept arbitrarily long scatter-gather lists */ hcd->self.sg_tablesize = ~0; @@ -5550,11 +5552,19 @@ int xhci_gen_setup(struct usb_hcd *hcd, xhci_get_quirks_t get_quirks) xhci->quirks |= XHCI_LINK_TRB_QUIRK; } + usbsts = readl(&xhci->op_regs->status); + xhci_info(xhci, "gen_setup old USBSTS %s\n", xhci_decode_usbsts(str, usbsts)); /* Make sure the HC is halted. */ retval = xhci_halt(xhci); if (retval) return retval; + usbsts = readl(&xhci->op_regs->status); + if (usbsts & STS_FATAL) + writel(STS_FATAL, &xhci->op_regs->status); + usbsts = readl(&xhci->op_regs->status); + xhci_info(xhci, "gen_setup new USBSTS %s\n", xhci_decode_usbsts(str, usbsts)); + xhci_zero_64b_regs(xhci); xhci_dbg(xhci, "Resetting HCD\n"); --MP_/RJPkfCvaUEZTz8gq3rnPbGG--