From: Michal Pecio <michal.pecio@gmail.com>
To: Desnes Nunes <desnesn@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
gregkh@linuxfoundation.org, mathias.nyman@intel.com,
stable@vger.kernel.org
Subject: [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout
Date: Sat, 2 May 2026 11:46:44 +0200 [thread overview]
Message-ID: <20260502114644.76e6b5a3.michal.pecio@gmail.com> (raw)
In-Reply-To: <CACaw+ewwM_5eqyGW5=+THwHsYPs7u3NT096AFQdt6x4E6HcWtA@mail.gmail.com>
On Fri, 1 May 2026 11:09:27 -0300, Desnes Nunes wrote:
> On Thu, Apr 30, 2026 at 6:55 PM Michal Pecio <michal.pecio@gmail.com> wrote:
> > When xhci_handle_command_timeout() logs USBSTS, does it help to add:
> >
> > if (usbsts & STS_FATAL) {
> > xhci_halt(xhci);
> > xhci_hc_died(xhci);
> > goto time_out_completed;
> > }
> > It may not be perfect solution (race conditions?) but it could hint
> > that we are on the right track, if it works.
>
> This panicked the system as soon as I hit `echo c > /proc/sysrq-trigger`:
>
> [ 141.683476] sysrq: Trigger a crash
> [ 141.686970] Kernel panic - not syncing: sysrq triggered crash
Damn, that sucks. Any chance it's not a problem with my proposed change
but some sort of issue on your side?
Anyway, I think the patch below might cover it. It works for me in the
sense that the bus does get killed, without ill effect. I tested on
VL805 where HSE is easily triggered by disabling XHCI_TRB_OVERFETCH.
However, the patch isn't necessary here - VL805 doesn't clear CRCR.CRR
on HSE, so normal abort path is taken and times out, then hc_died().
Can somebody serious confirm if this issue actually exists in the first
place, and whether the patch solves it?
Hello Redhat, anyone alive there? Or only stochastic parrots?
Mathias, do you remember what's the point of the "Command timeout on
stopped ring" branch? Can it happen in any case other than dead chip?
I also wonder if it wouldn't make sense to just hc_died() on every
command timeout except Address Device. We rely on Stop Endpoint
timeouts to kill chips which go unresponsive without setting HCE/HSE,
because sooner or later somebody loses patience and unlinks an URB,
but this story (real or hallucinated, but plausible) shows that this
may not help when there are no devices created yet.
---
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index e5823650850a..3041deb67b57 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1761,13 +1761,15 @@ void xhci_handle_command_timeout(struct work_struct *work)
/* mark this command to be cancelled */
xhci->current_cmd->status = COMP_COMMAND_ABORTED;
- /* Make sure command ring is running before aborting it */
+ /* check for crashed or disconnected chip */
hw_ring_state = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
- if (hw_ring_state == ~(u64)0) {
+ if (hw_ring_state == ~(u64)0 || usbsts & (STS_FATAL | STS_HCE)) {
+ xhci_info(xhci, "kill the damn thing\n");
xhci_hc_died(xhci);
goto time_out_completed;
}
+ /* Make sure command ring is running before aborting it */
if ((xhci->cmd_ring_state & CMD_RING_STATE_RUNNING) &&
(hw_ring_state & CMD_RING_RUNNING)) {
/* Prevent new doorbell, and start command abort */
next prev parent reply other threads:[~2026-05-02 9:46 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-30 1:48 [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Desnes Nunes
2026-04-30 8:48 ` Michal Pecio
2026-04-30 17:27 ` Desnes Nunes
2026-04-30 21:54 ` Michal Pecio
2026-05-01 14:09 ` Desnes Nunes
2026-05-02 9:46 ` Michal Pecio [this message]
2026-05-02 11:38 ` [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Desnes Nunes
2026-05-02 21:55 ` Michal Pecio
2026-05-03 3:36 ` Desnes Nunes
2026-05-03 5:17 ` Michal Pecio
2026-05-03 16:20 ` Desnes Nunes
2026-05-03 19:31 ` Michal Pecio
2026-05-04 7:31 ` Michal Pecio
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260502114644.76e6b5a3.michal.pecio@gmail.com \
--to=michal.pecio@gmail.com \
--cc=desnesn@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox