public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Michal Pecio <michal.pecio@gmail.com>
To: Desnes Nunes <desnesn@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-usb@vger.kernel.org,
	gregkh@linuxfoundation.org, mathias.nyman@intel.com,
	stable@vger.kernel.org
Subject: Re: [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock
Date: Thu, 30 Apr 2026 23:54:53 +0200	[thread overview]
Message-ID: <20260430235453.2288c973.michal.pecio@gmail.com> (raw)
In-Reply-To: <CACaw+exdPSVSfdAob7+d-xH=JEjBbPpY_z1cPPU6rzXx4wUZpA@mail.gmail.com>

On Thu, 30 Apr 2026 14:27:59 -0300, Desnes Nunes wrote:
> As for how I saw HSE, while testing the patch before submission, since
> I already had the xhci lock, I just added a read of the usbsts before
> calling xhci_hc_died(xhci):
> 
> ...
> -       wait_for_completion(command->completion);
> -       slot_id = command->slot_id;
> +       if (!wait_for_completion_timeout(command->completion,
> +                                        msecs_to_jiffies(2 *
> command->timeout_ms))) {
> +        spin_lock_irqsave(&xhci->lock, tflags);
> +        usbsts = readl(&xhci->op_regs->status);
> +        xhci_err(xhci,
> +            "TRB_ENABLE_SLOT: no command completion after %lums, USBSTS:%s\n",
> +            2 * command->timeout_ms,
> +            xhci_decode_usbsts(ststr, usbsts));
> +        xhci_hc_died(xhci);
> +        spin_unlock_irqrestore(&xhci->lock, tflags);
> +    }
> ...
> 
> This debug version of the patch printed:
> 
> [   17.481330] xhci_hcd 0000:80:14.0: TRB_ENABLE_SLOT: no command
> completion after 10000ms, USBSTS: 0x00000015 HCHalted HSE PCD

OK, so this chip is busted at that point. But it might still be better
to improve xhci_handle_command_timeout() to deal with this and complete
the command, instead of patching here and in other similar places.

> Actually, from the beginning of all my debugging I already had
> `usbcore.dyndbg=+p xhci_hcd.dyndbg=+p xhci_pci.dyndbg=+p` on the
> kernel cmdline, as well as on the crashkernel's
> KDUMP_COMMANDLINE_APPEND at /etc/sysconfig/kdump.
> 
> On crashkernel's kexec-dmesg of the unpatched kernel I see multiple
> doorbell rings stating the HSE:
> 
> ...
> [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Command timeout,
> USBSTS: 0x00000015 HCHalted HSE PCD
> [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Command timeout on
> stopped ring
> [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: Turn aborted command
> 000000005921b827 to no-op
> [Thu Apr 30 12:28:22 2026] xhci_hcd 0000:80:14.0: // Ding dong!
> ...

Hmm, the "Command timeout on stopped ring" case doesn't obviously lead
to any immediate command completion, and ringing the command doorbell
under HSE won't achieve any progress. Maybe that's the bug.

Could you post full crash kernel dmesg up to that point? Not sure how
it got to this place.

When xhci_handle_command_timeout() logs USBSTS, does it help to add:

if (usbsts & STS_FATAL) {
        xhci_halt(xhci);
        xhci_hc_died(xhci);
        goto time_out_completed;
}

It may not be perfect solution (race conditions?) but it could hint
that we are on the right track, if it works.

Regards,
Michal

  reply	other threads:[~2026-04-30 21:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-30  1:48 [PATCH] usb: xhci: bound wait command completion to avoid kdump deadlock Desnes Nunes
2026-04-30  8:48 ` Michal Pecio
2026-04-30 17:27   ` Desnes Nunes
2026-04-30 21:54     ` Michal Pecio [this message]
2026-05-01 14:09       ` Desnes Nunes
2026-05-02  9:46         ` [PATCH RFT RFC] usb: xhci: Kill hosts with HCE or HSE on command timeout Michal Pecio
2026-05-02 11:38           ` Desnes Nunes
2026-05-02 21:55             ` Michal Pecio
2026-05-03  3:36               ` Desnes Nunes
2026-05-03  5:17                 ` Michal Pecio
2026-05-03 16:20                   ` Desnes Nunes
2026-05-03 19:31                     ` Michal Pecio
2026-05-04  7:31                       ` Michal Pecio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260430235453.2288c973.michal.pecio@gmail.com \
    --to=michal.pecio@gmail.com \
    --cc=desnesn@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox