From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Michal Pecio <michal.pecio@gmail.com>,
Mathias Nyman <mathias.nyman@intel.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] usb: xhci: Fix host controllers "dying" after suspend and resume
Date: Tue, 4 Mar 2025 13:20:55 +0200 [thread overview]
Message-ID: <855ed817-03da-4e6f-9c4a-898edfcafedd@linux.intel.com> (raw)
In-Reply-To: <20250304085139.4610e8ff@foxbook>
On 4.3.2025 9.51, Michal Pecio wrote:
> A recent cleanup went a bit too far and dropped clearing the cycle bit
> of link TRBs, so it stays different from the rest of the ring half of
> the time. Then a race occurs: if the xHC reaches such link TRB before
> more commands are queued, the link's cycle bit uintentionally matches
> the xHC's cycle so it follows the link and waits for further commands.
> If more commands are queued before the xHC gets there, inc_enq() flips
> the bit so the xHC later sees a mismatch and stops executing commands.
>
> This function is called before suspend and 50% of times after resuming
> the xHC is doomed to get stuck sooner or later. Then some Stop Endpoint
> command fails to complete in 5 seconds and this shows up
>
> xhci_hcd 0000:00:10.0: xHCI host not responding to stop endpoint command
> xhci_hcd 0000:00:10.0: xHCI host controller not responding, assume dead
> xhci_hcd 0000:00:10.0: HC died; cleaning up
>
> followed by loss of all USB decives on the affected bus. That's if you
> are lucky, because if Set Deq gets stuck instead, the failure is silent.
>
> Likely responsible for kernel bug 219824. I found this while searching
> for possible causes of that regression and reproduced it locally before
> hearing back from the reporter. To repro, simply wait for link cycle to
> become set (debugfs), then suspend, resume and wait. To accelerate the
> failure I used a script which repeatedly starts and stops a UVC camera.
>
> Some HCs get fully reinitialized on resume and they are not affected.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219824
> Fixes: 36b972d4b7ce ("usb: xhci: improve xhci_clear_command_ring()")
Very nice debugging, did not suspect or consider that.
Thanks
Mathias
prev parent reply other threads:[~2025-03-04 11:19 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-04 7:51 [PATCH] usb: xhci: Fix host controllers "dying" after suspend and resume Michal Pecio
2025-03-04 11:20 ` Mathias Nyman [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=855ed817-03da-4e6f-9c4a-898edfcafedd@linux.intel.com \
--to=mathias.nyman@linux.intel.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=mathias.nyman@intel.com \
--cc=michal.pecio@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.