public inbox for linux-usb@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Michal Pecio <michal.pecio@gmail.com>,
	Mathias Nyman <mathias.nyman@intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] usb: xhci: Fix host controllers "dying" after suspend and resume
Date: Tue, 4 Mar 2025 13:20:55 +0200	[thread overview]
Message-ID: <855ed817-03da-4e6f-9c4a-898edfcafedd@linux.intel.com> (raw)
In-Reply-To: <20250304085139.4610e8ff@foxbook>

On 4.3.2025 9.51, Michal Pecio wrote:
> A recent cleanup went a bit too far and dropped clearing the cycle bit
> of link TRBs, so it stays different from the rest of the ring half of
> the time. Then a race occurs: if the xHC reaches such link TRB before
> more commands are queued, the link's cycle bit uintentionally matches
> the xHC's cycle so it follows the link and waits for further commands.
> If more commands are queued before the xHC gets there, inc_enq() flips
> the bit so the xHC later sees a mismatch and stops executing commands.
> 
> This function is called before suspend and 50% of times after resuming
> the xHC is doomed to get stuck sooner or later. Then some Stop Endpoint
> command fails to complete in 5 seconds and this shows up
> 
> xhci_hcd 0000:00:10.0: xHCI host not responding to stop endpoint command
> xhci_hcd 0000:00:10.0: xHCI host controller not responding, assume dead
> xhci_hcd 0000:00:10.0: HC died; cleaning up
> 
> followed by loss of all USB decives on the affected bus. That's if you
> are lucky, because if Set Deq gets stuck instead, the failure is silent.
> 
> Likely responsible for kernel bug 219824. I found this while searching
> for possible causes of that regression and reproduced it locally before
> hearing back from the reporter. To repro, simply wait for link cycle to
> become set (debugfs), then suspend, resume and wait. To accelerate the
> failure I used a script which repeatedly starts and stops a UVC camera.
> 
> Some HCs get fully reinitialized on resume and they are not affected.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219824
> Fixes: 36b972d4b7ce ("usb: xhci: improve xhci_clear_command_ring()")

Very nice debugging, did not suspect or consider that.

Thanks
Mathias


      reply	other threads:[~2025-03-04 11:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-04  7:51 [PATCH] usb: xhci: Fix host controllers "dying" after suspend and resume Michal Pecio
2025-03-04 11:20 ` Mathias Nyman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=855ed817-03da-4e6f-9c4a-898edfcafedd@linux.intel.com \
    --to=mathias.nyman@linux.intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=michal.pecio@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox