Re: [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper"

public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed

From: Mathias Nyman <mathias.nyman@intel.com>
To: Roy Luo <royluo@google.com>,
	Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Udipto Goswami <udipto.goswami@oss.qualcomm.com>,
	quic_ugoswami@quicinc.com, Thinh.Nguyen@synopsys.com,
	gregkh@linuxfoundation.org, michal.pecio@gmail.com,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper"
Date: Thu, 22 May 2025 15:24:04 +0300	[thread overview]
Message-ID: <6bfee225-7519-41ab-8ae9-99267c5ce06e@intel.com> (raw)
In-Reply-To: <CA+zupgyU2czaczPcqavYBi=NrPqKqgp7SbrUocy0qbJ0m9np6g@mail.gmail.com>

On 22.5.2025 5.21, Roy Luo wrote:
>>>> Udipto Goswami, can you recall the platforms that needed this workaroud?
>>>> and do we have an easy way to detect those?
>>>
>>> Hi Mathias,
>>>
>>>   From what I recall, we saw this issue coming up on our QCOM mobile
>>> platforms but it was not consistent. It was only reported in long runs
>>> i believe. The most recent instance when I pushed this patch was with
>>> platform SM8650, it was a watchdog timeout issue where xhci_reset() ->
>>> xhci_handshake() polling read timeout upon xhci remove. Unfortunately
>>> I was not able to simulate the scenario for more granular testing and
>>> had validated it with long hours stress testing.
>>> The callstack was like so:
>>>
>>> Full call stack on core6:
>>> -000|readl([X19] addr = 0xFFFFFFC03CC08020)
>>> -001|xhci_handshake(inline)
>>> -001|xhci_reset([X19] xhci = 0xFFFFFF8942052250, [X20] timeout_us = 10000000)
>>> -002|xhci_resume([X20] xhci = 0xFFFFFF8942052250, [?] hibernated = ?)
>>> -003|xhci_plat_runtime_resume([locdesc] dev = ?)
>>> -004|pm_generic_runtime_resume([locdesc] dev = ?)
>>> -005|__rpm_callback([X23] cb = 0xFFFFFFE3F09307D8, [X22] dev =
>>> 0xFFFFFF890F619C10)
>>> -006|rpm_callback(inline)
>>> -006|rpm_resume([X19] dev = 0xFFFFFF890F619C10,
>>> [NSD:0xFFFFFFC041453AD4] rpmflags = 4)
>>> -007|__pm_runtime_resume([X20] dev = 0xFFFFFF890F619C10, [X19] rpmflags = 4)
>>> -008|pm_runtime_get_sync(inline)
>>> -008|xhci_plat_remove([X20] dev = 0xFFFFFF890F619C00)
>>
>> Thank you for clarifying this.
>>
>> So patch avoids the long timeout by always cutting xhci reinit path short in
>> xhci_resume() if resume was caused by pm_runtime_get_sync() call in
>> xhci_plat_remove()
>>
>> void xhci_plat_remove(struct platform_device *dev)
>> {
>>          xhci->xhc_state |= XHCI_STATE_REMOVING;
>>          pm_runtime_get_sync(&dev->dev);
>>          ...
>> }
>>
>> I think we can revert this patch, and just make sure that we don't reset the
>> host in the reinit path of xhci_resume() if XHCI_STATE_REMOVING is set.
>> Just return immediately instead.
>>
> 
> Just to be sure, are you proposing that we skip xhci_reset() within
> the reinit path
> of xhci_resume()? If we do that, could that lead to issues with
> subsequent operations
> in the reinit sequence, such as xhci_init() or xhci_run()?

I suggest to only skip xhci_reset in xhci_resume() if XHCI_STATE_REMOVING is set.

This should be similar to what is going on already.

xhci_reset() currently returns -ENODEV if XHCI_STATE_REMOVING is set, unless reset
completes extremely fast. xhci_resume() bails out if xhci_reset() returns error:

xhci_resume()
   ...
   if (power_lost) {
     ...
     retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
     spin_unlock_irq(&xhci->lock);
     if (retval)
       return retval;
> 
> Do you prefer to group the change to skip xhci_reset() within the
> reinit path together
> with this revert? or do you want it to be sent and reviewed separately?

First a patch that bails out from xhci_resume() if XHCI_STATE_REMOVING is set
and we are in the reinit (power_lost) path about to call xhci_reset();

Then a second patch that reverts 6ccb83d6c497 ("usb: xhci: Implement
xhci_handshake_check_state()

Does this sound reasonable?

should avoid the QCOM 10sec watchdog issue as next xhci_rest() called
in xhci_remove() path has a short 250ms timeout, and ensure the
SNPS DWC3 USB regression won't trigger.

Thanks
Mathias

next prev parent reply	other threads:[~2025-05-22 12:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-17  4:39 [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper" Roy Luo
2025-05-19 12:52 ` Mathias Nyman
2025-05-19 18:13   ` Udipto Goswami
2025-05-19 22:32     ` Michał Pecio
2025-05-20 12:30       ` Udipto Goswami
2025-05-20 16:18     ` Mathias Nyman
2025-05-22  2:21       ` Roy Luo
2025-05-22 12:24         ` Mathias Nyman [this message]
2025-05-22 19:19           ` Roy Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bfee225-7519-41ab-8ae9-99267c5ce06e@intel.com \
    --to=mathias.nyman@intel.com \
    --cc=Thinh.Nguyen@synopsys.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=michal.pecio@gmail.com \
    --cc=quic_ugoswami@quicinc.com \
    --cc=royluo@google.com \
    --cc=stable@vger.kernel.org \
    --cc=udipto.goswami@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox