All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathias Nyman <mathias.nyman@intel.com>
To: Roy Luo <royluo@google.com>,
	Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: Udipto Goswami <udipto.goswami@oss.qualcomm.com>,
	quic_ugoswami@quicinc.com, Thinh.Nguyen@synopsys.com,
	gregkh@linuxfoundation.org, michal.pecio@gmail.com,
	linux-usb@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper"
Date: Thu, 22 May 2025 15:24:04 +0300	[thread overview]
Message-ID: <6bfee225-7519-41ab-8ae9-99267c5ce06e@intel.com> (raw)
In-Reply-To: <CA+zupgyU2czaczPcqavYBi=NrPqKqgp7SbrUocy0qbJ0m9np6g@mail.gmail.com>

On 22.5.2025 5.21, Roy Luo wrote:
>>>> Udipto Goswami, can you recall the platforms that needed this workaroud?
>>>> and do we have an easy way to detect those?
>>>
>>> Hi Mathias,
>>>
>>>   From what I recall, we saw this issue coming up on our QCOM mobile
>>> platforms but it was not consistent. It was only reported in long runs
>>> i believe. The most recent instance when I pushed this patch was with
>>> platform SM8650, it was a watchdog timeout issue where xhci_reset() ->
>>> xhci_handshake() polling read timeout upon xhci remove. Unfortunately
>>> I was not able to simulate the scenario for more granular testing and
>>> had validated it with long hours stress testing.
>>> The callstack was like so:
>>>
>>> Full call stack on core6:
>>> -000|readl([X19] addr = 0xFFFFFFC03CC08020)
>>> -001|xhci_handshake(inline)
>>> -001|xhci_reset([X19] xhci = 0xFFFFFF8942052250, [X20] timeout_us = 10000000)
>>> -002|xhci_resume([X20] xhci = 0xFFFFFF8942052250, [?] hibernated = ?)
>>> -003|xhci_plat_runtime_resume([locdesc] dev = ?)
>>> -004|pm_generic_runtime_resume([locdesc] dev = ?)
>>> -005|__rpm_callback([X23] cb = 0xFFFFFFE3F09307D8, [X22] dev =
>>> 0xFFFFFF890F619C10)
>>> -006|rpm_callback(inline)
>>> -006|rpm_resume([X19] dev = 0xFFFFFF890F619C10,
>>> [NSD:0xFFFFFFC041453AD4] rpmflags = 4)
>>> -007|__pm_runtime_resume([X20] dev = 0xFFFFFF890F619C10, [X19] rpmflags = 4)
>>> -008|pm_runtime_get_sync(inline)
>>> -008|xhci_plat_remove([X20] dev = 0xFFFFFF890F619C00)
>>
>> Thank you for clarifying this.
>>
>> So patch avoids the long timeout by always cutting xhci reinit path short in
>> xhci_resume() if resume was caused by pm_runtime_get_sync() call in
>> xhci_plat_remove()
>>
>> void xhci_plat_remove(struct platform_device *dev)
>> {
>>          xhci->xhc_state |= XHCI_STATE_REMOVING;
>>          pm_runtime_get_sync(&dev->dev);
>>          ...
>> }
>>
>> I think we can revert this patch, and just make sure that we don't reset the
>> host in the reinit path of xhci_resume() if XHCI_STATE_REMOVING is set.
>> Just return immediately instead.
>>
> 
> Just to be sure, are you proposing that we skip xhci_reset() within
> the reinit path
> of xhci_resume()? If we do that, could that lead to issues with
> subsequent operations
> in the reinit sequence, such as xhci_init() or xhci_run()?

I suggest to only skip xhci_reset in xhci_resume() if XHCI_STATE_REMOVING is set.

This should be similar to what is going on already.

xhci_reset() currently returns -ENODEV if XHCI_STATE_REMOVING is set, unless reset
completes extremely fast. xhci_resume() bails out if xhci_reset() returns error:

xhci_resume()
   ...
   if (power_lost) {
     ...
     retval = xhci_reset(xhci, XHCI_RESET_LONG_USEC);
     spin_unlock_irq(&xhci->lock);
     if (retval)
       return retval;
> 
> Do you prefer to group the change to skip xhci_reset() within the
> reinit path together
> with this revert? or do you want it to be sent and reviewed separately?

First a patch that bails out from xhci_resume() if XHCI_STATE_REMOVING is set
and we are in the reinit (power_lost) path about to call xhci_reset();

Then a second patch that reverts 6ccb83d6c497 ("usb: xhci: Implement
xhci_handshake_check_state()

Does this sound reasonable?

should avoid the QCOM 10sec watchdog issue as next xhci_rest() called
in xhci_remove() path has a short 250ms timeout, and ensure the
SNPS DWC3 USB regression won't trigger.

Thanks
Mathias


  reply	other threads:[~2025-05-22 12:24 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-17  4:39 [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper" Roy Luo
2025-05-19 12:52 ` Mathias Nyman
2025-05-19 18:13   ` Udipto Goswami
2025-05-19 22:32     ` Michał Pecio
2025-05-20 12:30       ` Udipto Goswami
2025-05-20 16:18     ` Mathias Nyman
2025-05-22  2:21       ` Roy Luo
2025-05-22 12:24         ` Mathias Nyman [this message]
2025-05-22 19:19           ` Roy Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6bfee225-7519-41ab-8ae9-99267c5ce06e@intel.com \
    --to=mathias.nyman@intel.com \
    --cc=Thinh.Nguyen@synopsys.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@linux.intel.com \
    --cc=michal.pecio@gmail.com \
    --cc=quic_ugoswami@quicinc.com \
    --cc=royluo@google.com \
    --cc=stable@vger.kernel.org \
    --cc=udipto.goswami@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.