public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: Udipto Goswami <udipto.goswami@oss.qualcomm.com>
Cc: Roy Luo <royluo@google.com>,
	mathias.nyman@intel.com, quic_ugoswami@quicinc.com,
	Thinh.Nguyen@synopsys.com, gregkh@linuxfoundation.org,
	michal.pecio@gmail.com, linux-usb@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper"
Date: Tue, 20 May 2025 19:18:21 +0300	[thread overview]
Message-ID: <fbf92981-6601-4ee9-a494-718e322ac1b9@linux.intel.com> (raw)
In-Reply-To: <CAMTwNXB0QLP-b=RmLPtRJo=T_efN_3H4dd5AiMNYrJDXddJkMA@mail.gmail.com>

On 19.5.2025 21.13, Udipto Goswami wrote:
> On Mon, May 19, 2025 at 6:23 PM Mathias Nyman
> <mathias.nyman@linux.intel.com> wrote:
>>
>> On 17.5.2025 7.39, Roy Luo wrote:
>>> This reverts commit 6ccb83d6c4972ebe6ae49de5eba051de3638362c.
>>>
>>> Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state()
>>> helper") was introduced to workaround watchdog timeout issues on some
>>> platforms, allowing xhci_reset() to bail out early without waiting
>>> for the reset to complete.
>>>
>>> Skipping the xhci handshake during a reset is a dangerous move. The
>>> xhci specification explicitly states that certain registers cannot
>>> be accessed during reset in section 5.4.1 USB Command Register (USBCMD),
>>> Host Controller Reset (HCRST) field:
>>> "This bit is cleared to '0' by the Host Controller when the reset
>>> process is complete. Software cannot terminate the reset process
>>> early by writinga '0' to this bit and shall not write any xHC
>>> Operational or Runtime registers until while HCRST is '1'."
>>>
>>> This behavior causes a regression on SNPS DWC3 USB controller with
>>> dual-role capability. When the DWC3 controller exits host mode and
>>> removes xhci while a reset is still in progress, and then tries to
>>> configure its hardware for device mode, the ongoing reset leads to
>>> register access issues; specifically, all register reads returns 0.
>>> These issues extend beyond the xhci register space (which is expected
>>> during a reset) and affect the entire DWC3 IP block, causing the DWC3
>>> device mode to malfunction.
>>
>> I agree with you and Thinh that waiting for the HCRST bit to clear during
>> reset is the right thing to do, especially now when we know skipping it
>> causes issues for SNPS DWC3, even if it's only during remove phase.
>>
>> But reverting this patch will re-introduce the issue originally worked
>> around by Udipto Goswami, causing regression.
>>
>> Best thing to do would be to wait for HCRST to clear for all other platforms
>> except the one with the issue.
>>
>> Udipto Goswami, can you recall the platforms that needed this workaroud?
>> and do we have an easy way to detect those?
> 
> Hi Mathias,
> 
>  From what I recall, we saw this issue coming up on our QCOM mobile
> platforms but it was not consistent. It was only reported in long runs
> i believe. The most recent instance when I pushed this patch was with
> platform SM8650, it was a watchdog timeout issue where xhci_reset() ->
> xhci_handshake() polling read timeout upon xhci remove. Unfortunately
> I was not able to simulate the scenario for more granular testing and
> had validated it with long hours stress testing.
> The callstack was like so:
> 
> Full call stack on core6:
> -000|readl([X19] addr = 0xFFFFFFC03CC08020)
> -001|xhci_handshake(inline)
> -001|xhci_reset([X19] xhci = 0xFFFFFF8942052250, [X20] timeout_us = 10000000)
> -002|xhci_resume([X20] xhci = 0xFFFFFF8942052250, [?] hibernated = ?)
> -003|xhci_plat_runtime_resume([locdesc] dev = ?)
> -004|pm_generic_runtime_resume([locdesc] dev = ?)
> -005|__rpm_callback([X23] cb = 0xFFFFFFE3F09307D8, [X22] dev =
> 0xFFFFFF890F619C10)
> -006|rpm_callback(inline)
> -006|rpm_resume([X19] dev = 0xFFFFFF890F619C10,
> [NSD:0xFFFFFFC041453AD4] rpmflags = 4)
> -007|__pm_runtime_resume([X20] dev = 0xFFFFFF890F619C10, [X19] rpmflags = 4)
> -008|pm_runtime_get_sync(inline)
> -008|xhci_plat_remove([X20] dev = 0xFFFFFF890F619C00)

Thank you for clarifying this.

So patch avoids the long timeout by always cutting xhci reinit path short in
xhci_resume() if resume was caused by pm_runtime_get_sync() call in
xhci_plat_remove()

void xhci_plat_remove(struct platform_device *dev)
{
	xhci->xhc_state |= XHCI_STATE_REMOVING;
	pm_runtime_get_sync(&dev->dev);
	...
}

I think we can revert this patch, and just make sure that we don't reset the
host in the reinit path of xhci_resume() if XHCI_STATE_REMOVING is set.
Just return immediately instead.

xhci_reset() will be called with a shorter timeout later in the remove path

Not entirely sure remove path needs to call pm_runtime_get_sync().
I think it just tries to prevent runtime suspend/resume from racing with remove.
PCI code seems to call pm_runtime_get_noresume() in remove path instead.

Thanks
Mathias

  parent reply	other threads:[~2025-05-20 16:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-17  4:39 [PATCH v1] Revert "usb: xhci: Implement xhci_handshake_check_state() helper" Roy Luo
2025-05-19 12:52 ` Mathias Nyman
2025-05-19 18:13   ` Udipto Goswami
2025-05-19 22:32     ` Michał Pecio
2025-05-20 12:30       ` Udipto Goswami
2025-05-20 16:18     ` Mathias Nyman [this message]
2025-05-22  2:21       ` Roy Luo
2025-05-22 12:24         ` Mathias Nyman
2025-05-22 19:19           ` Roy Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fbf92981-6601-4ee9-a494-718e322ac1b9@linux.intel.com \
    --to=mathias.nyman@linux.intel.com \
    --cc=Thinh.Nguyen@synopsys.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=mathias.nyman@intel.com \
    --cc=michal.pecio@gmail.com \
    --cc=quic_ugoswami@quicinc.com \
    --cc=royluo@google.com \
    --cc=stable@vger.kernel.org \
    --cc=udipto.goswami@oss.qualcomm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox