From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org,
"Maciej W. Rozycki" <macro@orcam.me.uk>,
LKML <linux-kernel@vger.kernel.org>,
Mika Westerberg <mika.westerberg@linux.intel.com>
Subject: Re: [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk
Date: Tue, 30 Jan 2024 13:53:04 +0200 (EET) [thread overview]
Message-ID: <aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com> (raw)
In-Reply-To: <20240129184354.GA470131@bhelgaas>
[-- Attachment #1: Type: text/plain, Size: 3339 bytes --]
On Mon, 29 Jan 2024, Bjorn Helgaas wrote:
> On Mon, Jan 29, 2024 at 01:27:09PM +0200, Ilpo Järvinen wrote:
> > While a device is runtime suspended along with its PCIe hierarchy, the
> > device could get disconnected. Because of the suspend, the device
> > disconnection cannot be detected until portdrv/hotplug have resumed. On
> > runtime resume, pcie_wait_for_link_delay() is called:
> >
> > pci_pm_runtime_resume()
> > pci_pm_bridge_power_up_actions()
> > pci_bridge_wait_for_secondary_bus()
> > pcie_wait_for_link_delay()
> >
> > Because the device is already disconnected, this results in cascading
> > failures:
> >
> > 1. pcie_wait_for_link_status() returns -ETIMEDOUT.
> >
> > 2. After the commit a89c82249c37 ("PCI: Work around PCIe link
> > training failures"),
>
> I this this also depends on the merge resolution in 1abb47390350
> ("Merge branch 'pci/enumeration'"). Just looking at a89c82249c37 in
> isolation suggests that pcie_wait_for_link_status() returning
> -ETIMEDOUT would not cause pcie_wait_for_link_delay() to call
> pcie_failed_link_retrain().
I was aware of the merge but I seem to have somehow misanalyzed the return
values earlier since I cannot anymore reach my earlier conclusion and now
ended up agreeing with your analysis that 1abb47390350 broke it.
That would imply there is a logic error in 1abb47390350 in addition to
the LBMS-logic problem in a89c82249c37 my patch is fixing... However, I
cannot pinpoint a single error because there seems to be more than one in
the whole code.
First of all, this is not true for pcie_failed_link_retrain():
* Return TRUE if the link has been successfully retrained, otherwise FALSE.
If LBMS is not set, the Target Speed quirk is not applied but the function
still returns true. I think that should be changed to early return false
when no LBMS is present.
But if I make that change, then pcie_wait_for_link_delay() will do
msleep() + return true, and pci_bridge_wait_for_secondary_bus() will call
long ~60s pci_dev_wait().
I'll try to come up another patch to cleanup all that return logic so that
it actually starts to make some sense.
> > pcie_failed_link_retrain() spuriously detects
> > this failure as a Link Retraining failure and attempts the Target
> > Speed trick, which also fails.
>
> Based on the comment below, I guess "Target Speed trick" probably
> refers to the "retrain at 2.5GT/s, then remove the speed restriction
> and retrain again" part of pcie_failed_link_retrain() (which I guess
> is basically the entire point of the function)?
Yes. I'll change the wording slightly to make it more obvious and put
(Target Speed quirk) into parenthesis so I can use it below.
> > 3. pci_bridge_wait_for_secondary_bus() then calls pci_dev_wait() which
> > cannot succeed (but waits ~1 minute, delaying the resume).
> >
> > The Target Speed trick (in step 2) is only used if LBMS bit (PCIe r6.1
> > sec 7.5.3.8) is set. For links that have been operational before
> > suspend, it is well possible that LBMS has been set at the bridge and
> > remains on. Thus, after resume, LBMS does not indicate the link needs
> > the Target Speed quirk. Clear LBMS on resume for bridges to avoid the
> > issue.
--
i.
next prev parent reply other threads:[~2024-01-30 11:53 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-29 11:27 [PATCH 0/2] PCI: Fix disconnect related issues Ilpo Järvinen
2024-01-29 11:27 ` [PATCH 1/2] PCI: Clear LBMS on resume to avoid Target Speed quirk Ilpo Järvinen
2024-01-29 18:43 ` Bjorn Helgaas
2024-01-30 11:53 ` Ilpo Järvinen [this message]
2024-01-30 16:41 ` Maciej W. Rozycki
2024-01-30 17:33 ` Ilpo Järvinen
2024-02-01 9:47 ` Ilpo Järvinen
2024-02-01 18:49 ` Maciej W. Rozycki
2024-02-02 15:27 ` Ilpo Järvinen
2024-02-07 12:33 ` Ilpo Järvinen
2024-08-09 13:25 ` Maciej W. Rozycki
2024-08-09 15:55 ` Ilpo Järvinen
2024-08-12 11:59 ` Maciej W. Rozycki
2024-02-12 17:56 ` Maciej W. Rozycki
2024-01-29 11:27 ` [PATCH 2/2] PCI: Do not wait for disconnected devices when resuming Ilpo Järvinen
2024-01-29 18:55 ` Bjorn Helgaas
2024-01-30 13:15 ` Ilpo Järvinen
2024-02-02 17:03 ` Ilpo Järvinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aa2d1c4e-9961-d54a-00c7-ddf8e858a9b0@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=helgaas@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=macro@orcam.me.uk \
--cc=mika.westerberg@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox