From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Matthew W Carlis <mattc@purestorage.com>
Cc: ashishk@purestorage.com, bhelgaas@google.com,
linux-pci@vger.kernel.org, macro@orcam.me.uk,
msaggi@purestorage.com, sconnor@purestorage.com
Subject: Re: [PATCH v2 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
Date: Wed, 9 Jul 2025 12:45:35 +0300 (EEST) [thread overview]
Message-ID: <2b72378d-a8c1-56b1-3dbb-142eb4c7f302@linux.intel.com> (raw)
In-Reply-To: <20250708224917.7386-1-mattc@purestorage.com>
[-- Attachment #1: Type: text/plain, Size: 3929 bytes --]
On Tue, 8 Jul 2025, Matthew W Carlis wrote:
> On Fri, 4 Jul 2025, Ilpo Järvinen wrote:
> > The other question still stands though, why is LBMS is not reset? Perhaps
> > DPC should clear LBMS in some places (that is, call pcie_reset_lbms()).
> > Have you consider that?
>
> Initially we started to observe this when physically removing and
> reinserting devices in a kernel version with the quirk, but without the bandwidth
> controller driver. I think there is a problem with any place where the link
> would be expected to go down (dpc, hpc, etc) & then carrying forward LBMS
> into the next time the link comes up.
Are you saying there's still a problem in hpc? Since the introduction of
bwctrl, remove_board() in pciehp has had pcie_reset_lbms() (or it's
equivalent).
As I already mentioned, for DPC I agree, it likely should reset LBMS
somewhere.
We also clear LBMS after retraining to not retain that LBMS beyond the
completion of the retraining.
What other things are included into that "etc"?
> Should it not matter how long ago LBMS
> was asserted before we invoke a TLS modification?
To some extent, yes, which is why we call pcie_reset_lbms() in a few
places.
> It also looks like card
> presence is enough for the kernel to believe the link should train & enter
> the quirk function without ever having seen LNKSTA_DLLLA or LNKSTA_LT.
Without LBMS that won't do anything in the quirk (except try raise the
Link Speed if it's the particular device on the whitelist).
> I wonder if it shouldn't have to see some kind of actual link activity
> as a prereq to entering the quirk.
How would you observe that "link activity"? Doesn't LBMS itself imply
"link activity" occurred?
Any good suggestions how to realize that check more precisely to
differentiate if there was some link activity or not?
> > (It sound to me you're having this occur in multiple scenarios and I've
> > some trouble on figuring those out from your long descriptions what those
> > exactly are so it's bit challenging for me to suggest where it should be
> > done but I the surprise down certainly seems like case where LBMS
> > information must have become stale so it should be reset which would
> > prevent quirk from setting 2.5GT/s)
>
> Something I found recently that was interesting - when I power off
> a slot (triggering DPC via SDES) the LBMS becomes set on Intel Root Ports,
> but in another server with a PCIe switch LBMS does not become set on the
> switch DSP if I perform the same action. I don't have any explanation for
> this difference other than "vendor specific" behavior.
If you'd try this on different generations of Intel RP, you'd likely see
variations there too, that's my experience when testing bwctrl.
E.g., on some platforms, I see LBMS asserted twice from single retraining
(after a TLS change). One when still having LT=1 and the other after LT=0.
(I don't have explanation to that behavior.)
> One thing that honestly doesn't make any sense to me is the ID list in the
> quirk. If the link comes up after forcing to Gen1 then it would only restore
> TLS if the device is the ASMedia switch, but also ignoring what device is
> detected downstream. If we allow ASMedia to restore the speed for any downstream
> device when we only saw the initial issue with the Pericom switch then why
> do we exclude Intel Root Ports or AMD Root Ports or any other bridge from the
> list which did not have any issues reported.
I think it's because the restore has been tested on that device
(whitelist).
Your reasoning is based on assumption that TLS quirk setting Link Speed
to 2.5GT/s is part of "normal" operation. My view is that those
triggerings are caused by not clearing stale LBMS in the right places. If
LBMS is not wrongly kept, the quirk is no-op on all but that ID listed
device.
--
i.
next prev parent reply other threads:[~2025-07-09 9:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-02 5:24 [PATCH v2 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824 Matthew W Carlis
2025-07-02 5:24 ` [PATCH v2 1/1] " Matthew W Carlis
2025-07-03 12:11 ` Ilpo Järvinen
2025-07-03 23:53 ` [PATCH v2 0/1] " Matthew W Carlis
2025-07-04 10:20 ` Ilpo Järvinen
2025-07-08 22:49 ` Matthew W Carlis
2025-07-09 9:45 ` Ilpo Järvinen [this message]
2025-07-09 18:52 ` Matthew W Carlis
2025-07-09 20:27 ` Matthew W Carlis
2025-07-11 13:46 ` Ilpo Järvinen
2025-07-16 13:01 ` Maciej W. Rozycki
2025-07-23 19:13 ` Matthew W Carlis
2025-08-01 16:04 ` Maciej W. Rozycki
2025-08-15 0:35 ` Matthew W Carlis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=2b72378d-a8c1-56b1-3dbb-142eb4c7f302@linux.intel.com \
--to=ilpo.jarvinen@linux.intel.com \
--cc=ashishk@purestorage.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
--cc=macro@orcam.me.uk \
--cc=mattc@purestorage.com \
--cc=msaggi@purestorage.com \
--cc=sconnor@purestorage.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).