linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>
To: Matthew W Carlis <mattc@purestorage.com>
Cc: ashishk@purestorage.com, bhelgaas@google.com,
	linux-pci@vger.kernel.org,  macro@orcam.me.uk,
	msaggi@purestorage.com, sconnor@purestorage.com
Subject: Re: [PATCH v2 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824
Date: Fri, 4 Jul 2025 13:20:24 +0300 (EEST)	[thread overview]
Message-ID: <62c702a7-ce9b-21b8-c30e-a556771b987f@linux.intel.com> (raw)
In-Reply-To: <20250703235316.17920-1-mattc@purestorage.com>

[-- Attachment #1: Type: text/plain, Size: 2894 bytes --]

On Thu, 3 Jul 2025, Matthew W Carlis wrote:

> On Thu, 3 Jul 2025, Ilpo Järvinen wrote:
> > Is this mainly related to some artificial test that rapidly fires event 
> > after another (which is known to confuse the quirk)? ...I mean, you say 
> > "extremely likely".
> 
> I wouldn't describe the test as "rapidly fires" of events because we have given
> conservative delays between injections (waiting for DLLA & being able to perform
> IO to the nvme block device before potentially injecting again).

Okay, I asked this because I saw one other test which did hotplug add & 
remove in millisecond timescales which was way too fast for hotplug driver 
to keep up (and thus it couldn't reset LBMS often enough).

The other question still stands though, why is LBMS is not reset? Perhaps 
DPC should clear LBMS in some places (that is, call pcie_reset_lbms()). 
Have you consider that?

(It sound to me you're having this occur in multiple scenarios and I've 
some trouble on figuring those out from your long descriptions what those 
exactly are so it's bit challenging for me to suggest where it should be 
done but I the surprise down certainly seems like case where LBMS 
information must have become stale so it should be reset which would 
prevent quirk from setting 2.5GT/s).

> In any case
> the testing results are clearly worse when moving from a kernel that didn't
> have the quirk to a kernel that does which is a regression in my mind.
> 
> > I suppose when the problem occurs and the bridge remains at 2.5GT/s, is it 
> > possible to restore the higher speed using the pcie_cooling device 
> > associated with the bridge / bwctrl? You can find the correct cooling 
> > device with this:
> 
> Yes the problem is when a device is forced to 2.5GT/s and it should not have
> been. I did not test with the patches for CONFIG_PCIE_THERMAL because our drives
> would not need thermal management by the kernel,

Fine, but all it technically does is exposes interface to bwctrl set speed 
API, the pcie_thermal driver itself agnostic to whether userspace uses 
that for thermal management or some other purpose. There's no kernel side 
thermal management in that driver. It was just more natural to expose it 
there than inside PCI subsystem.

> but if I use "setpci" to
> restore TLS & then write the link retrain bit the link would arrive at the
> maximum speed (Gen3/Gen4/Gen5 depending).
>
> I have other vendor drives as well, but we design and build our own drives
> with our own firmware & therefore are able to determine from firmware logging
> in the drive when the link was most likely guided to 2.5GT/s by TLS. We are
> also able to see the 2.5GT/s value in the TLS register when it happens. I have
> less visibility into drives from other vendors in terms of ltssm transitions
> without hooking up an analyzer.
> 

-- 
 i.

  reply	other threads:[~2025-07-04 10:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02  5:24 [PATCH v2 0/1] PCI: pcie_failed_link_retrain() return if dev is not ASM2824 Matthew W Carlis
2025-07-02  5:24 ` [PATCH v2 1/1] " Matthew W Carlis
2025-07-03 12:11   ` Ilpo Järvinen
2025-07-03 23:53     ` [PATCH v2 0/1] " Matthew W Carlis
2025-07-04 10:20       ` Ilpo Järvinen [this message]
2025-07-08 22:49         ` Matthew W Carlis
2025-07-09  9:45           ` Ilpo Järvinen
2025-07-09 18:52             ` Matthew W Carlis
2025-07-09 20:27               ` Matthew W Carlis
2025-07-11 13:46               ` Ilpo Järvinen
2025-07-16 13:01             ` Maciej W. Rozycki
2025-07-23 19:13               ` Matthew W Carlis
2025-08-01 16:04                 ` Maciej W. Rozycki
2025-08-15  0:35                 ` Matthew W Carlis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62c702a7-ce9b-21b8-c30e-a556771b987f@linux.intel.com \
    --to=ilpo.jarvinen@linux.intel.com \
    --cc=ashishk@purestorage.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=macro@orcam.me.uk \
    --cc=mattc@purestorage.com \
    --cc=msaggi@purestorage.com \
    --cc=sconnor@purestorage.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).