From: Matthew W Carlis <mattc@purestorage.com>
To: macro@orcam.me.uk
Cc: alex.williamson@redhat.com, bhelgaas@google.com,
davem@davemloft.net, david.abdurachmanov@gmail.com,
edumazet@google.com, helgaas@kernel.org, kuba@kernel.org,
leon@kernel.org, linux-kernel@vger.kernel.org,
linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, lukas@wunner.de,
mahesh@linux.ibm.com, mattc@purestorage.com,
mika.westerberg@linux.intel.com, netdev@vger.kernel.org,
npiggin@gmail.com, oohall@gmail.com, pabeni@redhat.com,
pali@kernel.org, saeedm@nvidia.com, sr@denx.de,
wilson@tuliptree.org
Subject: PCI: Work around PCIe link training failures
Date: Thu, 15 Aug 2024 13:40:59 -0600 [thread overview]
Message-ID: <20240815194059.28798-1-mattc@purestorage.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2408091356190.61955@angie.orcam.me.uk>
Sorry for the delay in my responses here I had some things get in my way.
On Fri, 9 Aug 2024 09:13:52 Oliver O'Halloran <oohall@gmail.com> wrote:
> Ok? If we have to check for DPC being enabled in addition to checking
> the surprise bit in the slot capabilities then that's fine, we can do
> that. The question to be answered here is: how should this feature
> work on ports where it's normal for a device to be removed without any
> notice?
I'm not sure if its the correct thing to check however. I assumed that ports
using the pciehp driver would usually consider it "normal" for a device to
be removed actually, but maybe I have the idea of hp reversed.
On Fri, 9 Aug 2024 14:34:04 Maciej W. Rozycki <macro@orcam.me.uk> wrote:
> Well, in principle in a setup with reliable links the LBMS bit may never
> be set, e.g. this system of mine has been in 24/7 operation since the last
> reboot 410 days ago and for the devices that support Link Active reporting
> it shows:
> ...
> so out of 11 devices 6 have the LBMS bit clear. But then 5 have it set,
> perhaps worryingly, so of course you're right, that it will get set in the
> field, though it's not enough by itself for your problem to trigger.
The way I look at it is that its essentially a probability distribution with time,
but I try to avoid learning too much about the physical layer because I would find
myself debugging more hardware issues lol. I also don't think LBMS/LABS being set
by itself is very interesting without knowing the rate at which it is being set.
FWIW I have seen some devices in the past going into recovery state many times a
second & still never downtrain, but at the same time they were setting the
LBMS/LABS bits which maybe not quite spec compliant.
I would like to help test these changes, but I would like to avoid having to test
each mentioned change individually. Does anyone have any preferences in how I batch
the patches for testing? Would it be ok if I just pulled them all together on one go?
- Matt
next prev parent reply other threads:[~2024-08-15 19:41 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 02/14] PCI: Export PCIe link retrain timeout Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 03/14] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 04/14] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 05/14] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 06/14] net/mlx5: " Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 07/14] PCI: Export `pcie_retrain_link' for use outside ASPM Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 08/14] PCI: Use distinct local vars in `pcie_retrain_link' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 09/14] PCI: Factor our waiting for link training end Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 10/14] PCI: Add support for polling DLLLA to `pcie_retrain_link' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 11/14] PCI: Use `pcie_wait_for_link_status' in `pcie_wait_for_link_delay' Maciej W. Rozycki
2023-06-11 17:20 ` [PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug Maciej W. Rozycki
2024-07-22 19:34 ` PCI: Work around PCIe link training failures Matthew W Carlis
2024-07-22 20:40 ` Maciej W. Rozycki
2024-07-24 19:18 ` Matthew W Carlis
2024-07-26 8:04 ` Matthew W Carlis
2024-07-29 10:27 ` Ilpo Järvinen
2024-07-29 14:51 ` Maciej W. Rozycki
2024-07-29 18:56 ` Matthew W Carlis
2023-06-11 17:20 ` [PATCH v9 13/14] PCI: Add failed link recovery for device reset events Maciej W. Rozycki
2023-06-11 17:20 ` [PATCH v9 14/14] PCI: Work around PCIe link training failures Maciej W. Rozycki
2023-06-14 23:12 ` [PATCH v9 00/14] pci: Work around ASMedia ASM2824 " Bjorn Helgaas
2023-06-15 0:41 ` Maciej W. Rozycki
2023-06-15 18:37 ` Bjorn Helgaas
2023-06-16 12:27 ` Maciej W. Rozycki
2023-06-16 20:29 ` Bjorn Helgaas
2023-06-20 9:54 ` Maciej W. Rozycki
2024-08-06 0:06 ` PCI: Work around " Matthew W Carlis
2024-08-06 19:36 ` Bjorn Helgaas
2024-08-07 8:43 ` Matthew W Carlis
2024-08-07 11:14 ` Maciej W. Rozycki
2024-08-07 12:29 ` Oliver O'Halloran
2024-08-07 11:49 ` Maciej W. Rozycki
2024-08-08 2:07 ` Matthew W Carlis
2024-08-08 23:13 ` Oliver O'Halloran
2024-08-09 13:34 ` Maciej W. Rozycki
2024-08-15 19:40 ` Matthew W Carlis [this message]
2024-08-16 13:57 ` Maciej W. Rozycki
2024-10-01 21:04 ` Matthew W Carlis
2024-10-02 12:58 ` Maciej W. Rozycki
2024-10-02 20:55 ` Bjorn Helgaas
2024-10-03 10:39 ` Maciej W. Rozycki
2025-06-10 7:00 ` Matthew W Carlis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240815194059.28798-1-mattc@purestorage.com \
--to=mattc@purestorage.com \
--cc=alex.williamson@redhat.com \
--cc=bhelgaas@google.com \
--cc=davem@davemloft.net \
--cc=david.abdurachmanov@gmail.com \
--cc=edumazet@google.com \
--cc=helgaas@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lukas@wunner.de \
--cc=macro@orcam.me.uk \
--cc=mahesh@linux.ibm.com \
--cc=mika.westerberg@linux.intel.com \
--cc=netdev@vger.kernel.org \
--cc=npiggin@gmail.com \
--cc=oohall@gmail.com \
--cc=pabeni@redhat.com \
--cc=pali@kernel.org \
--cc=saeedm@nvidia.com \
--cc=sr@denx.de \
--cc=wilson@tuliptree.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox