public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
From: Matthew W Carlis <mattc@purestorage.com>
To: macro@orcam.me.uk
Cc: alex.williamson@redhat.com, bhelgaas@google.com,
	davem@davemloft.net, david.abdurachmanov@gmail.com,
	edumazet@google.com, helgaas@kernel.org, kuba@kernel.org,
	leon@kernel.org, linux-kernel@vger.kernel.org,
	linux-pci@vger.kernel.org, linux-rdma@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, lukas@wunner.de,
	mahesh@linux.ibm.com, mattc@purestorage.com,
	mika.westerberg@linux.intel.com, netdev@vger.kernel.org,
	npiggin@gmail.com, oohall@gmail.com, pabeni@redhat.com,
	pali@kernel.org, saeedm@nvidia.com, sr@denx.de,
	wilson@tuliptree.org
Subject: PCI: Work around PCIe link training failures
Date: Thu, 15 Aug 2024 13:40:59 -0600	[thread overview]
Message-ID: <20240815194059.28798-1-mattc@purestorage.com> (raw)
In-Reply-To: <alpine.DEB.2.21.2408091356190.61955@angie.orcam.me.uk>

Sorry for the delay in my responses here I had some things get in my way.

On Fri, 9 Aug 2024 09:13:52 Oliver O'Halloran <oohall@gmail.com> wrote:

> Ok? If we have to check for DPC being enabled in addition to checking
> the surprise bit in the slot capabilities then that's fine, we can do
> that. The question to be answered here is: how should this feature
> work on ports where it's normal for a device to be removed without any
> notice?

I'm not sure if its the correct thing to check however. I assumed that ports
using the pciehp driver would usually consider it "normal" for a device to
be removed actually, but maybe I have the idea of hp reversed.

On Fri, 9 Aug 2024 14:34:04 Maciej W. Rozycki <macro@orcam.me.uk> wrote:

> Well, in principle in a setup with reliable links the LBMS bit may never 
> be set, e.g. this system of mine has been in 24/7 operation since the last 
> reboot 410 days ago and for the devices that support Link Active reporting 
> it shows:
> ...
> so out of 11 devices 6 have the LBMS bit clear.  But then 5 have it set, 
> perhaps worryingly, so of course you're right, that it will get set in the 
> field, though it's not enough by itself for your problem to trigger.

The way I look at it is that its essentially a probability distribution with time,
but I try to avoid learning too much about the physical layer because I would find
myself debugging more hardware issues lol. I also don't think LBMS/LABS being set
by itself is very interesting without knowing the rate at which it is being set.
FWIW I have seen some devices in the past going into recovery state many times a
second & still never downtrain, but at the same time they were setting the
LBMS/LABS bits which maybe not quite spec compliant.

I would like to help test these changes, but I would like to avoid having to test
each mentioned change individually. Does anyone have any preferences in how I batch
the patches for testing? Would it be ok if I just pulled them all together on one go?

- Matt

  reply	other threads:[~2024-08-15 19:41 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-11 17:19 [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 02/14] PCI: Export PCIe link retrain timeout Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 03/14] PCI: Execute `quirk_enable_clear_retrain_link' earlier Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 04/14] PCI: Initialize `link_active_reporting' earlier Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 05/14] powerpc/eeh: Rely on `link_active_reporting' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 06/14] net/mlx5: " Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 07/14] PCI: Export `pcie_retrain_link' for use outside ASPM Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 08/14] PCI: Use distinct local vars in `pcie_retrain_link' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 09/14] PCI: Factor our waiting for link training end Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 10/14] PCI: Add support for polling DLLLA to `pcie_retrain_link' Maciej W. Rozycki
2023-06-11 17:19 ` [PATCH v9 11/14] PCI: Use `pcie_wait_for_link_status' in `pcie_wait_for_link_delay' Maciej W. Rozycki
2023-06-11 17:20 ` [PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug Maciej W. Rozycki
2024-07-22 19:34   ` PCI: Work around PCIe link training failures Matthew W Carlis
2024-07-22 20:40     ` Maciej W. Rozycki
2024-07-24 19:18       ` Matthew W Carlis
2024-07-26  8:04         ` Matthew W Carlis
2024-07-29 10:27           ` Ilpo Järvinen
2024-07-29 14:51             ` Maciej W. Rozycki
2024-07-29 18:56               ` Matthew W Carlis
2023-06-11 17:20 ` [PATCH v9 13/14] PCI: Add failed link recovery for device reset events Maciej W. Rozycki
2023-06-11 17:20 ` [PATCH v9 14/14] PCI: Work around PCIe link training failures Maciej W. Rozycki
2023-06-14 23:12 ` [PATCH v9 00/14] pci: Work around ASMedia ASM2824 " Bjorn Helgaas
2023-06-15  0:41   ` Maciej W. Rozycki
2023-06-15 18:37     ` Bjorn Helgaas
2023-06-16 12:27       ` Maciej W. Rozycki
2023-06-16 20:29         ` Bjorn Helgaas
2023-06-20  9:54           ` Maciej W. Rozycki
2024-08-06  0:06             ` PCI: Work around " Matthew W Carlis
2024-08-06 19:36               ` Bjorn Helgaas
2024-08-07  8:43                 ` Matthew W Carlis
2024-08-07 11:14                   ` Maciej W. Rozycki
2024-08-07 12:29                     ` Oliver O'Halloran
2024-08-07 11:49               ` Maciej W. Rozycki
2024-08-08  2:07                 ` Matthew W Carlis
2024-08-08 23:13                   ` Oliver O'Halloran
2024-08-09 13:34                   ` Maciej W. Rozycki
2024-08-15 19:40                     ` Matthew W Carlis [this message]
2024-08-16 13:57                       ` Maciej W. Rozycki
2024-10-01 21:04                         ` Matthew W Carlis
2024-10-02 12:58                           ` Maciej W. Rozycki
2024-10-02 20:55                             ` Bjorn Helgaas
2024-10-03 10:39                               ` Maciej W. Rozycki
2025-06-10  7:00                                 ` Matthew W Carlis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240815194059.28798-1-mattc@purestorage.com \
    --to=mattc@purestorage.com \
    --cc=alex.williamson@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=davem@davemloft.net \
    --cc=david.abdurachmanov@gmail.com \
    --cc=edumazet@google.com \
    --cc=helgaas@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lukas@wunner.de \
    --cc=macro@orcam.me.uk \
    --cc=mahesh@linux.ibm.com \
    --cc=mika.westerberg@linux.intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=oohall@gmail.com \
    --cc=pabeni@redhat.com \
    --cc=pali@kernel.org \
    --cc=saeedm@nvidia.com \
    --cc=sr@denx.de \
    --cc=wilson@tuliptree.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox