public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Harshank Matkar <harshankmatkar1304@outlook.com>
Cc: "intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"tony.nguyen@intel.com" <tony.nguyen@intel.com>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"pabeni@redhat.com" <pabeni@redhat.com>,
	"edumazet@google.com" <edumazet@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] igc: Add PCIe link recovery for I225/I226
Date: Wed, 11 Feb 2026 12:29:09 -0600	[thread overview]
Message-ID: <20260211182909.GA117627@bhelgaas> (raw)
In-Reply-To: <20260210203332.23200-1-harshankmatkar1304@outlook.com>

On Tue, Feb 10, 2026 at 08:34:02PM +0000, Harshank Matkar wrote:
> From: Harshank Matkar <harshankmatkar1304@outlook.com>
> 
> When ASPM L0s transitions occur on Intel I225/I226 controllers,
> transient PCIe link instability can cause register read failures
> (0xFFFFFFFF responses).

At the PCIe level, the failure is some uncorrectable PCIe error like a
Completion Timeout or Unsupported Request.  The 0xFFFFFFFF response is
implementation-specific behavior determined by the Root Complex
design.

> Implement a multi-layer recovery strategy:
> 1. Immediate retries: 3 attempts with 100-200μs delays
> 2. Link retraining: Trigger PCIe link retraining via capabilities
> 3. Device detachment: Only as last resort after max attempts
> 
> The recovery mechanism includes rate limiting, maximum attempt
> tracking, and device presence validation to prevent false detaches
> on transient ASPM glitches while maintaining safety through
> bounded retry limits.

I assume the glitch is a hardware erratum and should be documented as
such by Intel, although it's possible ASPM L0s isn't configured
correctly.

If it's a hardware erratum, I think you should use a quirk to disable
L0s on these devices, e.g., pci_disable_link_state(pdev,
PCIE_LINK_STATE_L0S).  Even if this patch allows recovery, the PCIe
errors will be logged and reported via AER, which will be confusing to
users.

Bjorn

  parent reply	other threads:[~2026-02-11 18:29 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-10 20:34 [PATCH] igc: Add PCIe link recovery for I225/I226 Harshank Matkar
2026-02-11 14:30 ` [Intel-wired-lan] " Ruinskiy, Dima
2026-02-11 15:00 ` Paul Menzel
2026-02-12 18:14   ` Harshank Matkar
2026-02-11 18:29 ` Bjorn Helgaas [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-02-09 12:08 Harshank Matkar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260211182909.GA117627@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=harshankmatkar1304@outlook.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tony.nguyen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox