netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesse Brandeburg <jesse.brandeburg@intel.com>
To: "Andreas K. Huettel" <andreas.huettel@ur.de>,
	Paul Menzel <pmenzel@molgen.mpg.de>
Cc: <netdev@vger.kernel.org>, <intel-wired-lan@lists.osuosl.org>,
	"Jakub Kicinski" <kubakici@wp.pl>
Subject: Re: [EXT] Re: [Intel-wired-lan] Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521]
Date: Tue, 5 Oct 2021 15:27:51 -0700	[thread overview]
Message-ID: <c75203e9-0ef4-20bd-87a5-ad0846863886@intel.com> (raw)
In-Reply-To: <2944777.ktpJ11cQ8Q@pinacolada>

On 10/5/2021 6:43 AM, Andreas K. Huettel wrote:
>>
>> What messages are new compared to the working Linux 5.10.59?
>>
> 
> I've uploaded the full boot logs to https://dev.gentoo.org/~dilfridge/igb/
> (both in a version with and without timestamps, for easy diff).
> 
> * I can't see anything that immediately points to the igb device (like a PCI id etc.) before the module is loaded. 
> * The main difference between the logs is many unrelated (?) i915 warnings in 5.10.59 because of the nonfunctional graphics.
> 
> The messages easily identifiable are:
> 
> huettel@pinacolada ~/tmp $ cat kernel-messages-5.10.59.txt |grep igb
> Oct  5 15:11:18 dilfridge kernel: [    2.090675] igb: Intel(R) Gigabit Ethernet Network Driver
> Oct  5 15:11:18 dilfridge kernel: [    2.090676] igb: Copyright (c) 2007-2014 Intel Corporation.
> Oct  5 15:11:18 dilfridge kernel: [    2.090728] igb 0000:01:00.0: enabling device (0000 -> 0002)

This line is missing below, it indicates that the kernel couldn't or
didn't power up the PCIe for some reason. We're looking for something
like ACPI or PCI patches (possibly PCI-Power management) to be the
culprit here.


> Oct  5 15:11:18 dilfridge kernel: [    2.094438] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs
> Oct  5 15:11:18 dilfridge kernel: [    2.097287] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs
> Oct  5 15:11:18 dilfridge kernel: [    2.098492] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs
> Oct  5 15:11:18 dilfridge kernel: [    2.098787] Modules linked in: igb(+) i915(+) iosf_mbi acpi_pad efivarfs
> Oct  5 15:11:18 dilfridge kernel: [    2.173386] igb 0000:01:00.0: added PHC on eth0
> Oct  5 15:11:18 dilfridge kernel: [    2.173391] igb 0000:01:00.0: Intel(R) Gigabit Ethernet Network Connection
> Oct  5 15:11:18 dilfridge kernel: [    2.173395] igb 0000:01:00.0: eth0: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4c
> Oct  5 15:11:18 dilfridge kernel: [    2.173991] igb 0000:01:00.0: eth0: PBA No: H47819-001
> Oct  5 15:11:18 dilfridge kernel: [    2.173994] igb 0000:01:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
> Oct  5 15:11:18 dilfridge kernel: [    2.174199] igb 0000:01:00.1: enabling device (0000 -> 0002)
> Oct  5 15:11:18 dilfridge kernel: [    2.261029] igb 0000:01:00.1: added PHC on eth1
> Oct  5 15:11:18 dilfridge kernel: [    2.261034] igb 0000:01:00.1: Intel(R) Gigabit Ethernet Network Connection
> Oct  5 15:11:18 dilfridge kernel: [    2.261038] igb 0000:01:00.1: eth1: (PCIe:5.0Gb/s:Width x4) 6c:b3:11:23:d4:4d
> Oct  5 15:11:18 dilfridge kernel: [    2.261772] igb 0000:01:00.1: eth1: PBA No: H47819-001
> Oct  5 15:11:18 dilfridge kernel: [    2.261776] igb 0000:01:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
> Oct  5 15:11:18 dilfridge kernel: [    2.265376] igb 0000:01:00.1 enp1s0f1: renamed from eth1
> Oct  5 15:11:18 dilfridge kernel: [    2.282514] igb 0000:01:00.0 enp1s0f0: renamed from eth0
> Oct  5 15:11:31 dilfridge kernel: [   17.585202] igb 0000:01:00.0 enp1s0f0: igb: enp1s0f0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
> 
> huettel@pinacolada ~/tmp $ cat kernel-messages-5.14.9.txt |grep igb
> Oct  5 02:38:31 dilfridge kernel: [    2.108606] igb: Intel(R) Gigabit Ethernet Network Driver
> Oct  5 02:38:31 dilfridge kernel: [    2.108608] igb: Copyright (c) 2007-2014 Intel Corporation.
> Oct  5 02:38:31 dilfridge kernel: [    2.108622] igb 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)

This is really the only message that matters. It indicates the config
space is inaccessible, and from the system/kernel's perspective, the
device is unplugged or not responding, or in a PCIe power state.


> Oct  5 02:38:31 dilfridge kernel: [    2.108918] igb 0000:01:00.0 0000:01:00.0 (uninitialized): PCIe link lost
> Oct  5 02:38:31 dilfridge kernel: [    2.418724] igb 0000:01:00.0: PHY reset is blocked due to SOL/IDER session.
> Oct  5 02:38:31 dilfridge kernel: [    4.148163] igb 0000:01:00.0: The NVM Checksum Is Not Valid
> Oct  5 02:38:31 dilfridge kernel: [    4.154891] igb: probe of 0000:01:00.0 failed with error -5
> Oct  5 02:38:31 dilfridge kernel: [    4.154904] igb 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> Oct  5 02:38:31 dilfridge kernel: [    4.155146] igb 0000:01:00.1 0000:01:00.1 (uninitialized): PCIe link lost
> Oct  5 02:38:31 dilfridge kernel: [    4.466904] igb 0000:01:00.1: PHY reset is blocked due to SOL/IDER session.
> Oct  5 02:38:31 dilfridge kernel: [    6.195528] igb 0000:01:00.1: The NVM Checksum Is Not Valid
> Oct  5 02:38:31 dilfridge kernel: [    6.200863] igb: probe of 0000:01:00.1 failed with error -5
> 
> 
>>>> Any advice on how to proceed? Willing to test patches and provide additional debug info.
>>
>> Without any ideas about the issue, please bisect the issue to find the 
>> commit introducing the regression, so it can be reverted/fixed to not 
>> violate Linux’ no-regression policy.
> 
> I'll start going through kernel versions (and later revisions) end of the week.

Thank you for helping the community figure out what is up here. I don't
believe that it is a driver bug/change that broke things, but anything
is possible. :-) Given what I saw above I wonder if you should try to
boot with pci_aspm=off

The best option is a bisect using git, but it will help to narrow things
down to a couple different kernel versions if that is the only option
you have.

  reply	other threads:[~2021-10-05 22:28 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-04 13:06 Intel I350 regression 5.10 -> 5.14 ("The NVM Checksum Is Not Valid") [8086:1521] Andreas K. Huettel
2021-10-04 14:48 ` Jakub Kicinski
2021-10-04 23:39   ` [Intel-wired-lan] " Hisashi T Fujinaka
2021-10-05  0:12     ` [EXT] " Andreas K. Huettel
2021-10-05  0:21       ` Hisashi T Fujinaka
2021-10-05  6:50     ` Sasha Neftin
2021-10-05  9:40       ` Paul Menzel
2021-10-05 18:20         ` Hisashi T Fujinaka
2021-10-05  9:34   ` Paul Menzel
2021-10-05 13:43     ` [EXT] " Andreas K. Huettel
2021-10-05 22:27       ` Jesse Brandeburg [this message]
2021-10-12 16:34         ` Andreas K. Huettel
2021-10-12 17:42           ` Paul Menzel
2021-10-12 17:58             ` Rafael J. Wysocki
2021-10-12 19:28               ` Andreas K. Huettel
2021-10-14 12:09                 ` Rafael J. Wysocki
2021-10-15 14:00                   ` Andreas K. Huettel
2021-10-15 18:42                     ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c75203e9-0ef4-20bd-87a5-ad0846863886@intel.com \
    --to=jesse.brandeburg@intel.com \
    --cc=andreas.huettel@ur.de \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kubakici@wp.pl \
    --cc=netdev@vger.kernel.org \
    --cc=pmenzel@molgen.mpg.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).