From: Bjorn Helgaas <helgaas@kernel.org>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] RX CRC errors on I219-V (6) 8086:15be
Date: Tue, 2 Jul 2019 13:01:12 -0500 [thread overview]
Message-ID: <20190702180112.GB128603@google.com> (raw)
In-Reply-To: <E29A2CD2-1632-4575-9910-0808BD15F4D3@canonical.com>
On Tue, Jul 02, 2019 at 04:25:59PM +0800, Kai Heng Feng wrote:
> +linux-pci
>
> Hi Sasha,
>
> at 6:49 PM, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
>
> > at 14:26, Neftin, Sasha <sasha.neftin@intel.com> wrote:
> >
> > > On 6/26/2019 09:14, Kai Heng Feng wrote:
> > > > Hi Sasha
> > > > at 5:09 PM, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> > > > > Hi Jeffrey,
> > > > >
> > > > > We?ve encountered another issue, which causes multiple CRC
> > > > > errors and renders ethernet completely useless, here?s the
> > > > > network stats:
> > > > I also tried ignore_ltr for this issue, seems like it alleviates
> > > > the symptom a bit for a while, then the network still becomes
> > > > useless after some usage.
> > > > And yes, it?s also a Whiskey Lake platform. What?s the next step
> > > > to debug this problem?
> > > > Kai-Heng
> > > CRC errors not related to the LTR. Please, try to disable the ME on
> > > your platform. Hope you have this option in BIOS. Another way is to
> > > contact your PC vendor and ask to provide NVM without ME. Let's
> > > start debugging with these steps.
> >
> > According to ODM, the ME can be physically disabled by a jumper.
> > But after disabling the ME the same issue can still be observed.
>
> We?ve found that this issue doesn?t happen to SATA SSD, it only happens when
> NVMe SSD is in use.
>
> Here are the steps:
> - Disable NVMe ASPM, issue persists
> - modprobe -r e1000e && modprobe e1000e, issue doesn?t happen
> - Enabling NVMe ASPM, issue doesn?t happen
>
> As long as NVMe ASPM gets enabled after e1000e gets loaded, the issue
> doesn?t happen.
IIUC the problem happens with the mainline and dev-queue e1000e
driver, but not with the out-of-tree Intel driver. Since there is a
working driver and there's the potential (at least in principle) for
unifying them or bisecting between them, I have limited interest in
debugging it from scratch.
If it turns out to be a PCI core problem, I would want to know: What's
the PCI topology? "lspci -vv" output for the system? Does it make a
difference if you boot with "pcie_aspm=off"? Collect complete dmesg,
maybe attach it to a kernel.org bugzilla?
> > > > > /sys/class/net/eno1/statistics$ grep . *
> > > > > collisions:0
> > > > > multicast:95
> > > > > rx_bytes:1499851
> > > > > rx_compressed:0
> > > > > rx_crc_errors:1165
> > > > > rx_dropped:0
> > > > > rx_errors:2330
> > > > > rx_fifo_errors:0
> > > > > rx_frame_errors:0
> > > > > rx_length_errors:0
> > > > > rx_missed_errors:0
> > > > > rx_nohandler:0
> > > > > rx_over_errors:0
> > > > > rx_packets:4789
> > > > > tx_aborted_errors:0
> > > > > tx_bytes:864312
> > > > > tx_carrier_errors:0
> > > > > tx_compressed:0
> > > > > tx_dropped:0
> > > > > tx_errors:0
> > > > > tx_fifo_errors:0
> > > > > tx_heartbeat_errors:0
> > > > > tx_packets:7370
> > > > > tx_window_errors:0
> > > > >
> > > > > Same behavior can be observed on both mainline kernel and on
> > > > > your dev-queue branch.
> > > > > OTOH, the same issue can?t be observed on out-of-tree e1000e.
> > > > >
> > > > > Is there any plan to close the gap between upstream and
> > > > > out-of-tree version?
> > > > >
> > > > > Kai-Heng
>
>
WARNING: multiple messages have this Message-ID (diff)
From: Bjorn Helgaas <helgaas@kernel.org>
To: Kai Heng Feng <kai.heng.feng@canonical.com>
Cc: "Neftin, Sasha" <sasha.neftin@intel.com>,
jeffrey.t.kirsher@intel.com,
Anthony Wong <anthony.wong@canonical.com>,
intel-wired-lan@lists.osuosl.org,
linux-kernel <linux-kernel@vger.kernel.org>,
Linux PCI <linux-pci@vger.kernel.org>
Subject: Re: RX CRC errors on I219-V (6) 8086:15be
Date: Tue, 2 Jul 2019 13:01:12 -0500 [thread overview]
Message-ID: <20190702180112.GB128603@google.com> (raw)
In-Reply-To: <E29A2CD2-1632-4575-9910-0808BD15F4D3@canonical.com>
On Tue, Jul 02, 2019 at 04:25:59PM +0800, Kai Heng Feng wrote:
> +linux-pci
>
> Hi Sasha,
>
> at 6:49 PM, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
>
> > at 14:26, Neftin, Sasha <sasha.neftin@intel.com> wrote:
> >
> > > On 6/26/2019 09:14, Kai Heng Feng wrote:
> > > > Hi Sasha
> > > > at 5:09 PM, Kai-Heng Feng <kai.heng.feng@canonical.com> wrote:
> > > > > Hi Jeffrey,
> > > > >
> > > > > We’ve encountered another issue, which causes multiple CRC
> > > > > errors and renders ethernet completely useless, here’s the
> > > > > network stats:
> > > > I also tried ignore_ltr for this issue, seems like it alleviates
> > > > the symptom a bit for a while, then the network still becomes
> > > > useless after some usage.
> > > > And yes, it’s also a Whiskey Lake platform. What’s the next step
> > > > to debug this problem?
> > > > Kai-Heng
> > > CRC errors not related to the LTR. Please, try to disable the ME on
> > > your platform. Hope you have this option in BIOS. Another way is to
> > > contact your PC vendor and ask to provide NVM without ME. Let's
> > > start debugging with these steps.
> >
> > According to ODM, the ME can be physically disabled by a jumper.
> > But after disabling the ME the same issue can still be observed.
>
> We’ve found that this issue doesn’t happen to SATA SSD, it only happens when
> NVMe SSD is in use.
>
> Here are the steps:
> - Disable NVMe ASPM, issue persists
> - modprobe -r e1000e && modprobe e1000e, issue doesn’t happen
> - Enabling NVMe ASPM, issue doesn’t happen
>
> As long as NVMe ASPM gets enabled after e1000e gets loaded, the issue
> doesn’t happen.
IIUC the problem happens with the mainline and dev-queue e1000e
driver, but not with the out-of-tree Intel driver. Since there is a
working driver and there's the potential (at least in principle) for
unifying them or bisecting between them, I have limited interest in
debugging it from scratch.
If it turns out to be a PCI core problem, I would want to know: What's
the PCI topology? "lspci -vv" output for the system? Does it make a
difference if you boot with "pcie_aspm=off"? Collect complete dmesg,
maybe attach it to a kernel.org bugzilla?
> > > > > /sys/class/net/eno1/statistics$ grep . *
> > > > > collisions:0
> > > > > multicast:95
> > > > > rx_bytes:1499851
> > > > > rx_compressed:0
> > > > > rx_crc_errors:1165
> > > > > rx_dropped:0
> > > > > rx_errors:2330
> > > > > rx_fifo_errors:0
> > > > > rx_frame_errors:0
> > > > > rx_length_errors:0
> > > > > rx_missed_errors:0
> > > > > rx_nohandler:0
> > > > > rx_over_errors:0
> > > > > rx_packets:4789
> > > > > tx_aborted_errors:0
> > > > > tx_bytes:864312
> > > > > tx_carrier_errors:0
> > > > > tx_compressed:0
> > > > > tx_dropped:0
> > > > > tx_errors:0
> > > > > tx_fifo_errors:0
> > > > > tx_heartbeat_errors:0
> > > > > tx_packets:7370
> > > > > tx_window_errors:0
> > > > >
> > > > > Same behavior can be observed on both mainline kernel and on
> > > > > your dev-queue branch.
> > > > > OTOH, the same issue can’t be observed on out-of-tree e1000e.
> > > > >
> > > > > Is there any plan to close the gap between upstream and
> > > > > out-of-tree version?
> > > > >
> > > > > Kai-Heng
>
>
next prev parent reply other threads:[~2019-07-02 18:01 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-24 9:09 [Intel-wired-lan] RX CRC errors on I219-V (6) 8086:15be Kai-Heng Feng
2019-06-24 9:09 ` Kai-Heng Feng
2019-06-26 6:14 ` [Intel-wired-lan] " Kai Heng Feng
2019-06-26 6:14 ` Kai Heng Feng
2019-06-26 6:26 ` [Intel-wired-lan] " Neftin, Sasha
2019-06-26 6:26 ` Neftin, Sasha
2019-06-26 6:36 ` [Intel-wired-lan] " Lifshits, Vitaly
2019-06-28 10:49 ` Kai-Heng Feng
2019-06-28 10:49 ` Kai-Heng Feng
2019-07-02 8:25 ` [Intel-wired-lan] " Kai Heng Feng
2019-07-02 8:25 ` Kai Heng Feng
2019-07-02 18:01 ` Bjorn Helgaas [this message]
2019-07-02 18:01 ` Bjorn Helgaas
2019-07-03 11:32 ` [Intel-wired-lan] " Kai-Heng Feng
2019-07-03 11:32 ` Kai-Heng Feng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190702180112.GB128603@google.com \
--to=helgaas@kernel.org \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.