From: Conor Dooley <conor@kernel.org>
To: "Neftin, Sasha" <sasha.neftin@intel.com>
Cc: "Fuxbrumer, Devora" <devora.fuxbrumer@intel.com>,
helgaas@kernel.org, regressions@lists.linux.dev, "Meir,
NaamaX" <naamax.meir@intel.com>,
Ivan Smirnov <isgsmirnov@gmail.com>,
intel-wired-lan@lists.osuosl.org,
Jakub Kicinski <kuba@kernel.org>,
"Ruinskiy, Dima" <dima.ruinskiy@intel.com>,
"Avivi, Amir" <amir.avivi@intel.com>
Subject: Re: [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V)
Date: Sat, 31 Dec 2022 15:02:57 +0000 [thread overview]
Message-ID: <Y7BPIYsBXN0ivoLE@spud> (raw)
In-Reply-To: <Y6NCzgXZzv+oJKV1@spud>
[-- Attachment #1.1: Type: text/plain, Size: 5909 bytes --]
On Wed, Dec 21, 2022 at 05:30:54PM +0000, Conor Dooley wrote:
> On Sun, Nov 20, 2022 at 07:55:09PM +0000, Conor Dooley wrote:
> > On Sat, Nov 19, 2022 at 08:06:05PM +0200, Neftin, Sasha wrote:
> > > On 11/19/2022 01:21, Conor Dooley wrote:
> > > > On Fri, Nov 18, 2022 at 02:54:43PM -0800, Jakub Kicinski wrote:
> > > > > On Fri, 18 Nov 2022 22:43:29 +0000 Conor Dooley wrote:
> > > > > > > Is there any update for the community? More and more folks are asking. We
> > > > > > > are all techies and happy to help debug.
> > > > > >
> > > > > > Vested interest since I am suffering from the same issue (X670E-F
> > > > > > Gaming), but is it okay to add this to regzbot? Not sure whether it
> > > > > > counts as a regression or not since it's new hw with the existing driver,
> > > > > > but this seems to be falling through the cracks without a response for
> > > > > > several weeks.
> > > > >
> > > > > Dunno, Thorsten's will decide. The line has to be drawn somewhere
> > > > > on "vendor doesn't care about Linux support" vs "we broke uAPI".
> > > > > This is the kind of situation I was alluding to in my line of
> > > > > questioning at the maintainer summit: https://lwn.net/Articles/908324/
> > > >
> > > > Yeah & it is /regression/ tracking which I don't (or rather didn't)
> > > > consider this situation to be. I'm generally a little unsure as to when
> > > > I should trigger regzbot in general:
> > > > - immediately when I find something?
> > > > - only if it goes a while with nothing constructive?
> > > > - is it okay to use it outside of "this used to work and now doesnt"?
> > > >
> > > > Either way, but I did some more googling and found this reddit thread:
> > > > https://www.reddit.com/r/intel/comments/lqb4km/for_people_having_i225v_connection_issues/
> > > >
> > > > That's being reported against windows & I dunno if the dude is using
> > > > firmware and driver interchangeably etc. But the disabling power saving
> > > > etc sounds oddly like the issue we have here, since that was a proposed
> > > > workaround in Ivan's 2022 reddit thread.
> > > >
> > > > Supposedly I am on firmware-version 1082:8770, but /I/ I have no idea
> > > > how that corresponds to windows versioning. That may lend some credence
> > > > to your assertion about firmware being the source of many issues.
> > > >
> > > > > Finding a kernel release which does not suffer from the problem
> > > > > would certainly strengthen your case.
> > > >
> > > > Aye, likely to be a little difficult to do a meaningful bisection for
> > > > me at least, since the motherboard I have with the problem is an AM5
> > > > one for the new Zen4 stuff. I'm not an x86 person, so not entirely
> > > > sure when that support landed. I may do some poking tomorrow..
> > > >
> > > I do not think we can resolve this problem on this forum.
> > > In early Ivan's report was reported error to netdev "PCIe link lost, device
> > > now detached"). Since the PCIe link unexpectedly drops it could lead to many
> > > problems (not only crashes).
> >
> > Hmm, I'll take a look at what mine spits out next time it dies, but I
> > would imagine that you're correct and I see it too.
>
> It does in fact say that, but interestingly only this peripheral has any
> issues. My GPUs etc have no problem at all.
>
> > > Before you go to SW/FW bisection (change FW(NVM), go back with a kernel
> > > version) - please, contact your board vendor (ASUS). Why PCIe link drop?
> >
> > I dunno, I suppose it just entered a lower power state!
> >
> > > Circuit problem on board, the system performs power management flows and
> > > does not stop the driver.
> >
> > My GPU and other PCI devices are returning from lower power modes properly.
> > I wonder what's different about this specific device. As I said, not too
> > familiar with x86 stuff - is there someone from AMD worth poking as the
> > output from lspci is a wall of AMD bridges w/ endpoints mixed in.
> >
> > Doing a cursory look at other x670 stuff - the non-asus ones that I
> > looked at are not using Intel ethernet.
> >
> > > "failed to read reg 0xc030" (just symptom) happen after PCIe link lost.
> >
> > Per 47e16692b26b ("igb/igc: warn when fatal read failure happens"), it
> > looks as though this is not a *new* problem though as you guys have seen
> > this while testing.
> >
> > I've got a 1 G NIC, I like my dev machine to "just work" so I'll probably
> > throw that in and see how far that gets me. IIRC it's an igb one so will
> > at least make for a datapoint.
>
> FWIW I gave up on the igc driver and am using my NIC, couldn't be
> bothered with the disruption. I'll give the bios stuff mentioned
> elsewhere a go over Christmas now that v6.1.1 exists and see if that
> helps. Hopefully it does!
Hallo, me again...
I didn't actually give the bios stuff a go in the end. I figured that
changing everything at once would likely not be a good idea - but what I
did do was try v6.1.1 & have now been running for 50-something hours
without any issues while using the igc iface.
Whole-ly unscientific of course, but I had noticed this thread:
https://lore.kernel.org/all/20221226225045.GA400369@bhelgaas/
and that commit c01163dbd1b8 ("PCI/PM: Always disable PTM for all devices
during suspend") was not part of the v6.0.y kernels I was running but
*is* in v6.1.y, which was my impetus for trying the kernel upgrade.
I checked v6.0.16-rc2 and that commit does not appear to have been
backported yet.
Perhaps some of the other "victims" in this thread who have not yet
tried changing BIOS etc, could give v6.1.y a go & see if they still have
issues.
I may backport the aforementioned patch myself and see how it does, but
someone else trying v6.1.y & not seeing the iface dying would certainly
help with motivation :)
Thanks,
Conor.
[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]
[-- Attachment #2: Type: text/plain, Size: 162 bytes --]
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan
next prev parent reply other threads:[~2023-01-03 16:48 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAPAtJa_o5q-sU+AD=G3y43H_5pBKnOZTQGXM99uszPXNkn8Z9A@mail.gmail.com>
2022-11-01 0:05 ` [Intel-wired-lan] igc kernel module crashes on new hardware (Intel Ethernet I225-V) Jakub Kicinski
2022-11-01 16:20 ` Neftin, Sasha
2022-11-02 16:54 ` Ivan Smirnov
2022-11-02 17:53 ` Ivan Smirnov
2022-11-10 11:44 ` Ivan Smirnov
2022-11-16 22:23 ` Ivan Smirnov
2022-11-18 22:43 ` Conor Dooley
2022-11-18 22:54 ` Jakub Kicinski
2022-11-18 23:21 ` Conor Dooley
2022-11-19 18:06 ` Neftin, Sasha
2022-11-20 19:55 ` Conor Dooley
2022-12-21 17:30 ` Conor Dooley
2022-12-31 15:02 ` Conor Dooley [this message]
2023-01-02 11:09 ` Conor Dooley
2022-11-20 10:32 ` Thorsten Leemhuis
2022-11-20 18:40 ` Conor Dooley
2022-11-23 11:47 ` Ruinskiy, Dima
2022-11-24 6:20 ` Ivan Smirnov
2022-11-24 13:55 ` Ruinskiy, Dima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y7BPIYsBXN0ivoLE@spud \
--to=conor@kernel.org \
--cc=amir.avivi@intel.com \
--cc=devora.fuxbrumer@intel.com \
--cc=dima.ruinskiy@intel.com \
--cc=helgaas@kernel.org \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=isgsmirnov@gmail.com \
--cc=kuba@kernel.org \
--cc=naamax.meir@intel.com \
--cc=regressions@lists.linux.dev \
--cc=sasha.neftin@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox