From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neftin, Sasha Date: Sun, 10 Apr 2022 12:26:32 +0300 Subject: [Intel-wired-lan] Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards In-Reply-To: <46696877-3dc9-0600-9a8f-eda42d029cd7@leemhuis.info> References: <6801d0ef-9620-0f61-c107-c2c5524952ef@leemhuis.info> <1e0558eb-b1f1-edb5-e554-a41f2c160401@intel.com> <46696877-3dc9-0600-9a8f-eda42d029cd7@leemhuis.info> Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: On 4/10/2022 11:21, Thorsten Leemhuis wrote: > Hi, this is your Linux kernel regression tracker. Top-posting for once, > to make this easily accessible to everyone. > > Hey Sasha and e1000e developers, what's up there? Two and a half weeks > ago it seemed the root cause for this regression was found and a > proposed patch to fix it was added to the bugzilla ticket and even > tested by the reporter. But since then nothing happened afaics. What's > up here? Or did I miss something? Hello Thorsten, Patch submitted to the IWL: https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git/commit/?h=dev-queue&id=7dd121b8d5735780b6a70db735d44b3e5b856130 > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > P.S.: As the Linux kernel's regression tracker I'm getting a lot of > reports on my table. I can only look briefly into most of them and lack > knowledge about most of the areas they concern. I thus unfortunately > will sometimes get things wrong or miss something important. I hope > that's not the case here; if you think it is, don't hesitate to tell me > in a public reply, it's in everyone's interest to set the public record > straight. > > #regzbot poke > > On 24.03.22 20:36, Neftin, Sasha wrote: >> On 3/24/2022 17:09, Neftin, Sasha wrote: >>> On 3/24/2022 12:37, Thorsten Leemhuis wrote: >>>> Hi, this is your Linux kernel regression tracker. >>>> >>>> I noticed a regression report in bugzilla.kernel.org that afaics nobody >>>> acted upon since it was reported about a week ago, that's why I decided >>>> to forward it to the lists and a few relevant people to the CC. To quote >>>> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 : >>>> >>>>> [reply] [?] Description James 2022-03-15 13:45:38 UTC >>>>> >>>>> I run Arch linux on an Intel NUC 8i3BEH which has the following >>>>> network card: >>>>> >>>>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection >>>>> (6) I219-V (rev 30) >>>>> ???????? DeviceName:? LAN >>>>> ???????? Subsystem: Intel Corporation Device 2074 >>>>> ???????? Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- >>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+ >>>>> ???????? Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >>>>>> TAbort- SERR- >>>> ???????? Latency: 0 >>>>> ???????? Interrupt: pin A routed to IRQ 135 >>>>> ???????? Region 0: Memory at c0b00000 (32-bit, non-prefetchable) >>>>> [size=128K] >>>>> ???????? Capabilities: [c8] Power Management version 3 >>>>> ???????????????? Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA >>>>> PME(D0+,D1-,D2-,D3hot+,D3cold+) >>>>> ???????????????? Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- >>>>> ???????? Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+ >>>>> ???????????????? Address: 00000000fee003d8? Data: 0000 >>>>> ???????? Kernel driver in use: e1000e >>>>> ???????? Kernel modules: e1000e >>>>> >>>>> I found a major regression since the previous few kernel versions >>>>> which causes several odd issues, most noteably I use the machine to >>>>> stream live tv via TVheadend and was finding this to be unusable >>>>> (picture freezes and sound breaks up very badly with continuity >>>>> errors in the TVheadend logfile). >>>>> >>>>> I found the issue was introduced since the 5.14 kernel, and have >>>>> eventually got round to performing a git bisect, which landed upon >>>>> the following commit: >>>>> >>>>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M >>>>> >>>>> Indeed, if I revert this single commit then the problem is resolved. >>>>> >>>>> To help diagnose the issue I applied the following patch to capture >>>>> the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d >>>>> variables: >>>>> >>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c >>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c >>>>> index d60e2016d..f4e5ffbcd 100644 >>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c >>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c >>>>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct >>>>> e1000_hw *hw, bool link) >>>>> ???????? u16 max_ltr_enc_d = 0;? /* maximum LTR decoded by platform */ >>>>> ???????? u16 lat_enc_d = 0;????? /* latency decoded */ >>>>> ???????? u16 lat_enc = 0;??????? /* latency encoded */ >>>>> +?????? struct e1000_adapter *adapter = hw->adapter; >>>>> >>>>> ???????? if (link) { >>>>> ???????????????? u16 speed, duplex, scale = 0; >>>>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct >>>>> e1000_hw *hw, bool link) >>>>> ????????????????????????????????? ((max_ltr_enc & >>>>> E1000_LTRV_SCALE_MASK) >>>>> ????????????????????????????????? >> E1000_LTRV_SCALE_SHIFT))); >>>>> >>>>> +?????????????? e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc, >>>>> max_ltr_enc); >>>>> +?????????????? e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u", >>>>> lat_enc_d, max_ltr_enc_d); >>>>> + >>>>> ???????????????? if (lat_enc_d > max_ltr_enc_d) >>>>> ???????????????????????? lat_enc = max_ltr_enc; >>>>> ???????? } >>>>> >>>>> With this in place I see the following in dmesg: >>>>> >>>>> [??? 3.241215] e1000e: Intel(R) PRO/1000 Network Driver >>>>> [??? 3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation. >>>>> [??? 3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate >>>>> (ints/sec) set to dynamic conservative mode >>>>> [??? 3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): >>>>> registered PHC clock >>>>> [??? 3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width >>>>> x1) 94:c6:91:ae:b3:7b >>>>> [??? 3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network >>>>> Connection >>>>> [??? 3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: >>>>> FFFFFF-0FF >>>>> [??? 6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233 >>>>> max_ltr_enc=4099 >>>>> [??? 6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368 >>>>> max_ltr_enc_d=0 >>>>> [??? 6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps >>>>> Full Duplex, Flow Control: Rx/Tx >>>>> >>>>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 ! >>>>> >>>>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop >>>>> latency to max_ltr_enc (i.e. 4099) where it would have previously >>>>> been set to 2233 in this particular example. This seems to be where >>>>> the problem lies. >>>>> >>>>> Prior to commit 44a13a5: >>>>> >>>>> if (lat_enc > max_ltr_enc) >>>>> ?? lat_enc = max_ltr_enc; >>>>> >>>>> After commit 44a13a5: >>>>> >>>>> if (lat_enc_d > max_ltr_enc_d) >>>>> ?? lat_enc = max_ltr_enc; >>>>> >>>>> >>>>> I'm not sure whether it was intended for this new code to take >>>>> effect for an I219 since the commit message on 44a13a5 indicates it >>>>> was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting >>>>> set to 0? >>>>> >>>> >>>> BTW, that commit is from Sasha Neftin. >>> Hello Thorsten, >>> I've expected follow decoded values (link 1G) >>> lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185) >>> max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3) >>> >>> scale 0 - 1 >>> scale 1 - 32 >>> scale 2 - 1024 >>> scale 3 - 32768 >>> scale 4 - 1048576 (nano s) >>> >>> I've separated calculate: >>> e_info("e1000e: 1* max_ltr_enc_d: %d\n", >>> ??????? max_ltr_enc & E1000_LTRV_VALUE_MASK); >>> e_info("e1000e: 2* max_ltr_enc_d: %d\n", >>> ??????? (1U << (E1000_LTRV_SCALE_FACTOR * >>> ??????? ((max_ltr_enc & E1000_LTRV_SCALE_MASK) >>> ??????? >> E1000_LTRV_SCALE_SHIFT)))); >>> I would expect: >>> 1* max_ltr_enc_d (value): 3 >>> 2* max_ltr_enc_d (scale): 1048576 >>> and so: value * scale >>> 1048576*3 = 3145728ns >>> >>> Please, let's check it. (I am wondering if over-calculate it) >>> Thanks, >>> Sasha >> ok. Overflow... Instead of >> +?????? u16 max_ltr_enc_d = 0;? /* maximum LTR decoded by platform */ >> +?????? u16 lat_enc_d = 0;????? /* latency decoded */ >> >> Should be: >> +?????? u32 max_ltr_enc_d = 0;? /* maximum LTR decoded by platform */ >> +?????? u32 lat_enc_d = 0;????? /* latency decoded */ >> I will process the patch address this overflow and some e_dbg to >> eliminate calculation. >> >> sudo cat /sys/kernel/debug/pmc_core/ltr_show >> SOUTHPORT_A???????????????????????? LTR: RAW: 0x0 Non-Snoop(ns): >> 0?????????????????? Snoop(ns): 0 >> SOUTHPORT_B???????????????????????? LTR: RAW: 0x0 Non-Snoop(ns): >> 0?????????????????? Snoop(ns): 0 >> SATA??????????????????????????????? LTR: RAW: 0x900f Non-Snoop(ns): >> 0?????????????????? Snoop(ns): 15728640 >> GIGABIT_ETHERNET??????????????????? LTR: RAW: 0x88b988b9 Non-Snoop(ns): >> 189440????????????? Snoop(ns): 189440 >> XHCI??????????????????????????????? LTR: RAW: 0x891a Non-Snoop(ns): >> 0?????????????????? Snoop(ns): 288768 >> >>>> >>>> Could somebody take a look into this? Or was this discussed somewhere >>>> else already? Or even fixed? >>>> >>>> Anyway, to get this tracked: >>>> >>>> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73 >>>> #regzbot from: James >>>> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14 >>>> onwards >>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689 >>>> >>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) >>>> >>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of >>>> reports on my table. I can only look briefly into most of them and lack >>>> knowledge about most of the areas they concern. I thus unfortunately >>>> will sometimes get things wrong or miss something important. I hope >>>> that's not the case here; if you think it is, don't hesitate to tell me >>>> in a public reply, it's in everyone's interest to set the public record >>>> straight. >>>> >>> >> >> >> Sasha