From: Neftin, Sasha <sasha.neftin@intel.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards
Date: Sun, 10 Apr 2022 12:26:32 +0300 [thread overview]
Message-ID: <b63584c0-aa7c-9252-f3ef-7ddc857ad5ae@intel.com> (raw)
In-Reply-To: <46696877-3dc9-0600-9a8f-eda42d029cd7@leemhuis.info>
On 4/10/2022 11:21, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker. Top-posting for once,
> to make this easily accessible to everyone.
>
> Hey Sasha and e1000e developers, what's up there? Two and a half weeks
> ago it seemed the root cause for this regression was found and a
> proposed patch to fix it was added to the bugzilla ticket and even
> tested by the reporter. But since then nothing happened afaics. What's
> up here? Or did I miss something?
Hello Thorsten,
Patch submitted to the IWL:
https://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue.git/commit/?h=dev-queue&id=7dd121b8d5735780b6a70db735d44b3e5b856130
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
>
> #regzbot poke
>
> On 24.03.22 20:36, Neftin, Sasha wrote:
>> On 3/24/2022 17:09, Neftin, Sasha wrote:
>>> On 3/24/2022 12:37, Thorsten Leemhuis wrote:
>>>> Hi, this is your Linux kernel regression tracker.
>>>>
>>>> I noticed a regression report in bugzilla.kernel.org that afaics nobody
>>>> acted upon since it was reported about a week ago, that's why I decided
>>>> to forward it to the lists and a few relevant people to the CC. To quote
>>>> from https://bugzilla.kernel.org/show_bug.cgi?id=215689 :
>>>>
>>>>> [reply] [?] Description James 2022-03-15 13:45:38 UTC
>>>>>
>>>>> I run Arch linux on an Intel NUC 8i3BEH which has the following
>>>>> network card:
>>>>>
>>>>> 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection
>>>>> (6) I219-V (rev 30)
>>>>> ???????? DeviceName:? LAN
>>>>> ???????? Subsystem: Intel Corporation Device 2074
>>>>> ???????? Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
>>>>> ParErr- Stepping- SERR- FastB2B- DisINTx+
>>>>> ???????? Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>>>> TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>> ???????? Latency: 0
>>>>> ???????? Interrupt: pin A routed to IRQ 135
>>>>> ???????? Region 0: Memory at c0b00000 (32-bit, non-prefetchable)
>>>>> [size=128K]
>>>>> ???????? Capabilities: [c8] Power Management version 3
>>>>> ???????????????? Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>>> PME(D0+,D1-,D2-,D3hot+,D3cold+)
>>>>> ???????????????? Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
>>>>> ???????? Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>>> ???????????????? Address: 00000000fee003d8? Data: 0000
>>>>> ???????? Kernel driver in use: e1000e
>>>>> ???????? Kernel modules: e1000e
>>>>>
>>>>> I found a major regression since the previous few kernel versions
>>>>> which causes several odd issues, most noteably I use the machine to
>>>>> stream live tv via TVheadend and was finding this to be unusable
>>>>> (picture freezes and sound breaks up very badly with continuity
>>>>> errors in the TVheadend logfile).
>>>>>
>>>>> I found the issue was introduced since the 5.14 kernel, and have
>>>>> eventually got round to performing a git bisect, which landed upon
>>>>> the following commit:
>>>>>
>>>>> 44a13a5: e1000e: Fix the max snoop/no-snoop latency for 10M
>>>>>
>>>>> Indeed, if I revert this single commit then the problem is resolved.
>>>>>
>>>>> To help diagnose the issue I applied the following patch to capture
>>>>> the values of the lat_enc, max_ltr_enc vs lat_enc_d, max_ltr_enc_d
>>>>> variables:
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> index d60e2016d..f4e5ffbcd 100644
>>>>> --- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
>>>>> @@ -1012,6 +1012,7 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>>> e1000_hw *hw, bool link)
>>>>> ???????? u16 max_ltr_enc_d = 0;? /* maximum LTR decoded by platform */
>>>>> ???????? u16 lat_enc_d = 0;????? /* latency decoded */
>>>>> ???????? u16 lat_enc = 0;??????? /* latency encoded */
>>>>> +?????? struct e1000_adapter *adapter = hw->adapter;
>>>>>
>>>>> ???????? if (link) {
>>>>> ???????????????? u16 speed, duplex, scale = 0;
>>>>> @@ -1074,6 +1075,9 @@ static s32 e1000_platform_pm_pch_lpt(struct
>>>>> e1000_hw *hw, bool link)
>>>>> ????????????????????????????????? ((max_ltr_enc &
>>>>> E1000_LTRV_SCALE_MASK)
>>>>> ????????????????????????????????? >> E1000_LTRV_SCALE_SHIFT)));
>>>>>
>>>>> +?????????????? e_info("e1000e: lat_enc=%d max_ltr_enc=%d", lat_enc,
>>>>> max_ltr_enc);
>>>>> +?????????????? e_info("e1000e: lat_enc_d=%u max_ltr_enc_d=%u",
>>>>> lat_enc_d, max_ltr_enc_d);
>>>>> +
>>>>> ???????????????? if (lat_enc_d > max_ltr_enc_d)
>>>>> ???????????????????????? lat_enc = max_ltr_enc;
>>>>> ???????? }
>>>>>
>>>>> With this in place I see the following in dmesg:
>>>>>
>>>>> [??? 3.241215] e1000e: Intel(R) PRO/1000 Network Driver
>>>>> [??? 3.241217] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
>>>>> [??? 3.243382] e1000e 0000:00:1f.6: Interrupt Throttling Rate
>>>>> (ints/sec) set to dynamic conservative mode
>>>>> [??? 3.749009] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized):
>>>>> registered PHC clock
>>>>> [??? 3.824751] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width
>>>>> x1) 94:c6:91:ae:b3:7b
>>>>> [??? 3.824765] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network
>>>>> Connection
>>>>> [??? 3.824849] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No:
>>>>> FFFFFF-0FF
>>>>> [??? 6.949327] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc=2233
>>>>> max_ltr_enc=4099
>>>>> [??? 6.949331] e1000e 0000:00:1f.6 eth0: e1000e: lat_enc_d=58368
>>>>> max_ltr_enc_d=0
>>>>> [??? 6.951165] e1000e 0000:00:1f.6 eth0: NIC Link is Up 1000 Mbps
>>>>> Full Duplex, Flow Control: Rx/Tx
>>>>>
>>>>> Notice that lat_enc_d=58368 and max_ltr_enc_d=0 !
>>>>>
>>>>> lat_enc_d is greater than max_ltr_enc_d so it's setting snoop
>>>>> latency to max_ltr_enc (i.e. 4099) where it would have previously
>>>>> been set to 2233 in this particular example. This seems to be where
>>>>> the problem lies.
>>>>>
>>>>> Prior to commit 44a13a5:
>>>>>
>>>>> if (lat_enc > max_ltr_enc)
>>>>> ?? lat_enc = max_ltr_enc;
>>>>>
>>>>> After commit 44a13a5:
>>>>>
>>>>> if (lat_enc_d > max_ltr_enc_d)
>>>>> ?? lat_enc = max_ltr_enc;
>>>>>
>>>>>
>>>>> I'm not sure whether it was intended for this new code to take
>>>>> effect for an I219 since the commit message on 44a13a5 indicates it
>>>>> was aimed at I217/I218. Seems strange that max_ltr_enc_d is getting
>>>>> set to 0?
>>>>>
>>>>
>>>> BTW, that commit is from Sasha Neftin.
>>> Hello Thorsten,
>>> I've expected follow decoded values (link 1G)
>>> lat_enc: 0x000008b9 => lat_enc_d: 189440 (1024*185)
>>> max_ltr_enc: 0x00001003 => max_ltr_enc_d: 3145728 (1048576*3)
>>>
>>> scale 0 - 1
>>> scale 1 - 32
>>> scale 2 - 1024
>>> scale 3 - 32768
>>> scale 4 - 1048576 (nano s)
>>>
>>> I've separated calculate:
>>> e_info("e1000e: 1* max_ltr_enc_d: %d\n",
>>> ??????? max_ltr_enc & E1000_LTRV_VALUE_MASK);
>>> e_info("e1000e: 2* max_ltr_enc_d: %d\n",
>>> ??????? (1U << (E1000_LTRV_SCALE_FACTOR *
>>> ??????? ((max_ltr_enc & E1000_LTRV_SCALE_MASK)
>>> ??????? >> E1000_LTRV_SCALE_SHIFT))));
>>> I would expect:
>>> 1* max_ltr_enc_d (value): 3
>>> 2* max_ltr_enc_d (scale): 1048576
>>> and so: value * scale
>>> 1048576*3 = 3145728ns
>>>
>>> Please, let's check it. (I am wondering if over-calculate it)
>>> Thanks,
>>> Sasha
>> ok. Overflow... Instead of
>> +?????? u16 max_ltr_enc_d = 0;? /* maximum LTR decoded by platform */
>> +?????? u16 lat_enc_d = 0;????? /* latency decoded */
>>
>> Should be:
>> +?????? u32 max_ltr_enc_d = 0;? /* maximum LTR decoded by platform */
>> +?????? u32 lat_enc_d = 0;????? /* latency decoded */
>> I will process the patch address this overflow and some e_dbg to
>> eliminate calculation.
>>
>> sudo cat /sys/kernel/debug/pmc_core/ltr_show
>> SOUTHPORT_A???????????????????????? LTR: RAW: 0x0 Non-Snoop(ns):
>> 0?????????????????? Snoop(ns): 0
>> SOUTHPORT_B???????????????????????? LTR: RAW: 0x0 Non-Snoop(ns):
>> 0?????????????????? Snoop(ns): 0
>> SATA??????????????????????????????? LTR: RAW: 0x900f Non-Snoop(ns):
>> 0?????????????????? Snoop(ns): 15728640
>> GIGABIT_ETHERNET??????????????????? LTR: RAW: 0x88b988b9 Non-Snoop(ns):
>> 189440????????????? Snoop(ns): 189440
>> XHCI??????????????????????????????? LTR: RAW: 0x891a Non-Snoop(ns):
>> 0?????????????????? Snoop(ns): 288768
>>
>>>>
>>>> Could somebody take a look into this? Or was this discussed somewhere
>>>> else already? Or even fixed?
>>>>
>>>> Anyway, to get this tracked:
>>>>
>>>> #regzbot introduced: 44a13a5d99c71bf9e1676d9e51679daf4d7b3d73
>>>> #regzbot from: James <jahutchinson99@googlemail.com>
>>>> #regzbot title: net: e1000e: instabilities on I219-V for kernel 5.14
>>>> onwards
>>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215689
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>>
>>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>>>> reports on my table. I can only look briefly into most of them and lack
>>>> knowledge about most of the areas they concern. I thus unfortunately
>>>> will sometimes get things wrong or miss something important. I hope
>>>> that's not the case here; if you think it is, don't hesitate to tell me
>>>> in a public reply, it's in everyone's interest to set the public record
>>>> straight.
>>>>
>>>
>>
>>
>>
Sasha
next prev parent reply other threads:[~2022-04-10 9:26 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-24 10:37 [Intel-wired-lan] Bug 215689 - e1000e: regression for I219-V for kernel 5.14 onwards Thorsten Leemhuis
2022-03-24 15:09 ` Neftin, Sasha
2022-03-24 19:36 ` Neftin, Sasha
2022-04-10 8:21 ` Thorsten Leemhuis
2022-04-10 9:26 ` Neftin, Sasha [this message]
2022-04-10 9:47 ` Thorsten Leemhuis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b63584c0-aa7c-9252-f3ef-7ddc857ad5ae@intel.com \
--to=sasha.neftin@intel.com \
--cc=intel-wired-lan@osuosl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox