All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Timo Teras <timo.teras@iki.fi>
Cc: "Lifshits, Vitaly" <vitaly.lifshits@intel.com>,
	intel-wired-lan@osuosl.org, en-wei.wu@canonical.com
Subject: Re: [Intel-wired-lan] [PATCH iwl-next v1 1/1] e1000e: Introduce private flag to disable K1
Date: Wed, 20 Aug 2025 15:51:28 +0200	[thread overview]
Message-ID: <aKXS4IVLImmevNv7@mail-itl> (raw)
In-Reply-To: <20250820162614.43226d39@onyx.my.domain>

[-- Attachment #1: Type: text/plain, Size: 6672 bytes --]

On Wed, Aug 20, 2025 at 04:26:14PM +0300, Timo Teras wrote:
> On Wed, 20 Aug 2025 15:38:12 +0300
> "Lifshits, Vitaly" <vitaly.lifshits@intel.com> wrote:
> 
> > On 8/20/2025 9:57 AM, Timo Teras wrote:
> > 
> > >>>
> > >>> Thanks for adding this!
> > >>>
> > >>> However, as a user, I find it inconvenient if the default setting
> > >>> results in a subtly broken system on a device I just from a store.
> > >>>
> > >>> Since this affects devices from multiple large vendors, would it
> > >>> be possible to add some kind of quirk mechanism to automatically
> > >>> enable this on known "bad" systems. Perhaps something based on
> > >>> the DMI or other system specific information. Could something
> > >>> like this be implemented?
> > >>>
> > >>> At least in my use case I have multiple e1000e using laptops on
> > >>> the same link partner working, and only one broken device for
> > >>> which I reported this issue. So at least on my experience the
> > >>> issue relates to specific system primarily (perhaps also
> > >>> requiring a specific link partner for the issue to show up).  
> > >>
> > >> Unfortunately, there is no visible configuration that allows the
> > >> driver to reliably identify problematic systems.
> > >> If in the future we find such data, then we can improve the
> > >> workaround and make it automatic.
> > >>
> > >> At present, the user-controlled interface is the best we have.  
> > > 
> > > Could you look at:
> > >   - drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c
> > >   - drivers/soundwire/dmi-quirks.c
> > > 
> > > These use dmi_first_match() to match the DMI information of the
> > > system and then apply quirks based on the matching per-system data.
> > > 
> > > Having similar mechanism in e1000e should be possible, right?
> > > 
> > > I am happy to provide the needed DMI information from my system if
> > > this works out.
> > > 
> > > Timo  
> > 
> > Hi Timo,
> > 
> > At the moment, we have no clear knowledge as to which systems may be 
> > affected, and what common characteristics they share.
> > We are working with vendors to try to narrow it down.
> > You are most welcome to share DMI information from your system. It
> > can help with further investigation.
> > 
> > However, maintaining a DMI quirk for every single system for which an 
> > issue has been reported is not feasible. Trying to deduce a pattern
> > from a handful of data points can lead to it being too broad or too
> > narrow. Furthermore, it may set up expectations of updating the quirk
> > every time another user comes and says 'your default setting does not
> > work for me'. This can quickly escalate out of control, and generally
> > seems like the wrong approach.
> > 
> > Ultimately, vendors are best positioned to manage this, as they know 
> > which of their systems require this parameter. If a list were to be 
> > maintained, I’d suggest something similar to what Mario proposed for 
> > Dell platforms a few years ago for a different issue:
> > https://patchwork.ozlabs.org/project/netdev/patch/20201202161748.128938-4-mario.limonciello@dell.com/
> > 
> > For now, I prefer not to delay the current patch, acknowledging that 
> > finding a better solution may take time.
> 
> Thank you for the continued investigation on the issue!
> 
> But I find this commit to not fix the reported regression. Nothing
> changes without additional admin/user changes. Things used to work and
> the added/modified K1 support thing is causing a regression.
> 
> Ubuntu has already reverted the offending patch due to complaints in
> some flavors:
>  https://patchwork.ozlabs.org/project/ubuntu-kernel/patch/20250805071341.41797-2-en-wei.wu@canonical.com/
>  https://bugs.launchpad.net/bugs/2115393
>  https://www.mail-archive.com/kernel-packages@lists.launchpad.net/msg551129.html

Qubes OS also has this change reverted in default kernel, for the same
reason:
https://github.com/QubesOS/qubes-issues/issues/9896
https://github.com/QubesOS/qubes-linux-kernel/commit/4fb8c96dd7bd73dda00a89d026b6ebefff939a67

We've got several reports of the regression caused by the "e1000e:
change k1 configuration on MTP and later platforms", and _none_
complains after reverting it. And we do have many users on MTL or newer.

> This is what I ended up also doing as it reliably fixes things on every
> model I have, and has not caused any of them to have any other issues
> (including packet loss).
> 
> At least mainstream Dell Pro and HP Zbook laptops have been reported to
> be broken. See:
>  https://lists.openwall.net/netdev/2025/07/01/57
>  https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20250623/048860.html
> 
> This seems to be the same issue:
>  https://bugzilla.kernel.org/show_bug.cgi?id=218642
> 
> So some questions at this point:
> 
> If the added K1 configuration does not work and causes regressions,
> could it be reverted and added back when a k1 configuration change that
> can determine the affected systems is ready?
> 
> Could you explain the commit "e1000e: change k1 configuration on MTP
> and later platforms" more? What does it fix? My understanding it is
> "minor packet loss that may affect some machines"?
> 
> How many machines / what kind of scenario is affected? Is it fixing a
> more serious issue than the regression it is causing?
> The regression is completely defunct ethernet after unplugging cable.
> 
> My understanding is that the K1 change affects only power consumption.
> Is this right? How much is the consumption difference? Would it rather
> make sense to disable K1 by default on the potentially affected mac/phy
> versions until a good common denominator is found?

Given the severity of the regression, I'd suggest something like the
above. Have functional configuration by default, and have an option to
potentially improve power consumption. Once criteria when it can be
safely enabled by default are figured out, then it's fine to apply the
improvement by default. But I'd rather have users with functional
ethernet, than slight power (or performance?) improvement at the cost of
completely breaking it for others...

> On the other hand, do you think that asking to have a list of the few
> currently known affected machines (until a simpler common denominator
> can be found) too unreasonable? If the list seems to grow much, it
> would be an indication that the default setting is wrong and changing
> the defaults might be a good idea.

Let me know what info you'd need for such list.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-08-20 14:01 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-19 12:43 [Intel-wired-lan] [PATCH iwl-next v1 1/1] e1000e: Introduce private flag to disable K1 Vitaly Lifshits
2025-08-19 14:19 ` Loktionov, Aleksandr
2025-08-19 15:53   ` Lifshits, Vitaly
2025-08-19 17:10 ` Timo Teras
2025-08-20  6:43   ` Lifshits, Vitaly
2025-08-20  6:57     ` Timo Teras
2025-08-20 12:38       ` Lifshits, Vitaly
2025-08-20 13:26         ` Timo Teras
2025-08-20 13:51           ` Marek Marczykowski-Górecki [this message]
2025-08-21 17:11             ` Lifshits, Vitaly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aKXS4IVLImmevNv7@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=en-wei.wu@canonical.com \
    --cc=intel-wired-lan@osuosl.org \
    --cc=timo.teras@iki.fi \
    --cc=vitaly.lifshits@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.