On 8/21/2024 8:09 AM, Jiri Slaby wrote: > On 20. 08. 24, 23:30, Bjorn Helgaas wrote: >> On Tue, Aug 20, 2024 at 11:13:54PM +0200, Petr Valenta wrote: >>> Dne 20. 08. 24 v 20:09 Bjorn Helgaas napsal(a): >>>> On Mon, Aug 19, 2024 at 07:23:42AM +0200, Jiri Slaby wrote: >>>>> On 19. 08. 24, 6:50, Jiri Slaby wrote: >>>>>> CC e1000e guys + Jesse (due to 75a3f93b5383) + Bjorn (due to >>>>>> b2c289415b2b) >>>>> >>>>> Bjorn, >>>>> >>>>> I am confused by these changes: >>>>> ========================================== >>>>> @@ -291,16 +288,13 @@ static int e1000_set_link_ksettings(struct >>>>> net_device >>>>> *net >>>>> dev, >>>>>            * duplex is forced. >>>>>            */ >>>>>           if (cmd->base.eth_tp_mdix_ctrl) { >>>>> -               if (hw->phy.media_type != e1000_media_type_copper) { >>>>> -                       ret_val = -EOPNOTSUPP; >>>>> -                       goto out; >>>>> -               } >>>>> +               if (hw->phy.media_type != e1000_media_type_copper) >>>>> +                       return -EOPNOTSUPP; >>>>> >>>>>                   if ((cmd->base.eth_tp_mdix_ctrl != >>>>> ETH_TP_MDI_AUTO) && >>>>>                       (cmd->base.autoneg != AUTONEG_ENABLE)) { >>>>>                           e_err("forcing MDI/MDI-X state is not >>>>> supported when >>>>> lin >>>>> k speed and/or duplex are forced\n"); >>>>> -                       ret_val = -EINVAL; >>>>> -                       goto out; >>>>> +                       return -EINVAL; >>>>>                   } >>>>>           } >>>>> >>>>> @@ -347,7 +341,6 @@ static int e1000_set_link_ksettings(struct >>>>> net_device >>>>> *netde >>>>> v, >>>>>           } >>>>> >>>>>    out: >>>>> -       pm_runtime_put_sync(netdev->dev.parent); >>>>>           clear_bit(__E1000_RESETTING, &adapter->state); >>>>>           return ret_val; >>>>>    } >>>>> ========================================== >>>>> >>>>> So no more clear_bit(__E1000_RESETTING in the above fail paths. Is >>>>> that >>>>> intentional? >>>> >>>> Not intentional.  Petr, do you have the ability to test the patch >>>> below?  I'm not sure it's the correct fix, but it reverts the pieces >>>> of b2c289415b2b that Jiri pointed out. >>> >>> I tested the patch below but it didn't help. After the first boot >>> with new >>> kernel it looked promising as the irq storm only appeared for a few >>> seconds, >>> but with subsequent reboots it was the same as without the patch. >> >> Thank you very much for testing that! > > >>> To be sure, I also send the md5sum of ethtool.c after applying the >>> patch: >>> >>> a25c003257538f16994b4d7c4260e894 ethtool.c >> >> Thanks, that matches what I get when applying the patch on v6.10. >> >> I'm at a loss.  You could try reverting the entire b2c289415b2b commit >> (patch for that is below). > > FWIW he already tested with b2c289415b2b reverted (I provided him with > a built kernel). It behaves the same. So you are not the breaker. > >> If that doesn't help, I guess you could try reverting the other >> commits Jiri mentioned: >> >>    76a0a3f9cc2f e1000e: fix force smbus during suspend flow >>    c93a6f62cb1b e1000e: Fix S0ix residency on corporate systems >>    bfd546a552e1 e1000e: move force SMBUS near the end of enable_ulp >> function >>    6918107e2540 net: e1000e & ixgbe: Remove PCI_HEADER_TYPE_MFD >> duplicates >>    1eb2cded45b3 net: annotate writes on dev->mtu from ndo_change_mtu() >>    b2c289415b2b e1000e: Remove redundant runtime resume for ethtool_ops >>    75a3f93b5383 net: intel: implement modern PM ops declarations >> >> If you do this, I would revert 76a0a3f9cc2f, test, then revert >> c93a6f62cb1b in addition, test, then revert bfd546a552e1 in addition, >> etc. > > Or perhaps easier to do: >   git bisect v6.10 v6.9 -- drivers/net/ethernet/intel/e1000e/ > directly. But that assumes one of the above commits broke it. If they > did not, as a last resort, you can still do full bisect (without the > "-- drivers" part). > > I would take v6.10 suses config. > Would boot 6.10. > do lsmod > /tmp/lsmod > make LSMOD=/tmp/lsmod localyesconfig > make bzImage > and use that bzImage. > > Note that graphics, wireless and other stuff will be defunct unless > you build in firmwares for them (EXTRA_FIRMWARE config). Alternatively > use localmodconfig and build and install also modules (now limited to > your machine). > > thanks, From all the data in the Suse Bugzilla I understood that the issue happens with a cable disconnected. Does it reproduce with a connected cable? Also, I suspect that it might be related to wake up interrupts that the I219 device gets which might cause the IRQ storm. I would like to ask you to enable the debug prints in the e1000e driver and share the dmesg log. I would like to see if we can get the values of WUFC and WUS on on resume. For getting these values please comment out the following in __e1000_resume function in netdev.c file:         /* report the system wakeup cause from S3/S4 */ //      if (adapter->flags2 & FLAG2_HAS_PHY_WAKEUP) {                 u16 phy_data;                 e1e_rphy(&adapter->hw, BM_WUS, &phy_data);                 if (phy_data) {                         e_info("PHY Wakeup cause - %s\n",                                phy_data & E1000_WUS_EX ? "Unicast Packet" :                                phy_data & E1000_WUS_MC ? "Multicast Packet" :                                phy_data & E1000_WUS_BC ? "Broadcast Packet" :                                phy_data & E1000_WUS_MAG ? "Magic Packet" :                                phy_data & E1000_WUS_LNKC ?                                "Link Status Change" : "other");                 }                 e1e_wphy(&adapter->hw, BM_WUS, ~0); //      } else {                 u32 wus = er32(WUS);                 if (wus) {                         e_info("MAC Wakeup cause - %s\n",                                wus & E1000_WUS_EX ? "Unicast Packet" :                                wus & E1000_WUS_MC ? "Multicast Packet" :                                wus & E1000_WUS_BC ? "Broadcast Packet" :                                wus & E1000_WUS_MAG ? "Magic Packet" :                                wus & E1000_WUS_LNKC ? "Link Status Change" :                                "other"); //              }                 ew32(WUS, ~0); To enable the debug prints you can run: echo "module e1000e +p" | sudo tee /sys/kernel/debug/dynamic_debug/control