* pre-boot plugged SFP autoneg advertisement @ 2026-04-18 9:27 markus.stockhausen 2026-04-18 15:25 ` Andrew Lunn 0 siblings, 1 reply; 6+ messages in thread From: markus.stockhausen @ 2026-04-18 9:27 UTC (permalink / raw) To: linux, andrew, hkallweit1, netdev; +Cc: 'Jonas Jelonek', jan Hi, I'm currently analyzing an issue where a pre-boot-plugged SFP module comes up with autoneg=no advertisement during boot. After an unplug/replug autoneg=yes advertisement is chosen. The following addition in phylink_start() just before the call to phylink_mac_initial_config() mitigiates this. + /* If an SFP module was already present before phylink_start() was + * called, phylink_sfp_set_config() was unable to call + * phylink_mac_initial_config() as phylink was not yet started. + * Ensure the SFP capabilities are reflected in advertising. + */ + if (pl->sfp_bus && !linkmode_empty(pl->sfp_support)) + linkmode_copy(pl->link_config.advertising, pl->sfp_support); Remark! This is about the OpenWrt Realtek Switch ecosystem with kernel 6.18 where we are working hard to get hardware up and running. We still rely heavily on pcs/dsa downstream drivers. So I'm unsure if my observation/idea regarding upstream phylink is right. Thanks for your feedback in advance. Markus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pre-boot plugged SFP autoneg advertisement 2026-04-18 9:27 pre-boot plugged SFP autoneg advertisement markus.stockhausen @ 2026-04-18 15:25 ` Andrew Lunn 2026-04-19 8:49 ` AW: " markus.stockhausen 2026-04-20 16:16 ` markus.stockhausen 0 siblings, 2 replies; 6+ messages in thread From: Andrew Lunn @ 2026-04-18 15:25 UTC (permalink / raw) To: markus.stockhausen Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan On Sat, Apr 18, 2026 at 11:27:40AM +0200, markus.stockhausen@gmx.de wrote: > Hi, > > I'm currently analyzing an issue where a pre-boot-plugged SFP module > comes up with autoneg=no advertisement during boot. After an > unplug/replug autoneg=yes advertisement is chosen. > > The following addition in phylink_start() just before the call to > phylink_mac_initial_config() mitigiates this. > > + /* If an SFP module was already present before phylink_start() was > + * called, phylink_sfp_set_config() was unable to call > + * phylink_mac_initial_config() as phylink was not yet started. > + * Ensure the SFP capabilities are reflected in advertising. > + */ > + if (pl->sfp_bus && !linkmode_empty(pl->sfp_support)) > + linkmode_copy(pl->link_config.advertising, pl->sfp_support); Let me see if i have the call chain correct. This is net-next/main from today. phylink_sfp_connect_phy() -> phylink_sfp_config_phy if (changed && !test_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state)) phylink_mac_initial_config(pl, false); You are saying PHYLINK_DISABLE_STOPPED is set, so phylink_mac_initial_config() is not called. What i don't see is how phylink_mac_initial_config() does the linkmode_copy() you are adding. Andrew ^ permalink raw reply [flat|nested] 6+ messages in thread
* AW: pre-boot plugged SFP autoneg advertisement 2026-04-18 15:25 ` Andrew Lunn @ 2026-04-19 8:49 ` markus.stockhausen 2026-04-20 16:16 ` markus.stockhausen 1 sibling, 0 replies; 6+ messages in thread From: markus.stockhausen @ 2026-04-19 8:49 UTC (permalink / raw) To: 'Andrew Lunn' Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd Hi Andrew, > Von: Andrew Lunn <andrew@lunn.ch> > Betreff: Re: pre-boot plugged SFP autoneg advertisement > > > On Sat, Apr 18, 2026 at 11:27:40AM +0200, markus.stockhausen@gmx.de wrote: > > Hi, > > > > I'm currently analyzing an issue where a pre-boot-plugged SFP module > > comes up with autoneg=no advertisement during boot. After an > > unplug/replug autoneg=yes advertisement is chosen. > > > > The following addition in phylink_start() just before the call to > > phylink_mac_initial_config() mitigiates this. > > > > + /* If an SFP module was already present before phylink_start() was > > + * called, phylink_sfp_set_config() was unable to call > > + * phylink_mac_initial_config() as phylink was not yet started. > > + * Ensure the SFP capabilities are reflected in advertising. > > + */ > > + if (pl->sfp_bus && !linkmode_empty(pl->sfp_support)) > > + linkmode_copy(pl->link_config.advertising, pl->sfp_support); > > Let me see if i have the call chain correct. This is net-next/main > from today. > > phylink_sfp_connect_phy() -> > phylink_sfp_config_phy > > if (changed && !test_bit(PHYLINK_DISABLE_STOPPED, > &pl->phylink_disable_state)) > phylink_mac_initial_config(pl, false); > > You are saying PHYLINK_DISABLE_STOPPED is set, so > phylink_mac_initial_config() is not called. > > What i don't see is how phylink_mac_initial_config() does the > linkmode_copy() you are adding. Took that hint/question and digged deeper. Added further debug to each and every linkmode_copy. I think I found the culprit in a userspace ethtool call. For now I assume OpenWrt netifd. Adding my last trace below including the original (wrong) idea. Thank you very much for taking the time and your assistance. Markus [ 3.301299] XXXX phylink_create lan12 set pl->link_config.advertising (autoneg = 1) [ 3.309954] XXXX phylink_parse_mode lan12 set pl->link_config.advertising (autoneg = 1) [ 3.318964] XXX sfp_module_insert lan12 called [ 3.323935] XXXX phylink_sfp_config_optical lan12 set config.advertising (autoneg = 1) [ 3.332815] XXXX phylink_validate_one lan12 set tmp_supported (autoneg = 1) [ 3.340629] XXXX phylink_validate_mask lan12 set supported (autoneg = 1) [ 3.348165] XXXX phylink_validate_mask lan12 set state->advertising (autoneg = 1) [ 3.356527] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12 (uninitialized): XXX phylink_sfp_set_config requesting link mode inband/1000base-x with support 0000000,00000000,00000200,00006440 --- ETHTOOL CALL HERE --- [ 81.213726] XXXX phylink_ethtool_ksettings_set lan12 start got config.advertising (autoneg = 1) [ 81.223542] XXXX phylink_ethtool_ksettings_set lan12 set accoring to kset->base.autoneg (autoneg = 0) [ 81.233961] CPU: 0 UID: 0 PID: 1470 Comm: netifd Tainted: G O 6.18.21 #0 NONE [ 81.234010] Tainted: [O]=OOT_MODULE [ 81.234017] Hardware name: Zyxel XGS1210-12 A1 Switch [ 81.234026] Stack : 823a3bbc 80139d20 00000000 00000001 00000000 00000000 00000000 00000000 [ 81.234094] 00000000 00000000 00000000 00000000 00000000 00000001 823a3b78 82040d00 [ 81.234152] 00000000 00000000 80992870 823a3a10 00000000 ffffefff 00000001 00000224 [ 81.234213] 00000226 823a39d4 00000226 000019c8 00000001 00000000 80992870 80a00000 [ 81.234273] 82764648 00000016 00000016 82764628 00000000 80a913b0 00000000 81990000 [ 81.234334] ... [ 81.234350] Call Trace: [ 81.234356] [<80115e48>] show_stack+0x28/0xf0 [ 81.234407] [<8010fc78>] dump_stack_lvl+0x70/0xb0 [ 81.234432] [<80592718>] phylink_ethtool_ksettings_set+0x58c/0x6a4 [ 81.234479] [<806bb890>] ethtool_set_link_ksettings+0xbc/0x198 [ 81.234516] [<806be01c>] __dev_ethtool+0xfe0/0x1a1c [ 81.234550] [<806beb24>] dev_ethtool+0xcc/0x24c [ 81.234575] [<80678770>] dev_ioctl+0x30c/0x5f4 [ 81.234616] [<80603ebc>] sock_ioctl+0x2bc/0x470 [ 81.234642] [<8036e530>] sys_ioctl+0xb4/0x120 [ 81.234683] [<8011edec>] syscall_common+0x34/0x58 [ 81.234715] [ 81.234723] XXXX 2 phylink_ethtool_ksettings_set lan12 set pl->link_config.advertising (autoneg = 0) ----------- [ 81.375070] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX phylink_start configuring for inband/1000base-x link mode --- INITIAL FIX/IDEA HERE --- [ 81.388408] XXX phylink_start lan12 sfp_bus set and linkmode not empty -> would run linkmode_copy() ----------- [ 81.398688] XXX phylink_mac_initial_config lan12 called with force_restart = 1 [ 81.406752] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX major config, requested inband/1000base-x [ 81.418459] XXXX major_config_entry lan12: autoneg_adv=0 autoneg_sfp=1 sfp_may_have_phy=0 [ 81.427612] XXXX phylink_pcs_neg_mode ENTRY lan12: pl->pcs_neg_mode=0x0 [ 81.435102] XXXX phylink_pcs_neg_mode lan12 advertising autoneg=0 [ 81.442034] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX interface 1000base-x inband modes: pcs=03 phy=00 [ 81.454495] XXXX phylink_pcs_neg_mode lan12 base-x without phy [ 81.461134] XXXX phylink_pcs_neg_mode EXIT lan12 pl->pcs_neg_mode = 0x40 pl->act_link_an_mode = 0x2 [ 82.085493] XXXX phylink_mac_pcs_get_state lan12 set state->advertising (autoneg = 0) [ 82.094391] XXXX phylink_mac_pcs_get_state lan12 autoneg is 0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* AW: pre-boot plugged SFP autoneg advertisement 2026-04-18 15:25 ` Andrew Lunn 2026-04-19 8:49 ` AW: " markus.stockhausen @ 2026-04-20 16:16 ` markus.stockhausen 2026-04-20 17:57 ` Andrew Lunn 1 sibling, 1 reply; 6+ messages in thread From: markus.stockhausen @ 2026-04-20 16:16 UTC (permalink / raw) To: 'Andrew Lunn' Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd, 'Daniel Golle' > Von: markus.stockhausen@gmx.de <markus.stockhausen@gmx.de> > Gesendet: Sonntag, 19. April 2026 10:49 > An: 'Andrew Lunn' <andrew@lunn.ch> > Betreff: AW: pre-boot plugged SFP autoneg advertisement > > Took that hint/question and digged deeper. Added further debug > to each and every linkmode_copy. I think I found the culprit in > a userspace ethtool call. For now I assume OpenWrt netifd. Hi Andrew, once again thanks for your help. After further investigation I hopefully can add more details. I think I got the whole picture now. So some additional background information about the environment. - Realtek RTL930x devices with SFP+ module slots - These are driven directly by a SerDes (controlled by downstream PCS driver) - The DTS reads port11: port@11 { reg = <11>; label = "lan12" ; pcs-handle = <&serdes8>; phy-mode = "1000base-x"; sfp = <&sfp1>; managed = "in-band-status"; }; Sequence of events during boot is as follows: - SFP module is already inserted (in my case 1G) - phylink_sfp_config_phy() runs long before any network config starts - OpenWrt netifd daemon starts and wants to configure the network interfaces - It reads current settings via ethtool ioctl and gets autoneg=off - It writes basic config values via ethtool ioctl including autneg=off - Later on it starts the interface and phylink_start() is issued With my limited knowledge I would patch phylink_ethtool_ksettings_get(). /* The MAC is reporting the link results from its own PCS * layer via in-band status. Report these as the current * link settings. */ phylink_get_ksettings(&link_state, kset); break; + case MLO_AN_PHY: + /* SFP module present at boot but phylink not yet started. + * Return autonegotiation as set by phylink_sfp_config_phy(). + */ + if (pl->sfp_bus && !pl->phydev) + kset->base.autoneg = + linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, + pl->link_config.advertising) + ? AUTONEG_ENABLE : AUTONEG_DISABLE; + break; } return 0; } This comes from the observation that - pl->link_config.advertising is filled by phylink_sfp_set_config() - state MLO_AN_PHY is reported before phylink_start() - state MLO_AN_INBAND is reported after phylink_start() Is this reasonable or am I totally off? Markus ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: pre-boot plugged SFP autoneg advertisement 2026-04-20 16:16 ` markus.stockhausen @ 2026-04-20 17:57 ` Andrew Lunn 2026-04-20 19:10 ` AW: " markus.stockhausen 0 siblings, 1 reply; 6+ messages in thread From: Andrew Lunn @ 2026-04-20 17:57 UTC (permalink / raw) To: markus.stockhausen Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd, 'Daniel Golle' On Mon, Apr 20, 2026 at 06:16:34PM +0200, markus.stockhausen@gmx.de wrote: > > Von: markus.stockhausen@gmx.de <markus.stockhausen@gmx.de> > > Gesendet: Sonntag, 19. April 2026 10:49 > > An: 'Andrew Lunn' <andrew@lunn.ch> > > Betreff: AW: pre-boot plugged SFP autoneg advertisement > > > > Took that hint/question and digged deeper. Added further debug > > to each and every linkmode_copy. I think I found the culprit in > > a userspace ethtool call. For now I assume OpenWrt netifd. > > Hi Andrew, > > once again thanks for your help. After further investigation I hopefully can > add > more details. I think I got the whole picture now. So some additional > background > information about the environment. > > - Realtek RTL930x devices with SFP+ module slots > - These are driven directly by a SerDes (controlled by downstream PCS > driver) > - The DTS reads > > port11: port@11 { > reg = <11>; > label = "lan12" ; > pcs-handle = <&serdes8>; > phy-mode = "1000base-x"; > sfp = <&sfp1>; > managed = "in-band-status"; > }; > > Sequence of events during boot is as follows: > > - SFP module is already inserted (in my case 1G) > - phylink_sfp_config_phy() runs long before any network config starts > - OpenWrt netifd daemon starts and wants to configure the network interfaces > - It reads current settings via ethtool ioctl and gets autoneg=off > - It writes basic config values via ethtool ioctl including autneg=off > - Later on it starts the interface and phylink_start() is issued I would say netifd is not optimal. I'm not sure we every agree to return the full ksetting on an interface which is admin down. Many driver don't even connect to the PHY until open is called, and so are likely to return -ENODEV. See phy_ethtool_set_link_ksettings(). Could you look into the behaviour of netifd, especially if it gets -ENODEV during the first read. Does it try again after setting the interface up? Could you disable netifd and manually configure the interface up. Does it get autoneg correct then? Now, i think it is useful to be able to configure an interface when it is admin down. So if ksetting_get does not return -ENODEV it probably should return the full and correct information. However, im not sure your change is sufficient to do that, since what an interface can actually do is the common subset of what the MAC, PCS and SFP can do. So just taking the value from the SFP does not feel correct to me, at least not without having a deeper understanding of what phylink is doing. And Russell King is busy with other things are the moment. So i think we are looking at multiple problems/solutions here: netifd should does a second ksettings_get after setting the interface admin up, and reevaluates how the interface should be configured. If we know phylink is going to return a subset of the correct information when the interface is admin down, maybe it should return -ENODEV? Is it possible in general to make phylink return the full correct ksetting when start() has not been called. We need to think about multiple use cases here, not just an SFP, but also a PHY, a fixed link and a BASE-T PHY inside an SFP module. Maybe it needs to sometimes return -ENODEV, other times it can return correct information? Andrew ^ permalink raw reply [flat|nested] 6+ messages in thread
* AW: pre-boot plugged SFP autoneg advertisement 2026-04-20 17:57 ` Andrew Lunn @ 2026-04-20 19:10 ` markus.stockhausen 0 siblings, 0 replies; 6+ messages in thread From: markus.stockhausen @ 2026-04-20 19:10 UTC (permalink / raw) To: 'Andrew Lunn' Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd, 'Daniel Golle' > Von: Andrew Lunn <andrew@lunn.ch> > Gesendet: Montag, 20. April 2026 19:58 > An: markus.stockhausen@gmx.de > > > Sequence of events during boot is as follows: > > > > - SFP module is already inserted (in my case 1G) > > - phylink_sfp_config_phy() runs long before any network config starts > > - OpenWrt netifd daemon starts and wants to configure the network interfaces > > - It reads current settings via ethtool ioctl and gets autoneg=off > > - It writes basic config values via ethtool ioctl including autneg=off > > - Later on it starts the interface and phylink_start() is issued > > I would say netifd is not optimal. I'm not sure we every agree to > return the full ksetting on an interface which is admin down. Many > driver don't even connect to the PHY until open is called, and so are > likely to return -ENODEV. See phy_ethtool_set_link_ksettings(). > > Could you look into the behaviour of netifd, especially if it gets > -ENODEV during the first read. Does it try again after setting the > interface up? Netifd has no issues with linksettings reading/writing in admin state. Getting a rc=0 it assumes that all values are filled, changes the needed attributes and writes them back. I retested and think there might be a solution to avoid unneeded Ioctl access (see [1]) > If we know phylink is going to return a subset of the correct > information when the interface is admin down, maybe it should return > -ENODEV? Or (stupid idea) phylink_ethtool_ksettings_set() should not accept all settings in this state. Thanks for your valuable input. Markus [1] https://github.com/openwrt/netifd/issues/76#issuecomment-4283478081 ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2026-04-20 19:10 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-18 9:27 pre-boot plugged SFP autoneg advertisement markus.stockhausen 2026-04-18 15:25 ` Andrew Lunn 2026-04-19 8:49 ` AW: " markus.stockhausen 2026-04-20 16:16 ` markus.stockhausen 2026-04-20 17:57 ` Andrew Lunn 2026-04-20 19:10 ` AW: " markus.stockhausen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox