public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* pre-boot plugged SFP autoneg advertisement
@ 2026-04-18  9:27 markus.stockhausen
  2026-04-18 15:25 ` Andrew Lunn
  0 siblings, 1 reply; 6+ messages in thread
From: markus.stockhausen @ 2026-04-18  9:27 UTC (permalink / raw)
  To: linux, andrew, hkallweit1, netdev; +Cc: 'Jonas Jelonek', jan

Hi,

I'm currently analyzing an issue where a pre-boot-plugged SFP module 
comes up with autoneg=no advertisement during boot. After an
unplug/replug autoneg=yes advertisement is chosen. 

The following addition in phylink_start() just before the call to
phylink_mac_initial_config() mitigiates this.

+  /* If an SFP module was already present before phylink_start() was
+   * called, phylink_sfp_set_config() was unable to call
+   * phylink_mac_initial_config() as phylink was not yet started.
+   * Ensure the SFP capabilities are reflected in advertising.
+   */
+  if (pl->sfp_bus && !linkmode_empty(pl->sfp_support))
+    linkmode_copy(pl->link_config.advertising, pl->sfp_support);

Remark! This is about the OpenWrt Realtek Switch ecosystem with 
kernel 6.18 where we are working hard to get hardware up and 
running. We still rely heavily on pcs/dsa downstream drivers. So 
I'm unsure if my observation/idea regarding upstream phylink is 
right.

Thanks for your feedback in advance.

Markus


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pre-boot plugged SFP autoneg advertisement
  2026-04-18  9:27 pre-boot plugged SFP autoneg advertisement markus.stockhausen
@ 2026-04-18 15:25 ` Andrew Lunn
  2026-04-19  8:49   ` AW: " markus.stockhausen
  2026-04-20 16:16   ` markus.stockhausen
  0 siblings, 2 replies; 6+ messages in thread
From: Andrew Lunn @ 2026-04-18 15:25 UTC (permalink / raw)
  To: markus.stockhausen
  Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan

On Sat, Apr 18, 2026 at 11:27:40AM +0200, markus.stockhausen@gmx.de wrote:
> Hi,
> 
> I'm currently analyzing an issue where a pre-boot-plugged SFP module 
> comes up with autoneg=no advertisement during boot. After an
> unplug/replug autoneg=yes advertisement is chosen. 
> 
> The following addition in phylink_start() just before the call to
> phylink_mac_initial_config() mitigiates this.
> 
> +  /* If an SFP module was already present before phylink_start() was
> +   * called, phylink_sfp_set_config() was unable to call
> +   * phylink_mac_initial_config() as phylink was not yet started.
> +   * Ensure the SFP capabilities are reflected in advertising.
> +   */
> +  if (pl->sfp_bus && !linkmode_empty(pl->sfp_support))
> +    linkmode_copy(pl->link_config.advertising, pl->sfp_support);

Let me see if i have the call chain correct. This is net-next/main
from today.

phylink_sfp_connect_phy() ->
  phylink_sfp_config_phy

        if (changed && !test_bit(PHYLINK_DISABLE_STOPPED,
                                 &pl->phylink_disable_state))
                phylink_mac_initial_config(pl, false);

You are saying PHYLINK_DISABLE_STOPPED is set, so
phylink_mac_initial_config() is not called.

What i don't see is how phylink_mac_initial_config() does the
linkmode_copy() you are adding.

	Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: pre-boot plugged SFP autoneg advertisement
  2026-04-18 15:25 ` Andrew Lunn
@ 2026-04-19  8:49   ` markus.stockhausen
  2026-04-20 16:16   ` markus.stockhausen
  1 sibling, 0 replies; 6+ messages in thread
From: markus.stockhausen @ 2026-04-19  8:49 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd

Hi Andrew,

> Von: Andrew Lunn <andrew@lunn.ch> 
> Betreff: Re: pre-boot plugged SFP autoneg advertisement
> 
> > On Sat, Apr 18, 2026 at 11:27:40AM +0200, markus.stockhausen@gmx.de
wrote:
> > Hi,
> > 
> > I'm currently analyzing an issue where a pre-boot-plugged SFP module 
> > comes up with autoneg=no advertisement during boot. After an
> > unplug/replug autoneg=yes advertisement is chosen. 
> > 
> > The following addition in phylink_start() just before the call to
> > phylink_mac_initial_config() mitigiates this.
> > 
> > +  /* If an SFP module was already present before phylink_start() was
> > +   * called, phylink_sfp_set_config() was unable to call
> > +   * phylink_mac_initial_config() as phylink was not yet started.
> > +   * Ensure the SFP capabilities are reflected in advertising.
> > +   */
> > +  if (pl->sfp_bus && !linkmode_empty(pl->sfp_support))
> > +    linkmode_copy(pl->link_config.advertising, pl->sfp_support);
>
> Let me see if i have the call chain correct. This is net-next/main
> from today.
>
> phylink_sfp_connect_phy() ->
>   phylink_sfp_config_phy
>
>         if (changed && !test_bit(PHYLINK_DISABLE_STOPPED,
>                                  &pl->phylink_disable_state))
>                 phylink_mac_initial_config(pl, false);
>
> You are saying PHYLINK_DISABLE_STOPPED is set, so
> phylink_mac_initial_config() is not called.
>
> What i don't see is how phylink_mac_initial_config() does the
> linkmode_copy() you are adding.

Took that hint/question and digged deeper. Added further debug
to each and every linkmode_copy. I think I found the culprit in 
a userspace ethtool call. For now I assume OpenWrt netifd.
Adding my last trace below including the original (wrong) idea. 

Thank you very much for taking the time and your assistance.

Markus

[    3.301299] XXXX phylink_create lan12 set pl->link_config.advertising
(autoneg = 1)
[    3.309954] XXXX phylink_parse_mode lan12 set pl->link_config.advertising
(autoneg = 1)
[    3.318964] XXX sfp_module_insert lan12 called
[    3.323935] XXXX phylink_sfp_config_optical lan12 set config.advertising
(autoneg = 1)
[    3.332815] XXXX phylink_validate_one lan12 set tmp_supported (autoneg =
1)
[    3.340629] XXXX phylink_validate_mask lan12 set supported (autoneg = 1)
[    3.348165] XXXX phylink_validate_mask lan12 set state->advertising
(autoneg = 1)
[    3.356527] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12
(uninitialized): XXX phylink_sfp_set_config requesting link mode
inband/1000base-x with support 0000000,00000000,00000200,00006440
--- ETHTOOL CALL HERE ---
[   81.213726] XXXX phylink_ethtool_ksettings_set lan12 start got
config.advertising (autoneg = 1)
[   81.223542] XXXX phylink_ethtool_ksettings_set lan12 set accoring to
kset->base.autoneg (autoneg = 0)
[   81.233961] CPU: 0 UID: 0 PID: 1470 Comm: netifd Tainted: G           O
6.18.21 #0 NONE
[   81.234010] Tainted: [O]=OOT_MODULE
[   81.234017] Hardware name: Zyxel XGS1210-12 A1 Switch
[   81.234026] Stack : 823a3bbc 80139d20 00000000 00000001 00000000 00000000
00000000 00000000
[   81.234094]         00000000 00000000 00000000 00000000 00000000 00000001
823a3b78 82040d00
[   81.234152]         00000000 00000000 80992870 823a3a10 00000000 ffffefff
00000001 00000224
[   81.234213]         00000226 823a39d4 00000226 000019c8 00000001 00000000
80992870 80a00000
[   81.234273]         82764648 00000016 00000016 82764628 00000000 80a913b0
00000000 81990000
[   81.234334]         ...
[   81.234350] Call Trace:
[   81.234356] [<80115e48>] show_stack+0x28/0xf0
[   81.234407] [<8010fc78>] dump_stack_lvl+0x70/0xb0
[   81.234432] [<80592718>] phylink_ethtool_ksettings_set+0x58c/0x6a4
[   81.234479] [<806bb890>] ethtool_set_link_ksettings+0xbc/0x198
[   81.234516] [<806be01c>] __dev_ethtool+0xfe0/0x1a1c
[   81.234550] [<806beb24>] dev_ethtool+0xcc/0x24c
[   81.234575] [<80678770>] dev_ioctl+0x30c/0x5f4
[   81.234616] [<80603ebc>] sock_ioctl+0x2bc/0x470
[   81.234642] [<8036e530>] sys_ioctl+0xb4/0x120
[   81.234683] [<8011edec>] syscall_common+0x34/0x58
[   81.234715]
[   81.234723] XXXX 2 phylink_ethtool_ksettings_set lan12 set
pl->link_config.advertising (autoneg = 0)
-----------
[   81.375070] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX
phylink_start configuring for inband/1000base-x link mode
--- INITIAL FIX/IDEA HERE ---
[   81.388408] XXX phylink_start lan12 sfp_bus set and linkmode not empty ->
would run linkmode_copy()
-----------
[   81.398688] XXX phylink_mac_initial_config lan12 called with
force_restart = 1
[   81.406752] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX
major config, requested inband/1000base-x
[   81.418459] XXXX major_config_entry lan12: autoneg_adv=0 autoneg_sfp=1
sfp_may_have_phy=0
[   81.427612] XXXX phylink_pcs_neg_mode ENTRY lan12: pl->pcs_neg_mode=0x0
[   81.435102] XXXX phylink_pcs_neg_mode lan12 advertising autoneg=0
[   81.442034] rtl83xx-switch 1b000000.switchcore:ethernet-switch lan12: XXX
interface 1000base-x inband modes: pcs=03 phy=00
[   81.454495] XXXX phylink_pcs_neg_mode lan12 base-x without phy
[   81.461134] XXXX phylink_pcs_neg_mode EXIT lan12 pl->pcs_neg_mode = 0x40
pl->act_link_an_mode = 0x2
[   82.085493] XXXX phylink_mac_pcs_get_state lan12 set state->advertising
(autoneg = 0)
[   82.094391] XXXX phylink_mac_pcs_get_state lan12 autoneg is 0


^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: pre-boot plugged SFP autoneg advertisement
  2026-04-18 15:25 ` Andrew Lunn
  2026-04-19  8:49   ` AW: " markus.stockhausen
@ 2026-04-20 16:16   ` markus.stockhausen
  2026-04-20 17:57     ` Andrew Lunn
  1 sibling, 1 reply; 6+ messages in thread
From: markus.stockhausen @ 2026-04-20 16:16 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd,
	'Daniel Golle'

> Von: markus.stockhausen@gmx.de <markus.stockhausen@gmx.de> 
> Gesendet: Sonntag, 19. April 2026 10:49
> An: 'Andrew Lunn' <andrew@lunn.ch>
> Betreff: AW: pre-boot plugged SFP autoneg advertisement
>
> Took that hint/question and digged deeper. Added further debug
> to each and every linkmode_copy. I think I found the culprit in 
> a userspace ethtool call. For now I assume OpenWrt netifd.

Hi Andrew,

once again thanks for your help. After further investigation I hopefully can
add 
more details. I think I got the whole picture now. So some additional
background 
information about the environment. 

- Realtek RTL930x devices with SFP+ module slots
- These are driven directly by a SerDes (controlled by downstream PCS
driver)
- The DTS reads

	port11: port@11 { 
		reg = <11>;
		label = "lan12" ;
		pcs-handle = <&serdes8>;
		phy-mode = "1000base-x";
		sfp = <&sfp1>;
		managed = "in-band-status";
	};

Sequence of events during boot is as follows:

- SFP module is already inserted (in my case 1G)
- phylink_sfp_config_phy() runs long before any network config starts
- OpenWrt netifd daemon starts and wants to configure the network interfaces
- It reads current settings via ethtool ioctl and gets autoneg=off
- It writes basic config values via ethtool ioctl including autneg=off
- Later on it starts the interface and phylink_start() is issued

With my limited knowledge I would patch phylink_ethtool_ksettings_get().

		/* The MAC is reporting the link results from its own PCS
		 * layer via in-band status. Report these as the current
		 * link settings.
		 */
		phylink_get_ksettings(&link_state, kset);
		break;

+	case MLO_AN_PHY:
+		/* SFP module present at boot but phylink not yet started.
+		 * Return autonegotiation as set by
phylink_sfp_config_phy().
+		 */
+		if (pl->sfp_bus && !pl->phydev)
+			kset->base.autoneg =
+
linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
+
pl->link_config.advertising)
+				? AUTONEG_ENABLE : AUTONEG_DISABLE;
+		break;
	}

	return 0;
}

This comes from the observation that

- pl->link_config.advertising is filled by phylink_sfp_set_config()
- state MLO_AN_PHY is reported before phylink_start()
- state MLO_AN_INBAND is reported after phylink_start()

Is this reasonable or am I totally off?

Markus



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pre-boot plugged SFP autoneg advertisement
  2026-04-20 16:16   ` markus.stockhausen
@ 2026-04-20 17:57     ` Andrew Lunn
  2026-04-20 19:10       ` AW: " markus.stockhausen
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Lunn @ 2026-04-20 17:57 UTC (permalink / raw)
  To: markus.stockhausen
  Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd,
	'Daniel Golle'

On Mon, Apr 20, 2026 at 06:16:34PM +0200, markus.stockhausen@gmx.de wrote:
> > Von: markus.stockhausen@gmx.de <markus.stockhausen@gmx.de> 
> > Gesendet: Sonntag, 19. April 2026 10:49
> > An: 'Andrew Lunn' <andrew@lunn.ch>
> > Betreff: AW: pre-boot plugged SFP autoneg advertisement
> >
> > Took that hint/question and digged deeper. Added further debug
> > to each and every linkmode_copy. I think I found the culprit in 
> > a userspace ethtool call. For now I assume OpenWrt netifd.
> 
> Hi Andrew,
> 
> once again thanks for your help. After further investigation I hopefully can
> add 
> more details. I think I got the whole picture now. So some additional
> background 
> information about the environment. 
> 
> - Realtek RTL930x devices with SFP+ module slots
> - These are driven directly by a SerDes (controlled by downstream PCS
> driver)
> - The DTS reads
> 
> 	port11: port@11 { 
> 		reg = <11>;
> 		label = "lan12" ;
> 		pcs-handle = <&serdes8>;
> 		phy-mode = "1000base-x";
> 		sfp = <&sfp1>;
> 		managed = "in-band-status";
> 	};
> 
> Sequence of events during boot is as follows:
> 
> - SFP module is already inserted (in my case 1G)
> - phylink_sfp_config_phy() runs long before any network config starts
> - OpenWrt netifd daemon starts and wants to configure the network interfaces
> - It reads current settings via ethtool ioctl and gets autoneg=off
> - It writes basic config values via ethtool ioctl including autneg=off
> - Later on it starts the interface and phylink_start() is issued

I would say netifd is not optimal. I'm not sure we every agree to
return the full ksetting on an interface which is admin down. Many
driver don't even connect to the PHY until open is called, and so are
likely to return -ENODEV. See phy_ethtool_set_link_ksettings().

Could you look into the behaviour of netifd, especially if it gets
-ENODEV during the first read. Does it try again after setting the
interface up?

Could you disable netifd and manually configure the interface up. Does
it get autoneg correct then?

Now, i think it is useful to be able to configure an interface when it
is admin down. So if ksetting_get does not return -ENODEV it probably
should return the full and correct information. However, im not sure
your change is sufficient to do that, since what an interface can
actually do is the common subset of what the MAC, PCS and SFP can
do. So just taking the value from the SFP does not feel correct to me,
at least not without having a deeper understanding of what phylink is
doing. And Russell King is busy with other things are the moment.

So i think we are looking at multiple problems/solutions here:

netifd should does a second ksettings_get after setting the interface
admin up, and reevaluates how the interface should be configured.

If we know phylink is going to return a subset of the correct
information when the interface is admin down, maybe it should return
-ENODEV?

Is it possible in general to make phylink return the full correct
ksetting when start() has not been called. We need to think about
multiple use cases here, not just an SFP, but also a PHY, a fixed link
and a BASE-T PHY inside an SFP module. Maybe it needs to sometimes
return -ENODEV, other times it can return correct information?

       Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* AW: pre-boot plugged SFP autoneg advertisement
  2026-04-20 17:57     ` Andrew Lunn
@ 2026-04-20 19:10       ` markus.stockhausen
  0 siblings, 0 replies; 6+ messages in thread
From: markus.stockhausen @ 2026-04-20 19:10 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: linux, hkallweit1, netdev, 'Jonas Jelonek', jan, nbd,
	'Daniel Golle'

> Von: Andrew Lunn <andrew@lunn.ch> 
> Gesendet: Montag, 20. April 2026 19:58
> An: markus.stockhausen@gmx.de
> 
> > Sequence of events during boot is as follows:
> > 
> > - SFP module is already inserted (in my case 1G)
> > - phylink_sfp_config_phy() runs long before any network config starts
> > - OpenWrt netifd daemon starts and wants to configure the network
interfaces
> > - It reads current settings via ethtool ioctl and gets autoneg=off
> > - It writes basic config values via ethtool ioctl including autneg=off
> > - Later on it starts the interface and phylink_start() is issued
>
> I would say netifd is not optimal. I'm not sure we every agree to
> return the full ksetting on an interface which is admin down. Many
> driver don't even connect to the PHY until open is called, and so are
> likely to return -ENODEV. See phy_ethtool_set_link_ksettings().
>
> Could you look into the behaviour of netifd, especially if it gets
> -ENODEV during the first read. Does it try again after setting the
> interface up?

Netifd has no issues with linksettings reading/writing in admin state.
Getting a rc=0 it assumes that all values are filled, changes the needed
attributes and writes them back. I retested and think there might be a 
solution to avoid unneeded Ioctl access (see [1])

> If we know phylink is going to return a subset of the correct
> information when the interface is admin down, maybe it should return
> -ENODEV?

Or (stupid idea) phylink_ethtool_ksettings_set() should not accept all 
settings in this state. 

Thanks for your valuable input.

Markus

[1] https://github.com/openwrt/netifd/issues/76#issuecomment-4283478081


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-20 19:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-18  9:27 pre-boot plugged SFP autoneg advertisement markus.stockhausen
2026-04-18 15:25 ` Andrew Lunn
2026-04-19  8:49   ` AW: " markus.stockhausen
2026-04-20 16:16   ` markus.stockhausen
2026-04-20 17:57     ` Andrew Lunn
2026-04-20 19:10       ` AW: " markus.stockhausen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox