* [net PATCH 0/2] Fixes for net/phy/phylink.c
@ 2025-04-01 21:29 Alexander Duyck
2025-04-01 21:30 ` [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting Alexander Duyck
2025-04-01 21:30 ` [net PATCH 2/2] net: phylink: Set advertising based on phy_lookup_setting in ksettings_set Alexander Duyck
0 siblings, 2 replies; 35+ messages in thread
From: Alexander Duyck @ 2025-04-01 21:29 UTC (permalink / raw)
To: netdev; +Cc: linux, andrew, hkallweit1, davem, kuba, pabeni, maxime.chevallier
This series consists of two minor fixes for phylink.
The first cleans up the handling of fixed-link for cases where the driver
may not be using a twisted pair connection. In such a case we essentially
leave it up to the driver to validate the supported connection types itself
likely via the pcs_validate function.
The second addresses an issue where the SFP refused to change speeds when
requested via the ksettings_set call due to the fact that the
config.advertising field wasn't updated when not using autonegotiation.
---
Alexander Duyck (2):
net: phylink: Cleanup handling of recent changes to phy_lookup_setting
net: phylink: Set advertising based on phy_lookup_setting in ksettings_set
drivers/net/phy/phylink.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
--
^ permalink raw reply [flat|nested] 35+ messages in thread* [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-01 21:29 [net PATCH 0/2] Fixes for net/phy/phylink.c Alexander Duyck @ 2025-04-01 21:30 ` Alexander Duyck 2025-04-02 7:00 ` Maxime Chevallier 2025-04-03 14:55 ` Russell King (Oracle) 2025-04-01 21:30 ` [net PATCH 2/2] net: phylink: Set advertising based on phy_lookup_setting in ksettings_set Alexander Duyck 1 sibling, 2 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-01 21:30 UTC (permalink / raw) To: netdev; +Cc: linux, andrew, hkallweit1, davem, kuba, pabeni, maxime.chevallier From: Alexander Duyck <alexanderduyck@fb.com> The blamed commit introduced an issue where it was limiting the link configuration so that we couldn't use fixed-link mode for any settings other than twisted pair modes 10G or less. As a result this was causing the driver to lose any advertised/lp_advertised/supported modes when setup as a fixed link. To correct this we can add a check to identify if the user is in fact enabling a TP mode and then apply the mask to select only 1 of each speed for twisted pair instead of applying this before we know the number of bits set. Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> --- drivers/net/phy/phylink.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 16a1f31f0091..380e51c5bdaa 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -713,17 +713,24 @@ static int phylink_parse_fixedlink(struct phylink *pl, phylink_warn(pl, "fixed link specifies half duplex for %dMbps link?\n", pl->link_config.speed); - linkmode_zero(pl->supported); - phylink_fill_fixedlink_supported(pl->supported); - + linkmode_fill(pl->supported); linkmode_copy(pl->link_config.advertising, pl->supported); phylink_validate(pl, pl->supported, &pl->link_config); c = phy_caps_lookup(pl->link_config.speed, pl->link_config.duplex, pl->supported, true); - if (c) + if (c) { linkmode_and(match, pl->supported, c->linkmodes); + /* Compatbility with the legacy behaviour: + * Report one single BaseT mode. + */ + phylink_fill_fixedlink_supported(mask); + if (linkmode_intersects(match, mask)) + linkmode_and(match, match, mask); + linkmode_zero(mask); + } + linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT, mask); linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, mask); linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, mask); ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-01 21:30 ` [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting Alexander Duyck @ 2025-04-02 7:00 ` Maxime Chevallier 2025-04-02 14:21 ` Alexander H Duyck 2025-04-03 14:55 ` Russell King (Oracle) 1 sibling, 1 reply; 35+ messages in thread From: Maxime Chevallier @ 2025-04-02 7:00 UTC (permalink / raw) To: Alexander Duyck; +Cc: netdev, linux, andrew, hkallweit1, davem, kuba, pabeni Hi Alexander, On Tue, 01 Apr 2025 14:30:06 -0700 Alexander Duyck <alexander.duyck@gmail.com> wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > The blamed commit introduced an issue where it was limiting the link > configuration so that we couldn't use fixed-link mode for any settings > other than twisted pair modes 10G or less. As a result this was causing the > driver to lose any advertised/lp_advertised/supported modes when setup as a > fixed link. > > To correct this we can add a check to identify if the user is in fact > enabling a TP mode and then apply the mask to select only 1 of each speed > for twisted pair instead of applying this before we know the number of bits > set. The commit title should be : net: phylink: ... > Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration") > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> With that, Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Maxime ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-02 7:00 ` Maxime Chevallier @ 2025-04-02 14:21 ` Alexander H Duyck 2025-04-02 17:17 ` Jakub Kicinski 0 siblings, 1 reply; 35+ messages in thread From: Alexander H Duyck @ 2025-04-02 14:21 UTC (permalink / raw) To: Maxime Chevallier; +Cc: netdev, linux, andrew, hkallweit1, davem, kuba, pabeni On Wed, 2025-04-02 at 09:00 +0200, Maxime Chevallier wrote: > Hi Alexander, > > On Tue, 01 Apr 2025 14:30:06 -0700 > Alexander Duyck <alexander.duyck@gmail.com> wrote: > > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > The blamed commit introduced an issue where it was limiting the link > > configuration so that we couldn't use fixed-link mode for any settings > > other than twisted pair modes 10G or less. As a result this was causing the > > driver to lose any advertised/lp_advertised/supported modes when setup as a > > fixed link. > > > > To correct this we can add a check to identify if the user is in fact > > enabling a TP mode and then apply the mask to select only 1 of each speed > > for twisted pair instead of applying this before we know the number of bits > > set. > > The commit title should be : > > net: phylink: ... > > > Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration") > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > With that, > > Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> > > Maxime If needed I can resubmit. I had realized it when I went to send the cover letter and edited it in another terminal. I completely overlooked that a change like that would have changed the commit ID so it stuck to the original. Thanks, - Alex ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-02 14:21 ` Alexander H Duyck @ 2025-04-02 17:17 ` Jakub Kicinski 2025-04-02 17:30 ` Russell King (Oracle) 0 siblings, 1 reply; 35+ messages in thread From: Jakub Kicinski @ 2025-04-02 17:17 UTC (permalink / raw) To: Alexander H Duyck Cc: Maxime Chevallier, netdev, linux, andrew, hkallweit1, davem, pabeni On Wed, 02 Apr 2025 07:21:05 -0700 Alexander H Duyck wrote: > If needed I can resubmit. Let's get a repost, the pw-bots are sensitive to subject changes so I shouldn't edit.. -- pw-bot: cr ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-02 17:17 ` Jakub Kicinski @ 2025-04-02 17:30 ` Russell King (Oracle) 2025-04-02 22:37 ` Alexander Duyck 0 siblings, 1 reply; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-02 17:30 UTC (permalink / raw) To: Jakub Kicinski Cc: Alexander H Duyck, Maxime Chevallier, netdev, andrew, hkallweit1, davem, pabeni On Wed, Apr 02, 2025 at 10:17:20AM -0700, Jakub Kicinski wrote: > On Wed, 02 Apr 2025 07:21:05 -0700 Alexander H Duyck wrote: > > If needed I can resubmit. > > Let's get a repost, the pw-bots are sensitive to subject changes > so I shouldn't edit.. Note that I've yet to review this. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-02 17:30 ` Russell King (Oracle) @ 2025-04-02 22:37 ` Alexander Duyck 0 siblings, 0 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-02 22:37 UTC (permalink / raw) To: Russell King (Oracle) Cc: Jakub Kicinski, Maxime Chevallier, netdev, andrew, hkallweit1, davem, pabeni On Wed, Apr 2, 2025 at 10:30 AM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Wed, Apr 02, 2025 at 10:17:20AM -0700, Jakub Kicinski wrote: > > On Wed, 02 Apr 2025 07:21:05 -0700 Alexander H Duyck wrote: > > > If needed I can resubmit. > > > > Let's get a repost, the pw-bots are sensitive to subject changes > > so I shouldn't edit.. > > Note that I've yet to review this. I will wait for your review and then resubmit with the second patch dropped. Thanks, - Alex ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-01 21:30 ` [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting Alexander Duyck 2025-04-02 7:00 ` Maxime Chevallier @ 2025-04-03 14:55 ` Russell King (Oracle) 2025-04-03 15:29 ` Maxime Chevallier 1 sibling, 1 reply; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-03 14:55 UTC (permalink / raw) To: Alexander Duyck Cc: netdev, andrew, hkallweit1, davem, kuba, pabeni, maxime.chevallier On Tue, Apr 01, 2025 at 02:30:06PM -0700, Alexander Duyck wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > The blamed commit introduced an issue where it was limiting the link > configuration so that we couldn't use fixed-link mode for any settings > other than twisted pair modes 10G or less. As a result this was causing the > driver to lose any advertised/lp_advertised/supported modes when setup as a > fixed link. > > To correct this we can add a check to identify if the user is in fact > enabling a TP mode and then apply the mask to select only 1 of each speed > for twisted pair instead of applying this before we know the number of bits > set. > > Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration") > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > --- > drivers/net/phy/phylink.c | 15 +++++++++++---- > 1 file changed, 11 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 16a1f31f0091..380e51c5bdaa 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -713,17 +713,24 @@ static int phylink_parse_fixedlink(struct phylink *pl, > phylink_warn(pl, "fixed link specifies half duplex for %dMbps link?\n", > pl->link_config.speed); > > - linkmode_zero(pl->supported); > - phylink_fill_fixedlink_supported(pl->supported); > - > + linkmode_fill(pl->supported); > linkmode_copy(pl->link_config.advertising, pl->supported); > phylink_validate(pl, pl->supported, &pl->link_config); > > c = phy_caps_lookup(pl->link_config.speed, pl->link_config.duplex, > pl->supported, true); > - if (c) > + if (c) { > linkmode_and(match, pl->supported, c->linkmodes); > > + /* Compatbility with the legacy behaviour: > + * Report one single BaseT mode. > + */ > + phylink_fill_fixedlink_supported(mask); > + if (linkmode_intersects(match, mask)) > + linkmode_and(match, match, mask); > + linkmode_zero(mask); > + } > + I'm still wondering about the wiseness of exposing more than one link mode for something that's supposed to be fixed-link. For gigabit fixed links, even if we have: phy-mode = "1000base-x"; speed = <1000>; full-duplex; in DT, we still state to ethtool: Supported link modes: 1000baseT/Full Advertised link modes: 1000baseT/Full Link partner advertised link modes: 1000baseT/Full Link partner advertised auto-negotiation: No Speed: 1000Mb/s Duplex: Full Auto-negotiation: on despite it being a 1000base-X link. This is perfectly reasonable, because of the origins of fixed-links - these existed as a software emulated baseT PHY no matter what the underlying link was. So, is getting the right link mode for the underlying link important for fixed-links? I don't think it is. Does it make sense to publish multiple link modes for a fixed-link? I don't think it does, because if multiple link modes are published, it means that it isn't fixed. As for arguments about the number of lanes, that's a property of the PHY_INTERFACE_MODE_xxx. There's a long history of this, e.g. MII/RMII is effectively a very early illustration of reducing the number of lanes, yet we don't have separate link modes for these. So, I'm still uneasy about this approach. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-03 14:55 ` Russell King (Oracle) @ 2025-04-03 15:29 ` Maxime Chevallier 2025-04-03 16:34 ` Andrew Lunn 0 siblings, 1 reply; 35+ messages in thread From: Maxime Chevallier @ 2025-04-03 15:29 UTC (permalink / raw) To: Russell King (Oracle) Cc: Alexander Duyck, netdev, andrew, hkallweit1, davem, kuba, pabeni On Thu, 3 Apr 2025 15:55:45 +0100 "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > On Tue, Apr 01, 2025 at 02:30:06PM -0700, Alexander Duyck wrote: > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > The blamed commit introduced an issue where it was limiting the link > > configuration so that we couldn't use fixed-link mode for any settings > > other than twisted pair modes 10G or less. As a result this was causing the > > driver to lose any advertised/lp_advertised/supported modes when setup as a > > fixed link. > > > > To correct this we can add a check to identify if the user is in fact > > enabling a TP mode and then apply the mask to select only 1 of each speed > > for twisted pair instead of applying this before we know the number of bits > > set. > > > > Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration") > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > --- > > drivers/net/phy/phylink.c | 15 +++++++++++---- > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > > index 16a1f31f0091..380e51c5bdaa 100644 > > --- a/drivers/net/phy/phylink.c > > +++ b/drivers/net/phy/phylink.c > > @@ -713,17 +713,24 @@ static int phylink_parse_fixedlink(struct phylink *pl, > > phylink_warn(pl, "fixed link specifies half duplex for %dMbps link?\n", > > pl->link_config.speed); > > > > - linkmode_zero(pl->supported); > > - phylink_fill_fixedlink_supported(pl->supported); > > - > > + linkmode_fill(pl->supported); > > linkmode_copy(pl->link_config.advertising, pl->supported); > > phylink_validate(pl, pl->supported, &pl->link_config); > > > > c = phy_caps_lookup(pl->link_config.speed, pl->link_config.duplex, > > pl->supported, true); > > - if (c) > > + if (c) { > > linkmode_and(match, pl->supported, c->linkmodes); > > > > + /* Compatbility with the legacy behaviour: > > + * Report one single BaseT mode. > > + */ > > + phylink_fill_fixedlink_supported(mask); > > + if (linkmode_intersects(match, mask)) > > + linkmode_and(match, match, mask); > > + linkmode_zero(mask); > > + } > > + > > I'm still wondering about the wiseness of exposing more than one link > mode for something that's supposed to be fixed-link. > > For gigabit fixed links, even if we have: > > phy-mode = "1000base-x"; > speed = <1000>; > full-duplex; > > in DT, we still state to ethtool: > > Supported link modes: 1000baseT/Full > Advertised link modes: 1000baseT/Full > Link partner advertised link modes: 1000baseT/Full > Link partner advertised auto-negotiation: No > Speed: 1000Mb/s > Duplex: Full > Auto-negotiation: on > > despite it being a 1000base-X link. This is perfectly reasonable, > because of the origins of fixed-links - these existed as a software > emulated baseT PHY no matter what the underlying link was. > > So, is getting the right link mode for the underlying link important > for fixed-links? I don't think it is. Does it make sense to publish > multiple link modes for a fixed-link? I don't think it does, because > if multiple link modes are published, it means that it isn't fixed. That's a good point. The way I saw that was : "we report all the modes because, being fixed-link, it can be any of these modes." But I agree with you in that this doesn't show that "this is fixed, don't try to change that, this won't work". So, I do agree with you now. > As for arguments about the number of lanes, that's a property of the > PHY_INTERFACE_MODE_xxx. There's a long history of this, e.g. MII/RMII > is effectively a very early illustration of reducing the number of > lanes, yet we don't have separate link modes for these. > > So, I'm still uneasy about this approach. So, how about extending the compat list of "first link of each speed" to all the modes, then once the "mediums" addition from the phy_port lands, we simplify it down the following way : Looking at the current list of elegible fixed-link linkmodes, we have (I'm taking this from one of your mails) : speed duplex linkmode 10M Half 10baseT_Half 10M Full 10baseT_Full 100M Half 100baseT_Half 100M Full 100baseT_Full 1G Half 1000baseT_Half 1G Full 1000baseT_Full (this changed over time) 2.5G Full 2500baseT_Full 5G Full 5000baseT_Full 10G Full 10000baseCR_Full (used to be 10000baseKR_Full) 20G Full 20000baseKR2_Full => there's no 20GBaseCR* 25G Full 25000baseCR_Full 40G Full 40000baseCR4_Full 50G Full 50000baseCR2_Full 56G Full 56000baseCR4_Full 100G Full 100000baseCR4_Full To avoid maintaining a hardcoded list, we could clearly specifying what we report in fixed-link : 1 : Any BaseT mode for the given speed duplex (BaseT and not BaseT1) 2 : If there's none, Any BaseK mode for that speed/duplex 3 : If there's none, Any BaseC mode for that speed/duplex That's totally arbitrary of course, and if one day someone adds, say, 25GBaseT, fixed-link linkmode will change. Another issue us 10G, 10GBaseT exists, but wasn't the first choice. Another idea could be to add a Fixed linkmode BIT, like we have for aneg, pause, asym_pause, and report 2 linkmodes : Supported link modes: 1000baseT/Full Fixed Advertised link modes: 1000baseT/Full Fixed Link partner advertised link modes: 1000baseT/Full Fixed The first "legacy" linkmode will still be reported for compat, we add a second one to tell userspace that this is Fixed, don't try to make any sense out of it ? But that may just overcomplicate the whole thing and leave yet another way for the linkmodes to be abused in drivers. Maxime ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-03 15:29 ` Maxime Chevallier @ 2025-04-03 16:34 ` Andrew Lunn 2025-04-03 21:53 ` Alexander Duyck 0 siblings, 1 reply; 35+ messages in thread From: Andrew Lunn @ 2025-04-03 16:34 UTC (permalink / raw) To: Maxime Chevallier Cc: Russell King (Oracle), Alexander Duyck, netdev, hkallweit1, davem, kuba, pabeni On Thu, Apr 03, 2025 at 05:29:53PM +0200, Maxime Chevallier wrote: > On Thu, 3 Apr 2025 15:55:45 +0100 > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > > > On Tue, Apr 01, 2025 at 02:30:06PM -0700, Alexander Duyck wrote: > > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > > > The blamed commit introduced an issue where it was limiting the link > > > configuration so that we couldn't use fixed-link mode for any settings > > > other than twisted pair modes 10G or less. As a result this was causing the > > > driver to lose any advertised/lp_advertised/supported modes when setup as a > > > fixed link. > > > > > > To correct this we can add a check to identify if the user is in fact > > > enabling a TP mode and then apply the mask to select only 1 of each speed > > > for twisted pair instead of applying this before we know the number of bits > > > set. > > > > > > Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration") > > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > > --- > > > drivers/net/phy/phylink.c | 15 +++++++++++---- > > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > > > index 16a1f31f0091..380e51c5bdaa 100644 > > > --- a/drivers/net/phy/phylink.c > > > +++ b/drivers/net/phy/phylink.c > > > @@ -713,17 +713,24 @@ static int phylink_parse_fixedlink(struct phylink *pl, > > > phylink_warn(pl, "fixed link specifies half duplex for %dMbps link?\n", > > > pl->link_config.speed); > > > > > > - linkmode_zero(pl->supported); > > > - phylink_fill_fixedlink_supported(pl->supported); > > > - > > > + linkmode_fill(pl->supported); > > > linkmode_copy(pl->link_config.advertising, pl->supported); > > > phylink_validate(pl, pl->supported, &pl->link_config); > > > > > > c = phy_caps_lookup(pl->link_config.speed, pl->link_config.duplex, > > > pl->supported, true); > > > - if (c) > > > + if (c) { > > > linkmode_and(match, pl->supported, c->linkmodes); > > > > > > + /* Compatbility with the legacy behaviour: > > > + * Report one single BaseT mode. > > > + */ > > > + phylink_fill_fixedlink_supported(mask); > > > + if (linkmode_intersects(match, mask)) > > > + linkmode_and(match, match, mask); > > > + linkmode_zero(mask); > > > + } > > > + > > > > I'm still wondering about the wiseness of exposing more than one link > > mode for something that's supposed to be fixed-link. > > > > For gigabit fixed links, even if we have: > > > > phy-mode = "1000base-x"; > > speed = <1000>; > > full-duplex; > > > > in DT, we still state to ethtool: > > > > Supported link modes: 1000baseT/Full > > Advertised link modes: 1000baseT/Full > > Link partner advertised link modes: 1000baseT/Full > > Link partner advertised auto-negotiation: No > > Speed: 1000Mb/s > > Duplex: Full > > Auto-negotiation: on > > > > despite it being a 1000base-X link. This is perfectly reasonable, > > because of the origins of fixed-links - these existed as a software > > emulated baseT PHY no matter what the underlying link was. > > > > So, is getting the right link mode for the underlying link important > > for fixed-links? I don't think it is. Does it make sense to publish > > multiple link modes for a fixed-link? I don't think it does, because > > if multiple link modes are published, it means that it isn't fixed. > > That's a good point. The way I saw that was : > > "we report all the modes because, being fixed-link, it can be > any of these modes." > > But I agree with you in that this doesn't show that "this is fixed, > don't try to change that, this won't work". So, I do agree with you now. > > > As for arguments about the number of lanes, that's a property of the > > PHY_INTERFACE_MODE_xxx. There's a long history of this, e.g. MII/RMII > > is effectively a very early illustration of reducing the number of > > lanes, yet we don't have separate link modes for these. > > > > So, I'm still uneasy about this approach. > > So, how about extending the compat list of "first link of each speed" > to all the modes, then once the "mediums" addition from the phy_port > lands, we simplify it down the following way : > > Looking at the current list of elegible fixed-link linkmodes, we have > (I'm taking this from one of your mails) : > > speed duplex linkmode > 10M Half 10baseT_Half > 10M Full 10baseT_Full > 100M Half 100baseT_Half > 100M Full 100baseT_Full > 1G Half 1000baseT_Half > 1G Full 1000baseT_Full (this changed over time) > 2.5G Full 2500baseT_Full > 5G Full 5000baseT_Full > 10G Full 10000baseCR_Full (used to be 10000baseKR_Full) > 20G Full 20000baseKR2_Full => there's no 20GBaseCR* > 25G Full 25000baseCR_Full > 40G Full 40000baseCR4_Full > 50G Full 50000baseCR2_Full > 56G Full 56000baseCR4_Full > 100G Full 100000baseCR4_Full > > To avoid maintaining a hardcoded list, we could clearly specifying > what we report in fixed-link : > > 1 : Any BaseT mode for the given speed duplex (BaseT and not BaseT1) > 2 : If there's none, Any BaseK mode for that speed/duplex > 3 : If there's none, Any BaseC mode for that speed/duplex > > That's totally arbitrary of course, and if one day someone adds, say, > 25GBaseT, fixed-link linkmode will change. Another issue us 10G, > 10GBaseT exists, but wasn't the first choice. Maybe go back to why fixed-link exists? It is basically a hack to make MAC configuration easier. It was originally used for MAC to MAC connections, e.g. a NIC connected to a switch, without PHYs in the middle. By faking a PHY, there was no need to add any special configuration API to the MAC, the phylib adjust_link callback would be sufficient to tell the MAC to speed and duplex to use. For {R}{G}MII, or SGMII, that is all you need to know. The phy-mode told you to configure the MAC to MII, GMII, SGMII. But things evolved since then. We started having PHYs which change their host side depending on their media side. SGMII for <= 1G, 2500BaseX, 5GBaseX, 10GBaseX. It became necessary for the adjust_link callback to look at more than just the speed/duplex, it also needed to look at the phy_interface_t. phy-mode looses its meaning, it might be considered the default until we know better. But consider the use case, a hack to allow configuration of a MAC to MAC connection. The link mode does not change depending on the media, there is no media. The switch will not be changing its port configuration. The link really is fixed. phy-mode tells you the basic configuration, and then adjust_link/mac_link_up tells you the speed/dupex if there are multiple speeds/duplex supported, e.g. RGMII/SGMII. What Alex is trying to do is abuse fixed link for something which is not a MAC-MAC connection, something which is not fixed. Do we want to support that? Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-03 16:34 ` Andrew Lunn @ 2025-04-03 21:53 ` Alexander Duyck 2025-04-03 23:19 ` Russell King (Oracle) 2025-04-03 23:26 ` Russell King (Oracle) 0 siblings, 2 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-03 21:53 UTC (permalink / raw) To: Andrew Lunn Cc: Maxime Chevallier, Russell King (Oracle), netdev, hkallweit1, davem, kuba, pabeni On Thu, Apr 3, 2025 at 9:34 AM Andrew Lunn <andrew@lunn.ch> wrote: > > On Thu, Apr 03, 2025 at 05:29:53PM +0200, Maxime Chevallier wrote: > > On Thu, 3 Apr 2025 15:55:45 +0100 > > "Russell King (Oracle)" <linux@armlinux.org.uk> wrote: > > > > > On Tue, Apr 01, 2025 at 02:30:06PM -0700, Alexander Duyck wrote: > > > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > > > > > The blamed commit introduced an issue where it was limiting the link > > > > configuration so that we couldn't use fixed-link mode for any settings > > > > other than twisted pair modes 10G or less. As a result this was causing the > > > > driver to lose any advertised/lp_advertised/supported modes when setup as a > > > > fixed link. > > > > > > > > To correct this we can add a check to identify if the user is in fact > > > > enabling a TP mode and then apply the mask to select only 1 of each speed > > > > for twisted pair instead of applying this before we know the number of bits > > > > set. > > > > > > > > Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration") > > > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > > > --- > > > > drivers/net/phy/phylink.c | 15 +++++++++++---- > > > > 1 file changed, 11 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > > > > index 16a1f31f0091..380e51c5bdaa 100644 > > > > --- a/drivers/net/phy/phylink.c > > > > +++ b/drivers/net/phy/phylink.c > > > > @@ -713,17 +713,24 @@ static int phylink_parse_fixedlink(struct phylink *pl, > > > > phylink_warn(pl, "fixed link specifies half duplex for %dMbps link?\n", > > > > pl->link_config.speed); > > > > > > > > - linkmode_zero(pl->supported); > > > > - phylink_fill_fixedlink_supported(pl->supported); > > > > - > > > > + linkmode_fill(pl->supported); > > > > linkmode_copy(pl->link_config.advertising, pl->supported); > > > > phylink_validate(pl, pl->supported, &pl->link_config); > > > > > > > > c = phy_caps_lookup(pl->link_config.speed, pl->link_config.duplex, > > > > pl->supported, true); > > > > - if (c) > > > > + if (c) { > > > > linkmode_and(match, pl->supported, c->linkmodes); > > > > > > > > + /* Compatbility with the legacy behaviour: > > > > + * Report one single BaseT mode. > > > > + */ > > > > + phylink_fill_fixedlink_supported(mask); > > > > + if (linkmode_intersects(match, mask)) > > > > + linkmode_and(match, match, mask); > > > > + linkmode_zero(mask); > > > > + } > > > > + > > > > > > I'm still wondering about the wiseness of exposing more than one link > > > mode for something that's supposed to be fixed-link. > > > > > > For gigabit fixed links, even if we have: > > > > > > phy-mode = "1000base-x"; > > > speed = <1000>; > > > full-duplex; > > > > > > in DT, we still state to ethtool: > > > > > > Supported link modes: 1000baseT/Full > > > Advertised link modes: 1000baseT/Full > > > Link partner advertised link modes: 1000baseT/Full > > > Link partner advertised auto-negotiation: No > > > Speed: 1000Mb/s > > > Duplex: Full > > > Auto-negotiation: on > > > > > > despite it being a 1000base-X link. This is perfectly reasonable, > > > because of the origins of fixed-links - these existed as a software > > > emulated baseT PHY no matter what the underlying link was. > > > > > > So, is getting the right link mode for the underlying link important > > > for fixed-links? I don't think it is. Does it make sense to publish > > > multiple link modes for a fixed-link? I don't think it does, because > > > if multiple link modes are published, it means that it isn't fixed. > > > > That's a good point. The way I saw that was : > > > > "we report all the modes because, being fixed-link, it can be > > any of these modes." > > > > But I agree with you in that this doesn't show that "this is fixed, > > don't try to change that, this won't work". So, I do agree with you now. > > > > > As for arguments about the number of lanes, that's a property of the > > > PHY_INTERFACE_MODE_xxx. There's a long history of this, e.g. MII/RMII > > > is effectively a very early illustration of reducing the number of > > > lanes, yet we don't have separate link modes for these. > > > > > > So, I'm still uneasy about this approach. > > > > So, how about extending the compat list of "first link of each speed" > > to all the modes, then once the "mediums" addition from the phy_port > > lands, we simplify it down the following way : > > > > Looking at the current list of elegible fixed-link linkmodes, we have > > (I'm taking this from one of your mails) : > > > > speed duplex linkmode > > 10M Half 10baseT_Half > > 10M Full 10baseT_Full > > 100M Half 100baseT_Half > > 100M Full 100baseT_Full > > 1G Half 1000baseT_Half > > 1G Full 1000baseT_Full (this changed over time) > > 2.5G Full 2500baseT_Full > > 5G Full 5000baseT_Full > > 10G Full 10000baseCR_Full (used to be 10000baseKR_Full) > > 20G Full 20000baseKR2_Full => there's no 20GBaseCR* > > 25G Full 25000baseCR_Full > > 40G Full 40000baseCR4_Full > > 50G Full 50000baseCR2_Full > > 56G Full 56000baseCR4_Full > > 100G Full 100000baseCR4_Full > > > > To avoid maintaining a hardcoded list, we could clearly specifying > > what we report in fixed-link : > > > > 1 : Any BaseT mode for the given speed duplex (BaseT and not BaseT1) > > 2 : If there's none, Any BaseK mode for that speed/duplex > > 3 : If there's none, Any BaseC mode for that speed/duplex > > > > That's totally arbitrary of course, and if one day someone adds, say, > > 25GBaseT, fixed-link linkmode will change. Another issue us 10G, > > 10GBaseT exists, but wasn't the first choice. > > Maybe go back to why fixed-link exists? It is basically a hack to make > MAC configuration easier. It was originally used for MAC to MAC > connections, e.g. a NIC connected to a switch, without PHYs in the > middle. By faking a PHY, there was no need to add any special > configuration API to the MAC, the phylib adjust_link callback would be > sufficient to tell the MAC to speed and duplex to use. For {R}{G}MII, > or SGMII, that is all you need to know. The phy-mode told you to > configure the MAC to MII, GMII, SGMII. Another issue is that how you would define the connection between the two endpoints is changing. Maxime is basing his data off of speed/duplex however to source that he is pulling data from link_mode_params that is starting to broaden including things like lanes. I really think going forward lanes is going to start playing a role as we get into the higher speeds and it is already becoming a standard config item to use to strip out unsupported modes when configuring the interface via autoneg. > But things evolved since then. We started having PHYs which change > their host side depending on their media side. SGMII for <= 1G, > 2500BaseX, 5GBaseX, 10GBaseX. It became necessary for the adjust_link > callback to look at more than just the speed/duplex, it also needed to > look at the phy_interface_t. phy-mode looses its meaning, it might be > considered the default until we know better. I am wondering about that. I know I specified we were XLGMII for fbnic but that has proven problematic since we aren't actually 40G. So we are still essentially just reporting link up/down using that. That is why I was looking at going with a fixed mode as I can at least specify the correct speed duplex for the one speed I am using if I want to use ethtool_ksettings_get. I have a patch to add the correct phy_interface_t modes for 50, and 100G links. However one thing I am seeing is that after I set the initial interface type I cannot change the interface type without the SFP code added. One thing I was wondering. Should I just ignore the phy_interface_t on the pcs_config call and use the link mode mask flags in autoneg and the speed/duplex/lanes in non-autoneg to configure the link? It seems like that is what the SFP code itself is doing based on my patch 2 in the set. > But consider the use case, a hack to allow configuration of a MAC to > MAC connection. The link mode does not change depending on the media, > there is no media. The switch will not be changing its port > configuration. The link really is fixed. phy-mode tells you the basic > configuration, and then adjust_link/mac_link_up tells you the > speed/dupex if there are multiple speeds/duplex supported, > e.g. RGMII/SGMII. > > What Alex is trying to do is abuse fixed link for something which is > not a MAC-MAC connection, something which is not fixed. Do we want to > support that? How is it not a fixed link? If anything it was going to be more fixed than what you described above. In our case the connection type is indicated by the FW and we aren't meant to change it unless we want to be without a link. The link goes up and down and that would be about it. So we essentially advertise one link mode bit and are locked on the interface configured at creation. I was basically taking that config from the FW, creating a SW node with it, and then using that to set up the link w/o permitting changes. The general idea was that I would use that to limp along until I could get the QSFP support added and then add support for configuring the link with the ethtool_ksettings_set call. Mainly we just need that functionality for our own testing as the production case is non-autoneg fixed links only. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-03 21:53 ` Alexander Duyck @ 2025-04-03 23:19 ` Russell King (Oracle) 2025-04-04 15:56 ` Alexander Duyck 2025-04-03 23:26 ` Russell King (Oracle) 1 sibling, 1 reply; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-03 23:19 UTC (permalink / raw) To: Alexander Duyck Cc: Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > On Thu, Apr 3, 2025 at 9:34 AM Andrew Lunn <andrew@lunn.ch> wrote: > > Maybe go back to why fixed-link exists? It is basically a hack to make > > MAC configuration easier. It was originally used for MAC to MAC > > connections, e.g. a NIC connected to a switch, without PHYs in the > > middle. By faking a PHY, there was no need to add any special > > configuration API to the MAC, the phylib adjust_link callback would be > > sufficient to tell the MAC to speed and duplex to use. For {R}{G}MII, > > or SGMII, that is all you need to know. The phy-mode told you to > > configure the MAC to MII, GMII, SGMII. > > Another issue is that how you would define the connection between the > two endpoints is changing. Maxime is basing his data off of > speed/duplex however to source that he is pulling data from > link_mode_params that is starting to broaden including things like > lanes. Just a quick correction - this is not entirely correct. It's speed, duplex and "lanes" is defined by interface mode. For example, 10GBASER is a single lane, as is SGMII, 1000BASE-X, 2500BASE-X. XLGMII and CGMII are defined by 802.3 as 8 lanes (clause 81.) speed and duplex just define the speed operated over the link defined by the PHY interface mode. (I've previously described why we don't go to that depth with fixed links, but to briefly state it, it's what we've done in the past and it's visible to the user, and we try to avoid breaking userspace.) > I really think going forward lanes is going to start playing a > role as we get into the higher speeds and it is already becoming a > standard config item to use to strip out unsupported modes when > configuring the interface via autoneg. Don't vendors already implement downshift for cases where there are problems with lanes/cabling? > I am wondering about that. I know I specified we were XLGMII for fbnic > but that has proven problematic since we aren't actually 40G. If you aren't actually 40G, then you aren't actually XLGMII as defined by 802.3... so that begs the question - what are you! > So we > are still essentially just reporting link up/down using that. That is > why I was looking at going with a fixed mode as I can at least specify > the correct speed duplex for the one speed I am using if I want to use > ethtool_ksettings_get. > > I have a patch to add the correct phy_interface_t modes for 50, and > 100G links. However one thing I am seeing is that after I set the > initial interface type I cannot change the interface type without the > SFP code added. One thing I was wondering. Should I just ignore the > phy_interface_t on the pcs_config call and use the link mode mask > flags in autoneg and the speed/duplex/lanes in non-autoneg to > configure the link? It seems like that is what the SFP code itself is > doing based on my patch 2 in the set. That is most certainly *not* what the SFP code is doing. As things stand today, everything respects the PHY interface mode, if it says SGMII then we get SGMII. If it says 1000BASE-X, we get 1000BASE-X. If it says 2500BASE-X, then that's what we get... and so on. The SFP code has added support to switch between 2500BASE-X and 1000BASE-X because this is a use case with optical SFPs that can operate at either speed. That's why this happens for the SFP case. For PHYs, modern PHYs switch their host facing interface, so we support the interface mode changing there - under the control of the PHY and most certainly not under user control (the user doesn't know how the PHY has been configured and whether the PHY does switch or rate adapt.) For everything else, we're in fixed host interface mode, because that is how we've been - and to do anything else is new development. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-03 23:19 ` Russell King (Oracle) @ 2025-04-04 15:56 ` Alexander Duyck 2025-04-04 16:33 ` Andrew Lunn 2025-04-05 9:10 ` Russell King (Oracle) 0 siblings, 2 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-04 15:56 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Thu, Apr 3, 2025 at 4:19 PM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > > On Thu, Apr 3, 2025 at 9:34 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > Maybe go back to why fixed-link exists? It is basically a hack to make > > > MAC configuration easier. It was originally used for MAC to MAC > > > connections, e.g. a NIC connected to a switch, without PHYs in the > > > middle. By faking a PHY, there was no need to add any special > > > configuration API to the MAC, the phylib adjust_link callback would be > > > sufficient to tell the MAC to speed and duplex to use. For {R}{G}MII, > > > or SGMII, that is all you need to know. The phy-mode told you to > > > configure the MAC to MII, GMII, SGMII. > > > > Another issue is that how you would define the connection between the > > two endpoints is changing. Maxime is basing his data off of > > speed/duplex however to source that he is pulling data from > > link_mode_params that is starting to broaden including things like > > lanes. > > Just a quick correction - this is not entirely correct. It's speed, > duplex and "lanes" is defined by interface mode. > > For example, 10GBASER is a single lane, as is SGMII, 1000BASE-X, > 2500BASE-X. XLGMII and CGMII are defined by 802.3 as 8 lanes (clause > 81.) > > speed and duplex just define the speed operated over the link defined > by the PHY interface mode. > > (I've previously described why we don't go to that depth with fixed > links, but to briefly state it, it's what we've done in the past and > it's visible to the user, and we try to avoid breaking userspace.) Part of the issue is that I think I may be mixing terms up and I have to be careful of that. If I am not mistaken you refer to the PHY as the full setup from the MII down to the other side of the PMA or maybe PMD. I also have to resist calling our PMA a PHY as it is a SerDes PHY, not an Ethernet PHY. Am I understanding that correctly? The PMD is where we get into the media types, but the reason why I am focused on lanes is because my interfaces are essentially defined by the combination of an MII on the top, and coming out the bottom at the PMA we have either one or two lanes operating in NRZ or PAM4 giving us at least 4 total combinations that I am concerned with excluding the media type since I am essentially running chip to module. > > I really think going forward lanes is going to start playing a > > role as we get into the higher speeds and it is already becoming a > > standard config item to use to strip out unsupported modes when > > configuring the interface via autoneg. > > Don't vendors already implement downshift for cases where there are > problems with lanes/cabling? One issue with doing any sort of downshift is that RSFEC and the alignment markers are different for each configuration. As such I don't think downshifting is an option without changing interface modes. If the other end is 50R2 and one of the lanes is dead then the link is dead. No optional downshift to 25R. > > I am wondering about that. I know I specified we were XLGMII for fbnic > > but that has proven problematic since we aren't actually 40G. > > If you aren't actually 40G, then you aren't actually XLGMII as > defined by 802.3... so that begs the question - what are you! Confused.. Sadly confused.. So I am still trying to grok all this but I think I am getting there. The issue is that XLGMII was essentially overclocked to get to the 50GMII. That is what we do have. Our hardware supports a 50GMII with open loop rate matching to 25G, and CGMII, but they refer to the 50GMII as XLGMII throughout the documentation which led to my initial confusion when I implemented the limited support we have upstream now. On the PMA end of things like I mentioned we support NRZ (25.78125) or PAM4 (26.5625*2) and 1 or 2 lanes. To complicate things 50G is a mess in and of itself. There are two specifications for the 50R2 w/ RS setup. One is the IEEE which uses RS544 and the other is the Ethernet Consortium which uses RS528. If I reference LAUI or LAUI-2 that is the the setup the IEEE referred to in their 50G setup w/o FEC. Since that was the common overlap between the two setups I figured I would use that to refer to this mode. I am also overloading the meaning of it to reference 50G with RS528 or BASER. > > So we > > are still essentially just reporting link up/down using that. That is > > why I was looking at going with a fixed mode as I can at least specify > > the correct speed duplex for the one speed I am using if I want to use > > ethtool_ksettings_get. > > > > I have a patch to add the correct phy_interface_t modes for 50, and > > 100G links. However one thing I am seeing is that after I set the > > initial interface type I cannot change the interface type without the > > SFP code added. One thing I was wondering. Should I just ignore the > > phy_interface_t on the pcs_config call and use the link mode mask > > flags in autoneg and the speed/duplex/lanes in non-autoneg to > > configure the link? It seems like that is what the SFP code itself is > > doing based on my patch 2 in the set. > > That is most certainly *not* what the SFP code is doing. As things stand > today, everything respects the PHY interface mode, if it says SGMII then > we get SGMII. If it says 1000BASE-X, we get 1000BASE-X. If it says > 2500BASE-X, then that's what we get... and so on. I think I may be starting to understand where my confusing came from. While the SFP code may be correctly behaved I don't think other PCS drivers are, either that or they were implemented for much more than what they support. For example looking at the xpcs (https://elixir.bootlin.com/linux/v6.14-rc6/C/ident/xpcs_get_max_xlgmii_speed) I don't see how you are getting 100G out of an XLGMII interface. I'm guessing somebody is checking for bits that will never be set. So then if I am understanding correctly the expectation right now is that once an interface mode is set, it is set. Do I have that right? Is it acceptable for pcs_get_state to return an interface value other than what is currently set? Based on the code I would assume that is the case, but currently that won't result in a change without a phydev. > The SFP code has added support to switch between 2500BASE-X and > 1000BASE-X because this is a use case with optical SFPs that can operate > at either speed. That's why this happens for the SFP case. Okay, so it looks like I will have to add code for new use cases then as essentially there are standards in place for how to autoneg between one or two lane modes as long as we stick to either NRZ or PAM4. > For PHYs, modern PHYs switch their host facing interface, so we support > the interface mode changing there - under the control of the PHY and > most certainly not under user control (the user doesn't know how the > PHY has been configured and whether the PHY does switch or rate > adapt.) I think I see what I am missing. The issue I have is that, assuming I can ever get autoneg code added, we can essentially get autoneg that tells us to jump between 50R or 100R2, or 25R and 50R2. If I am not mistaken based on the current model each of those would be a different interface mode. Now there are 2 ways we can get there. The first would be to have the user specify it. With the SFP code as it is I think that solution should be mostly addressed. The issue is what happens if I do get autoneg up and running. That is the piece it sounds like we will need more code for. > For everything else, we're in fixed host interface mode, because that > is how we've been - and to do anything else is new development. Yeah, that is the part I need to dig into I guess. As it stands I have patches in the works to add the interface modes and support for QSFP+/28. Looks like I will need to add support for allowing the PCS to provide enough info to switch interface modes. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 15:56 ` Alexander Duyck @ 2025-04-04 16:33 ` Andrew Lunn 2025-04-04 22:46 ` Alexander Duyck 2025-04-05 9:10 ` Russell King (Oracle) 1 sibling, 1 reply; 35+ messages in thread From: Andrew Lunn @ 2025-04-04 16:33 UTC (permalink / raw) To: Alexander Duyck Cc: Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni > Part of the issue is that I think I may be mixing terms up and I have > to be careful of that. If I am not mistaken you refer to the PHY as > the full setup from the MII down to the other side of the PMA or maybe > PMD. I also have to resist calling our PMA a PHY as it is a SerDes > PHY, not an Ethernet PHY. For us, a PHY is something like: https://www.marvell.com/content/dam/marvell/en/public-collateral/phys-transceivers/marvell-phys-transceivers-alaska-88e151x-datasheet.pdf https://www.ti.com/lit/ds/symlink/dp83867ir.pdf It has an MII interface one side, and a Base-T interface the other, where you connect magnetics and an RJ-45. It gets a bit more complex with devices like the Marvell 10G PHY, which can be used as an MII to MII converter typically used to convert the MII output from the MAC to something you can connect to an SFP cage. PHYs are not necessarily soldered to the board, they can also be inside SFPs. Copper 1G SFPs typically use a Marvell m88e1111 PHY. Just to add more confusion, Linux also has generic PHYs, which came later than PHYs, drivers/phy. A SERDES interface can be pretty generic, and can be used for multiple protocols. Some SoCs have SERDES interfaces which you can configure for USB, SATA, or networking. Generic PHYs hand this protocol switch. For some, you even need to configure the subprotocol SGMII, 1000BaseX, 2500BaseX. All this terminology has been driven mostly from SoCs, because x86 systems either hid all this in firmware, or like the intel drivers, wrote there own MDIO and PHY drivers inside there MAC drivers. So for a long time we talked about MII, GMII, RGMII, which are relatively simple MII interfaces. Things got more complex with SGMII, 1000BaseX, 2500BaseX, 10GBaseX since you then have a PCS, and the PCS is an active part, performing signalling, negotiation, except when vendors broke it because why run SGMII at 2.5 times the clock speed and call it 2500BaseX, which is it not... We needed something to represent that negotiation, so drivers/net/pcs was born, with a lot of helpers for devices which follow 802.3 registers. So for us, we have: MAC - PHY MAC - PCS - PHY MAC - PCS - SFP cage MAC - PCS - PHY - SFP cage This is why i keep saying you are pushing the envelope. SoC currently top out at 10GbaseX. There might be 4 lanes to implement that 10G, or 1 lane, but we don't care, they all get connected to a PHY, and BaseT comes out the other side. Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 16:33 ` Andrew Lunn @ 2025-04-04 22:46 ` Alexander Duyck 2025-04-05 9:43 ` Russell King (Oracle) 2025-04-05 14:51 ` Andrew Lunn 0 siblings, 2 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-04 22:46 UTC (permalink / raw) To: Andrew Lunn Cc: Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Fri, Apr 4, 2025 at 9:33 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > Part of the issue is that I think I may be mixing terms up and I have > > to be careful of that. If I am not mistaken you refer to the PHY as > > the full setup from the MII down to the other side of the PMA or maybe > > PMD. I also have to resist calling our PMA a PHY as it is a SerDes > > PHY, not an Ethernet PHY. > > For us, a PHY is something like: > > https://www.marvell.com/content/dam/marvell/en/public-collateral/phys-transceivers/marvell-phys-transceivers-alaska-88e151x-datasheet.pdf > > https://www.ti.com/lit/ds/symlink/dp83867ir.pdf > > It has an MII interface one side, and a Base-T interface the other, > where you connect magnetics and an RJ-45. I'm familiar with that stuff, although I was dealing with it closer to 20 years ago back when I was working on e1000/e1000e/igb. > It gets a bit more complex with devices like the Marvell 10G PHY, > which can be used as an MII to MII converter typically used to convert > the MII output from the MAC to something you can connect to an SFP > cage. > > PHYs are not necessarily soldered to the board, they can also be > inside SFPs. Copper 1G SFPs typically use a Marvell m88e1111 PHY. > > Just to add more confusion, Linux also has generic PHYs, which came > later than PHYs, drivers/phy. A SERDES interface can be pretty > generic, and can be used for multiple protocols. Some SoCs have SERDES > interfaces which you can configure for USB, SATA, or > networking. Generic PHYs hand this protocol switch. For some, you even > need to configure the subprotocol SGMII, 1000BaseX, 2500BaseX. Yeah, we heard about that a month ago at netdev. We are likely going to have to convert our PMA over to that, but it looks like there has already been some precedent for that. > All this terminology has been driven mostly from SoCs, because x86 > systems either hid all this in firmware, or like the intel drivers, > wrote there own MDIO and PHY drivers inside there MAC drivers. Admittedly with my background being Intel my first preference was to essentially place all of that code in the MAC driver, thus why everything for now is ending up in fbnic_mac.c. I might have to think about splitting some of that up before I submit the patches. > So for a long time we talked about MII, GMII, RGMII, which are > relatively simple MII interfaces. Things got more complex with SGMII, > 1000BaseX, 2500BaseX, 10GBaseX since you then have a PCS, and the PCS > is an active part, performing signalling, negotiation, except when > vendors broke it because why run SGMII at 2.5 times the clock speed > and call it 2500BaseX, which is it not... That is some of my confusion about XLGMII. I'm wondering if I should call it that or not since the documentation refers to our PCS as using XLGMII but everything seems to indicate it is clocked at 1.25x to get it to 50G. Then in addition the PCS is referring to RXLAUI for the lower end of the link which I opted to just call LAUI since in our case we are running at the 50G speeds anyway so that is probably the correct term for it. > We needed something to represent that negotiation, so drivers/net/pcs > was born, with a lot of helpers for devices which follow 802.3 > registers. > > So for us, we have: > > MAC - PHY > MAC - PCS - PHY > MAC - PCS - SFP cage > MAC - PCS - PHY - SFP cage Is this last one correct? I would have thought it would be MAC - PCS - SFP cage - PHY. At least that is how I remember it being with some of the igb setups I worked on back in the day. > This is why i keep saying you are pushing the envelope. SoC currently > top out at 10GbaseX. There might be 4 lanes to implement that 10G, or > 1 lane, but we don't care, they all get connected to a PHY, and BaseT > comes out the other side. I know we are pushing the envelope. That was one of the complaints we had when you insisted that we switch this over to phylink. If anything 50G sounds like it will give the 2500BaseX a run for its money in terms of being even more confusing and complicated. If anything we most closely resemble the setup with just the SFP cage and no PHY. So I suspect we will probably need that whole set in place in order for things to function as expected. That was one of the reasons why I was thinking of using fixed-link as a crutch to get the driver up initially while we enable the 3 new interface modes. The only spot where they seem like they will matter is just to the PCS since the link_up function is essentially just passed the speed, and as far as the MAC config function it is mostly just setting up pause frames, MTUs, and fairly mundane items unrelated to the lower levels of the link. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 22:46 ` Alexander Duyck @ 2025-04-05 9:43 ` Russell King (Oracle) 2025-04-05 14:51 ` Andrew Lunn 1 sibling, 0 replies; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-05 9:43 UTC (permalink / raw) To: Alexander Duyck Cc: Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Fri, Apr 04, 2025 at 03:46:38PM -0700, Alexander Duyck wrote: > On Fri, Apr 4, 2025 at 9:33 AM Andrew Lunn <andrew@lunn.ch> wrote: > > So for us, we have: > > > > MAC - PHY > > MAC - PCS - PHY > > MAC - PCS - SFP cage > > MAC - PCS - PHY - SFP cage > > Is this last one correct? I would have thought it would be MAC - PCS - > SFP cage - PHY. At least that is how I remember it being with some of > the igb setups I worked on back in the day. Yes. Macchiatobin: -Part of SoC -| /----- RJ45 MAC - PCS --------- 88X3310 Ethernet PHY --------------| \----- SFP cage Things get more interesting when one plugs in a SFP which itself contains a PHY, where we effectively end up with: MAC - PCS - PHY - PHY on SFP module - media which we don't support very well, partly because it doesn't fit into the higher levels of the networking model (that's being worked on), and the 88X3310 doesn't support SGMII on its "fibre" port. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 22:46 ` Alexander Duyck 2025-04-05 9:43 ` Russell King (Oracle) @ 2025-04-05 14:51 ` Andrew Lunn 2025-04-05 20:41 ` Alexander Duyck 1 sibling, 1 reply; 35+ messages in thread From: Andrew Lunn @ 2025-04-05 14:51 UTC (permalink / raw) To: Alexander Duyck Cc: Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni > > So for us, we have: > > > > MAC - PHY > > MAC - PCS - PHY > > MAC - PCS - SFP cage > > MAC - PCS - PHY - SFP cage > > Is this last one correct? I would have thought it would be MAC - PCS - > SFP cage - PHY. At least that is how I remember it being with some of > the igb setups I worked on back in the day. This PHY is acting as an MII converter. What comes out of the PCS cannot be directly connected to the SFP cage, it needs a translation. The Marvell 10G PHY can do this, you see this with some of the Marvell reference designs. There could also be a PHY inside the SFP cage, if the media is Base-T. Linux is not great at describing that situation, multiple PHYs for one link, but it is getting better at that, thanks to the work Bootlin is doing. > > > This is why i keep saying you are pushing the envelope. SoC currently > > top out at 10GbaseX. There might be 4 lanes to implement that 10G, or > > 1 lane, but we don't care, they all get connected to a PHY, and BaseT > > comes out the other side. > > I know we are pushing the envelope. That was one of the complaints we > had when you insisted that we switch this over to phylink. If anything > 50G sounds like it will give the 2500BaseX a run for its money in > terms of being even more confusing and complicated. Well, 2500BaseX itself it straight forward. It is the vendors that make it complex by having broken implementations. Does your 50G mode follow the standard? SoC vendors tend to follow the standard, which is why there is so much code sharing possible. They often just purchase IP to implement the boring parts like the PCS, there is no magic sauce there, all the vendor differentiation is in the MAC, if they try to differentiate at all in networking. The current market is SoCs have 10G. Microchip does have a 25G link in its switches, which uses phylink. We might see more 25G, or we might see a jump to 40G. I know your register layout does not follow the standard, but i hope the registers themselves do. So i guess what will happen is when somebody else has a 40G PCS, maybe even the same licensed IP, they will write a translation layer on top of yours to make your registers standards compliment, and then reuse your driver. This assumes you are following the standard, plus/minus some integration quirks. If you have thrown the standard out the window, and nothing is going to be reusable then maybe you should hide it away in the MAC driver. > If anything we most closely resemble the setup with just the SFP cage > and no PHY. So I suspect we will probably need that whole set in place > in order for things to function as expected. That is how we have seen new link modes added. Going from 2.5G to 5G to 10G is not that big, so the patchsets are reasonably small. But the jump from 10G to 40G is probably bigger. If you internally use fixed-link as a development crutch, that is not a problem. If however you want it in mainline, then we need to look at the big picture, does it fit with what fixed-link is meant to be? What is also going to make things complex is the BMC. SoCs and switches don't have BMCs, Linux via phylink and all the other pieces of the puzzle are in complete control of driving the hardware. We don't have a good story for when Linux is only partially in control of the hardware, because the BMC is also controlling some of it. Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-05 14:51 ` Andrew Lunn @ 2025-04-05 20:41 ` Alexander Duyck 2025-04-05 20:53 ` Andrew Lunn 0 siblings, 1 reply; 35+ messages in thread From: Alexander Duyck @ 2025-04-05 20:41 UTC (permalink / raw) To: Andrew Lunn Cc: Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Sat, Apr 5, 2025 at 7:51 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > > So for us, we have: > > > > > > MAC - PHY > > > MAC - PCS - PHY > > > MAC - PCS - SFP cage > > > MAC - PCS - PHY - SFP cage > > > > Is this last one correct? I would have thought it would be MAC - PCS - > > SFP cage - PHY. At least that is how I remember it being with some of > > the igb setups I worked on back in the day. > > This PHY is acting as an MII converter. What comes out of the PCS > cannot be directly connected to the SFP cage, it needs a > translation. The Marvell 10G PHY can do this, you see this with some > of the Marvell reference designs. > > There could also be a PHY inside the SFP cage, if the media is > Base-T. Linux is not great at describing that situation, multiple PHYs > for one link, but it is getting better at that, thanks to the work > Bootlin is doing. > > > > > > This is why i keep saying you are pushing the envelope. SoC currently > > > top out at 10GbaseX. There might be 4 lanes to implement that 10G, or > > > 1 lane, but we don't care, they all get connected to a PHY, and BaseT > > > comes out the other side. > > > > I know we are pushing the envelope. That was one of the complaints we > > had when you insisted that we switch this over to phylink. If anything > > 50G sounds like it will give the 2500BaseX a run for its money in > > terms of being even more confusing and complicated. > > Well, 2500BaseX itself it straight forward. It is the vendors that > make it complex by having broken implementations. > > Does your 50G mode follow the standard? From what I can tell the 50GbaseR portion of it follows the standard. The LAUI stuff is another story. It looks like it mostly compiles but I am having to blur some definitions as the IEEE version had no FEC and with ours we have the options for RS528 or BASER which more closely matches up with 25GbaseR > SoC vendors tend to follow the standard, which is why there is so much > code sharing possible. They often just purchase IP to implement the > boring parts like the PCS, there is no magic sauce there, all the > vendor differentiation is in the MAC, if they try to differentiate at > all in networking. > > The current market is SoCs have 10G. Microchip does have a 25G link in > its switches, which uses phylink. We might see more 25G, or we might > see a jump to 40G. > > I know your register layout does not follow the standard, but i hope > the registers themselves do. So i guess what will happen is when > somebody else has a 40G PCS, maybe even the same licensed IP, they > will write a translation layer on top of yours to make your registers > standards compliment, and then reuse your driver. This assumes you are > following the standard, plus/minus some integration quirks. > > If you have thrown the standard out the window, and nothing is going > to be reusable then maybe you should hide it away in the MAC > driver. So the ugly bit for us is that there are no MII interfaces to the PCS or PMA. It is all MMIO accesses a register map and a number of signals were just routed to registers in another section of the part for us to read to or write from. > > If anything we most closely resemble the setup with just the SFP cage > > and no PHY. So I suspect we will probably need that whole set in place > > in order for things to function as expected. > > That is how we have seen new link modes added. Going from 2.5G to 5G > to 10G is not that big, so the patchsets are reasonably small. But the > jump from 10G to 40G is probably bigger. > > If you internally use fixed-link as a development crutch, that is not > a problem. If however you want it in mainline, then we need to look at > the big picture, does it fit with what fixed-link is meant to be? It just impacts the order in which I do things. By going with a fixed link I could add the phylink functionality to the driver as I went. I can go the other way around, it just means I can't test the functionality as I add it. Instead it will be adding all the code and then suddenly it all just works. At this point I have it mostly working with the few items I have already pointed out so I can probably just re-order things to push the functionality changes first, and then enable the driver to use them bypassing the fixed-link step. > What is also going to make things complex is the BMC. SoCs and > switches don't have BMCs, Linux via phylink and all the other pieces > of the puzzle are in complete control of driving the hardware. We > don't have a good story for when Linux is only partially in control of > the hardware, because the BMC is also controlling some of it. Fortunately the BMC isn't much of an issue as I think I figured out the one problem I had on Thursday. One of the first things we did is establish a lockout/tagout procedure for the link and TCAM configuration. Essentially the FW/BMC is in control when the driver isn't loaded. When we call open we send a message to the FW indicating we are locking it out and taking ownership. At that point it shouldn't modify anything unless we ask it to, or we don't send a heartbeat message for 2 minutes. If anything we were the problem child is that the code as it is currently written defaults to taking down the link and re-configuring everything on driver load. This was causing a bunch of heartburn because it was causing the BMC to lose link for a few seconds. However as of Thursday I realized we can essentially just use our pcs_get_state call at the start of our configure routine to identify if we actually need to reconfigure things or if the link is already up with the configuration we want. Doing that the only thing that causes any link issues is the initial phylink_link_down in phylink_resume, however that is much less significant as it doesn't actually trigger any link down events on the FW and the time it is down is only a fraction of a second versus the several seconds it takes for a PCS reset and for the PMA to complete link training. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-05 20:41 ` Alexander Duyck @ 2025-04-05 20:53 ` Andrew Lunn 0 siblings, 0 replies; 35+ messages in thread From: Andrew Lunn @ 2025-04-05 20:53 UTC (permalink / raw) To: Alexander Duyck Cc: Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni > So the ugly bit for us is that there are no MII interfaces to the PCS > or PMA. It is all MMIO accesses a register map and a number of signals > were just routed to registers in another section of the part for us to > read to or write from. The API is pretty forgiving, so you might be able to make it look like a normal device. You need to implement: /** @read_c45: Perform a C45 read transfer on the bus */ int (*read_c45)(struct mii_bus *bus, int addr, int devnum, int regnum); /** @write_c45: Perform a C45 write transfer on the bus */ int (*write_c45)(struct mii_bus *bus, int addr, int devnum, int regnum, u16 val); It could be you can implement a lookup table to map (devnum, regnum) to an MMIO address. Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 15:56 ` Alexander Duyck 2025-04-04 16:33 ` Andrew Lunn @ 2025-04-05 9:10 ` Russell King (Oracle) 2025-04-05 15:43 ` Andrew Lunn ` (2 more replies) 1 sibling, 3 replies; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-05 9:10 UTC (permalink / raw) To: Alexander Duyck Cc: Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Fri, Apr 04, 2025 at 08:56:19AM -0700, Alexander Duyck wrote: > On Thu, Apr 3, 2025 at 4:19 PM Russell King (Oracle) > <linux@armlinux.org.uk> wrote: > > > > On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > > > On Thu, Apr 3, 2025 at 9:34 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > > Maybe go back to why fixed-link exists? It is basically a hack to make > > > > MAC configuration easier. It was originally used for MAC to MAC > > > > connections, e.g. a NIC connected to a switch, without PHYs in the > > > > middle. By faking a PHY, there was no need to add any special > > > > configuration API to the MAC, the phylib adjust_link callback would be > > > > sufficient to tell the MAC to speed and duplex to use. For {R}{G}MII, > > > > or SGMII, that is all you need to know. The phy-mode told you to > > > > configure the MAC to MII, GMII, SGMII. > > > > > > Another issue is that how you would define the connection between the > > > two endpoints is changing. Maxime is basing his data off of > > > speed/duplex however to source that he is pulling data from > > > link_mode_params that is starting to broaden including things like > > > lanes. > > > > Just a quick correction - this is not entirely correct. It's speed, > > duplex and "lanes" is defined by interface mode. > > > > For example, 10GBASER is a single lane, as is SGMII, 1000BASE-X, > > 2500BASE-X. XLGMII and CGMII are defined by 802.3 as 8 lanes (clause > > 81.) > > > > speed and duplex just define the speed operated over the link defined > > by the PHY interface mode. > > > > (I've previously described why we don't go to that depth with fixed > > links, but to briefly state it, it's what we've done in the past and > > it's visible to the user, and we try to avoid breaking userspace.) > > Part of the issue is that I think I may be mixing terms up and I have > to be careful of that. If I am not mistaken you refer to the PHY as > the full setup from the MII down to the other side of the PMA or maybe > PMD. I also have to resist calling our PMA a PHY as it is a SerDes > PHY, not an Ethernet PHY. The model that we have in the kernel is: '------- Optional ----------` MAC --- PCS --- Optional Serdes -------- Optional PHY --- Media `---------------------------' ^ PHY_INTERFACE_MODE_xxx For example, in PHY_INTERFACE_MODE_MII (defined by clause 22), this is typically used for setups that only support 100M/10M speeds between the MAC and ethernet PHY to connect to the media. In the case of PHY_INTERFACE_MODE_1000BASEX (clause 36), then there will generally be a PCS block, sometimes there's a separate serdes block, and sometimes there may be a PHY. Other times, there may not be a PHY and the Serdes directly connects to the media. To simplify things, we continue to use the PHY_INTERFACE_MODE_xxx to define the properties of the interface on the right hand side of the MAC/PCS/Serdes, even when there is no PHY. As an illustration of the kind of properties, consider Cisco SGMII vs 1000BASE-X. These are physically the same interface, but the protocol differs slightly. Then take 2500BASE-X (ignore 802.3's definition, they were years late to that party and so we have a long history of vendor implementations, and thus practically their definition is irrelevant.) 2500BASE-X is generally implemented by taking the 802.3 1000BASE-X implementation, and up-clocking it by 2.5x. 1000BASE-X, 2500BASE-X and SGMII are all single lane. XGMII, for 2.5G, 5G and 10G speeds, is divided into four lanes of 8 data lines and one control signal. In terms of the layers that 802.3 uses, this is where things can get confusing. Consider MAC-PCS-Serdes-Media vs MAC-PCS-Serdes-PHY-Media. The former does conform to the 802.3 layering: MAC RS GMII PCS PMA PMD media The latter case, however, is effectively: MAC \ RS | GMII | Host PCS | PMA | PMD / PMD \ PMA | PCS | External Ethernet PHY, either on a PCB, module or PCS | pluggable module PMA | PMD / The effective presence of three PCS, PMA and PMDs make talking about these things more complex. Throw into this that Ethernet PHYs may not split out these functions to software, especially the host-facing PCS/PMA, except maybe through some control bits in overall control registers or through pinstrapping. Nevertheless, PHY_INTERFACE_MODE_xxx defines the operating mode of the PCS/PMA/PMD in the host, and also the host facing PMD/PMA/PCS of any following PHY. When considering ethtool link modes, negotiated speed/duplex, and configured speed/ duplex are about the ultimate media that is presented to the user. So, in the former case, where the media is connected directly to the serdes, the PHY interface mode needs to agree with the ethtool link modes. In the case of PHY_INTERFACE_MODE_1000BASEX, the media we can only support that can be directly connected are the 1000BASE-X family, and ethtool doesn't distinguish between each of those medias, so we only have ETHTOOL_LINK_MODE_1000baseX_Full_BIT. For PHY_INTERFACE_MODE_10GBASER, things get a bit more complicated, because we have each media type as a separate ethtool link mode, but it can support these connected directly: ETHTOOL_LINK_MODE_10000baseCR_Full_BIT ETHTOOL_LINK_MODE_10000baseSR_Full_BIT ETHTOOL_LINK_MODE_10000baseLR_Full_BIT ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT ETHTOOL_LINK_MODE_10000baseER_Full_BIT PHY interface modes tend to lag behind the ethtool link modes, so it's entirely possible that definitions to support interfaces that support the faster link speeds are missing. > Am I understanding that correctly? The PMD > is where we get into the media types, but the reason why I am focused > on lanes is because my interfaces are essentially defined by the > combination of an MII on the top, and coming out the bottom at the PMA > we have either one or two lanes operating in NRZ or PAM4 giving us at > least 4 total combinations that I am concerned with excluding the > media type since I am essentially running chip to module. It seems to me that what you are describing conforms with the former case above, so PHY_INTERFACE_MODE_xxx describes the interface presented to the media, and speed/duplex describe the data speed over that link. The ethtool link modes describe the properties of the media itself. > > If you aren't actually 40G, then you aren't actually XLGMII as > > defined by 802.3... so that begs the question - what are you! > > Confused.. Sadly confused.. > > So I am still trying to grok all this but I think I am getting there. > The issue is that XLGMII was essentially overclocked to get to the > 50GMII. This is what happened with 2500BASE-X. Vendors went off and did their own thing to create that higher speed interface. Many just up-clocked their 1000BASE-X implementation to provide the faster data rate. > That is what we do have. Our hardware supports a 50GMII with > open loop rate matching to 25G, and CGMII, but they refer to the > 50GMII as XLGMII throughout the documentation which led to my initial > confusion when I implemented the limited support we have upstream now. > On the PMA end of things like I mentioned we support NRZ (25.78125) or > PAM4 (26.5625*2) and 1 or 2 lanes. Great. This reminds me of the confusion caused by vendors overloading the "SGMII" term, which means we're now endlessly trying to disambiguate between SGMII being used for "we have a single lane serial gigabit media independent interface" and "we have an interface that supports the Cisco SGMII modifications of the IEEE 802.3 1000BASE-X specification." I don't think vendors just don't have any clue of the impacts of their stupid naming abuse. It causes *major* problems, so to hear that it continues with other interface modes (a) is not surprising, (b) is very disappointing. To get to the bottom of these interface types you are talking about, it sounds like each lane is a balanced pair of conductors in each direction? Do you know what kind of protocol is run over those lanes? Could it be essentially 25GBASE-R (e.g. it would be if operating with one lane NRZ.) This would make sense, because 25GBASE-R is 25Gb/s with 64B/66B, which gives a data rate of 25.78125Mbd. To get to the 50Gb/s NRZ, I'm guessing the hardware effectively runs 25GBASE-R over two lanes, doubling the data rate. For PAM4, there doesn't seem to be anything defined in 802.3 for 25G data rates, so I guess we have another pair of vendor specific interface modes. > To complicate things 50G is a mess in and of itself. There are two > specifications for the 50R2 w/ RS setup. One is the IEEE which uses > RS544 and the other is the Ethernet Consortium which uses RS528. If I > reference LAUI or LAUI-2 that is the the setup the IEEE referred to in > their 50G setup w/o FEC. Since that was the common overlap between the > two setups I figured I would use that to refer to this mode. I am also > overloading the meaning of it to reference 50G with RS528 or BASER. Hmm, I'm guessing 50G is defined in 802.3 after the 2018 edition (which is the latest I have at the moment.) > > That is most certainly *not* what the SFP code is doing. As things stand > > today, everything respects the PHY interface mode, if it says SGMII then > > we get SGMII. If it says 1000BASE-X, we get 1000BASE-X. If it says > > 2500BASE-X, then that's what we get... and so on. > > I think I may be starting to understand where my confusing came from. > While the SFP code may be correctly behaved I don't think other PCS > drivers are, either that or they were implemented for much more than > what they support. For example looking at the xpcs > (https://elixir.bootlin.com/linux/v6.14-rc6/C/ident/xpcs_get_max_xlgmii_speed) > I don't see how you are getting 100G out of an XLGMII interface. I'm > guessing somebody is checking for bits that will never be set. I don't think anyone reviewed the XLGMII code there - it was added by 7c6dbd29a73e ("net: phy: xpcs: Add XLGMII support"). I don't have XPCS documentation to be able to comment, but as XLGMII as defined by 802.3 supports up to 40G, it does raise questions - is this another case of overloading 802.3's XLGMII... no idea without further research. As you've found, these kinds of vendor inventions and overloading of terms thing just creates confusion. The problem for the kernel is if one of those vendor inventions becomes established and its part of a kernel-external API, then fixing it later becomes rather difficult. > So then if I am understanding correctly the expectation right now is > that once an interface mode is set, it is set. Do I have that right? That certainly used to be the case before phylink. Phylink exists to address several issues: 1. How do we hot-plug ethernet PHYs (drivers/net/phy) from a network interface without needing to tear down the interface. 2. How do we reconfigure the network device as a whole while it is up to switch between different interface modes. The former came from me being sent a SolidRun Clearfog (the original Armada 388) based platform, which had a SFP cage on, and the need to support copper SFP modules which have a PHY integrated on them. The latter also came from me when I was sent the Macchiatobin board, which had Marvell 88X3310 PHYs on which switch their host facing interface between 10GBASE-R, 5GBASE-R, 2500BASE-X and SGMII depending on what was negotiated on the PHYs media side. So, ethernet drivers such as mvneta and mvpp2 (used on these two platforms) are coded to allow interface modes to be switched between. The stmmac driver, however, is very much not, and I'm not sure that the hardware supports switching modes - but I'm working on the stmmac driver, trying to improve it. The phylink integration there is what I would best describe as "botched". It works but I'd say that it's not correct. E.g. The hardware can have an integrated PCS, but the code entirely bypasses phylink, messing with network device state that phylink relies upon, which can cause phylink to malfunction. > Is it acceptable for pcs_get_state to return an interface value other > than what is currently set? Based on the code I would assume that is > the case, but currently that won't result in a change without a > phydev. As mentioned previously, we've supported this for PHYs. Currently the code in phylink only allows interface changes when a PHY is present: /* Only update if the PHY link is up */ if (pl->phydev && pl->phy_state.link) { /* If the interface has changed, force a link down * event if the link isn't already down, and re-resolve. */ if (link_state.interface != pl->phy_state.interface) { retrigger = true; link_state.link = false; } ... mac_config = true; } ... if (mac_config) { if (link_state.interface != pl->link_config.interface) { /* The interface has changed, force the link down and * then reconfigure. */ if (cur_link_state) { phylink_link_down(pl); cur_link_state = false; } phylink_major_config(pl, false, &link_state); pl->link_config.interface = link_state.interface; } } but there is no fundamental reason why we couldn't allow that without a PHY, but with a few caveats: 1. provided that the change in interface mode doesn't result in a different PCS being selected by mac_select_pcs() (which we should probably enforce in this case). 2. provided that the change in interface mode doesn't cause the PCS's pcs_get_state() method to report differently as a result of that interface change. (The reason is, some PCS behave differently depending on interface mode.) These two caveats should be obvious, because they can lead to the interface mode/link endlessly toggling if either are false. > > The SFP code has added support to switch between 2500BASE-X and > > 1000BASE-X because this is a use case with optical SFPs that can operate > > at either speed. That's why this happens for the SFP case. > > Okay, so it looks like I will have to add code for new use cases then > as essentially there are standards in place for how to autoneg between > one or two lane modes as long as we stick to either NRZ or PAM4. As I think I've mentioned previously, I would like to find a different solution in the ksettings_set() code that is functionally similar to userspace (so we don't break use cases there) but without being tied to calling into the SFP code. The original code that handled optical interfaces did the same as ksettings_set() - going from ethtool link modes to a PHY interface mode. I revamped that path to use a bitmap of interface modes having also introduced the interface modes that the network device supports. This has turned out to be much easier to understand. I don't like that ksettings_set() still uses the old way, because that's a recipe for fragility with things going weird. We have phylink_choose_sfp_interface() now which chooses the interface based on preference, which the ksettings_set() code doesn't do, but it should... something that needs addressing. It sounds like your issue needs to be rolled into this as well. What I'm thinking is that we need to do is limit the bitmap of PHY interface modes supported by the MAC and optional SFP by the maximum data of the advertised link modes, and then do a preference based selection. In the case of fixed-link mode, I've proposed a solution there in one of my previous emails that bases it off the maximum speed of supported PHY interface modes. > > For PHYs, modern PHYs switch their host facing interface, so we support > > the interface mode changing there - under the control of the PHY and > > most certainly not under user control (the user doesn't know how the > > PHY has been configured and whether the PHY does switch or rate > > adapt.) > > I think I see what I am missing. The issue I have is that, assuming I > can ever get autoneg code added, we can essentially get autoneg that > tells us to jump between 50R or 100R2, or 25R and 50R2. If I am not > mistaken based on the current model each of those would be a different > interface mode. Yes, I agree. > Now there are 2 ways we can get there. The first would be to have the > user specify it. With the SFP code as it is I think that solution > should be mostly addressed. The issue is what happens if I do get > autoneg up and running. That is the piece it sounds like we will need > more code for. > > > For everything else, we're in fixed host interface mode, because that > > is how we've been - and to do anything else is new development. > > Yeah, that is the part I need to dig into I guess. As it stands I have > patches in the works to add the interface modes and support for > QSFP+/28. Looks like I will need to add support for allowing the PCS > to provide enough info to switch interface modes. Before I agree/disagree, let's see what the code ends up looking like. I don't think it's going to be too bad based on what we've discussed here. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-05 9:10 ` Russell King (Oracle) @ 2025-04-05 15:43 ` Andrew Lunn 2025-04-05 15:52 ` Andrew Lunn 2025-04-05 20:23 ` Alexander Duyck 2 siblings, 0 replies; 35+ messages in thread From: Andrew Lunn @ 2025-04-05 15:43 UTC (permalink / raw) To: Russell King (Oracle) Cc: Alexander Duyck, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni > This is what happened with 2500BASE-X. Vendors went off and did their > own thing to create that higher speed interface. Many just up-clocked > their 1000BASE-X implementation to provide the faster data rate. > > > That is what we do have. Our hardware supports a 50GMII with > > open loop rate matching to 25G, and CGMII, but they refer to the > > 50GMII as XLGMII throughout the documentation which led to my initial > > confusion when I implemented the limited support we have upstream now. > > On the PMA end of things like I mentioned we support NRZ (25.78125) or > > PAM4 (26.5625*2) and 1 or 2 lanes. > > Great. This reminds me of the confusion caused by vendors overloading > the "SGMII" term, which means we're now endlessly trying to > disambiguate between SGMII being used for "we have a single lane > serial gigabit media independent interface" and "we have an interface > that supports the Cisco SGMII modifications of the IEEE 802.3 > 1000BASE-X specification." > > I don't think vendors just don't have any clue of the impacts of their > stupid naming abuse. It causes *major* problems, so to hear that it > continues with other interface modes (a) is not surprising, (b) is > very disappointing. Hi Alex This should be a major takeaway from this email. Please try to strictly use 802.3 terms and meanings. You can make references to clauses in 802.3, it is an open document which we all should have some version of. And clearly call out where the hardware does not follow 802.3. Given that you are going to be pushing phylink forward, we should have a good idea what is 802.3, and what is a vendor extension/bug workaround. Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-05 9:10 ` Russell King (Oracle) 2025-04-05 15:43 ` Andrew Lunn @ 2025-04-05 15:52 ` Andrew Lunn 2025-04-05 16:19 ` Alexander Duyck 2025-04-05 20:23 ` Alexander Duyck 2 siblings, 1 reply; 35+ messages in thread From: Andrew Lunn @ 2025-04-05 15:52 UTC (permalink / raw) To: Russell King (Oracle) Cc: Alexander Duyck, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni > Hmm, I'm guessing 50G is defined in 802.3 after the 2018 edition (which > is the latest I have at the moment.) I have 2022 edition to hand. Clause 131 is Introduction to 50 GB/s networks. Clause 132 is the RS sublayer. Clause 133 is the PCS. Clause 134 RS-FEC, etc. IEEE makes 802.3 free to download, along with all the other 802 standard, although it always takes me a while to find where to download it from. Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-05 15:52 ` Andrew Lunn @ 2025-04-05 16:19 ` Alexander Duyck 0 siblings, 0 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-05 16:19 UTC (permalink / raw) To: Andrew Lunn Cc: Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Sat, Apr 5, 2025 at 8:52 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > Hmm, I'm guessing 50G is defined in 802.3 after the 2018 edition (which > > is the latest I have at the moment.) If you don't want the full document I think the 50G stuff is 802.3cd as I recall. I will try to get to your replies later today. Will need some time to read and process them. > I have 2022 edition to hand. Clause 131 is Introduction to 50 GB/s > networks. Clause 132 is the RS sublayer. Clause 133 is the PCS. Clause > 134 RS-FEC, etc. > > IEEE makes 802.3 free to download, along with all the other 802 > standard, although it always takes me a while to find where to > download it from. > > Andrew If you have access to the 2022 edition figure 135-2 is a good diagram showing essentially what I am working with. Essentially everything below the AUI would be the SFP. Our MAC essentially just does 50GMII or CGMII and coming out the bottom is one or two lanes w/ NRZ or PAM4 depending on the PCS/PMA configuration. The LAUI-2 portion is only a partial fit in that what we have is more 2 lanes of 25G than LAUI-2, but I am having to overlap that with the 25G/50G consortium implementation and they do match in the 50G non-FEC case. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-05 9:10 ` Russell King (Oracle) 2025-04-05 15:43 ` Andrew Lunn 2025-04-05 15:52 ` Andrew Lunn @ 2025-04-05 20:23 ` Alexander Duyck 2 siblings, 0 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-05 20:23 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Sat, Apr 5, 2025 at 2:10 AM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Fri, Apr 04, 2025 at 08:56:19AM -0700, Alexander Duyck wrote: > > On Thu, Apr 3, 2025 at 4:19 PM Russell King (Oracle) > > <linux@armlinux.org.uk> wrote: > > > > > > On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > > > > On Thu, Apr 3, 2025 at 9:34 AM Andrew Lunn <andrew@lunn.ch> wrote: > > > > > Maybe go back to why fixed-link exists? It is basically a hack to make > > > > > MAC configuration easier. It was originally used for MAC to MAC > > > > > connections, e.g. a NIC connected to a switch, without PHYs in the > > > > > middle. By faking a PHY, there was no need to add any special > > > > > configuration API to the MAC, the phylib adjust_link callback would be > > > > > sufficient to tell the MAC to speed and duplex to use. For {R}{G}MII, > > > > > or SGMII, that is all you need to know. The phy-mode told you to > > > > > configure the MAC to MII, GMII, SGMII. > > > > > > > > Another issue is that how you would define the connection between the > > > > two endpoints is changing. Maxime is basing his data off of > > > > speed/duplex however to source that he is pulling data from > > > > link_mode_params that is starting to broaden including things like > > > > lanes. > > > > > > Just a quick correction - this is not entirely correct. It's speed, > > > duplex and "lanes" is defined by interface mode. > > > > > > For example, 10GBASER is a single lane, as is SGMII, 1000BASE-X, > > > 2500BASE-X. XLGMII and CGMII are defined by 802.3 as 8 lanes (clause > > > 81.) > > > > > > speed and duplex just define the speed operated over the link defined > > > by the PHY interface mode. > > > > > > (I've previously described why we don't go to that depth with fixed > > > links, but to briefly state it, it's what we've done in the past and > > > it's visible to the user, and we try to avoid breaking userspace.) > > > > Part of the issue is that I think I may be mixing terms up and I have > > to be careful of that. If I am not mistaken you refer to the PHY as > > the full setup from the MII down to the other side of the PMA or maybe > > PMD. I also have to resist calling our PMA a PHY as it is a SerDes > > PHY, not an Ethernet PHY. > > The model that we have in the kernel is: > > '------- Optional ----------` > MAC --- PCS --- Optional Serdes -------- Optional PHY --- Media > `---------------------------' ^ > PHY_INTERFACE_MODE_xxx Okay that more-or-less matches with what I was thinking. > For example, in PHY_INTERFACE_MODE_MII (defined by clause 22), this is > typically used for setups that only support 100M/10M speeds between the > MAC and ethernet PHY to connect to the media. > > In the case of PHY_INTERFACE_MODE_1000BASEX (clause 36), then there will > generally be a PCS block, sometimes there's a separate serdes block, and > sometimes there may be a PHY. Other times, there may not be a PHY and > the Serdes directly connects to the media. > > To simplify things, we continue to use the PHY_INTERFACE_MODE_xxx to > define the properties of the interface on the right hand side of the > MAC/PCS/Serdes, even when there is no PHY. > > As an illustration of the kind of properties, consider Cisco SGMII vs > 1000BASE-X. These are physically the same interface, but the protocol > differs slightly. Then take 2500BASE-X (ignore 802.3's definition, > they were years late to that party and so we have a long history of > vendor implementations, and thus practically their definition is > irrelevant.) 2500BASE-X is generally implemented by taking the 802.3 > 1000BASE-X implementation, and up-clocking it by 2.5x. > > 1000BASE-X, 2500BASE-X and SGMII are all single lane. XGMII, for 2.5G, > 5G and 10G speeds, is divided into four lanes of 8 data lines and one > control signal. > > > In terms of the layers that 802.3 uses, this is where things can get > confusing. Consider MAC-PCS-Serdes-Media vs MAC-PCS-Serdes-PHY-Media. > > The former does conform to the 802.3 layering: > > MAC > RS > GMII > PCS > PMA > PMD > media > > The latter case, however, is effectively: > > MAC \ > RS | > GMII | Host > PCS | > PMA | > PMD / > > PMD \ > PMA | > PCS | External Ethernet PHY, either on a PCB, module or > PCS | pluggable module > PMA | > PMD / > > The effective presence of three PCS, PMA and PMDs make talking about > these things more complex. Throw into this that Ethernet PHYs may not > split out these functions to software, especially the host-facing > PCS/PMA, except maybe through some control bits in overall control > registers or through pinstrapping. > > Nevertheless, PHY_INTERFACE_MODE_xxx defines the operating mode of > the PCS/PMA/PMD in the host, and also the host facing PMD/PMA/PCS > of any following PHY. > > When considering ethtool link modes, negotiated speed/duplex, and > configured speed/ duplex are about the ultimate media that is > presented to the user. I am following so far. Fortunately in our case the setup will be pretty simple as we mostly conform to the 802.3 setup with only a few minor issues at 50G. I think I might also realize part of my issue in terms of the stuff about "lanes". It essentially comes down to really an issue about media type. I think making sure we have the QSFP module code in place will probably take care of most of it as it should limit us to either 25R/50R2 or 50R/100R2 so we can avoid the confusion on the to 50G modes. > So, in the former case, where the media is connected directly to the > serdes, the PHY interface mode needs to agree with the ethtool link > modes. In the case of PHY_INTERFACE_MODE_1000BASEX, the media we can > only support that can be directly connected are the 1000BASE-X family, > and ethtool doesn't distinguish between each of those medias, so we > only have ETHTOOL_LINK_MODE_1000baseX_Full_BIT. > > For PHY_INTERFACE_MODE_10GBASER, things get a bit more complicated, > because we have each media type as a separate ethtool link mode, but > it can support these connected directly: > ETHTOOL_LINK_MODE_10000baseCR_Full_BIT > ETHTOOL_LINK_MODE_10000baseSR_Full_BIT > ETHTOOL_LINK_MODE_10000baseLR_Full_BIT > ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT > ETHTOOL_LINK_MODE_10000baseER_Full_BIT > > PHY interface modes tend to lag behind the ethtool link modes, so it's > entirely possible that definitions to support interfaces that support > the faster link speeds are missing. Yeah, this is where I was essentially hacking in "fixed-link" to establish an initial connection while I was working to prop up the PHY interface support in the kernel. The speeds have been there in ethtool for some time, but the only interface that has been there was the XLGMII which was puzzling me as the implementations for it currently seem to be very misleading. > > Am I understanding that correctly? The PMD > > is where we get into the media types, but the reason why I am focused > > on lanes is because my interfaces are essentially defined by the > > combination of an MII on the top, and coming out the bottom at the PMA > > we have either one or two lanes operating in NRZ or PAM4 giving us at > > least 4 total combinations that I am concerned with excluding the > > media type since I am essentially running chip to module. > > It seems to me that what you are describing conforms with the former > case above, so PHY_INTERFACE_MODE_xxx describes the interface presented > to the media, and speed/duplex describe the data speed over that link. > The ethtool link modes describe the properties of the media itself. > > > > If you aren't actually 40G, then you aren't actually XLGMII as > > > defined by 802.3... so that begs the question - what are you! > > > > Confused.. Sadly confused.. > > > > So I am still trying to grok all this but I think I am getting there. > > The issue is that XLGMII was essentially overclocked to get to the > > 50GMII. > > This is what happened with 2500BASE-X. Vendors went off and did their > own thing to create that higher speed interface. Many just up-clocked > their 1000BASE-X implementation to provide the faster data rate. Yeah, I think this is what we have on our hands. Essentially the XLGMII 50G is likely based on the Ethernet Consortium approach to it so it is just a 40G that was up-clocked to get to 50 and then make use of half the lanes from a 100R4. That is why I was referring to it as LAUI as it definitely isn't the 50BaseR. > > That is what we do have. Our hardware supports a 50GMII with > > open loop rate matching to 25G, and CGMII, but they refer to the > > 50GMII as XLGMII throughout the documentation which led to my initial > > confusion when I implemented the limited support we have upstream now. > > On the PMA end of things like I mentioned we support NRZ (25.78125) or > > PAM4 (26.5625*2) and 1 or 2 lanes. > > Great. This reminds me of the confusion caused by vendors overloading > the "SGMII" term, which means we're now endlessly trying to > disambiguate between SGMII being used for "we have a single lane > serial gigabit media independent interface" and "we have an interface > that supports the Cisco SGMII modifications of the IEEE 802.3 > 1000BASE-X specification." > > I don't think vendors just don't have any clue of the impacts of their > stupid naming abuse. It causes *major* problems, so to hear that it > continues with other interface modes (a) is not surprising, (b) is > very disappointing. > > To get to the bottom of these interface types you are talking about, > it sounds like each lane is a balanced pair of conductors in each > direction? Do you know what kind of protocol is run over those lanes? > Could it be essentially 25GBASE-R (e.g. it would be if operating with > one lane NRZ.) This would make sense, because 25GBASE-R is 25Gb/s with > 64B/66B, which gives a data rate of 25.78125Mbd. > > To get to the 50Gb/s NRZ, I'm guessing the hardware effectively runs > 25GBASE-R over two lanes, doubling the data rate. Yeah. The part has two PCS blocks that are running in parallel for 50R2 (Ethernet Consortium Version). There is an IEEE version of it, but nobody runs it as it would require new cabling since it requires 26.5625Mbd per lane to run it, so they are all sticking with the non-IEEE version. > For PAM4, there doesn't seem to be anything defined in 802.3 for 25G > data rates, so I guess we have another pair of vendor specific > interface modes. This is covered in the 802.3cd spec. The two types they define are 50BaseR and 100BaseP. They bump the clock rate to 26.5625Mbd, make RS544 mandatory, and add PAM4 as a requirement for 50R1 and 100R2. > > To complicate things 50G is a mess in and of itself. There are two > > specifications for the 50R2 w/ RS setup. One is the IEEE which uses > > RS544 and the other is the Ethernet Consortium which uses RS528. If I > > reference LAUI or LAUI-2 that is the the setup the IEEE referred to in > > their 50G setup w/o FEC. Since that was the common overlap between the > > two setups I figured I would use that to refer to this mode. I am also > > overloading the meaning of it to reference 50G with RS528 or BASER. > > Hmm, I'm guessing 50G is defined in 802.3 after the 2018 edition (which > is the latest I have at the moment.) Correct. As mentioned it is in the 2022 edition. > > > That is most certainly *not* what the SFP code is doing. As things stand > > > today, everything respects the PHY interface mode, if it says SGMII then > > > we get SGMII. If it says 1000BASE-X, we get 1000BASE-X. If it says > > > 2500BASE-X, then that's what we get... and so on. > > > > I think I may be starting to understand where my confusing came from. > > While the SFP code may be correctly behaved I don't think other PCS > > drivers are, either that or they were implemented for much more than > > what they support. For example looking at the xpcs > > (https://elixir.bootlin.com/linux/v6.14-rc6/C/ident/xpcs_get_max_xlgmii_speed) > > I don't see how you are getting 100G out of an XLGMII interface. I'm > > guessing somebody is checking for bits that will never be set. > > I don't think anyone reviewed the XLGMII code there - it was added by > 7c6dbd29a73e ("net: phy: xpcs: Add XLGMII support"). I don't have XPCS > documentation to be able to comment, but as XLGMII as defined by 802.3 > supports up to 40G, it does raise questions - is this another case > of overloading 802.3's XLGMII... no idea without further research. > > As you've found, these kinds of vendor inventions and overloading of > terms thing just creates confusion. > > The problem for the kernel is if one of those vendor inventions becomes > established and its part of a kernel-external API, then fixing it later > becomes rather difficult. > > > So then if I am understanding correctly the expectation right now is > > that once an interface mode is set, it is set. Do I have that right? > > That certainly used to be the case before phylink. Phylink exists to > address several issues: > > 1. How do we hot-plug ethernet PHYs (drivers/net/phy) from a network > interface without needing to tear down the interface. > > 2. How do we reconfigure the network device as a whole while it is up > to switch between different interface modes. > > The former came from me being sent a SolidRun Clearfog (the original > Armada 388) based platform, which had a SFP cage on, and the need to > support copper SFP modules which have a PHY integrated on them. > > The latter also came from me when I was sent the Macchiatobin board, > which had Marvell 88X3310 PHYs on which switch their host facing > interface between 10GBASE-R, 5GBASE-R, 2500BASE-X and SGMII depending > on what was negotiated on the PHYs media side. > > So, ethernet drivers such as mvneta and mvpp2 (used on these two > platforms) are coded to allow interface modes to be switched between. Okay, I might look at those two then. I'm usually the type to follow an existing example then adapt/refactor it once I have it up and running. > The stmmac driver, however, is very much not, and I'm not sure that the > hardware supports switching modes - but I'm working on the stmmac > driver, trying to improve it. The phylink integration there is what I > would best describe as "botched". It works but I'd say that it's not > correct. E.g. The hardware can have an integrated PCS, but the code > entirely bypasses phylink, messing with network device state that > phylink relies upon, which can cause phylink to malfunction. > > > Is it acceptable for pcs_get_state to return an interface value other > > than what is currently set? Based on the code I would assume that is > > the case, but currently that won't result in a change without a > > phydev. > > As mentioned previously, we've supported this for PHYs. Currently the > code in phylink only allows interface changes when a PHY is present: > > /* Only update if the PHY link is up */ > if (pl->phydev && pl->phy_state.link) { > /* If the interface has changed, force a link down > * event if the link isn't already down, and re-resolve. > */ > if (link_state.interface != pl->phy_state.interface) { > retrigger = true; > link_state.link = false; > } > ... > mac_config = true; > } > > ... > if (mac_config) { > if (link_state.interface != pl->link_config.interface) { > /* The interface has changed, force the link down and > * then reconfigure. > */ > if (cur_link_state) { > phylink_link_down(pl); > cur_link_state = false; > } > phylink_major_config(pl, false, &link_state); > pl->link_config.interface = link_state.interface; > } > } > > but there is no fundamental reason why we couldn't allow that without > a PHY, but with a few caveats: > > 1. provided that the change in interface mode doesn't result in a > different PCS being selected by mac_select_pcs() (which we should > probably enforce in this case). > > 2. provided that the change in interface mode doesn't cause the PCS's > pcs_get_state() method to report differently as a result of that > interface change. (The reason is, some PCS behave differently > depending on interface mode.) > > These two caveats should be obvious, because they can lead to the > interface mode/link endlessly toggling if either are false. These should be pretty easy for us to stick to. Currently we just have the one PCS so we shouldn't be changing it. As far as the pcs_get_state call, the general setup for now is that it is just doing link checking. What we will likely be adding is some logic to handle delaying the link_up a bit to address some link bouncing due to the PMA doing link training, and then other then that it would be autoneg which is kind of low on the totem pole for us since we don't have any link partners that will currently be using it, but is something we may end up supporting. > > > The SFP code has added support to switch between 2500BASE-X and > > > 1000BASE-X because this is a use case with optical SFPs that can operate > > > at either speed. That's why this happens for the SFP case. > > > > Okay, so it looks like I will have to add code for new use cases then > > as essentially there are standards in place for how to autoneg between > > one or two lane modes as long as we stick to either NRZ or PAM4. > > As I think I've mentioned previously, I would like to find a different > solution in the ksettings_set() code that is functionally similar to > userspace (so we don't break use cases there) but without being tied > to calling into the SFP code. It seems like part of the issue is that the use of the advertising field if autoneg is disabled is kind-of undefined. From what I can tell ethtool code will cleanup the advertised mode if autoneg is enabled (https://elixir.bootlin.com/linux/v6.14-rc6/C/ident/ethnl_auto_linkmodes), however if we are in fixed mode it isn't touching it. I'm honestly wondering if the advertising field should be ignored when in a non-autoneg setup. As far as deriving the media type, the SFP seems to be a good way for me to figure that out, at least for the QSFP+/28 cables I have seen so far. They at least lock me down into either 25CR/50CR2 or 50CR/100CR2 depending on the cable. So if I do get around to supporting clause 73 autoneg I think I should already be half way there as the supported fields are locked down based on that already. One thing I may do in the meantime is use the firmware provided link info as the lp_advertising value until I can get true c73 up and running. > The original code that handled optical interfaces did the same as > ksettings_set() - going from ethtool link modes to a PHY interface > mode. I revamped that path to use a bitmap of interface modes having > also introduced the interface modes that the network device supports. > This has turned out to be much easier to understand. > > I don't like that ksettings_set() still uses the old way, because > that's a recipe for fragility with things going weird. We have > phylink_choose_sfp_interface() now which chooses the interface based > on preference, which the ksettings_set() code doesn't do, but it > should... something that needs addressing. > > It sounds like your issue needs to be rolled into this as well. I think the "fixed-link" and ksettings_set issues are similar. Basically we just need a good way to define how we get from a fixed mode to an interface type without autoneg. > What I'm thinking is that we need to do is limit the bitmap of > PHY interface modes supported by the MAC and optional SFP by the > maximum data of the advertised link modes, and then do a preference > based selection. In the case of fixed-link mode, I've proposed a > solution there in one of my previous emails that bases it off the > maximum speed of supported PHY interface modes. Well for my setup there are essentially 3 components that come into play. The MII interface selection is based purely on the speed (and a little bit fec), so we are either 50GMII w/ rate matching, 50GMII, or CGMII. However I don't need it to determine if we can get a link, I can configure it at link_up time. The SFP cage selects the signal rate of the individual lanes of the link based on the cable which when combined with the speed defines what my AUI selection has to be. So the PCS and the SFP are the two components I need to be in agreement when it comes to determining the interface mode in order to get a link, the MAC or MII interface can be a much later selection/configuration. So if anything I wonder if the PCS shouldn't select the interface type instead of the other way around. Actually looking at the code it seems like the phylink_validate_mask and phylink_choose_sfp_interface would make the most sense. We could essentially just pass the speed and duplex as part of the link_state and validate using those instead of the advertising mask. Then the PCS could essentially own the decision to mask itself down to the 1 bit it wants for that speed, then after that it would be up to the phylink code to decide which interface it wants to select. > > > For PHYs, modern PHYs switch their host facing interface, so we support > > > the interface mode changing there - under the control of the PHY and > > > most certainly not under user control (the user doesn't know how the > > > PHY has been configured and whether the PHY does switch or rate > > > adapt.) > > > > I think I see what I am missing. The issue I have is that, assuming I > > can ever get autoneg code added, we can essentially get autoneg that > > tells us to jump between 50R or 100R2, or 25R and 50R2. If I am not > > mistaken based on the current model each of those would be a different > > interface mode. > > Yes, I agree. > > > Now there are 2 ways we can get there. The first would be to have the > > user specify it. With the SFP code as it is I think that solution > > should be mostly addressed. The issue is what happens if I do get > > autoneg up and running. That is the piece it sounds like we will need > > more code for. > > > > > For everything else, we're in fixed host interface mode, because that > > > is how we've been - and to do anything else is new development. > > > > Yeah, that is the part I need to dig into I guess. As it stands I have > > patches in the works to add the interface modes and support for > > QSFP+/28. Looks like I will need to add support for allowing the PCS > > to provide enough info to switch interface modes. > > Before I agree/disagree, let's see what the code ends up looking like. > I don't think it's going to be too bad based on what we've discussed > here. Yes, as always code talks.. I'll submit an RFC next week. For the most part the only real issue getting QSFP support going is that I have to support 2 different base pages, one for SFF8472 and one for SFF8636, so I am having to change the eeprom structure and adding logic to skip SFF8472 specific accesses. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-03 21:53 ` Alexander Duyck 2025-04-03 23:19 ` Russell King (Oracle) @ 2025-04-03 23:26 ` Russell King (Oracle) 2025-04-04 1:46 ` Andrew Lunn 1 sibling, 1 reply; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-03 23:26 UTC (permalink / raw) To: Alexander Duyck Cc: Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > How is it not a fixed link? If anything it was going to be more fixed > than what you described above. I had to laugh at this. Really. I don't think you quite understand the case that Andrew was referring to. While he said MAC to MAC, he's not just referring to two MACs that software can program to operate at any speed, thus achieving a link without autoneg. He is also talking about cases where one end is fixed e.g. by pinstrapping to a specific configuration including speed, and using anything different on the host MAC side results in no link. In that latter case, I don't think you could come up with something that is "more fixed" than that, because using anything other than the specified parameters means no link! -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-03 23:26 ` Russell King (Oracle) @ 2025-04-04 1:46 ` Andrew Lunn 2025-04-04 7:16 ` Russell King (Oracle) 0 siblings, 1 reply; 35+ messages in thread From: Andrew Lunn @ 2025-04-04 1:46 UTC (permalink / raw) To: Russell King (Oracle) Cc: Alexander Duyck, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Fri, Apr 04, 2025 at 12:26:15AM +0100, Russell King (Oracle) wrote: > On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > > How is it not a fixed link? If anything it was going to be more fixed > > than what you described above. > > I had to laugh at this. Really. I don't think you quite understand the > case that Andrew was referring to. > > While he said MAC to MAC, he's not just referring to two MACs that > software can program to operate at any speed, thus achieving a link > without autoneg. He is also talking about cases where one end is > fixed e.g. by pinstrapping to a specific configuration including > speed, and using anything different on the host MAC side results in > no link. Yep, this is pretty typical of SOHO switches, you use strapping to set the port, and it never changes, at least not without a soldering iron to take off/add resistors. There are also some SOHO switches which have a dedicated 'cpu port' and there is no configuration options at all. The CPU MAC must conform to what the switch MAC is doing. > In that latter case, I don't think you could come up with something > that is "more fixed" than that, because using anything other than > the specified parameters means no link! Well, you could maybe use the wrong duplex, and get link with all the collision problems we had back in what the 90s? 00s? Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 1:46 ` Andrew Lunn @ 2025-04-04 7:16 ` Russell King (Oracle) 2025-04-04 16:18 ` Alexander Duyck 0 siblings, 1 reply; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-04 7:16 UTC (permalink / raw) To: Andrew Lunn Cc: Alexander Duyck, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Fri, Apr 04, 2025 at 03:46:01AM +0200, Andrew Lunn wrote: > On Fri, Apr 04, 2025 at 12:26:15AM +0100, Russell King (Oracle) wrote: > > On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > > > How is it not a fixed link? If anything it was going to be more fixed > > > than what you described above. > > > > I had to laugh at this. Really. I don't think you quite understand the > > case that Andrew was referring to. > > > > While he said MAC to MAC, he's not just referring to two MACs that > > software can program to operate at any speed, thus achieving a link > > without autoneg. He is also talking about cases where one end is > > fixed e.g. by pinstrapping to a specific configuration including > > speed, and using anything different on the host MAC side results in > > no link. > > Yep, this is pretty typical of SOHO switches, you use strapping to set > the port, and it never changes, at least not without a soldering iron > to take off/add resistors. There are also some SOHO switches which > have a dedicated 'cpu port' and there is no configuration options at > all. The CPU MAC must conform to what the switch MAC is doing. From the sounds of it, Alexander seems to want to use fixed-link differently - take parameters from firmware, build swnodes, and set the MAC up for a fixed link with variable "media" type. So it's not really fixed, but "do what the firmware says". I can't see that being much different to other platforms which have firmware that gives us the link parameters. Presumably in Alexander's case, the firmware also gives link up/down notifications as well, so it seems to me that this isn't really a fixed link at all. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 7:16 ` Russell King (Oracle) @ 2025-04-04 16:18 ` Alexander Duyck 2025-04-07 17:01 ` Jakub Kicinski 0 siblings, 1 reply; 35+ messages in thread From: Alexander Duyck @ 2025-04-04 16:18 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, kuba, pabeni On Fri, Apr 4, 2025 at 12:16 AM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Fri, Apr 04, 2025 at 03:46:01AM +0200, Andrew Lunn wrote: > > On Fri, Apr 04, 2025 at 12:26:15AM +0100, Russell King (Oracle) wrote: > > > On Thu, Apr 03, 2025 at 02:53:22PM -0700, Alexander Duyck wrote: > > > > How is it not a fixed link? If anything it was going to be more fixed > > > > than what you described above. > > > > > > I had to laugh at this. Really. I don't think you quite understand the > > > case that Andrew was referring to. > > > > > > While he said MAC to MAC, he's not just referring to two MACs that > > > software can program to operate at any speed, thus achieving a link > > > without autoneg. He is also talking about cases where one end is > > > fixed e.g. by pinstrapping to a specific configuration including > > > speed, and using anything different on the host MAC side results in > > > no link. > > > > Yep, this is pretty typical of SOHO switches, you use strapping to set > > the port, and it never changes, at least not without a soldering iron > > to take off/add resistors. There are also some SOHO switches which > > have a dedicated 'cpu port' and there is no configuration options at > > all. The CPU MAC must conform to what the switch MAC is doing. I don't think you guys understand my case well either. You seem to think I am more flexible than I actually am in this setup. While I do have the firmware I can ask about settings all it provides me with is fixed link info. I can't talk to the link partner on the other end as it is just configured for whatever the one mode is it has and there is no changing it. Now yes, it isn't physically locked down. However most of the silicon in the "fixed-link" configs likely aren't either until they are put in their embedded setups. I would argue that "fixed-link" is there for setups where the kernel cannot reasonably expect to be able to understand what the link is supposed to be through other means. It is provided usually through device tree or ACPI because it is hard coded into the platform configuration. In our case the configuration for that is stored in the EEPROM and provided to us through the firmware. For any production system we have that is fixed and locked so there is no deviating from it unless you want to lose the link and RMA a server. I think the part you guys might be getting confused by is that we have 2 use cases, production and development. In the production case we will likely just want to use fixed-link or something like it. Basically our platforms are put together one way and connected to one switch and there is no deviating from it. The FW will configure the PCS and PMA beforehand as they have to do so to enable the BMC. They are as essentially locked down in terms of config as many of the embedded systems you work on. If we break the link the BMC goes offline we essentially bricked the platform. In the development case we want to be able to test all the bits and pieces and for that we need to be able to change the configuration and such. What I am trying to do is have one driver that can support both instead of doing what every other vendor does to avoid this pain which is to do one release driver and one internal development/test driver that never sees the light of day. > From the sounds of it, Alexander seems to want to use fixed-link > differently - take parameters from firmware, build swnodes, and > set the MAC up for a fixed link with variable "media" type. So > it's not really fixed, but "do what the firmware says". I can't see > that being much different to other platforms which have firmware > that gives us the link parameters. Yeah, the use case is a bit different. Instead of asking ACPI I am asking my device firmware what the link config is and then have to set things up based on that. So the main difference is what FW is having to be asked. Once I add the interface modes I can go that route instead of fixed-link I suppose. Essentially what I am doing is using fixed-link as a crutch to handle the fact that the kernel wouldn't be able to understand the config data I am presenting as it doesn't have the phy_interface_t to support it yet by swapping in PHY_INTERFACE_MODE_INTERNAL and relying on the configuration that was done by the FW to setup the link. The driver code as I have it now would probably only be fixed-link for the first half dozen patches or so until all the other code to enable the correct handling of the interfaces is up. > Presumably in Alexander's case, the firmware also gives link up/down > notifications as well, so it seems to me that this isn't really a > fixed link at all. The FW doesn't provide the link up/down. It just tells us if the link is there or not. Part of the issue is that the module is abstracted away by the firmware. So it knows what is plugged in and we have to support it. Fortunately for us though there is nothing for us to config on the QSFP. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-04 16:18 ` Alexander Duyck @ 2025-04-07 17:01 ` Jakub Kicinski 2025-04-07 18:20 ` Alexander Duyck 0 siblings, 1 reply; 35+ messages in thread From: Jakub Kicinski @ 2025-04-07 17:01 UTC (permalink / raw) To: Alexander Duyck Cc: Russell King (Oracle), Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, pabeni On Fri, 4 Apr 2025 09:18:30 -0700 Alexander Duyck wrote: > > > Yep, this is pretty typical of SOHO switches, you use strapping to set > > > the port, and it never changes, at least not without a soldering iron > > > to take off/add resistors. There are also some SOHO switches which > > > have a dedicated 'cpu port' and there is no configuration options at > > > all. The CPU MAC must conform to what the switch MAC is doing. > > I don't think you guys understand my case well either. You seem to > think I am more flexible than I actually am in this setup. While I do > have the firmware I can ask about settings all it provides me with is > fixed link info. I can't talk to the link partner on the other end as > it is just configured for whatever the one mode is it has and there is > no changing it. Now yes, it isn't physically locked down. However most > of the silicon in the "fixed-link" configs likely aren't either until > they are put in their embedded setups. I understand this code the least of all of you, obviously, but FWIW in my mind the datacenter use case is more like trying to feed set_link_ksettings() from the EEPROM. Rather than a OS config script. Maybe call it "stored link" ? Not sure how fruitful arguing whether the term "fixed-link" can be extended to cover this is going to be :S ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-07 17:01 ` Jakub Kicinski @ 2025-04-07 18:20 ` Alexander Duyck 2025-04-07 19:34 ` Andrew Lunn 0 siblings, 1 reply; 35+ messages in thread From: Alexander Duyck @ 2025-04-07 18:20 UTC (permalink / raw) To: Jakub Kicinski Cc: Russell King (Oracle), Andrew Lunn, Maxime Chevallier, netdev, hkallweit1, davem, pabeni On Mon, Apr 7, 2025 at 10:01 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 4 Apr 2025 09:18:30 -0700 Alexander Duyck wrote: > > > > Yep, this is pretty typical of SOHO switches, you use strapping to set > > > > the port, and it never changes, at least not without a soldering iron > > > > to take off/add resistors. There are also some SOHO switches which > > > > have a dedicated 'cpu port' and there is no configuration options at > > > > all. The CPU MAC must conform to what the switch MAC is doing. > > > > I don't think you guys understand my case well either. You seem to > > think I am more flexible than I actually am in this setup. While I do > > have the firmware I can ask about settings all it provides me with is > > fixed link info. I can't talk to the link partner on the other end as > > it is just configured for whatever the one mode is it has and there is > > no changing it. Now yes, it isn't physically locked down. However most > > of the silicon in the "fixed-link" configs likely aren't either until > > they are put in their embedded setups. > > I understand this code the least of all of you, obviously, but FWIW > in my mind the datacenter use case is more like trying to feed > set_link_ksettings() from the EEPROM. Rather than a OS config script. > Maybe call it "stored link" ? Not sure how fruitful arguing whether > the term "fixed-link" can be extended to cover this is going to be :S Arguably understanding this code, both phylink and our own MAC/PCS/PMD code, has been a real grind but I think I am getting there. If I am not mistaken the argument is that we aren't "fixed-link" as we have control over what the PCS/PMA configuration is. We are essentially managing things top down, whereas the expectation for "fixed-link" is more of a bottom up. Where that gets murky for us is that the firmware in our case did the top down config and is just notifying us of what it did. One thing I wasn't getting was that pcs_config doesn't actually set up the PCS. If I am understanding things correctly now that will be handled in the mac_config function. It will need to adjust the signals running to the PCS IP in order for it to get into the correct PCS/FEC mode, and then the PCS driver itself is only using the mii interface to access the registers on the device to tweak things. It isn't actually responsible for changing the PCS interface mode, just taking care of any remaining configuration setting things such as autoneg advertising. The other thing I realized is that it looks like we might actually end up using the XPCS driver. I also think I understand somewhat how they may have things working as for the 25, 40, and 50 speeds at least most of the management can be driven by the external pins so you can say whatever you want about the config, but it will set itself up based on what the external signals are telling it is. The only complication is that we have a PMA/PMD on the end of the link handling the role of CR PMD and it is behind the firmware so we will have to see what I can do to make this all work. In the meantime I think I can build things up slowly as sort of a reverse onion working from the outside in starting with the MAC and working down toward the PMD. It just means it will be a while before the upstream driver can see a ksettings_set call as we will be fixed with the XLGMII interface at the start until I can pull in the XPCS driver and then start building out both at the same time. ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-07 18:20 ` Alexander Duyck @ 2025-04-07 19:34 ` Andrew Lunn 2025-04-07 23:01 ` Alexander Duyck 0 siblings, 1 reply; 35+ messages in thread From: Andrew Lunn @ 2025-04-07 19:34 UTC (permalink / raw) To: Alexander Duyck Cc: Jakub Kicinski, Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, pabeni > Arguably understanding this code, both phylink and our own MAC/PCS/PMD > code, has been a real grind but I think I am getting there. If I am > not mistaken the argument is that we aren't "fixed-link" as we have > control over what the PCS/PMA configuration is. We are essentially > managing things top down, whereas the expectation for "fixed-link" is > more of a bottom up. Fixed link is there to emulate something which does not exist. Typically you have a MAC connected to a PHY, or a MAC connected to an SFP. The PHY or the SFP tell you how to configure the MAC so the chain MAC/PHY/Media works. However, there are use cases where you connect one MAC directly to another MAC. E.G you connect a NIC MAC to the MAC port of a switch. Fixed link allows you to emulate the missing PHY, so that the MAC has no idea the PHY is missing. The end result is that the MAC gets configured to make the MAC-MAC link work, without the MAC even knowing it is connected to another MAC. When talking about top down vs bottom up, i think you mean MAC towards the media, vs media towards the MAC. Because of Base-T autoneg, etc, phylink configures stuff upwards from the media. It needs to wait for the PHY to determine what the media is going to do. The PHY will then decided what its host side needs, SGMII, 2500BaseX, 10Gbase etc. That then allows phylink to configure the PCS to output that. And then you need to configure the MAC to feed the PCSs at the correct speed. I don't think SFPs are so different. The SFP will probably indicate what its maximum bandwidth is. You configure that in the PCS and let is negotiate with the link partner PCS to determine the speed, pause etc. You can then configure the MAC with the results of that negotiation. So fixed-link is not really bottom up, the whole configuration chain is media towards the MAC, and it makes no difference if the PHY is being emulated via a fixed-link because it does not exist. Andrew ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting 2025-04-07 19:34 ` Andrew Lunn @ 2025-04-07 23:01 ` Alexander Duyck 0 siblings, 0 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-07 23:01 UTC (permalink / raw) To: Andrew Lunn Cc: Jakub Kicinski, Russell King (Oracle), Maxime Chevallier, netdev, hkallweit1, davem, pabeni On Mon, Apr 7, 2025 at 12:34 PM Andrew Lunn <andrew@lunn.ch> wrote: > > > Arguably understanding this code, both phylink and our own MAC/PCS/PMD > > code, has been a real grind but I think I am getting there. If I am > > not mistaken the argument is that we aren't "fixed-link" as we have > > control over what the PCS/PMA configuration is. We are essentially > > managing things top down, whereas the expectation for "fixed-link" is > > more of a bottom up. > > Fixed link is there to emulate something which does not > exist. Typically you have a MAC connected to a PHY, or a MAC connected > to an SFP. The PHY or the SFP tell you how to configure the MAC so the > chain MAC/PHY/Media works. > > However, there are use cases where you connect one MAC directly to > another MAC. E.G you connect a NIC MAC to the MAC port of a switch. > > Fixed link allows you to emulate the missing PHY, so that the MAC has > no idea the PHY is missing. The end result is that the MAC gets > configured to make the MAC-MAC link work, without the MAC even knowing > it is connected to another MAC. > > When talking about top down vs bottom up, i think you mean MAC towards > the media, vs media towards the MAC. Because of Base-T autoneg, etc, > phylink configures stuff upwards from the media. It needs to wait for > the PHY to determine what the media is going to do. The PHY will then > decided what its host side needs, SGMII, 2500BaseX, 10Gbase etc. That > then allows phylink to configure the PCS to output that. And then you > need to configure the MAC to feed the PCSs at the correct speed. I > don't think SFPs are so different. The SFP will probably indicate what > its maximum bandwidth is. You configure that in the PCS and let is > negotiate with the link partner PCS to determine the speed, pause > etc. You can then configure the MAC with the results of that > negotiation. That isn't really true for phylink though. We have to specify a phy_interface_t before autoneg is even an option. In our case that will probably be doable since we just have to press the FW to give us the value. If we don't have an SFP cage you aren't changing that interface type either. > So fixed-link is not really bottom up, the whole configuration chain > is media towards the MAC, and it makes no difference if the PHY is > being emulated via a fixed-link because it does not exist. I'm not sure what you are talking about. When I refer to the bottom I am referring to the media. That is how it is depicted in all the IEEE drawings. I think we are actually agreeing, but I am not sure. What I was saying is that in our case we end up with the userspace or FW providing a link setting and we push it down to the PMD from the MAC side of things. Whereas for something like fixed-link you are essentially faking the autoneg and making it work from the bottom up. ^ permalink raw reply [flat|nested] 35+ messages in thread
* [net PATCH 2/2] net: phylink: Set advertising based on phy_lookup_setting in ksettings_set 2025-04-01 21:29 [net PATCH 0/2] Fixes for net/phy/phylink.c Alexander Duyck 2025-04-01 21:30 ` [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting Alexander Duyck @ 2025-04-01 21:30 ` Alexander Duyck 2025-04-02 18:02 ` Russell King (Oracle) 1 sibling, 1 reply; 35+ messages in thread From: Alexander Duyck @ 2025-04-01 21:30 UTC (permalink / raw) To: netdev; +Cc: linux, andrew, hkallweit1, davem, kuba, pabeni, maxime.chevallier From: Alexander Duyck <alexanderduyck@fb.com> While testing a driver that supports mulitple speeds on the same SFP module I noticed I wasn't able to change them when I was not using autonegotiation. I would attempt to update the speed, but it had no effect. A bit of digging led me to the fact that we weren't updating the advertised link mask and as a result the interface wasn't being updated when I requested an updated speed. This change makes it so that we apply the speed from the phy settings to the config.advertised following a behavior similar to what we already do when setting up a fixed-link. Fixes: ea269a6f7207 ("net: phylink: Update SFP selected interface on advertising changes") Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> --- drivers/net/phy/phylink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 380e51c5bdaa..f561a803e5ce 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -2763,6 +2763,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, config.speed = c->speed; config.duplex = c->duplex; + linkmode_and(config.advertising, c->linkmodes, pl->supported); break; case AUTONEG_ENABLE: ^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [net PATCH 2/2] net: phylink: Set advertising based on phy_lookup_setting in ksettings_set 2025-04-01 21:30 ` [net PATCH 2/2] net: phylink: Set advertising based on phy_lookup_setting in ksettings_set Alexander Duyck @ 2025-04-02 18:02 ` Russell King (Oracle) 2025-04-02 22:34 ` Alexander Duyck 0 siblings, 1 reply; 35+ messages in thread From: Russell King (Oracle) @ 2025-04-02 18:02 UTC (permalink / raw) To: Alexander Duyck Cc: netdev, andrew, hkallweit1, davem, kuba, pabeni, maxime.chevallier On Tue, Apr 01, 2025 at 02:30:13PM -0700, Alexander Duyck wrote: > From: Alexander Duyck <alexanderduyck@fb.com> > > While testing a driver that supports mulitple speeds on the same SFP module > I noticed I wasn't able to change them when I was not using > autonegotiation. I would attempt to update the speed, but it had no effect. > > A bit of digging led me to the fact that we weren't updating the advertised > link mask and as a result the interface wasn't being updated when I > requested an updated speed. This change makes it so that we apply the speed > from the phy settings to the config.advertised following a behavior similar > to what we already do when setting up a fixed-link. > > Fixes: ea269a6f7207 ("net: phylink: Update SFP selected interface on advertising changes") > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > --- > drivers/net/phy/phylink.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > index 380e51c5bdaa..f561a803e5ce 100644 > --- a/drivers/net/phy/phylink.c > +++ b/drivers/net/phy/phylink.c > @@ -2763,6 +2763,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, > > config.speed = c->speed; > config.duplex = c->duplex; > + linkmode_and(config.advertising, c->linkmodes, pl->supported); I had thought that ethtool provided an appropriate advertising mask when aneg is disabled, but it just preserves the old mask, which seems to be the intended behaviour (if one looks at phylib, that's also what happens there.) We should not deviate from that with a user API. So, I would like to change how this works somewhat to avoid a user visible change. Also, interface mode changing on AUTONEG_DISABLED was never intended to work. Indeed, mvneta and mvpp2 don't support AUTONEG_DISABLED for 1000BASE-X nor 2500BASE-X which is where this interface switching was implemented (for switching between these two.) I've already got rid of the phylink_sfp_select_interface() usage when a module is inserted (see phylink_sfp_config_optical(), where we base the interface selection off interface support masks there rather than advertisements - it used to be advertisements.) We now have phylink_interface_max_speed() which gives the speed of the interface, which gives us the possibility of doing something like this for the AUTONEG_DISABLE state: phy_interface_and(interfaces, pl->config->supported_interfaces, pl->sfp_interfaces); best_speed = SPEED_UNKNOWN; best_interface = PHY_INTERFACE_MODE_NA; for_each_set_bit(interface, interfaces, __ETHTOOL_LINK_MODE_MASK_NBITS) { max_speed = phylink_interface_max_speed(interface); if (max_speed < config.speed) continue; if (max_speed == config.speed) return interface; if (best_speed == SPEED_UNKNOWN || max_speed < best_speed) { best_speed = max_speed; best_interface = interface; } } return best_interface; to select the interface from aneg-disabled state. Do you think that would work for you? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [net PATCH 2/2] net: phylink: Set advertising based on phy_lookup_setting in ksettings_set 2025-04-02 18:02 ` Russell King (Oracle) @ 2025-04-02 22:34 ` Alexander Duyck 0 siblings, 0 replies; 35+ messages in thread From: Alexander Duyck @ 2025-04-02 22:34 UTC (permalink / raw) To: Russell King (Oracle) Cc: netdev, andrew, hkallweit1, davem, kuba, pabeni, maxime.chevallier On Wed, Apr 2, 2025 at 11:02 AM Russell King (Oracle) <linux@armlinux.org.uk> wrote: > > On Tue, Apr 01, 2025 at 02:30:13PM -0700, Alexander Duyck wrote: > > From: Alexander Duyck <alexanderduyck@fb.com> > > > > While testing a driver that supports mulitple speeds on the same SFP module > > I noticed I wasn't able to change them when I was not using > > autonegotiation. I would attempt to update the speed, but it had no effect. > > > > A bit of digging led me to the fact that we weren't updating the advertised > > link mask and as a result the interface wasn't being updated when I > > requested an updated speed. This change makes it so that we apply the speed > > from the phy settings to the config.advertised following a behavior similar > > to what we already do when setting up a fixed-link. > > > > Fixes: ea269a6f7207 ("net: phylink: Update SFP selected interface on advertising changes") > > Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> > > --- > > drivers/net/phy/phylink.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c > > index 380e51c5bdaa..f561a803e5ce 100644 > > --- a/drivers/net/phy/phylink.c > > +++ b/drivers/net/phy/phylink.c > > @@ -2763,6 +2763,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, > > > > config.speed = c->speed; > > config.duplex = c->duplex; > > + linkmode_and(config.advertising, c->linkmodes, pl->supported); > > I had thought that ethtool provided an appropriate advertising mask > when aneg is disabled, but it just preserves the old mask, which seems > to be the intended behaviour (if one looks at phylib, that's also what > happens there.) We should not deviate from that with a user API. > > So, I would like to change how this works somewhat to avoid a user > visible change. Also, interface mode changing on AUTONEG_DISABLED was > never intended to work. Indeed, mvneta and mvpp2 don't support > AUTONEG_DISABLED for 1000BASE-X nor 2500BASE-X which is where this > interface switching was implemented (for switching between these two.) > > I've already got rid of the phylink_sfp_select_interface() usage when > a module is inserted (see phylink_sfp_config_optical(), where we base > the interface selection off interface support masks there rather than > advertisements - it used to be advertisements.) > > We now have phylink_interface_max_speed() which gives the speed of > the interface, which gives us the possibility of doing something > like this for the AUTONEG_DISABLE state: > > phy_interface_and(interfaces, pl->config->supported_interfaces, > pl->sfp_interfaces); > best_speed = SPEED_UNKNOWN; > best_interface = PHY_INTERFACE_MODE_NA; > > for_each_set_bit(interface, interfaces, __ETHTOOL_LINK_MODE_MASK_NBITS) { > max_speed = phylink_interface_max_speed(interface); > if (max_speed < config.speed) > continue; > if (max_speed == config.speed) > return interface; > if (best_speed == SPEED_UNKNOWN || > max_speed < best_speed) { > best_speed = max_speed; > best_interface = interface; > } > } > > return best_interface; > > to select the interface from aneg-disabled state. > > Do you think that would work for you? That should work. The only case where it might get iffy would be a QSFP-DD cable that supported both NRZ and PAM4. In that case we might get a 50R1 when we are expecting a 50R2. However that is kind of a problem throughout with all the pure speed/duplex checks. The only way to get around that would be to add a new check for lanes to kind of take the place of duplex as we would need to also have the lanes match. ^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2025-04-07 23:02 UTC | newest] Thread overview: 35+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-01 21:29 [net PATCH 0/2] Fixes for net/phy/phylink.c Alexander Duyck 2025-04-01 21:30 ` [net PATCH 1/2] net: phy: Cleanup handling of recent changes to phy_lookup_setting Alexander Duyck 2025-04-02 7:00 ` Maxime Chevallier 2025-04-02 14:21 ` Alexander H Duyck 2025-04-02 17:17 ` Jakub Kicinski 2025-04-02 17:30 ` Russell King (Oracle) 2025-04-02 22:37 ` Alexander Duyck 2025-04-03 14:55 ` Russell King (Oracle) 2025-04-03 15:29 ` Maxime Chevallier 2025-04-03 16:34 ` Andrew Lunn 2025-04-03 21:53 ` Alexander Duyck 2025-04-03 23:19 ` Russell King (Oracle) 2025-04-04 15:56 ` Alexander Duyck 2025-04-04 16:33 ` Andrew Lunn 2025-04-04 22:46 ` Alexander Duyck 2025-04-05 9:43 ` Russell King (Oracle) 2025-04-05 14:51 ` Andrew Lunn 2025-04-05 20:41 ` Alexander Duyck 2025-04-05 20:53 ` Andrew Lunn 2025-04-05 9:10 ` Russell King (Oracle) 2025-04-05 15:43 ` Andrew Lunn 2025-04-05 15:52 ` Andrew Lunn 2025-04-05 16:19 ` Alexander Duyck 2025-04-05 20:23 ` Alexander Duyck 2025-04-03 23:26 ` Russell King (Oracle) 2025-04-04 1:46 ` Andrew Lunn 2025-04-04 7:16 ` Russell King (Oracle) 2025-04-04 16:18 ` Alexander Duyck 2025-04-07 17:01 ` Jakub Kicinski 2025-04-07 18:20 ` Alexander Duyck 2025-04-07 19:34 ` Andrew Lunn 2025-04-07 23:01 ` Alexander Duyck 2025-04-01 21:30 ` [net PATCH 2/2] net: phylink: Set advertising based on phy_lookup_setting in ksettings_set Alexander Duyck 2025-04-02 18:02 ` Russell King (Oracle) 2025-04-02 22:34 ` Alexander Duyck
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).