* mv88e6240 configuration broken for B850v3 @ 2021-12-03 9:06 Martyn Welch 2021-12-03 16:25 ` Andrew Lunn 0 siblings, 1 reply; 34+ messages in thread From: Martyn Welch @ 2021-12-03 9:06 UTC (permalink / raw) To: Andrew Lunn Cc: Vivien Didelot, Florian Fainelli, Vladimir Oltean, netdev, kernel Hi Andrew, I'm currently in the process of updating the GE B850v3 [1] to run a newer kernel than the one it's currently running. This device (and others in the same family) use a mv88e6240 switch to provide a number of their ethernet ports. The CPU link on the switch is connected via a PHY, as the network port on the SoM used is exposed via a PHY. The ports of the B850v3 stopped working when I upgraded, bisecting resulted in me finding that this commit was the root cause: 3be98b2d5fbc (refs/bisect/bad) net: dsa: Down cpu/dsa ports phylink will control I think this is causing the PHY on the mv88e6240 side of the CPU link to be forced down in our use case. I assume an extra check is needed here to stop that in cases like ours, though I'm not sure what at this point. Any ideas? Thanks in advance, Martyn [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/imx6q-b850v3.dts ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-03 9:06 mv88e6240 configuration broken for B850v3 Martyn Welch @ 2021-12-03 16:25 ` Andrew Lunn 2021-12-06 17:44 ` Martyn Welch 0 siblings, 1 reply; 34+ messages in thread From: Andrew Lunn @ 2021-12-03 16:25 UTC (permalink / raw) To: Martyn Welch Cc: Vivien Didelot, Florian Fainelli, Vladimir Oltean, netdev, kernel, Russell King > Hi Andrew, Adding Russell to Cc: > I'm currently in the process of updating the GE B850v3 [1] to run a > newer kernel than the one it's currently running. Which kernel exactly. We like bug reports against net-next, or at least the last -rc. > This device (and others in the same family) use a mv88e6240 switch to > provide a number of their ethernet ports. The CPU link on the switch is > connected via a PHY, as the network port on the SoM used is exposed via > a PHY. > > The ports of the B850v3 stopped working when I upgraded, bisecting > resulted in me finding that this commit was the root cause: > > 3be98b2d5fbc (refs/bisect/bad) net: dsa: Down cpu/dsa ports phylink > will control > > I think this is causing the PHY on the mv88e6240 side of the CPU link > to be forced down in our use case. > > I assume an extra check is needed here to stop that in cases like ours, > though I'm not sure what at this point. Any ideas? From the commit message. DSA and CPU ports can be configured in two ways. By default, the driver should configure such ports to there maximum bandwidth. For most use cases, this is sufficient. When this default is insufficient, a phylink instance can be bound to such ports, and phylink will configure the port, You have a phy-handle in your node: port@4 { reg = <4>; label = "cpu"; ethernet = <&switch_nic>; phy-handle = <&switchphy4>; }; so i would expect there to be a phylink instance. The commit message continues to say: and phylink will configure the port, e.g. based on fixed-link properties. So i think you are asking the wrong question. It is not an extra check is needed here, we need to understand why phylink is not configuring the MAC. Or is that configuration wrong. I suggest you add #define DEBUG 1 to the very top of drivers/net/phy/phylink.c so we can see what phylink is doing. Andrew ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-03 16:25 ` Andrew Lunn @ 2021-12-06 17:44 ` Martyn Welch 2021-12-06 18:26 ` Martyn Welch 0 siblings, 1 reply; 34+ messages in thread From: Martyn Welch @ 2021-12-06 17:44 UTC (permalink / raw) To: Andrew Lunn Cc: Vivien Didelot, Florian Fainelli, Vladimir Oltean, netdev, kernel, Russell King On Fri, 2021-12-03 at 17:25 +0100, Andrew Lunn wrote: > > Hi Andrew, > > Adding Russell to Cc: > > > I'm currently in the process of updating the GE B850v3 [1] to run a > > newer kernel than the one it's currently running. > > Which kernel exactly. We like bug reports against net-next, or at > least the last -rc. > I tested using v5.15-rc3 and that was also affected. > > > > This device (and others in the same family) use a mv88e6240 switch to > > provide a number of their ethernet ports. The CPU link on the switch > > is > > connected via a PHY, as the network port on the SoM used is exposed > > via > > a PHY. > > > > The ports of the B850v3 stopped working when I upgraded, bisecting > > resulted in me finding that this commit was the root cause: > > > > 3be98b2d5fbc (refs/bisect/bad) net: dsa: Down cpu/dsa ports phylink > > will control > > > > I think this is causing the PHY on the mv88e6240 side of the CPU link > > to be forced down in our use case. > > > > I assume an extra check is needed here to stop that in cases like > > ours, > > though I'm not sure what at this point. Any ideas? > > From the commit message. > > DSA and CPU ports can be configured in two ways. By default, the > driver should configure such ports to there maximum bandwidth. For > most use cases, this is sufficient. When this default is > insufficient, > a phylink instance can be bound to such ports, and phylink will > configure the port, > > You have a phy-handle in your node: > > port@4 { > reg = <4>; > label = "cpu"; > ethernet = <&switch_nic>; > phy-handle = <&switchphy4>; > }; > > so i would expect there to be a phylink instance. The commit message > continues to say: > > and phylink will > configure the port, e.g. based on fixed-link properties. > > So i think you are asking the wrong question. It is not an extra check > is needed here, we need to understand why phylink is not configuring > the MAC. Or is that configuration wrong. > > I suggest you add #define DEBUG 1 to the very top of > drivers/net/phy/phylink.c so we can see what phylink is doing. > The pertinent parts of the logs (from v5.16-rc3, with the above debug added) appear to be: # dmesg | grep -e mv88e6085 -e DSA -e libphy [ 2.119282] libphy: Fixed MDIO Bus: probed [ 2.124547] libphy: GPIO Bitbanged MDIO: probed [ 2.129795] mv88e6085 gpio-0:00: switch 0x2400 detected: Marvell 88E6240, revision 1 [ 2.150568] libphy: mdio: probed [ 2.224233] libphy: fec_enet_mii_bus: probed [ 3.064455] mv88e6085 gpio-0:00: switch 0x2400 detected: Marvell 88E6240, revision 1 [ 3.083930] libphy: mdio: probed [ 3.844882] mv88e6085 gpio-0:00 eneport1 (uninitialized): PHY [!mdio-gpio!switch@0!mdio:00] driver [Marvell 88E1540] (irq=362) [ 3.856335] mv88e6085 gpio-0:00 eneport1 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef [ 3.962649] mv88e6085 gpio-0:00 eneport2 (uninitialized): PHY [!mdio-gpio!switch@0!mdio:01] driver [Marvell 88E1540] (irq=363) [ 3.974091] mv88e6085 gpio-0:00 eneport2 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef [ 4.080405] mv88e6085 gpio-0:00 enix (uninitialized): PHY [!mdio-gpio!switch@0!mdio:02] driver [Marvell 88E1540] (irq=364) [ 4.091510] mv88e6085 gpio-0:00 enix (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef [ 4.202683] mv88e6085 gpio-0:00 enid (uninitialized): PHY [!mdio-gpio!switch@0!mdio:03] driver [Marvell 88E1540] (irq=365) [ 4.213774] mv88e6085 gpio-0:00 enid (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef [ 4.298209] mv88e6085 gpio-0:00: PHY [!mdio-gpio!switch@0!mdio:04] driver [Marvell 88E1540] (irq=366) [ 4.307475] mv88e6085 gpio-0:00: phy: setting supported 0000000,00000000,000022ef advertising 0000000,00000000,000022ef [ 4.314285] mv88e6085 gpio-0:00: configuring for phy/ link mode [ 4.320251] mv88e6085 gpio-0:00: major config [ 4.320262] mv88e6085 gpio-0:00: phylink_mac_config: mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 [ 4.324265] DSA: tree 0 setup [ 4.399423] mv88e6085 gpio-0:00: phy link down /Unknown/Unknown/off [ 15.600977] mv88e6085 gpio-0:00 enix: configuring for phy/ link mode [ 15.607417] mv88e6085 gpio-0:00 enix: major config [ 15.607443] mv88e6085 gpio-0:00 enix: phylink_mac_config: mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 [ 15.678811] mv88e6085 gpio-0:00 enix: phy link down /Unknown/Unknown/off [ 15.961559] mv88e6085 gpio-0:00 enid: configuring for phy/ link mode [ 15.967945] mv88e6085 gpio-0:00 enid: major config [ 15.967958] mv88e6085 gpio-0:00 enid: phylink_mac_config: mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 [ 15.986370] mv88e6085 gpio-0:00 enix: configuring for phy/ link mode [ 15.992829] mv88e6085 gpio-0:00 enix: major config [ 15.992843] mv88e6085 gpio-0:00 enix: phylink_mac_config: mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 [ 16.019628] mv88e6085 gpio-0:00 eneport2: configuring for phy/ link mode [ 16.026370] mv88e6085 gpio-0:00 eneport2: major config [ 16.026382] mv88e6085 gpio-0:00 eneport2: phylink_mac_config: mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 [ 16.057585] mv88e6085 gpio-0:00 eneport1: configuring for phy/ link mode [ 16.064367] mv88e6085 gpio-0:00 eneport1: major config [ 16.064381] mv88e6085 gpio-0:00 eneport1: phylink_mac_config: mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0 [ 16.132700] mv88e6085 gpio-0:00 enid: phy link up /10Mbps/Full/off [ 16.134210] mv88e6085 gpio-0:00 enid: phylink_mac_config: mode=phy//10Mbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0 [ 16.141177] mv88e6085 gpio-0:00 enid: Link is Up - 10Mbps/Full - flow control off [ 16.195201] mv88e6085 gpio-0:00 enix: phy link down /Unknown/Unknown/off [ 16.215131] mv88e6085 gpio-0:00 eneport2: phy link down /Unknown/Unknown/off [ 16.254004] mv88e6085 gpio-0:00 eneport1: phy link up /10Mbps/Full/off [ 16.254027] mv88e6085 gpio-0:00 eneport1: phylink_mac_config: mode=phy//10Mbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0 [ 16.254056] mv88e6085 gpio-0:00 eneport1: Link is Up - 10Mbps/Full - flow control off [ 18.706700] mv88e6085 gpio-0:00: phy link up /1Gbps/Full/rx/tx [ 18.708976] mv88e6085 gpio-0:00: phylink_mac_config: mode=phy//1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 an=0 [ 18.709002] mv88e6085 gpio-0:00: Link is Up - 1Gbps/Full - flow control rx/tx Despite the last few lines suggesting to me the phy link is up, I'm unable to access the network I'd expect to be able to access. > Andrew > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 17:44 ` Martyn Welch @ 2021-12-06 18:26 ` Martyn Welch 2021-12-06 18:31 ` Vladimir Oltean 0 siblings, 1 reply; 34+ messages in thread From: Martyn Welch @ 2021-12-06 18:26 UTC (permalink / raw) To: Andrew Lunn Cc: Vivien Didelot, Florian Fainelli, Vladimir Oltean, netdev, kernel, Russell King On Mon, 2021-12-06 at 17:44 +0000, Martyn Welch wrote: > On Fri, 2021-12-03 at 17:25 +0100, Andrew Lunn wrote: > > > Hi Andrew, > > > > Adding Russell to Cc: > > > > > I'm currently in the process of updating the GE B850v3 [1] to run > > > a > > > newer kernel than the one it's currently running. > > > > Which kernel exactly. We like bug reports against net-next, or at > > least the last -rc. > > > > I tested using v5.15-rc3 and that was also affected. > I've just tested v5.16-rc4 (sorry - just realised I previously wrote v5.15-rc3, it was v5.16-rc3...) and that was exactly the same. > > > > > > This device (and others in the same family) use a mv88e6240 > > > switch to > > > provide a number of their ethernet ports. The CPU link on the > > > switch > > > is > > > connected via a PHY, as the network port on the SoM used is > > > exposed > > > via > > > a PHY. > > > > > > The ports of the B850v3 stopped working when I upgraded, > > > bisecting > > > resulted in me finding that this commit was the root cause: > > > > > > 3be98b2d5fbc (refs/bisect/bad) net: dsa: Down cpu/dsa ports > > > phylink > > > will control > > > > > > I think this is causing the PHY on the mv88e6240 side of the CPU > > > link > > > to be forced down in our use case. > > > > > > I assume an extra check is needed here to stop that in cases like > > > ours, > > > though I'm not sure what at this point. Any ideas? > > > > From the commit message. > > > > DSA and CPU ports can be configured in two ways. By default, > > the > > driver should configure such ports to there maximum bandwidth. > > For > > most use cases, this is sufficient. When this default is > > insufficient, > > a phylink instance can be bound to such ports, and phylink will > > configure the port, > > > > You have a phy-handle in your node: > > > > port@4 { > > reg = <4>; > > label = "cpu"; > > ethernet = <&switch_nic>; > > phy-handle = <&switchphy4>; > > }; > > > > so i would expect there to be a phylink instance. The commit > > message > > continues to say: > > > > and phylink will > > configure the port, e.g. based on fixed-link properties. > > > > So i think you are asking the wrong question. It is not an extra > > check > > is needed here, we need to understand why phylink is not > > configuring > > the MAC. Or is that configuration wrong. > > > > I suggest you add #define DEBUG 1 to the very top of > > drivers/net/phy/phylink.c so we can see what phylink is doing. > > > > The pertinent parts of the logs (from v5.16-rc3, with the above debug > added) appear to be: > > # dmesg | grep -e mv88e6085 -e DSA -e libphy > [ 2.119282] libphy: Fixed MDIO Bus: probed > [ 2.124547] libphy: GPIO Bitbanged MDIO: probed > [ 2.129795] mv88e6085 gpio-0:00: switch 0x2400 detected: Marvell > 88E6240, revision 1 > [ 2.150568] libphy: mdio: probed > [ 2.224233] libphy: fec_enet_mii_bus: probed > [ 3.064455] mv88e6085 gpio-0:00: switch 0x2400 detected: Marvell > 88E6240, revision 1 > [ 3.083930] libphy: mdio: probed > [ 3.844882] mv88e6085 gpio-0:00 eneport1 (uninitialized): PHY > [!mdio-gpio!switch@0!mdio:00] driver [Marvell 88E1540] (irq=362) > [ 3.856335] mv88e6085 gpio-0:00 eneport1 (uninitialized): phy: > setting supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > [ 3.962649] mv88e6085 gpio-0:00 eneport2 (uninitialized): PHY > [!mdio-gpio!switch@0!mdio:01] driver [Marvell 88E1540] (irq=363) > [ 3.974091] mv88e6085 gpio-0:00 eneport2 (uninitialized): phy: > setting supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > [ 4.080405] mv88e6085 gpio-0:00 enix (uninitialized): PHY > [!mdio-gpio!switch@0!mdio:02] driver [Marvell 88E1540] (irq=364) > [ 4.091510] mv88e6085 gpio-0:00 enix (uninitialized): phy: setting > supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > [ 4.202683] mv88e6085 gpio-0:00 enid (uninitialized): PHY > [!mdio-gpio!switch@0!mdio:03] driver [Marvell 88E1540] (irq=365) > [ 4.213774] mv88e6085 gpio-0:00 enid (uninitialized): phy: setting > supported 0000000,00000000,000022ef advertising > 0000000,00000000,000022ef > [ 4.298209] mv88e6085 gpio-0:00: PHY [!mdio-gpio!switch@0!mdio:04] > driver [Marvell 88E1540] (irq=366) > [ 4.307475] mv88e6085 gpio-0:00: phy: setting supported > 0000000,00000000,000022ef advertising 0000000,00000000,000022ef > [ 4.314285] mv88e6085 gpio-0:00: configuring for phy/ link mode > [ 4.320251] mv88e6085 gpio-0:00: major config > [ 4.320262] mv88e6085 gpio-0:00: phylink_mac_config: > mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 > link=0 > an=0 > [ 4.324265] DSA: tree 0 setup > [ 4.399423] mv88e6085 gpio-0:00: phy link down > /Unknown/Unknown/off > [ 15.600977] mv88e6085 gpio-0:00 enix: configuring for phy/ link > mode > [ 15.607417] mv88e6085 gpio-0:00 enix: major config > [ 15.607443] mv88e6085 gpio-0:00 enix: phylink_mac_config: > mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 > link=0 > an=0 > [ 15.678811] mv88e6085 gpio-0:00 enix: phy link down > /Unknown/Unknown/off > [ 15.961559] mv88e6085 gpio-0:00 enid: configuring for phy/ link > mode > [ 15.967945] mv88e6085 gpio-0:00 enid: major config > [ 15.967958] mv88e6085 gpio-0:00 enid: phylink_mac_config: > mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 > link=0 > an=0 > [ 15.986370] mv88e6085 gpio-0:00 enix: configuring for phy/ link > mode > [ 15.992829] mv88e6085 gpio-0:00 enix: major config > [ 15.992843] mv88e6085 gpio-0:00 enix: phylink_mac_config: > mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 > link=0 > an=0 > [ 16.019628] mv88e6085 gpio-0:00 eneport2: configuring for phy/ > link > mode > [ 16.026370] mv88e6085 gpio-0:00 eneport2: major config > [ 16.026382] mv88e6085 gpio-0:00 eneport2: phylink_mac_config: > mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 > link=0 > an=0 > [ 16.057585] mv88e6085 gpio-0:00 eneport1: configuring for phy/ > link > mode > [ 16.064367] mv88e6085 gpio-0:00 eneport1: major config > [ 16.064381] mv88e6085 gpio-0:00 eneport1: phylink_mac_config: > mode=phy//Unknown/Unknown adv=0000000,00000000,00000000 pause=00 > link=0 > an=0 > [ 16.132700] mv88e6085 gpio-0:00 enid: phy link up /10Mbps/Full/off > [ 16.134210] mv88e6085 gpio-0:00 enid: phylink_mac_config: > mode=phy//10Mbps/Full adv=0000000,00000000,00000000 pause=00 link=1 > an=0 > [ 16.141177] mv88e6085 gpio-0:00 enid: Link is Up - 10Mbps/Full - > flow control off > [ 16.195201] mv88e6085 gpio-0:00 enix: phy link down > /Unknown/Unknown/off > [ 16.215131] mv88e6085 gpio-0:00 eneport2: phy link down > /Unknown/Unknown/off > [ 16.254004] mv88e6085 gpio-0:00 eneport1: phy link up > /10Mbps/Full/off > [ 16.254027] mv88e6085 gpio-0:00 eneport1: phylink_mac_config: > mode=phy//10Mbps/Full adv=0000000,00000000,00000000 pause=00 link=1 > an=0 > [ 16.254056] mv88e6085 gpio-0:00 eneport1: Link is Up - 10Mbps/Full > - > flow control off > [ 18.706700] mv88e6085 gpio-0:00: phy link up /1Gbps/Full/rx/tx > [ 18.708976] mv88e6085 gpio-0:00: phylink_mac_config: > mode=phy//1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 > an=0 > [ 18.709002] mv88e6085 gpio-0:00: Link is Up - 1Gbps/Full - flow > control rx/tx > > Despite the last few lines suggesting to me the phy link is up, I'm > unable to access the network I'd expect to be able to access. > > > Andrew > > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 18:26 ` Martyn Welch @ 2021-12-06 18:31 ` Vladimir Oltean 2021-12-06 18:37 ` Martyn Welch 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 18:31 UTC (permalink / raw) To: Martyn Welch Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel, Russell King On Mon, Dec 06, 2021 at 06:26:25PM +0000, Martyn Welch wrote: > On Mon, 2021-12-06 at 17:44 +0000, Martyn Welch wrote: > > On Fri, 2021-12-03 at 17:25 +0100, Andrew Lunn wrote: > > > > Hi Andrew, > > > > > > Adding Russell to Cc: > > > > > > > I'm currently in the process of updating the GE B850v3 [1] to run > > > > a > > > > newer kernel than the one it's currently running. > > > > > > Which kernel exactly. We like bug reports against net-next, or at > > > least the last -rc. > > > > > > > I tested using v5.15-rc3 and that was also affected. > > > > I've just tested v5.16-rc4 (sorry - just realised I previously wrote > v5.15-rc3, it was v5.16-rc3...) and that was exactly the same. Just to clarify: you're saying that you're on v5.16-rc4 and that if you revert commit 3be98b2d5fbc ("net: dsa: Down cpu/dsa ports phylink will control"), the link works again? It is a bit strange that the external ports negotiate at 10Mbps/Full, is that the link speed you intend the ports to work at? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 18:31 ` Vladimir Oltean @ 2021-12-06 18:37 ` Martyn Welch 2021-12-06 18:50 ` Vladimir Oltean 0 siblings, 1 reply; 34+ messages in thread From: Martyn Welch @ 2021-12-06 18:37 UTC (permalink / raw) To: Vladimir Oltean Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel, Russell King On Mon, 2021-12-06 at 20:31 +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 06:26:25PM +0000, Martyn Welch wrote: > > On Mon, 2021-12-06 at 17:44 +0000, Martyn Welch wrote: > > > On Fri, 2021-12-03 at 17:25 +0100, Andrew Lunn wrote: > > > > > Hi Andrew, > > > > > > > > Adding Russell to Cc: > > > > > > > > > I'm currently in the process of updating the GE B850v3 [1] to > > > > > run > > > > > a > > > > > newer kernel than the one it's currently running. > > > > > > > > Which kernel exactly. We like bug reports against net-next, or > > > > at > > > > least the last -rc. > > > > > > > > > > I tested using v5.15-rc3 and that was also affected. > > > > > > > I've just tested v5.16-rc4 (sorry - just realised I previously > > wrote > > v5.15-rc3, it was v5.16-rc3...) and that was exactly the same. > > Just to clarify: you're saying that you're on v5.16-rc4 and that if > you > revert commit 3be98b2d5fbc ("net: dsa: Down cpu/dsa ports phylink > will > control"), the link works again? > Correct > It is a bit strange that the external ports negotiate at 10Mbps/Full, > is that the link speed you intend the ports to work at? Yes, that's 100% intentional due to what's connected to to those ports and the environment it works in. Martyn ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 18:37 ` Martyn Welch @ 2021-12-06 18:50 ` Vladimir Oltean 2021-12-06 19:24 ` Martyn Welch 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 18:50 UTC (permalink / raw) To: Martyn Welch Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel, Russell King On Mon, Dec 06, 2021 at 06:37:44PM +0000, Martyn Welch wrote: > On Mon, 2021-12-06 at 20:31 +0200, Vladimir Oltean wrote: > > On Mon, Dec 06, 2021 at 06:26:25PM +0000, Martyn Welch wrote: > > > On Mon, 2021-12-06 at 17:44 +0000, Martyn Welch wrote: > > > > On Fri, 2021-12-03 at 17:25 +0100, Andrew Lunn wrote: > > > > > > Hi Andrew, > > > > > > > > > > Adding Russell to Cc: > > > > > > > > > > > I'm currently in the process of updating the GE B850v3 [1] to > > > > > > run > > > > > > a > > > > > > newer kernel than the one it's currently running. > > > > > > > > > > Which kernel exactly. We like bug reports against net-next, or > > > > > at > > > > > least the last -rc. > > > > > > > > > > > > > I tested using v5.15-rc3 and that was also affected. > > > > > > > > > > I've just tested v5.16-rc4 (sorry - just realised I previously > > > wrote > > > v5.15-rc3, it was v5.16-rc3...) and that was exactly the same. > > > > Just to clarify: you're saying that you're on v5.16-rc4 and that if > > you > > revert commit 3be98b2d5fbc ("net: dsa: Down cpu/dsa ports phylink > > will > > control"), the link works again? > > > > Correct > > > It is a bit strange that the external ports negotiate at 10Mbps/Full, > > is that the link speed you intend the ports to work at? > > Yes, that's 100% intentional due to what's connected to to those ports > and the environment it works in. > > Martyn Just out of curiosity, can you try this change? It looks like a simple case of mismatched conditions inside the mv88e6xxx driver between what is supposed to force the link down and what is supposed to force it up: diff --git a/net/dsa/port.c b/net/dsa/port.c index 20f183213cbc..d235270babf7 100644 --- a/net/dsa/port.c +++ b/net/dsa/port.c @@ -1221,7 +1221,7 @@ int dsa_port_link_register_of(struct dsa_port *dp) if (of_phy_is_fixed_link(dp->dn) || phy_np) { if (ds->ops->phylink_mac_link_down) ds->ops->phylink_mac_link_down(ds, port, - MLO_AN_FIXED, PHY_INTERFACE_MODE_NA); + MLO_AN_PHY, PHY_INTERFACE_MODE_NA); return dsa_port_phylink_register(dp); } return 0; -- 2.25.1 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 18:50 ` Vladimir Oltean @ 2021-12-06 19:24 ` Martyn Welch 2021-12-06 19:37 ` Vladimir Oltean 0 siblings, 1 reply; 34+ messages in thread From: Martyn Welch @ 2021-12-06 19:24 UTC (permalink / raw) To: Vladimir Oltean Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel, Russell King On Mon, 2021-12-06 at 20:50 +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 06:37:44PM +0000, Martyn Welch wrote: > > On Mon, 2021-12-06 at 20:31 +0200, Vladimir Oltean wrote: > > > On Mon, Dec 06, 2021 at 06:26:25PM +0000, Martyn Welch wrote: > > > > On Mon, 2021-12-06 at 17:44 +0000, Martyn Welch wrote: > > > > > On Fri, 2021-12-03 at 17:25 +0100, Andrew Lunn wrote: > > > > > > > Hi Andrew, > > > > > > > > > > > > Adding Russell to Cc: > > > > > > > > > > > > > I'm currently in the process of updating the GE B850v3 > > > > > > > [1] to > > > > > > > run > > > > > > > a > > > > > > > newer kernel than the one it's currently running. > > > > > > > > > > > > Which kernel exactly. We like bug reports against net-next, > > > > > > or > > > > > > at > > > > > > least the last -rc. > > > > > > > > > > > > > > > > I tested using v5.15-rc3 and that was also affected. > > > > > > > > > > > > > I've just tested v5.16-rc4 (sorry - just realised I previously > > > > wrote > > > > v5.15-rc3, it was v5.16-rc3...) and that was exactly the same. > > > > > > Just to clarify: you're saying that you're on v5.16-rc4 and that > > > if > > > you > > > revert commit 3be98b2d5fbc ("net: dsa: Down cpu/dsa ports phylink > > > will > > > control"), the link works again? > > > > > > > Correct > > > > > It is a bit strange that the external ports negotiate at > > > 10Mbps/Full, > > > is that the link speed you intend the ports to work at? > > > > Yes, that's 100% intentional due to what's connected to to those > > ports > > and the environment it works in. > > > > Martyn > > Just out of curiosity, can you try this change? It looks like a > simple > case of mismatched conditions inside the mv88e6xxx driver between > what > is supposed to force the link down and what is supposed to force it > up: > > diff --git a/net/dsa/port.c b/net/dsa/port.c > index 20f183213cbc..d235270babf7 100644 > --- a/net/dsa/port.c > +++ b/net/dsa/port.c > @@ -1221,7 +1221,7 @@ int dsa_port_link_register_of(struct dsa_port > *dp) > if (of_phy_is_fixed_link(dp->dn) || phy_np) { > if (ds->ops->phylink_mac_link_down) > ds->ops->phylink_mac_link_down(ds, > port, > - MLO_AN_FIXED, > PHY_INTERFACE_MODE_NA); > + MLO_AN_PHY, > PHY_INTERFACE_MODE_NA); > return dsa_port_phylink_register(dp); > } > return 0; Yes, that appears to also make it work. Martyn ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 19:24 ` Martyn Welch @ 2021-12-06 19:37 ` Vladimir Oltean 2021-12-06 19:53 ` Andrew Lunn 2021-12-06 20:07 ` Russell King (Oracle) 0 siblings, 2 replies; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 19:37 UTC (permalink / raw) To: Martyn Welch Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel, Russell King On Mon, Dec 06, 2021 at 07:24:56PM +0000, Martyn Welch wrote: > On Mon, 2021-12-06 at 20:50 +0200, Vladimir Oltean wrote: > > On Mon, Dec 06, 2021 at 06:37:44PM +0000, Martyn Welch wrote: > > > On Mon, 2021-12-06 at 20:31 +0200, Vladimir Oltean wrote: > > > > On Mon, Dec 06, 2021 at 06:26:25PM +0000, Martyn Welch wrote: > > > > > On Mon, 2021-12-06 at 17:44 +0000, Martyn Welch wrote: > > > > > > On Fri, 2021-12-03 at 17:25 +0100, Andrew Lunn wrote: > > > > > > > > Hi Andrew, > > > > > > > > > > > > > > Adding Russell to Cc: > > > > > > > > > > > > > > > I'm currently in the process of updating the GE B850v3 > > > > > > > > [1] to > > > > > > > > run > > > > > > > > a > > > > > > > > newer kernel than the one it's currently running. > > > > > > > > > > > > > > Which kernel exactly. We like bug reports against net-next, > > > > > > > or > > > > > > > at > > > > > > > least the last -rc. > > > > > > > > > > > > > > > > > > > I tested using v5.15-rc3 and that was also affected. > > > > > > > > > > > > > > > > I've just tested v5.16-rc4 (sorry - just realised I previously > > > > > wrote > > > > > v5.15-rc3, it was v5.16-rc3...) and that was exactly the same. > > > > > > > > Just to clarify: you're saying that you're on v5.16-rc4 and that > > > > if > > > > you > > > > revert commit 3be98b2d5fbc ("net: dsa: Down cpu/dsa ports phylink > > > > will > > > > control"), the link works again? > > > > > > > > > > Correct > > > > > > > It is a bit strange that the external ports negotiate at > > > > 10Mbps/Full, > > > > is that the link speed you intend the ports to work at? > > > > > > Yes, that's 100% intentional due to what's connected to to those > > > ports > > > and the environment it works in. > > > > > > Martyn > > > > Just out of curiosity, can you try this change? It looks like a > > simple > > case of mismatched conditions inside the mv88e6xxx driver between > > what > > is supposed to force the link down and what is supposed to force it > > up: > > > > diff --git a/net/dsa/port.c b/net/dsa/port.c > > index 20f183213cbc..d235270babf7 100644 > > --- a/net/dsa/port.c > > +++ b/net/dsa/port.c > > @@ -1221,7 +1221,7 @@ int dsa_port_link_register_of(struct dsa_port > > *dp) > > if (of_phy_is_fixed_link(dp->dn) || phy_np) { > > if (ds->ops->phylink_mac_link_down) > > ds->ops->phylink_mac_link_down(ds, port, > > - MLO_AN_FIXED, PHY_INTERFACE_MODE_NA); > > + MLO_AN_PHY, PHY_INTERFACE_MODE_NA); > > return dsa_port_phylink_register(dp); > > } > > return 0; > > Yes, that appears to also make it work. > > Martyn Well, I just pointed out what the problem is, I don't know how to solve it, honest! :) It's clear that the code is wrong, because it's in an "if" block that checks for "of_phy_is_fixed_link(dp->dn) || phy_np" but then it omits the "phy_np" part of it. On the other hand we can't just go ahead and say "if (phy_np) mode = MLO_AN_PHY; else mode = MLO_AN_FIXED;" because MLO_AN_INBAND is also a valid option that we may be omitting. So we'd have to duplicate part of the logic from phylink_parse_mode(), which does not appear ideal at all. What would be ideal is if this fabricated phylink call would not be done at all, but I don't know enough about the systems that need it, I expect Andrew knows more. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 19:37 ` Vladimir Oltean @ 2021-12-06 19:53 ` Andrew Lunn 2021-12-06 20:01 ` Vladimir Oltean 2021-12-06 20:07 ` Russell King (Oracle) 1 sibling, 1 reply; 34+ messages in thread From: Andrew Lunn @ 2021-12-06 19:53 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel, Russell King > > > Just out of curiosity, can you try this change? It looks like a > > > simple > > > case of mismatched conditions inside the mv88e6xxx driver between > > > what > > > is supposed to force the link down and what is supposed to force it > > > up: > > > > > > diff --git a/net/dsa/port.c b/net/dsa/port.c > > > index 20f183213cbc..d235270babf7 100644 > > > --- a/net/dsa/port.c > > > +++ b/net/dsa/port.c > > > @@ -1221,7 +1221,7 @@ int dsa_port_link_register_of(struct dsa_port > > > *dp) > > > if (of_phy_is_fixed_link(dp->dn) || phy_np) { > > > if (ds->ops->phylink_mac_link_down) > > > ds->ops->phylink_mac_link_down(ds, port, > > > - MLO_AN_FIXED, PHY_INTERFACE_MODE_NA); > > > + MLO_AN_PHY, PHY_INTERFACE_MODE_NA); > > > return dsa_port_phylink_register(dp); > > > } > > > return 0; > > > > Yes, that appears to also make it work. > > > > Martyn > > Well, I just pointed out what the problem is, I don't know how to solve > it, honest! :) > > It's clear that the code is wrong, because it's in an "if" block that > checks for "of_phy_is_fixed_link(dp->dn) || phy_np" but then it omits > the "phy_np" part of it. On the other hand we can't just go ahead and > say "if (phy_np) mode = MLO_AN_PHY; else mode = MLO_AN_FIXED;" because > MLO_AN_INBAND is also a valid option that we may be omitting. So we'd > have to duplicate part of the logic from phylink_parse_mode(), which > does not appear ideal at all. What would be ideal is if this fabricated > phylink call would not be done at all, but I don't know enough about the > systems that need it, I expect Andrew knows more. phylink assumes interfaces start in the down state. It will then configure them and bring them up as needed. This is not always true with mv88e6xxx, the interface can already by up, and then the hardware and phylink have different state information. So this call was added to force the link down before phylink took control of it. So i suspect something is missing when phylink sometime later does bring the interface up. It is not fully undoing what this down does. Maybe enable the dev_dbg() in mv88e6xxx_port_set_link() and see what value it has in both the good and bad case? Andrew ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 19:53 ` Andrew Lunn @ 2021-12-06 20:01 ` Vladimir Oltean 2021-12-06 20:18 ` Russell King (Oracle) 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 20:01 UTC (permalink / raw) To: Andrew Lunn Cc: Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel, Russell King On Mon, Dec 06, 2021 at 08:53:46PM +0100, Andrew Lunn wrote: > > > > Just out of curiosity, can you try this change? It looks like a > > > > simple > > > > case of mismatched conditions inside the mv88e6xxx driver between > > > > what > > > > is supposed to force the link down and what is supposed to force it > > > > up: > > > > > > > > diff --git a/net/dsa/port.c b/net/dsa/port.c > > > > index 20f183213cbc..d235270babf7 100644 > > > > --- a/net/dsa/port.c > > > > +++ b/net/dsa/port.c > > > > @@ -1221,7 +1221,7 @@ int dsa_port_link_register_of(struct dsa_port > > > > *dp) > > > > if (of_phy_is_fixed_link(dp->dn) || phy_np) { > > > > if (ds->ops->phylink_mac_link_down) > > > > ds->ops->phylink_mac_link_down(ds, port, > > > > - MLO_AN_FIXED, PHY_INTERFACE_MODE_NA); > > > > + MLO_AN_PHY, PHY_INTERFACE_MODE_NA); > > > > return dsa_port_phylink_register(dp); > > > > } > > > > return 0; > > > > > > Yes, that appears to also make it work. > > > > > > Martyn > > > > Well, I just pointed out what the problem is, I don't know how to solve > > it, honest! :) > > > > It's clear that the code is wrong, because it's in an "if" block that > > checks for "of_phy_is_fixed_link(dp->dn) || phy_np" but then it omits > > the "phy_np" part of it. On the other hand we can't just go ahead and > > say "if (phy_np) mode = MLO_AN_PHY; else mode = MLO_AN_FIXED;" because > > MLO_AN_INBAND is also a valid option that we may be omitting. So we'd > > have to duplicate part of the logic from phylink_parse_mode(), which > > does not appear ideal at all. What would be ideal is if this fabricated > > phylink call would not be done at all, but I don't know enough about the > > systems that need it, I expect Andrew knows more. > > phylink assumes interfaces start in the down state. It will then > configure them and bring them up as needed. This is not always true > with mv88e6xxx, the interface can already by up, and then the hardware > and phylink have different state information. So this call was added > to force the link down before phylink took control of it. > > So i suspect something is missing when phylink sometime later does > bring the interface up. It is not fully undoing what this down > does. Maybe enable the dev_dbg() in mv88e6xxx_port_set_link() and see > what value it has in both the good and bad case? Andrew, here is mv88e6xxx_mac_link_down(): if (((!mv88e6xxx_phy_is_internal(ds, port) && !mv88e6xxx_port_ppu_updates(chip, port)) || mode == MLO_AN_FIXED) && ops->port_sync_link) err = ops->port_sync_link(chip, port, mode, false); and here is mv88e6xxx_mac_link_up(): if ((!mv88e6xxx_phy_is_internal(ds, port) && !mv88e6xxx_port_ppu_updates(chip, port)) || mode == MLO_AN_FIXED) { (...) if (ops->port_sync_link) err = ops->port_sync_link(chip, port, mode, true); This is the CPU port from Martyn's device tree: port@4 { reg = <4>; label = "cpu"; ethernet = <&switch_nic>; phy-handle = <&switchphy4>; }; It has an internal PHY, so mv88e6xxx_phy_is_internal() will return true. True negated is false, so the AND with the other PPU condition is always false. BUT: the logic is: "force the link IF it doesn't have an internal PHY OR it is a fixed link". DSA fabricates a mv88e6xxx_mac_link_down call with MLO_AN_FIXED. So ->port_sync_link is called with false even if the PHY is internal, due to the right hand operand to the || operator. Then the real phylink, not the impersonator, comes along and calls mv88e6xxx_mac_link_up with MLO_AN_PHY. The same check is now not satisfied, because the input data has changed! If we're going to impersonate phylink we could at least provide the same arguments as phylink will. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 20:01 ` Vladimir Oltean @ 2021-12-06 20:18 ` Russell King (Oracle) 2021-12-06 20:29 ` Vladimir Oltean 2021-12-06 21:44 ` Vladimir Oltean 0 siblings, 2 replies; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-06 20:18 UTC (permalink / raw) To: Vladimir Oltean Cc: Andrew Lunn, Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 10:01:11PM +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 08:53:46PM +0100, Andrew Lunn wrote: > > So i suspect something is missing when phylink sometime later does > > bring the interface up. It is not fully undoing what this down > > does. Maybe enable the dev_dbg() in mv88e6xxx_port_set_link() and see > > what value it has in both the good and bad case? > > Andrew, here is mv88e6xxx_mac_link_down(): > > if (((!mv88e6xxx_phy_is_internal(ds, port) && > !mv88e6xxx_port_ppu_updates(chip, port)) || > mode == MLO_AN_FIXED) && ops->port_sync_link) > err = ops->port_sync_link(chip, port, mode, false); > > and here is mv88e6xxx_mac_link_up(): > > if ((!mv88e6xxx_phy_is_internal(ds, port) && > !mv88e6xxx_port_ppu_updates(chip, port)) || > mode == MLO_AN_FIXED) { > (...) > if (ops->port_sync_link) > err = ops->port_sync_link(chip, port, mode, true); > > This is the CPU port from Martyn's device tree: > > port@4 { > reg = <4>; > label = "cpu"; > ethernet = <&switch_nic>; > phy-handle = <&switchphy4>; > }; > > It has an internal PHY, so mv88e6xxx_phy_is_internal() will return true. > True negated is false, so the AND with the other PPU condition is always > false. BUT: the logic is: "force the link IF it doesn't have an internal > PHY OR it is a fixed link". > > DSA fabricates a mv88e6xxx_mac_link_down call with MLO_AN_FIXED. So > ->port_sync_link is called with false even if the PHY is internal, due > to the right hand operand to the || operator. > > Then the real phylink, not the impersonator, comes along and calls > mv88e6xxx_mac_link_up with MLO_AN_PHY. The same check is now not > satisfied, because the input data has changed! > > If we're going to impersonate phylink we could at least provide the same > arguments as phylink will. What is going on here in terms of impersonation is entirely reasonable. The only things in this respect that phylink guarantees are: 1) The MAC/PCS configuration will not be substantially reconfigured unless a call to mac_link_down() was made if a call to mac_link_up() was previously made. 2) The arguments to mac_link_down() will be the same as the preceeding mac_link_up() call - in other words, the "mode" and "interface". Phylink does *not* guarantee that a call to mac_link_up() or mac_config() will have the same "mode" as a preceeding call to mac_link_down(), in the same way that "interface" is not guaranteed. This has been true for as long as we've had SFPs that need to switch between MLO_AN_INBAND and MLO_AN_PHY - e.g. because the PHY doesn't supply in-band information. So, this has uncovered a latent bug in the Marvell DSA code - and that is that mac_config() needs to take care of the forcing state after completing its configuration as I suggested in my previous reply. There is also the question whether the automatic fetching of PHY status information by the hardware should be regarded as a form of in-band by phylink, even though it isn't true in-band - but from the software point of view, the PPU's automatic fetching is not materially different from what happens with SGMII. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 20:18 ` Russell King (Oracle) @ 2021-12-06 20:29 ` Vladimir Oltean 2021-12-07 14:09 ` Andrew Lunn 2021-12-06 21:44 ` Vladimir Oltean 1 sibling, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 20:29 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 08:18:30PM +0000, Russell King (Oracle) wrote: > Phylink does *not* guarantee that a call to mac_link_up() or > mac_config() will have the same "mode" as a preceeding call to > mac_link_down(), in the same way that "interface" is not guaranteed. > This has been true for as long as we've had SFPs that need to switch > between MLO_AN_INBAND and MLO_AN_PHY - e.g. because the PHY doesn't > supply in-band information. > > So, this has uncovered a latent bug in the Marvell DSA code - and > that is that mac_config() needs to take care of the forcing state > after completing its configuration as I suggested in my previous > reply. I can understand between MLO_AN_INBAND and MLO_AN_PHY, but isn't it reasonable that a "fixed" link is "fixed" and doesn't change? After all, it's in the name. The mv88e6xxx code doesn't appear necessarily problematic. This issue could crop up again and again with other drivers. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 20:29 ` Vladimir Oltean @ 2021-12-07 14:09 ` Andrew Lunn 0 siblings, 0 replies; 34+ messages in thread From: Andrew Lunn @ 2021-12-07 14:09 UTC (permalink / raw) To: Vladimir Oltean Cc: Russell King (Oracle), Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel > I can understand between MLO_AN_INBAND and MLO_AN_PHY, but isn't it > reasonable that a "fixed" link is "fixed" and doesn't change? Actually, it can. You can register a callback with fixed_phy_set_link_update() and it gets called on each mdio read. It is mainly there to change the link up/down status, but you can also change the speed. Andrew ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 20:18 ` Russell King (Oracle) 2021-12-06 20:29 ` Vladimir Oltean @ 2021-12-06 21:44 ` Vladimir Oltean 2021-12-06 22:13 ` Russell King (Oracle) 1 sibling, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 21:44 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 08:18:30PM +0000, Russell King (Oracle) wrote: > > If we're going to impersonate phylink we could at least provide the same > > arguments as phylink will. > > What is going on here in terms of impersonation is entirely reasonable. > > The only things in this respect that phylink guarantees are: > > 1) The MAC/PCS configuration will not be substantially reconfigured > unless a call to mac_link_down() was made if a call to mac_link_up() > was previously made. The wording here is unclear. Did you mean "When the MAC/PCS configuration is substantially reconfigured and the last call was a mac_link_up(), a follow-up call to mac_link_down() will also be made"? And what do you mean by "substantially reconfigured"? phylink_major_config called from the paths that aren't phylink_mac_initial_config (because that happens with no preceding call to either mac_link_down or mac_link_up), right? > 2) The arguments to mac_link_down() will be the same as the preceeding > mac_link_up() call - in other words, the "mode" and "interface". Does this imply that "there will always be a preceding mac_link_up to every mac_link_down call"? Because if it does imply that, DSA violates it. > Phylink does *not* guarantee that a call to mac_link_up() or > mac_config() will have the same "mode" as a preceeding call to > mac_link_down(), in the same way that "interface" is not guaranteed. > This has been true for as long as we've had SFPs that need to switch > between MLO_AN_INBAND and MLO_AN_PHY - e.g. because the PHY doesn't > supply in-band information. > > So, this has uncovered a latent bug in the Marvell DSA code - and > that is that mac_config() needs to take care of the forcing state > after completing its configuration as I suggested in my previous > reply. > > There is also the question whether the automatic fetching of PHY > status information by the hardware should be regarded as a form of > in-band by phylink, even though it isn't true in-band - but from > the software point of view, the PPU's automatic fetching is not > materially different from what happens with SGMII. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 21:44 ` Vladimir Oltean @ 2021-12-06 22:13 ` Russell King (Oracle) 0 siblings, 0 replies; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-06 22:13 UTC (permalink / raw) To: Vladimir Oltean Cc: Andrew Lunn, Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 11:44:28PM +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 08:18:30PM +0000, Russell King (Oracle) wrote: > > > If we're going to impersonate phylink we could at least provide the same > > > arguments as phylink will. > > > > What is going on here in terms of impersonation is entirely reasonable. > > > > The only things in this respect that phylink guarantees are: > > > > 1) The MAC/PCS configuration will not be substantially reconfigured > > unless a call to mac_link_down() was made if a call to mac_link_up() > > was previously made. > > The wording here is unclear. Did you mean "When the MAC/PCS configuration > is substantially reconfigured and the last call was a mac_link_up(), a > follow-up call to mac_link_down() will also be made"? > And what do you mean by "substantially reconfigured"? I mean what the documentation refers to as a "full initialisation" of the link, which happens if we have to change the interface mode or MLO_AN mode. Only minor changes (advertisements or pause modes if under manual control) will be changed without a prior call to mac_link_down(). For example, phylink will _never_ do: mac_link_up() mac_prepare() mac_config() pcs_config() mac_finish() mac_link_up() However, with legacy (pre-March 2020) users, it _may_ do: mac_link_up() mac_config() (e.g. changing the in-band advertisement) mac_an_restart() The problem here is the legacy stuff which clouds the picture by making extra calls to mac_config() when we're only changing things like the in-band advert. > phylink_major_config called from the paths that aren't phylink_mac_initial_config > (because that happens with no preceding call to either mac_link_down or > mac_link_up), right? Where we call phylink_major_config(), we ensure that if we know the link was previously up, we make a call to mac_link_down() first. > > 2) The arguments to mac_link_down() will be the same as the preceeding > > mac_link_up() call - in other words, the "mode" and "interface". > > Does this imply that "there will always be a preceding mac_link_up to > every mac_link_down call"? From the design of phylink, yes it does, since phylink assumes that the link is initially down. > Because if it does imply that, DSA violates it. Yes, DSA violates that, but DSA is free to do that if it makes sense, and from what I understand of DSA, this is a special case. From what I've seen with Marvell DSA, they start off auto-configuring the inter-switch and CPU ports in link-up mode, whereas phylink assumes that the link is down. When DSA sets stuff up and brings up the CPU port, the very first thing that phylink does is reconfigure the port - but the port is in link-up mode, and that can mess up the port. So Andrew introduced this to fix it. See commit 3be98b2d5fbc. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 19:37 ` Vladimir Oltean 2021-12-06 19:53 ` Andrew Lunn @ 2021-12-06 20:07 ` Russell King (Oracle) 2021-12-06 20:23 ` Vladimir Oltean 1 sibling, 1 reply; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-06 20:07 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 09:37:30PM +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 07:24:56PM +0000, Martyn Welch wrote: > > Yes, that appears to also make it work. > > > > Martyn > > Well, I just pointed out what the problem is, I don't know how to solve > it, honest! :) > > It's clear that the code is wrong, because it's in an "if" block that > checks for "of_phy_is_fixed_link(dp->dn) || phy_np" but then it omits > the "phy_np" part of it. On the other hand we can't just go ahead and > say "if (phy_np) mode = MLO_AN_PHY; else mode = MLO_AN_FIXED;" because > MLO_AN_INBAND is also a valid option that we may be omitting. So we'd > have to duplicate part of the logic from phylink_parse_mode(), which > does not appear ideal at all. What would be ideal is if this fabricated > phylink call would not be done at all, but I don't know enough about the > systems that need it, I expect Andrew knows more. It's needed because otherwise we end up reconfiguring a CPU link while it is still "up". Phylink's initial mode is "link down" and the phylink design is such that it guarantees that we will not call mac_link_down() unless the link was previously up. This is true of all network drivers, but was found to be false for some DSA drivers, and some DSA broke as a result. My conclusion from having read this thread is the CPU port is using PPU polling, meaning that in mac_link_up(): if ((!mv88e6xxx_phy_is_internal(ds, port) && !mv88e6xxx_port_ppu_updates(chip, port)) || mode == MLO_AN_FIXED) { is false - because mv88e6xxx_port_ppu_updates() returns true, and consequently we never undo this force-down. On Marvell hardware, the PPU updating effectively makes it appear as an in-band link as the hardware fetches the PHY status to the MAC by polling the PHY. What we do elsewhere is we handle the link-up-down-forcing in mac_config appropriate to the mode, and we already do that in Marvell DSA: /* Undo the forced down state above after completing configuration * irrespective of its state on entry, which allows the link to come up. */ if (mode == MLO_AN_INBAND && p->interface != state->interface && chip->info->ops->port_set_link) chip->info->ops->port_set_link(chip, port, LINK_UNFORCED); I'm thinking this needs set LINK_UNFORCED here if the PPU is polling and we're in MLO_AN_PHY mode: if (((mode == MLO_AN_INBAND && p->interface != state->interface) || (mode == MLO_AN_PHY && mv88e6xxx_port_ppu_updates(chip, port))) && chip->info->ops->port_set_link) chip->info->ops->port_set_link(chip, port, LINK_UNFORCED); What I don't know is whether that mv88e6xxx_port_ppu_updates() should be mv88e6xxx_phy_is_internal() || mv88e6xxx_port_ppu_updates(), at which point introducing a helper for that would probably be a good idea, or it might be simpler to use the new phylink_get_caps() ability to set phylink_config.ovr_an_inband for the internal/ppu polled ports if they're not operating in fixed link mode. This is essentially what is going on here - the PPU polling is just a way of replicating SGMII's propagation of the PHY status to the MAC. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 20:07 ` Russell King (Oracle) @ 2021-12-06 20:23 ` Vladimir Oltean 2021-12-06 20:51 ` Russell King (Oracle) 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 20:23 UTC (permalink / raw) To: Russell King (Oracle) Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 08:07:45PM +0000, Russell King (Oracle) wrote: > My conclusion from having read this thread is the CPU port is using PPU > polling, meaning that in mac_link_up(): > > if ((!mv88e6xxx_phy_is_internal(ds, port) && > !mv88e6xxx_port_ppu_updates(chip, port)) || > mode == MLO_AN_FIXED) { > > is false - because mv88e6xxx_port_ppu_updates() returns true, and > consequently we never undo this force-down. We know that 1. A == mv88e6xxx_phy_is_internal(ds, port), B == mv88e6xxx_port_ppu_updates(chip, port), C == mode == MLO_AN_FIXED 2. (!A && !B) || C == false. This is due to the effect we observe: link is not forced up 2. C == false. This is due to the device tree. 3. !A && !B == false. This is due to statement (2), plus the rule that if X || Y == false and Y == false, then X must also be false. 4. We know that A is true, again due to device tree: port 4 < .num_internal_phys for MV88E6240 which is 5. 5. !A is false, due to 4. So we have: false && !B == false. Therefore "!B" is "don't care". In other words we don't know whether mv88e6xxx_port_ppu_updates() is true or not. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 20:23 ` Vladimir Oltean @ 2021-12-06 20:51 ` Russell King (Oracle) 2021-12-06 21:13 ` Vladimir Oltean 0 siblings, 1 reply; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-06 20:51 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 10:23:08PM +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 08:07:45PM +0000, Russell King (Oracle) wrote: > > My conclusion from having read this thread is the CPU port is using PPU > > polling, meaning that in mac_link_up(): > > > > if ((!mv88e6xxx_phy_is_internal(ds, port) && > > !mv88e6xxx_port_ppu_updates(chip, port)) || > > mode == MLO_AN_FIXED) { > > > > is false - because mv88e6xxx_port_ppu_updates() returns true, and > > consequently we never undo this force-down. > > We know that > 1. A == mv88e6xxx_phy_is_internal(ds, port), B == mv88e6xxx_port_ppu_updates(chip, port), C == mode == MLO_AN_FIXED > 2. (!A && !B) || C == false. This is due to the effect we observe: link is not forced up > 2. C == false. This is due to the device tree. > 3. !A && !B == false. This is due to statement (2), plus the rule that if X || Y == false and Y == false, then X must also be false. > 4. We know that A is true, again due to device tree: port 4 < .num_internal_phys for MV88E6240 which is 5. > 5. !A is false, due to 4. > > So we have: > > false && !B == false. > > Therefore "!B" is "don't care". In other words we don't know whether > mv88e6xxx_port_ppu_updates() is true or not. With a bit of knowledge of how Marvell DSA switches work... The "ppu" is the PHY polling unit. When the switch comes out of reset, the PPU probes the MDIO bus, and sets the bit in the port status register depending on whether it detects a PHY at the port address by way of the PHY ID values. This bit is used to enable polling of the PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be set for all internal PHYs unless we explicitly turn it off (we don't.) Therefore, this is a reasonable assumption to make. So, given that mv88e6xxx_port_ppu_updates() is most likely true as I stated, it is also true that mv88e6xxx_phy_is_internal() is "don't care". -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 20:51 ` Russell King (Oracle) @ 2021-12-06 21:13 ` Vladimir Oltean 2021-12-06 21:27 ` Russell King (Oracle) 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 21:13 UTC (permalink / raw) To: Russell King (Oracle) Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 08:51:09PM +0000, Russell King (Oracle) wrote: > On Mon, Dec 06, 2021 at 10:23:08PM +0200, Vladimir Oltean wrote: > > On Mon, Dec 06, 2021 at 08:07:45PM +0000, Russell King (Oracle) wrote: > > > My conclusion from having read this thread is the CPU port is using PPU > > > polling, meaning that in mac_link_up(): > > > > > > if ((!mv88e6xxx_phy_is_internal(ds, port) && > > > !mv88e6xxx_port_ppu_updates(chip, port)) || > > > mode == MLO_AN_FIXED) { > > > > > > is false - because mv88e6xxx_port_ppu_updates() returns true, and > > > consequently we never undo this force-down. > > > > We know that > > 1. A == mv88e6xxx_phy_is_internal(ds, port), B == mv88e6xxx_port_ppu_updates(chip, port), C == mode == MLO_AN_FIXED > > 2. (!A && !B) || C == false. This is due to the effect we observe: link is not forced up > > 2. C == false. This is due to the device tree. > > 3. !A && !B == false. This is due to statement (2), plus the rule that if X || Y == false and Y == false, then X must also be false. > > 4. We know that A is true, again due to device tree: port 4 < .num_internal_phys for MV88E6240 which is 5. > > 5. !A is false, due to 4. > > > > So we have: > > > > false && !B == false. > > > > Therefore "!B" is "don't care". In other words we don't know whether > > mv88e6xxx_port_ppu_updates() is true or not. > > With a bit of knowledge of how Marvell DSA switches work... > > The "ppu" is the PHY polling unit. When the switch comes out of reset, > the PPU probes the MDIO bus, and sets the bit in the port status > register depending on whether it detects a PHY at the port address by > way of the PHY ID values. This bit is used to enable polling of the > PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be > set for all internal PHYs unless we explicitly turn it off (we don't.) > Therefore, this is a reasonable assumption to make. > > So, given that mv88e6xxx_port_ppu_updates() is most likely true as > I stated, it is also true that mv88e6xxx_phy_is_internal() is > "don't care". And the reason why you bring the PPU into the discussion is because? If the issue manifests itself with or without it, and you come up with a proposal to set LINK_UNFORCED in mv88e6xxx_mac_config if the PPU is used, doesn't that, logically speaking, still leave the issue unsolved if the PPU is _not_ used for whatever reason? The bug has nothing to do with the PPU. It can be solved by checking for PPU in-band status as you say. Maybe. But I've got no idea why we don't address the elephant in the room, which is in dsa_port_link_register_of()? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 21:13 ` Vladimir Oltean @ 2021-12-06 21:27 ` Russell King (Oracle) 2021-12-06 21:49 ` Russell King (Oracle) 2021-12-06 21:51 ` Vladimir Oltean 0 siblings, 2 replies; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-06 21:27 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 11:13:41PM +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 08:51:09PM +0000, Russell King (Oracle) wrote: > > With a bit of knowledge of how Marvell DSA switches work... > > > > The "ppu" is the PHY polling unit. When the switch comes out of reset, > > the PPU probes the MDIO bus, and sets the bit in the port status > > register depending on whether it detects a PHY at the port address by > > way of the PHY ID values. This bit is used to enable polling of the > > PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be > > set for all internal PHYs unless we explicitly turn it off (we don't.) > > Therefore, this is a reasonable assumption to make. > > > > So, given that mv88e6xxx_port_ppu_updates() is most likely true as > > I stated, it is also true that mv88e6xxx_phy_is_internal() is > > "don't care". > > And the reason why you bring the PPU into the discussion is because? > If the issue manifests itself with or without it, and you come up with a > proposal to set LINK_UNFORCED in mv88e6xxx_mac_config if the PPU is > used, doesn't that, logically speaking, still leave the issue unsolved > if the PPU is _not_ used for whatever reason? > The bug has nothing to do with the PPU. It can be solved by checking for > PPU in-band status as you say. Maybe. But I've got no idea why we don't > address the elephant in the room, which is in dsa_port_link_register_of()? I think I've covered that in the other sub-thread. It could be that a previous configuration left the port forced down. For example, if one were to kexec from one kernel that uses a fixed-link that forced the link down, into the same kernel with a different DT that uses PHY mode. The old kernel may have called mac_link_down(MLO_AN_FIXED), and the new kernel wouldn't know that. It comes along, and goes through the configuration process and calls mac_link_up(MLO_AN_PHY)... and from what you're suggesting, because these two calls use different MLO_AN_xxx constants that's a bug. An alternative: the hardware boots up with the link forced down. The boot loader doesn't touch it. The kernel boots and calls mac_link_up(MLO_AN_PHY). This all works as expected with e.g. mvneta. It doesn't work with Marvell DSA because we have all these additional extra exceptional cases to deal with the PPU (which is what _actually_ transfers the PHY status to the port registers for all PHYs.) We used to just rely on the PPU bit for making the decision, but when I introduced that helper, I forgot that the PPU bit doesn't exist on the 6250 family, which resulted in commit 4a3e0aeddf09. Looking at 4a3e0aeddf09, I now believe the fix there to be wrong. It should have made mv88e6xxx_port_ppu_updates() follow mv88e6xxx_phy_is_internal() for internal ports only for the 6250 family that has the link status bit in that position, especially as one can disable the PPU bit in DSA switches such as 6390, which for some ports stops the PHY being used and switches the port to serdes mode. "Internal" ports aren't always internal on these switches. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 21:27 ` Russell King (Oracle) @ 2021-12-06 21:49 ` Russell King (Oracle) 2021-12-06 23:27 ` Vladimir Oltean 2021-12-06 21:51 ` Vladimir Oltean 1 sibling, 1 reply; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-06 21:49 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote: > We used to just rely on the PPU bit for making the decision, but when > I introduced that helper, I forgot that the PPU bit doesn't exist on > the 6250 family, which resulted in commit 4a3e0aeddf09. Looking at > 4a3e0aeddf09, I now believe the fix there to be wrong. It should > have made mv88e6xxx_port_ppu_updates() follow > mv88e6xxx_phy_is_internal() for internal ports only for the 6250 family > that has the link status bit in that position, especially as one can > disable the PPU bit in DSA switches such as 6390, which for some ports > stops the PHY being used and switches the port to serdes mode. > "Internal" ports aren't always internal on these switches. Here's the situation I'm concerned about. The 88E6390X has two serdes each with four lanes. Let's just think about one serdes. Lane 0 is assigned to port 9 and lane 1 to port 4. We don't need to consider any others. If the PHY_DETECT bit (effectively PPU poll enable) is set for port 4, which is an "internal" port, then the port is in auto-media mode, and the PPU will poll the internal PHY and the serdes, and configure according to which has link. If the PPU bit is clear, then the port is forced to serdes mode. However, in this configuration, we end up with: mv88e6xxx_phy_is_internal(ds, port) = true mv88e6xxx_port_ppu_updates(chip, port) = false which results in: if ((!mv88e6xxx_phy_is_internal(ds, port) && !mv88e6xxx_port_ppu_updates(chip, port)) || mode == MLO_AN_FIXED) { being false since we have (!true && !false) || false. So, in actual fact, when we have a PHY_DETECT bit, we _do_ need to take note of it whether the port is "internal" or not. Essentially, that means that for DSA switches that are not part of the 6250, we should be using the PHY_DETECT bit. For the 6250 family, the problem is that there's no PHY_DETECT bit, and that's the link status. So I've started a separate discussion with Maarten to find out which Marvell switch is being used and whether an alterative approach would work for him. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 21:49 ` Russell King (Oracle) @ 2021-12-06 23:27 ` Vladimir Oltean 2021-12-07 0:58 ` Russell King (Oracle) 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 23:27 UTC (permalink / raw) To: Russell King (Oracle) Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 09:49:37PM +0000, Russell King (Oracle) wrote: > On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote: > > We used to just rely on the PPU bit for making the decision, but when > > I introduced that helper, I forgot that the PPU bit doesn't exist on > > the 6250 family, which resulted in commit 4a3e0aeddf09. Looking at > > 4a3e0aeddf09, I now believe the fix there to be wrong. It should > > have made mv88e6xxx_port_ppu_updates() follow > > mv88e6xxx_phy_is_internal() for internal ports only for the 6250 family > > that has the link status bit in that position, especially as one can > > disable the PPU bit in DSA switches such as 6390, which for some ports > > stops the PHY being used and switches the port to serdes mode. > > "Internal" ports aren't always internal on these switches. > > Here's the situation I'm concerned about. The 88E6390X has two serdes > each with four lanes. Let's just think about one serdes. Lane 0 is > assigned to port 9 and lane 1 to port 4. We don't need to consider > any others. > > If the PHY_DETECT bit (effectively PPU poll enable) is set for port 4, > which is an "internal" port, then the port is in auto-media mode, and > the PPU will poll the internal PHY and the serdes, and configure > according to which has link. > > If the PPU bit is clear, then the port is forced to serdes mode. > However, in this configuration, we end up with: > > mv88e6xxx_phy_is_internal(ds, port) = true > mv88e6xxx_port_ppu_updates(chip, port) = false > > which results in: > > if ((!mv88e6xxx_phy_is_internal(ds, port) && > !mv88e6xxx_port_ppu_updates(chip, port)) || > mode == MLO_AN_FIXED) { > > being false since we have (!true && !false) || false. So, in actual > fact, when we have a PHY_DETECT bit, we _do_ need to take note of it > whether the port is "internal" or not. Essentially, that means that > for DSA switches that are not part of the 6250, we should be using > the PHY_DETECT bit. > > For the 6250 family, the problem is that there's no PHY_DETECT bit, > and that's the link status. So I've started a separate discussion > with Maarten to find out which Marvell switch is being used and > whether an alterative approach would work for him. I hope that you understand that it is getting very difficult for me to follow, especially when faced with contradictory information about hardware that I am not familiar with, and which may well be very different from one family to another for all I know. Commit 4a3e0aeddf09 ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") says nothing about the 6250, and I have nothing through which I can cross-check your statement about the change only being applicable/needed for 6250. The documentation I happen to have access to, which is for the 6097, says about "Forcing Link, Speed, Duplex in the MAC": | These bits change the port's MAC mode only! It does not change the mode | of the PHY for ports where a PHY is connected. These bits are intended | to be used for the following situations only: | | - When the PHY Polling Unit (PPU) is disabled on the port (PHYDetect | equal to zero in Port offset 0x00) and software needs to copy the | PHY’s Link, Speed and Duplex values to the port’s MAC (this is not | required for internal PHYs as this information is communicated between | the PHY and MAC even if the PPU is disabled on the port). | | - When no PHY is connected to the port. This includes ports that connect | to a CPU (typically using a digital interface like MII or GMII) and | ports connected to another switch device (typically using a SERDES | interface). SERDES ports connected to a fiber module will get their | Link from the port’s SDET pin and its Speed and Duplex is set to | 1000BASE full-duplex (assuming the Px_MODE has been set correctly – | see the C_Mode bits, Port offset 0x00). So the first paragraph is pretty clear to me: the PPU could be disabled (PHY_DETECT bit unset) and yet there would still be no reason to force the link for internal PHY ports. So the change still makes some level of sense to me, even if not for the cited reason from the commit message. As for your auto-media comment, you say that on 6390X, port 4 is an auto-media port only if the PPU is enabled - otherwise it falls back to SERDES mode and not to internal PHY mode. Again, no way to cross-check, but so be it. The problem that you cite here is that we are in SERDES mode with PPU disabled, and that we should* (should we?) force the link, yet we don't, because the mv88e6xxx_phy_is_internal() function only conveys static information that doesn't properly reflect the current state of an auto-media port. The question is: did this use to work properly before the commit 4a3e0aeddf09 ("net: dsa: mv88e6xxx: don't use PHY_DETECT on internal PHY's") that you mention? This is the same as asking: if the PPU is disabled on an auto media port, the old code (via your commit 5d5b231da7ac ("net: dsa: mv88e6xxx: use PHY_DETECT in mac_link_up/mac_link_down")) would always force the link. Is that ok for an internal PHY port? Is it ok for a fiber port with clause 37 in-band autoneg? More below. * would it not matter whether this SERDES port is used in SGMII vs 1000base-X mode? According to the documentation I have access to, only SGMII would need forcing without a PPU - see second quoted paragraph. Anyway, so be it. Essentially, what is the most frustrating is that I haven't been doing anything else for the past 4 hours, and I still haven't understood enough of what you're saying to even understand why it is _relevant_ to Martyn's report. All I understand is that you're also looking in that area of the code for a completely unrelated reason which you've found adequate to mention here and now. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 23:27 ` Vladimir Oltean @ 2021-12-07 0:58 ` Russell King (Oracle) 2021-12-07 13:24 ` Vladimir Oltean 0 siblings, 1 reply; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-07 0:58 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Tue, Dec 07, 2021 at 01:27:35AM +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 09:49:37PM +0000, Russell King (Oracle) wrote: > > On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote: > > > We used to just rely on the PPU bit for making the decision, but when > > > I introduced that helper, I forgot that the PPU bit doesn't exist on > > > the 6250 family, which resulted in commit 4a3e0aeddf09. Looking at > > > 4a3e0aeddf09, I now believe the fix there to be wrong. It should > > > have made mv88e6xxx_port_ppu_updates() follow > > > mv88e6xxx_phy_is_internal() for internal ports only for the 6250 family > > > that has the link status bit in that position, especially as one can > > > disable the PPU bit in DSA switches such as 6390, which for some ports > > > stops the PHY being used and switches the port to serdes mode. > > > "Internal" ports aren't always internal on these switches. > > > > Here's the situation I'm concerned about. The 88E6390X has two serdes > > each with four lanes. Let's just think about one serdes. Lane 0 is > > assigned to port 9 and lane 1 to port 4. We don't need to consider > > any others. > > > > If the PHY_DETECT bit (effectively PPU poll enable) is set for port 4, > > which is an "internal" port, then the port is in auto-media mode, and > > the PPU will poll the internal PHY and the serdes, and configure > > according to which has link. > > > > If the PPU bit is clear, then the port is forced to serdes mode. > > However, in this configuration, we end up with: > > > > mv88e6xxx_phy_is_internal(ds, port) = true > > mv88e6xxx_port_ppu_updates(chip, port) = false > > > > which results in: > > > > if ((!mv88e6xxx_phy_is_internal(ds, port) && > > !mv88e6xxx_port_ppu_updates(chip, port)) || > > mode == MLO_AN_FIXED) { > > > > being false since we have (!true && !false) || false. So, in actual > > fact, when we have a PHY_DETECT bit, we _do_ need to take note of it > > whether the port is "internal" or not. Essentially, that means that > > for DSA switches that are not part of the 6250, we should be using > > the PHY_DETECT bit. > > > > For the 6250 family, the problem is that there's no PHY_DETECT bit, > > and that's the link status. So I've started a separate discussion > > with Maarten to find out which Marvell switch is being used and > > whether an alterative approach would work for him. > > I hope that you understand that it is getting very difficult for me to > follow, especially when faced with contradictory information about > hardware that I am not familiar with, and which may well be very > different from one family to another for all I know. > > Commit 4a3e0aeddf09 ("net: dsa: mv88e6xxx: don't use PHY_DETECT on > internal PHY's") says nothing about the 6250, and I have nothing through > which I can cross-check your statement about the change only being > applicable/needed for 6250. > > The documentation I happen to have access to, which is for the 6097, > says about "Forcing Link, Speed, Duplex in the MAC": > > | These bits change the port's MAC mode only! It does not change the mode > | of the PHY for ports where a PHY is connected. These bits are intended > | to be used for the following situations only: > | > | - When the PHY Polling Unit (PPU) is disabled on the port (PHYDetect > | equal to zero in Port offset 0x00) and software needs to copy the > | PHY’s Link, Speed and Duplex values to the port’s MAC (this is not > | required for internal PHYs as this information is communicated between > | the PHY and MAC even if the PPU is disabled on the port). > | > | - When no PHY is connected to the port. This includes ports that connect > | to a CPU (typically using a digital interface like MII or GMII) and > | ports connected to another switch device (typically using a SERDES > | interface). SERDES ports connected to a fiber module will get their > | Link from the port’s SDET pin and its Speed and Duplex is set to > | 1000BASE full-duplex (assuming the Px_MODE has been set correctly – > | see the C_Mode bits, Port offset 0x00). > > So the first paragraph is pretty clear to me: the PPU could be disabled > (PHY_DETECT bit unset) and yet there would still be no reason to force > the link for internal PHY ports. So the change still makes some level of > sense to me, even if not for the cited reason from the commit message. > > As for your auto-media comment, you say that on 6390X, port 4 is an > auto-media port only if the PPU is enabled - otherwise it falls back to > SERDES mode and not to internal PHY mode. Again, no way to cross-check, > but so be it. The problem that you cite here is that we are in SERDES > mode with PPU disabled, and that we should* (should we?) force the link, > yet we don't, because the mv88e6xxx_phy_is_internal() function only > conveys static information that doesn't properly reflect the current > state of an auto-media port. The question is: did this use to work > properly before the commit 4a3e0aeddf09 ("net: dsa: mv88e6xxx: don't use > PHY_DETECT on internal PHY's") that you mention? This is the same as > asking: if the PPU is disabled on an auto media port, the old code (via > your commit 5d5b231da7ac ("net: dsa: mv88e6xxx: use PHY_DETECT in > mac_link_up/mac_link_down")) would always force the link. Is that ok for > an internal PHY port? Is it ok for a fiber port with clause 37 in-band > autoneg? More below. > > * would it not matter whether this SERDES port is used in SGMII vs > 1000base-X mode? According to the documentation I have access to, only > SGMII would need forcing without a PPU - see second quoted paragraph. > > Anyway, so be it. Essentially, what is the most frustrating is that I > haven't been doing anything else for the past 4 hours, and I still > haven't understood enough of what you're saying to even understand why > it is _relevant_ to Martyn's report. All I understand is that you're > also looking in that area of the code for a completely unrelated reason > which you've found adequate to mention here and now. I hope you realise that _your_ attitude here is frustrating and inflamatory. This is _not_ a "completely unrelated reason" - this is about getting the right solution to Martyn's problem. I thought about doing another detailed reply, but when I got to the part about you wanting to check 6390X, I discarded that reply and wrote this one instead. You clearly have a total distrust for everything that I write in any email, so I just won't bother. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-07 0:58 ` Russell King (Oracle) @ 2021-12-07 13:24 ` Vladimir Oltean 2021-12-07 13:59 ` Russell King (Oracle) 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-07 13:24 UTC (permalink / raw) To: Russell King (Oracle) Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Tue, Dec 07, 2021 at 12:58:20AM +0000, Russell King (Oracle) wrote: > > Anyway, so be it. Essentially, what is the most frustrating is that I > > haven't been doing anything else for the past 4 hours, and I still > > haven't understood enough of what you're saying to even understand why > > it is _relevant_ to Martyn's report. All I understand is that you're > > also looking in that area of the code for a completely unrelated reason > > which you've found adequate to mention here and now. > > I hope you realise that _your_ attitude here is frustrating and > inflamatory. This is _not_ a "completely unrelated reason" - this > is about getting the right solution to Martyn's problem. I thought > about doing another detailed reply, but when I got to the part > about you wanting to check 6390X, I discarded that reply and > wrote this one instead. You clearly have a total distrust for > everything that I write in any email, so I just won't bother. I think getting the right solution to Martyn's problem is the most important part. I'd be very happy if I understood it too, although that isn't mandatory, since it may be beyond me but still correct, and I hope I'm not standing in the way if that is the case. I've explained my current state of understanding, which is that I saw in the manual for 6097 that forcing the link is unnecessary for internal PHY ports regardless of whether the PPU is enabled or not. Yet, DSA will still force these ports down with your proposed change (because DSA lies that it is a MLO_AN_FIXED mode despite what's in the device tree), and it relies on unforcing them in mv88e6xxx_mac_config(), which in itself makes sense considering the other discussion about kexec and such. But it appears that it may not unforce them again - because that condition depends on mv88e6xxx_port_ppu_updates() which may be false. So if it is false, things are still broken. It isn't a matter of distrust and I'm sorry that you take it personal, but your proposed solution doesn't appear to me to have a clear correlation to the patch that introduced a regression. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-07 13:24 ` Vladimir Oltean @ 2021-12-07 13:59 ` Russell King (Oracle) 2021-12-07 14:37 ` Vladimir Oltean 0 siblings, 1 reply; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-07 13:59 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Tue, Dec 07, 2021 at 03:24:13PM +0200, Vladimir Oltean wrote: > On Tue, Dec 07, 2021 at 12:58:20AM +0000, Russell King (Oracle) wrote: > > > Anyway, so be it. Essentially, what is the most frustrating is that I > > > haven't been doing anything else for the past 4 hours, and I still > > > haven't understood enough of what you're saying to even understand why > > > it is _relevant_ to Martyn's report. All I understand is that you're > > > also looking in that area of the code for a completely unrelated reason > > > which you've found adequate to mention here and now. > > > > I hope you realise that _your_ attitude here is frustrating and > > inflamatory. This is _not_ a "completely unrelated reason" - this > > is about getting the right solution to Martyn's problem. I thought > > about doing another detailed reply, but when I got to the part > > about you wanting to check 6390X, I discarded that reply and > > wrote this one instead. You clearly have a total distrust for > > everything that I write in any email, so I just won't bother. > > I think getting the right solution to Martyn's problem is the most > important part. I'd be very happy if I understood it too, although that > isn't mandatory, since it may be beyond me but still correct, and I hope > I'm not standing in the way if that is the case. > I've explained my current state of understanding, which is that I saw in > the manual for 6097 that forcing the link is unnecessary for internal > PHY ports regardless of whether the PPU is enabled or not. Yet, DSA will > still force these ports down with your proposed change (because DSA lies > that it is a MLO_AN_FIXED mode despite what's in the device tree), and > it relies on unforcing them in mv88e6xxx_mac_config(), which in itself > makes sense considering the other discussion about kexec and such. But > it appears that it may not unforce them again - because that condition > depends on mv88e6xxx_port_ppu_updates() which may be false. So if it is > false, things are still broken. > It isn't a matter of distrust and I'm sorry that you take it personal, > but your proposed solution doesn't appear to me to have a clear > correlation to the patch that introduced a regression. The change that I have proposed is based on two patches: 1) Change mv88e6xxx_port_ppu_updates() to behave sensibly for the 6250 family of DSA switches. 2) Un-force the link-down in mv88e6xxx_mac_config() This gives us more consistency. The checks become: a) mac_link_down(): if ((!mv88e6xxx_port_ppu_updates(chip, port) || mode == MLO_AN_FIXED) && ops->port_sync_link) err = ops->port_sync_link(chip, port, mode, false); b) mac_link_up(): if (!mv88e6xxx_port_ppu_updates(chip, port) || mode == MLO_AN_FIXED) { ... if (ops->port_sync_link) err = ops->port_sync_link(chip, port, mode, true); } c) mac_config(): if (chip->info->ops->port_set_link && ((mode == MLO_AN_INBAND && p->interface != state->interface) || (mode == MLO_AN_PHY && mv88e6xxx_port_ppu_updates(chip, port)))) chip->info->ops->port_set_link(chip, port, LINK_UNFORCED); The "(mode == MLO_AN_INBAND && p->interface != state->interface)" expression comes from the existing code, so isn't something that needs discussion. We want to preserve the state of the link-force for MLO_AN_FIXED, so the only case for discussion is the new MLO_AN_PHY. It can be seen that all three methods above end up using the same decision, and essentially if mv88e6xxx_port_ppu_updates() is true, meaning there is a non-software method to handle the status updates, then we (a) don't do any forcing in the mac_link_*() methods and turn off the forcing in mac_config(). -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-07 13:59 ` Russell King (Oracle) @ 2021-12-07 14:37 ` Vladimir Oltean 2021-12-07 14:53 ` Russell King (Oracle) 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-07 14:37 UTC (permalink / raw) To: Russell King (Oracle) Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Tue, Dec 07, 2021 at 01:59:03PM +0000, Russell King (Oracle) wrote: > The change that I have proposed is based on two patches: > > 1) Change mv88e6xxx_port_ppu_updates() to behave sensibly for the 6250 > family of DSA switches. > > 2) Un-force the link-down in mv88e6xxx_mac_config() > > This gives us more consistency. The checks become: > > a) mac_link_down(): > > if ((!mv88e6xxx_port_ppu_updates(chip, port) || > mode == MLO_AN_FIXED) && ops->port_sync_link) > err = ops->port_sync_link(chip, port, mode, false); > > b) mac_link_up(): > > if (!mv88e6xxx_port_ppu_updates(chip, port) || > mode == MLO_AN_FIXED) { > ... > if (ops->port_sync_link) > err = ops->port_sync_link(chip, port, mode, true); > } > > c) mac_config(): > > if (chip->info->ops->port_set_link && > ((mode == MLO_AN_INBAND && p->interface != state->interface) || > (mode == MLO_AN_PHY && mv88e6xxx_port_ppu_updates(chip, port)))) > chip->info->ops->port_set_link(chip, port, LINK_UNFORCED); > > The "(mode == MLO_AN_INBAND && p->interface != state->interface)" > expression comes from the existing code, so isn't something that > needs discussion. > > We want to preserve the state of the link-force for MLO_AN_FIXED, > so the only case for discussion is the new MLO_AN_PHY. It can be > seen that all three methods above end up using the same decision, > and essentially if mv88e6xxx_port_ppu_updates() is true, meaning > there is a non-software method to handle the status updates, then > we (a) don't do any forcing in the mac_link_*() methods and turn > off the forcing in mac_config(). Ok, so if we are in: - MLO_AN_PHY (internal or external) and PPU enabled, we're letting the link unforced in mac_config - MLO_AN_PHY (internal or external) and PPU disabled, we're forcing the link up in mac_link_up, via the "!mv88e6xxx_port_ppu_updates()" check. - MLO_AN_FIXED, we're also forcing the link up in mac_link_up, via the "mode == MLO_AN_FIXED" check. So in Martyn's case, a CPU port internal PHY link which is forced down in mac_link_down would either be forced up or unforced, but never be left the way in which mac_link_down put it. That is good and appears robust. Although it isn't strictly true that "[ if ] there is a non-software method to handle the status updates, then we don't do any forcing in the mac_link_*() methods", because technically we still do in mac_link_down, due to the " || mode == MLO_AN_FIXED" condition plus the way in which we are called, it's just that we undo it later. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-07 14:37 ` Vladimir Oltean @ 2021-12-07 14:53 ` Russell King (Oracle) 0 siblings, 0 replies; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-07 14:53 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Tue, Dec 07, 2021 at 04:37:05PM +0200, Vladimir Oltean wrote: > Although it isn't strictly true that "[ if ] there is a non-software > method to handle the status updates, then we don't do any forcing in > the mac_link_*() methods", because technically we still do in > mac_link_down, due to the " || mode == MLO_AN_FIXED" condition plus the > way in which we are called, it's just that we undo it later. Fixed links are a "software method" as Andrew has already pointed out - there is a software hook that can be used to retrieve the link status, potentially also updating the other link parameters as well. Fixed links have never been "these are the only parameters you will ever get and the link will always be forced up". That is not new. Have a look at drivers/net/phy/fixed_phy.c and its link_update() method, which can change any parameter of the fixed-link it desires. There is also fixed_phy_change_carrier() which allows one to force the link state of a fixed-link. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 21:27 ` Russell King (Oracle) 2021-12-06 21:49 ` Russell King (Oracle) @ 2021-12-06 21:51 ` Vladimir Oltean 2021-12-06 22:17 ` Andrew Lunn 2021-12-06 22:22 ` Russell King (Oracle) 1 sibling, 2 replies; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 21:51 UTC (permalink / raw) To: Russell King (Oracle) Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote: > On Mon, Dec 06, 2021 at 11:13:41PM +0200, Vladimir Oltean wrote: > > On Mon, Dec 06, 2021 at 08:51:09PM +0000, Russell King (Oracle) wrote: > > > With a bit of knowledge of how Marvell DSA switches work... > > > > > > The "ppu" is the PHY polling unit. When the switch comes out of reset, > > > the PPU probes the MDIO bus, and sets the bit in the port status > > > register depending on whether it detects a PHY at the port address by > > > way of the PHY ID values. This bit is used to enable polling of the > > > PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be > > > set for all internal PHYs unless we explicitly turn it off (we don't.) > > > Therefore, this is a reasonable assumption to make. > > > > > > So, given that mv88e6xxx_port_ppu_updates() is most likely true as > > > I stated, it is also true that mv88e6xxx_phy_is_internal() is > > > "don't care". > > > > And the reason why you bring the PPU into the discussion is because? > > If the issue manifests itself with or without it, and you come up with a > > proposal to set LINK_UNFORCED in mv88e6xxx_mac_config if the PPU is > > used, doesn't that, logically speaking, still leave the issue unsolved > > if the PPU is _not_ used for whatever reason? > > The bug has nothing to do with the PPU. It can be solved by checking for > > PPU in-band status as you say. Maybe. But I've got no idea why we don't > > address the elephant in the room, which is in dsa_port_link_register_of()? > > I think I've covered that in the other sub-thread. > > It could be that a previous configuration left the port forced down. > For example, if one were to kexec from one kernel that uses a > fixed-link that forced the link down, into the same kernel with a > different DT that uses PHY mode. > > The old kernel may have called mac_link_down(MLO_AN_FIXED), and the > new kernel wouldn't know that. It comes along, and goes through the > configuration process and calls mac_link_up(MLO_AN_PHY)... and from > what you're suggesting, because these two calls use different MLO_AN_xxx > constants that's a bug. Indeed I don't have detailed knowledge of Marvell hardware, but I'm surprised to see kexec being mentioned here as a potential source of configurations which the driver does not expect to handle. My belief was that kexec's requirements would be just to silence the device sufficiently such that it doesn't cause any surprises when things such interrupts are enabled (DMA isn't relevant for DSA switches). It wouldn't be responsible for leaving the hardware in any other state otherwise. I see this logic in the driver, does it not take care of bringing the ports to a known state, regardless of what a previous boot stage may have done? static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip) { int err; err = mv88e6xxx_disable_ports(chip); if (err) return err; mv88e6xxx_hardware_reset(chip); return mv88e6xxx_software_reset(chip); } So unless I'm fooled by mentally putting an equality sign between mv88e6xxx_switch_reset() and getting rid of whatever a previous kernel may have done, I don't think at all that the two cases are comparable: kexec and a previous call to mv88e6xxx_mac_link_down() initiated by dsa_port_link_register_of() from this kernel. > > An alternative: the hardware boots up with the link forced down. The > boot loader doesn't touch it. The kernel boots and calls > mac_link_up(MLO_AN_PHY). Again, in my simplistic view, the switch reset deals with this too. Maybe I'm wrong. > This all works as expected with e.g. mvneta. It doesn't work with > Marvell DSA because we have all these additional extra exceptional > cases to deal with the PPU (which is what _actually_ transfers the > PHY status to the port registers for all PHYs.) > > We used to just rely on the PPU bit for making the decision, but when > I introduced that helper, I forgot that the PPU bit doesn't exist on > the 6250 family, which resulted in commit 4a3e0aeddf09. Looking at > 4a3e0aeddf09, I now believe the fix there to be wrong. It should > have made mv88e6xxx_port_ppu_updates() follow > mv88e6xxx_phy_is_internal() for internal ports only for the 6250 family > that has the link status bit in that position, especially as one can > disable the PPU bit in DSA switches such as 6390, which for some ports > stops the PHY being used and switches the port to serdes mode. > "Internal" ports aren't always internal on these switches. > > -- > RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 21:51 ` Vladimir Oltean @ 2021-12-06 22:17 ` Andrew Lunn 2021-12-06 22:22 ` Russell King (Oracle) 1 sibling, 0 replies; 34+ messages in thread From: Andrew Lunn @ 2021-12-06 22:17 UTC (permalink / raw) To: Vladimir Oltean Cc: Russell King (Oracle), Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel > static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip) > { > int err; > > err = mv88e6xxx_disable_ports(chip); > if (err) > return err; > > mv88e6xxx_hardware_reset(chip); > > return mv88e6xxx_software_reset(chip); > } > > So unless I'm fooled by mentally putting an equality sign between > mv88e6xxx_switch_reset() and getting rid of whatever a previous kernel > may have done A software reset resets the queue controllers and a few other things. It does not touch the contents of most registers. A hardware reset, using a GPIO to the reset pin, will reset all registers. The switch will then read the optional PROM, and that could set some registers. But hardware reset is not supported by most boards. Andrew ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 21:51 ` Vladimir Oltean 2021-12-06 22:17 ` Andrew Lunn @ 2021-12-06 22:22 ` Russell King (Oracle) 2021-12-06 23:44 ` Vladimir Oltean 1 sibling, 1 reply; 34+ messages in thread From: Russell King (Oracle) @ 2021-12-06 22:22 UTC (permalink / raw) To: Vladimir Oltean Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 11:51:39PM +0200, Vladimir Oltean wrote: > On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote: > > On Mon, Dec 06, 2021 at 11:13:41PM +0200, Vladimir Oltean wrote: > > > On Mon, Dec 06, 2021 at 08:51:09PM +0000, Russell King (Oracle) wrote: > > > > With a bit of knowledge of how Marvell DSA switches work... > > > > > > > > The "ppu" is the PHY polling unit. When the switch comes out of reset, > > > > the PPU probes the MDIO bus, and sets the bit in the port status > > > > register depending on whether it detects a PHY at the port address by > > > > way of the PHY ID values. This bit is used to enable polling of the > > > > PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be > > > > set for all internal PHYs unless we explicitly turn it off (we don't.) > > > > Therefore, this is a reasonable assumption to make. > > > > > > > > So, given that mv88e6xxx_port_ppu_updates() is most likely true as > > > > I stated, it is also true that mv88e6xxx_phy_is_internal() is > > > > "don't care". > > > > > > And the reason why you bring the PPU into the discussion is because? > > > If the issue manifests itself with or without it, and you come up with a > > > proposal to set LINK_UNFORCED in mv88e6xxx_mac_config if the PPU is > > > used, doesn't that, logically speaking, still leave the issue unsolved > > > if the PPU is _not_ used for whatever reason? > > > The bug has nothing to do with the PPU. It can be solved by checking for > > > PPU in-band status as you say. Maybe. But I've got no idea why we don't > > > address the elephant in the room, which is in dsa_port_link_register_of()? > > > > I think I've covered that in the other sub-thread. > > > > It could be that a previous configuration left the port forced down. > > For example, if one were to kexec from one kernel that uses a > > fixed-link that forced the link down, into the same kernel with a > > different DT that uses PHY mode. > > > > The old kernel may have called mac_link_down(MLO_AN_FIXED), and the > > new kernel wouldn't know that. It comes along, and goes through the > > configuration process and calls mac_link_up(MLO_AN_PHY)... and from > > what you're suggesting, because these two calls use different MLO_AN_xxx > > constants that's a bug. > > Indeed I don't have detailed knowledge of Marvell hardware, but I'm > surprised to see kexec being mentioned here as a potential source of > configurations which the driver does not expect to handle. My belief was > that kexec's requirements would be just to silence the device > sufficiently such that it doesn't cause any surprises when things such > interrupts are enabled (DMA isn't relevant for DSA switches). > It wouldn't be responsible for leaving the hardware in any other state > otherwise. > > I see this logic in the driver, does it not take care of bringing the > ports to a known state, regardless of what a previous boot stage may > have done? > > static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip) > { > int err; > > err = mv88e6xxx_disable_ports(chip); > if (err) > return err; > > mv88e6xxx_hardware_reset(chip); > > return mv88e6xxx_software_reset(chip); > } > > So unless I'm fooled by mentally putting an equality sign between > mv88e6xxx_switch_reset() and getting rid of whatever a previous kernel > may have done, I don't think at all that the two cases are comparable: > kexec and a previous call to mv88e6xxx_mac_link_down() initiated by > dsa_port_link_register_of() from this kernel. If the hardware reset is not wired to be under software control or is not specified, then mv88e6xxx_hardware_reset() is a no-op. mv88e6xxx_software_reset() does not fully reinitialise the switch. To quote one switch manual for the SWReset bit "Register values are not modified." That means if the link was forced down previously by writing to the port control register, the port remains forced down until software changes that register to unforce the link, or to force the link up. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 22:22 ` Russell King (Oracle) @ 2021-12-06 23:44 ` Vladimir Oltean 2021-12-07 2:06 ` Andrew Lunn 0 siblings, 1 reply; 34+ messages in thread From: Vladimir Oltean @ 2021-12-06 23:44 UTC (permalink / raw) To: Russell King (Oracle) Cc: Martyn Welch, Andrew Lunn, Vivien Didelot, Florian Fainelli, netdev, kernel On Mon, Dec 06, 2021 at 10:22:15PM +0000, Russell King (Oracle) wrote: > On Mon, Dec 06, 2021 at 11:51:39PM +0200, Vladimir Oltean wrote: > > On Mon, Dec 06, 2021 at 09:27:33PM +0000, Russell King (Oracle) wrote: > > > On Mon, Dec 06, 2021 at 11:13:41PM +0200, Vladimir Oltean wrote: > > > > On Mon, Dec 06, 2021 at 08:51:09PM +0000, Russell King (Oracle) wrote: > > > > > With a bit of knowledge of how Marvell DSA switches work... > > > > > > > > > > The "ppu" is the PHY polling unit. When the switch comes out of reset, > > > > > the PPU probes the MDIO bus, and sets the bit in the port status > > > > > register depending on whether it detects a PHY at the port address by > > > > > way of the PHY ID values. This bit is used to enable polling of the > > > > > PHY and is what mv88e6xxx_port_ppu_updates() reports. This bit will be > > > > > set for all internal PHYs unless we explicitly turn it off (we don't.) > > > > > Therefore, this is a reasonable assumption to make. > > > > > > > > > > So, given that mv88e6xxx_port_ppu_updates() is most likely true as > > > > > I stated, it is also true that mv88e6xxx_phy_is_internal() is > > > > > "don't care". > > > > > > > > And the reason why you bring the PPU into the discussion is because? > > > > If the issue manifests itself with or without it, and you come up with a > > > > proposal to set LINK_UNFORCED in mv88e6xxx_mac_config if the PPU is > > > > used, doesn't that, logically speaking, still leave the issue unsolved > > > > if the PPU is _not_ used for whatever reason? > > > > The bug has nothing to do with the PPU. It can be solved by checking for > > > > PPU in-band status as you say. Maybe. But I've got no idea why we don't > > > > address the elephant in the room, which is in dsa_port_link_register_of()? > > > > > > I think I've covered that in the other sub-thread. > > > > > > It could be that a previous configuration left the port forced down. > > > For example, if one were to kexec from one kernel that uses a > > > fixed-link that forced the link down, into the same kernel with a > > > different DT that uses PHY mode. > > > > > > The old kernel may have called mac_link_down(MLO_AN_FIXED), and the > > > new kernel wouldn't know that. It comes along, and goes through the > > > configuration process and calls mac_link_up(MLO_AN_PHY)... and from > > > what you're suggesting, because these two calls use different MLO_AN_xxx > > > constants that's a bug. > > > > Indeed I don't have detailed knowledge of Marvell hardware, but I'm > > surprised to see kexec being mentioned here as a potential source of > > configurations which the driver does not expect to handle. My belief was > > that kexec's requirements would be just to silence the device > > sufficiently such that it doesn't cause any surprises when things such > > interrupts are enabled (DMA isn't relevant for DSA switches). > > It wouldn't be responsible for leaving the hardware in any other state > > otherwise. > > > > I see this logic in the driver, does it not take care of bringing the > > ports to a known state, regardless of what a previous boot stage may > > have done? > > > > static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip *chip) > > { > > int err; > > > > err = mv88e6xxx_disable_ports(chip); > > if (err) > > return err; > > > > mv88e6xxx_hardware_reset(chip); > > > > return mv88e6xxx_software_reset(chip); > > } > > > > So unless I'm fooled by mentally putting an equality sign between > > mv88e6xxx_switch_reset() and getting rid of whatever a previous kernel > > may have done, I don't think at all that the two cases are comparable: > > kexec and a previous call to mv88e6xxx_mac_link_down() initiated by > > dsa_port_link_register_of() from this kernel. > > If the hardware reset is not wired to be under software control or is > not specified, then mv88e6xxx_hardware_reset() is a no-op. > > mv88e6xxx_software_reset() does not fully reinitialise the switch. > To quote one switch manual for the SWReset bit "Register values are not > modified." That means if the link was forced down previously by writing > to the port control register, the port remains forced down until > software changes that register to unforce the link, or to force the > link up. Ouch, this is pretty unfortunate if true. But please allow me to suggest that not all DSA switches are like this, and that this is a pretty weak justification for the placement of a phylink_mac_link_down call in no other place than dsa_port_link_register_of. If this is an indication of anything, the two DSA drivers that I maintain have worked just fine in the time frame between the DSA conversion to forcing the link in mac_link_up and the DSA change to force a mac_link_down before connecting to phylink, therefore do not need that change. Therefore, I believe that it isn't fair to create avoidable baggage for other drivers, that may end up depending without even realizing on this non-standard arrangement of phylink calls. If the mac_link_down would have been in phylink I wouldn't have had any problem. Same if the same call would have been initiated by mv88e6xxx itself. Is there any technical reason why the mv88e6xxx driver (or others if others exist) cannot turn off its ports by itself and needs to be driven by an external phylink_mac_link_down call to do that (with extra care taken that the port is able to be turned back on again by phylink if needed)? It can't be that can't compute the arguments to call the function with - because they aren't correct in the current form of the code either. It also can't be due to the timing, because we are here: static int dsa_tree_setup_switches(struct dsa_switch_tree *dst) { struct dsa_port *dp; int err; list_for_each_entry(dp, &dst->ports, list) { err = dsa_switch_setup(dp->ds); -> calls ->setup() if (err) goto teardown; } list_for_each_entry(dp, &dst->ports, list) { err = dsa_port_setup(dp); -> calls dsa_port_link_register_of() -> calls phylink_mac_link_down() if (err) { err = dsa_port_reinit_as_unused(dp); if (err) goto teardown; } } So since we are positioned where we are in the DSA initialization sequence, forcing the CPU ports down at the end of ->setup() should be close enough temporally to where it is currently done now? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-06 23:44 ` Vladimir Oltean @ 2021-12-07 2:06 ` Andrew Lunn 2021-12-07 12:48 ` Vladimir Oltean 0 siblings, 1 reply; 34+ messages in thread From: Andrew Lunn @ 2021-12-07 2:06 UTC (permalink / raw) To: Vladimir Oltean Cc: Russell King (Oracle), Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel > > mv88e6xxx_software_reset() does not fully reinitialise the switch. > > To quote one switch manual for the SWReset bit "Register values are not > > modified." That means if the link was forced down previously by writing > > to the port control register, the port remains forced down until > > software changes that register to unforce the link, or to force the > > link up. > > Ouch, this is pretty unfortunate if true. Come on. Do you really think Russell is making this up? Andrew ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: mv88e6240 configuration broken for B850v3 2021-12-07 2:06 ` Andrew Lunn @ 2021-12-07 12:48 ` Vladimir Oltean 0 siblings, 0 replies; 34+ messages in thread From: Vladimir Oltean @ 2021-12-07 12:48 UTC (permalink / raw) To: Andrew Lunn Cc: Russell King (Oracle), Martyn Welch, Vivien Didelot, Florian Fainelli, netdev, kernel On Tue, Dec 07, 2021 at 03:06:55AM +0100, Andrew Lunn wrote: > > > mv88e6xxx_software_reset() does not fully reinitialise the switch. > > > To quote one switch manual for the SWReset bit "Register values are not > > > modified." That means if the link was forced down previously by writing > > > to the port control register, the port remains forced down until > > > software changes that register to unforce the link, or to force the > > > link up. > > > > Ouch, this is pretty unfortunate if true. > > Come on. Do you really think Russell is making this up? I didn't say it isn't true, I just lacked the energy to research this too last night in the documentation I happened to have. I did find that quote about the SWReset bit now. But I did also write a full email after that phrase, making an argument that still did hold if what was said about the mv88e6xxx resets was true. ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2021-12-07 14:53 UTC | newest] Thread overview: 34+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-12-03 9:06 mv88e6240 configuration broken for B850v3 Martyn Welch 2021-12-03 16:25 ` Andrew Lunn 2021-12-06 17:44 ` Martyn Welch 2021-12-06 18:26 ` Martyn Welch 2021-12-06 18:31 ` Vladimir Oltean 2021-12-06 18:37 ` Martyn Welch 2021-12-06 18:50 ` Vladimir Oltean 2021-12-06 19:24 ` Martyn Welch 2021-12-06 19:37 ` Vladimir Oltean 2021-12-06 19:53 ` Andrew Lunn 2021-12-06 20:01 ` Vladimir Oltean 2021-12-06 20:18 ` Russell King (Oracle) 2021-12-06 20:29 ` Vladimir Oltean 2021-12-07 14:09 ` Andrew Lunn 2021-12-06 21:44 ` Vladimir Oltean 2021-12-06 22:13 ` Russell King (Oracle) 2021-12-06 20:07 ` Russell King (Oracle) 2021-12-06 20:23 ` Vladimir Oltean 2021-12-06 20:51 ` Russell King (Oracle) 2021-12-06 21:13 ` Vladimir Oltean 2021-12-06 21:27 ` Russell King (Oracle) 2021-12-06 21:49 ` Russell King (Oracle) 2021-12-06 23:27 ` Vladimir Oltean 2021-12-07 0:58 ` Russell King (Oracle) 2021-12-07 13:24 ` Vladimir Oltean 2021-12-07 13:59 ` Russell King (Oracle) 2021-12-07 14:37 ` Vladimir Oltean 2021-12-07 14:53 ` Russell King (Oracle) 2021-12-06 21:51 ` Vladimir Oltean 2021-12-06 22:17 ` Andrew Lunn 2021-12-06 22:22 ` Russell King (Oracle) 2021-12-06 23:44 ` Vladimir Oltean 2021-12-07 2:06 ` Andrew Lunn 2021-12-07 12:48 ` Vladimir Oltean
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).