* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support [not found] ` <E1tYAEG-0014QH-9O@rmk-PC.armlinux.org.uk> @ 2025-02-13 11:05 ` Jon Hunter 2025-02-13 11:37 ` Russell King (Oracle) 0 siblings, 1 reply; 21+ messages in thread From: Jon Hunter @ 2025-02-13 11:05 UTC (permalink / raw) To: Russell King (Oracle), Andrew Lunn, Heiner Kallweit Cc: Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org Hi Russell, On 15/01/2025 20:43, Russell King (Oracle) wrote: > Convert stmmac to use phylink managed EEE support rather than delving > into phylib: > > 1. Move the stmmac_eee_init() calls out of mac_link_down() and > mac_link_up() methods into the new mac_{enable,disable}_lpi() > methods. We leave the calls to stmmac_set_eee_pls() in place as > these change bits which tell the EEE hardware when the link came > up or down, and is used for a separate hardware timer. However, > symmetrically conditionalise this with priv->dma_cap.eee. > > 2. Update the current LPI timer each time LPI is enabled - which we > need for software-timed LPI. > > 3. With phylink managed EEE, phylink manages the receive clock stop > configuration via phylink_config.eee_rx_clk_stop_enable. Set this > appropriately which makes the call to phy_eee_rx_clock_stop() > redundant. > > 4. From what I can work out, all supported interfaces support LPI > signalling on stmmac (there's no restriction implemented.) It > also appears to support LPI at all full duplex speeds at or over > 100M. Set these capabilities. > > 5. The default timer appears to be derived from a module parameter. > Set this the same, although we keep code that reconfigures the > timer in stmmac_init_phy(). > > 6. Remove the direct call to phy_support_eee(), which phylink will do > on the drivers behalf if phylink_config.eee_enabled_default is set. > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > --- > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 57 +++++++++++++++---- > 1 file changed, 45 insertions(+), 12 deletions(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index acd6994c1764..c5d293be8ab9 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -988,8 +988,8 @@ static void stmmac_mac_link_down(struct phylink_config *config, > struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > > stmmac_mac_set(priv, priv->ioaddr, false); > - stmmac_eee_init(priv, false); > - stmmac_set_eee_pls(priv, priv->hw, false); > + if (priv->dma_cap.eee) > + stmmac_set_eee_pls(priv, priv->hw, false); > > if (stmmac_fpe_supported(priv)) > stmmac_fpe_link_state_handle(priv, false); > @@ -1096,13 +1096,8 @@ static void stmmac_mac_link_up(struct phylink_config *config, > writel(ctrl, priv->ioaddr + MAC_CTRL_REG); > > stmmac_mac_set(priv, priv->ioaddr, true); > - if (phy && priv->dma_cap.eee) { > - phy_eee_rx_clock_stop(phy, !(priv->plat->flags & > - STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)); > - priv->tx_lpi_timer = phy->eee_cfg.tx_lpi_timer; > - stmmac_eee_init(priv, phy->enable_tx_lpi); > + if (priv->dma_cap.eee) > stmmac_set_eee_pls(priv, priv->hw, true); > - } > > if (stmmac_fpe_supported(priv)) > stmmac_fpe_link_state_handle(priv, true); > @@ -1111,12 +1106,32 @@ static void stmmac_mac_link_up(struct phylink_config *config, > stmmac_hwtstamp_correct_latency(priv, priv); > } > > +static void stmmac_mac_disable_tx_lpi(struct phylink_config *config) > +{ > + struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > + > + stmmac_eee_init(priv, false); > +} > + > +static int stmmac_mac_enable_tx_lpi(struct phylink_config *config, u32 timer, > + bool tx_clk_stop) > +{ > + struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > + > + priv->tx_lpi_timer = timer; > + stmmac_eee_init(priv, true); > + > + return 0; > +} > + > static const struct phylink_mac_ops stmmac_phylink_mac_ops = { > .mac_get_caps = stmmac_mac_get_caps, > .mac_select_pcs = stmmac_mac_select_pcs, > .mac_config = stmmac_mac_config, > .mac_link_down = stmmac_mac_link_down, > .mac_link_up = stmmac_mac_link_up, > + .mac_disable_tx_lpi = stmmac_mac_disable_tx_lpi, > + .mac_enable_tx_lpi = stmmac_mac_enable_tx_lpi, > }; > > /** > @@ -1189,9 +1204,6 @@ static int stmmac_init_phy(struct net_device *dev) > return -ENODEV; > } > > - if (priv->dma_cap.eee) > - phy_support_eee(phydev); > - > ret = phylink_connect_phy(priv->phylink, phydev); > } else { > fwnode_handle_put(phy_fwnode); > @@ -1201,7 +1213,12 @@ static int stmmac_init_phy(struct net_device *dev) > if (ret == 0) { > struct ethtool_keee eee; > > - /* Configure phylib's copy of the LPI timer */ > + /* Configure phylib's copy of the LPI timer. Normally, > + * phylink_config.lpi_timer_default would do this, but there is > + * a chance that userspace could change the eee_timer setting > + * via sysfs before the first open. Thus, preserve existing > + * behaviour. > + */ > if (!phylink_ethtool_get_eee(priv->phylink, &eee)) { > eee.tx_lpi_timer = priv->tx_lpi_timer; > phylink_ethtool_set_eee(priv->phylink, &eee); > @@ -1234,6 +1251,9 @@ static int stmmac_phy_setup(struct stmmac_priv *priv) > /* Stmmac always requires an RX clock for hardware initialization */ > priv->phylink_config.mac_requires_rxc = true; > > + if (!(priv->plat->flags & STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)) > + priv->phylink_config.eee_rx_clk_stop_enable = true; > + > mdio_bus_data = priv->plat->mdio_bus_data; > if (mdio_bus_data) > priv->phylink_config.default_an_inband = > @@ -1255,6 +1275,19 @@ static int stmmac_phy_setup(struct stmmac_priv *priv) > priv->phylink_config.supported_interfaces, > pcs->supported_interfaces); > > + if (priv->dma_cap.eee) { > + /* Assume all supported interfaces also support LPI */ > + memcpy(priv->phylink_config.lpi_interfaces, > + priv->phylink_config.supported_interfaces, > + sizeof(priv->phylink_config.lpi_interfaces)); > + > + /* All full duplex speeds above 100Mbps are supported */ > + priv->phylink_config.lpi_capabilities = ~(MAC_1000FD - 1) | > + MAC_100FD; > + priv->phylink_config.lpi_timer_default = eee_timer * 1000; > + priv->phylink_config.eee_enabled_default = true; > + } > + > fwnode = priv->plat->port_node; > if (!fwnode) > fwnode = dev_fwnode(priv->device); I have been tracking down a suspend regression on Tegra186 and bisect is pointing to this change. If I revert this on top of v6.14-rc2 then suspend is working again. This is observed on the Jetson TX2 board (specifically tegra186-p2771-0000.dts). This device is using NFS for testing. So it appears that for this board networking does not restart and the board hangs. Looking at the logs I do see this on resume ... [ 64.129079] dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma [ 64.133125] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed My first thought was if 'dma_cap.eee' is not supported for this device, but from what I can see it is and 'dma_cap.eee' is true. Here are some more details on this device regarding the ethernet controller. [ 4.221837] dwc-eth-dwmac 2490000.ethernet: Adding to iommu group 3 [ 4.239289] dwc-eth-dwmac 2490000.ethernet: User ID: 0x10, Synopsys ID: 0x41 [ 4.244020] dwc-eth-dwmac 2490000.ethernet: DWMAC4/5 [ 4.249042] dwc-eth-dwmac 2490000.ethernet: DMA HW capability register supported [ 4.256406] dwc-eth-dwmac 2490000.ethernet: RX Checksum Offload Engine supported [ 4.263768] dwc-eth-dwmac 2490000.ethernet: TX Checksum insertion supported [ 4.270700] dwc-eth-dwmac 2490000.ethernet: Wake-Up On Lan supported [ 4.277063] dwc-eth-dwmac 2490000.ethernet: TSO supported [ 4.282401] dwc-eth-dwmac 2490000.ethernet: Enable RX Mitigation via HW Watchdog Timer [ 4.290293] dwc-eth-dwmac 2490000.ethernet: Enabled L3L4 Flow TC (entries=8) [ 4.297309] dwc-eth-dwmac 2490000.ethernet: Enabled RFS Flow TC (entries=10) [ 4.304327] dwc-eth-dwmac 2490000.ethernet: TSO feature enabled [ 4.310220] dwc-eth-dwmac 2490000.ethernet: Using 40/40 bits DMA host/device width Let me know if you have any thoughts. Thanks! Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-13 11:05 ` [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support Jon Hunter @ 2025-02-13 11:37 ` Russell King (Oracle) 2025-02-13 12:00 ` Russell King (Oracle) 0 siblings, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-13 11:37 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Thu, Feb 13, 2025 at 11:05:01AM +0000, Jon Hunter wrote: > Hi Russell, > > On 15/01/2025 20:43, Russell King (Oracle) wrote: > > Convert stmmac to use phylink managed EEE support rather than delving > > into phylib: > > > > 1. Move the stmmac_eee_init() calls out of mac_link_down() and > > mac_link_up() methods into the new mac_{enable,disable}_lpi() > > methods. We leave the calls to stmmac_set_eee_pls() in place as > > these change bits which tell the EEE hardware when the link came > > up or down, and is used for a separate hardware timer. However, > > symmetrically conditionalise this with priv->dma_cap.eee. > > > > 2. Update the current LPI timer each time LPI is enabled - which we > > need for software-timed LPI. > > > > 3. With phylink managed EEE, phylink manages the receive clock stop > > configuration via phylink_config.eee_rx_clk_stop_enable. Set this > > appropriately which makes the call to phy_eee_rx_clock_stop() > > redundant. > > > > 4. From what I can work out, all supported interfaces support LPI > > signalling on stmmac (there's no restriction implemented.) It > > also appears to support LPI at all full duplex speeds at or over > > 100M. Set these capabilities. > > > > 5. The default timer appears to be derived from a module parameter. > > Set this the same, although we keep code that reconfigures the > > timer in stmmac_init_phy(). > > > > 6. Remove the direct call to phy_support_eee(), which phylink will do > > on the drivers behalf if phylink_config.eee_enabled_default is set. > > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > > --- > > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 57 +++++++++++++++---- > > 1 file changed, 45 insertions(+), 12 deletions(-) > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index acd6994c1764..c5d293be8ab9 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -988,8 +988,8 @@ static void stmmac_mac_link_down(struct phylink_config *config, > > struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > > stmmac_mac_set(priv, priv->ioaddr, false); > > - stmmac_eee_init(priv, false); > > - stmmac_set_eee_pls(priv, priv->hw, false); > > + if (priv->dma_cap.eee) > > + stmmac_set_eee_pls(priv, priv->hw, false); > > if (stmmac_fpe_supported(priv)) > > stmmac_fpe_link_state_handle(priv, false); > > @@ -1096,13 +1096,8 @@ static void stmmac_mac_link_up(struct phylink_config *config, > > writel(ctrl, priv->ioaddr + MAC_CTRL_REG); > > stmmac_mac_set(priv, priv->ioaddr, true); > > - if (phy && priv->dma_cap.eee) { > > - phy_eee_rx_clock_stop(phy, !(priv->plat->flags & > > - STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)); > > - priv->tx_lpi_timer = phy->eee_cfg.tx_lpi_timer; > > - stmmac_eee_init(priv, phy->enable_tx_lpi); > > + if (priv->dma_cap.eee) > > stmmac_set_eee_pls(priv, priv->hw, true); > > - } > > if (stmmac_fpe_supported(priv)) > > stmmac_fpe_link_state_handle(priv, true); > > @@ -1111,12 +1106,32 @@ static void stmmac_mac_link_up(struct phylink_config *config, > > stmmac_hwtstamp_correct_latency(priv, priv); > > } > > +static void stmmac_mac_disable_tx_lpi(struct phylink_config *config) > > +{ > > + struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > > + > > + stmmac_eee_init(priv, false); > > +} > > + > > +static int stmmac_mac_enable_tx_lpi(struct phylink_config *config, u32 timer, > > + bool tx_clk_stop) > > +{ > > + struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > > + > > + priv->tx_lpi_timer = timer; > > + stmmac_eee_init(priv, true); > > + > > + return 0; > > +} > > + > > static const struct phylink_mac_ops stmmac_phylink_mac_ops = { > > .mac_get_caps = stmmac_mac_get_caps, > > .mac_select_pcs = stmmac_mac_select_pcs, > > .mac_config = stmmac_mac_config, > > .mac_link_down = stmmac_mac_link_down, > > .mac_link_up = stmmac_mac_link_up, > > + .mac_disable_tx_lpi = stmmac_mac_disable_tx_lpi, > > + .mac_enable_tx_lpi = stmmac_mac_enable_tx_lpi, > > }; > > /** > > @@ -1189,9 +1204,6 @@ static int stmmac_init_phy(struct net_device *dev) > > return -ENODEV; > > } > > - if (priv->dma_cap.eee) > > - phy_support_eee(phydev); > > - > > ret = phylink_connect_phy(priv->phylink, phydev); > > } else { > > fwnode_handle_put(phy_fwnode); > > @@ -1201,7 +1213,12 @@ static int stmmac_init_phy(struct net_device *dev) > > if (ret == 0) { > > struct ethtool_keee eee; > > - /* Configure phylib's copy of the LPI timer */ > > + /* Configure phylib's copy of the LPI timer. Normally, > > + * phylink_config.lpi_timer_default would do this, but there is > > + * a chance that userspace could change the eee_timer setting > > + * via sysfs before the first open. Thus, preserve existing > > + * behaviour. > > + */ > > if (!phylink_ethtool_get_eee(priv->phylink, &eee)) { > > eee.tx_lpi_timer = priv->tx_lpi_timer; > > phylink_ethtool_set_eee(priv->phylink, &eee); > > @@ -1234,6 +1251,9 @@ static int stmmac_phy_setup(struct stmmac_priv *priv) > > /* Stmmac always requires an RX clock for hardware initialization */ > > priv->phylink_config.mac_requires_rxc = true; > > + if (!(priv->plat->flags & STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)) > > + priv->phylink_config.eee_rx_clk_stop_enable = true; > > + > > mdio_bus_data = priv->plat->mdio_bus_data; > > if (mdio_bus_data) > > priv->phylink_config.default_an_inband = > > @@ -1255,6 +1275,19 @@ static int stmmac_phy_setup(struct stmmac_priv *priv) > > priv->phylink_config.supported_interfaces, > > pcs->supported_interfaces); > > + if (priv->dma_cap.eee) { > > + /* Assume all supported interfaces also support LPI */ > > + memcpy(priv->phylink_config.lpi_interfaces, > > + priv->phylink_config.supported_interfaces, > > + sizeof(priv->phylink_config.lpi_interfaces)); > > + > > + /* All full duplex speeds above 100Mbps are supported */ > > + priv->phylink_config.lpi_capabilities = ~(MAC_1000FD - 1) | > > + MAC_100FD; > > + priv->phylink_config.lpi_timer_default = eee_timer * 1000; > > + priv->phylink_config.eee_enabled_default = true; > > + } > > + > > fwnode = priv->plat->port_node; > > if (!fwnode) > > fwnode = dev_fwnode(priv->device); > > > I have been tracking down a suspend regression on Tegra186 and bisect is > pointing to this change. If I revert this on top of v6.14-rc2 then > suspend is working again. This is observed on the Jetson TX2 board > (specifically tegra186-p2771-0000.dts). Thanks for the report. > This device is using NFS for testing. So it appears that for this board > networking does not restart and the board hangs. Looking at the logs I > do see this on resume ... > > [ 64.129079] dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma > [ 64.133125] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed > > My first thought was if 'dma_cap.eee' is not supported for this device, > but from what I can see it is and 'dma_cap.eee' is true. Here are some > more details on this device regarding the ethernet controller. Could you see whether disabling EEE through ethtool (maybe first try turning tx-lpi off before using the "eee off") to see whether that makes any difference please? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-13 11:37 ` Russell King (Oracle) @ 2025-02-13 12:00 ` Russell King (Oracle) 2025-02-14 10:58 ` Jon Hunter 0 siblings, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-13 12:00 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Thu, Feb 13, 2025 at 11:37:17AM +0000, Russell King (Oracle) wrote: > On Thu, Feb 13, 2025 at 11:05:01AM +0000, Jon Hunter wrote: > > Hi Russell, > > > > On 15/01/2025 20:43, Russell King (Oracle) wrote: > > > Convert stmmac to use phylink managed EEE support rather than delving > > > into phylib: > > > > > > 1. Move the stmmac_eee_init() calls out of mac_link_down() and > > > mac_link_up() methods into the new mac_{enable,disable}_lpi() > > > methods. We leave the calls to stmmac_set_eee_pls() in place as > > > these change bits which tell the EEE hardware when the link came > > > up or down, and is used for a separate hardware timer. However, > > > symmetrically conditionalise this with priv->dma_cap.eee. > > > > > > 2. Update the current LPI timer each time LPI is enabled - which we > > > need for software-timed LPI. > > > > > > 3. With phylink managed EEE, phylink manages the receive clock stop > > > configuration via phylink_config.eee_rx_clk_stop_enable. Set this > > > appropriately which makes the call to phy_eee_rx_clock_stop() > > > redundant. > > > > > > 4. From what I can work out, all supported interfaces support LPI > > > signalling on stmmac (there's no restriction implemented.) It > > > also appears to support LPI at all full duplex speeds at or over > > > 100M. Set these capabilities. > > > > > > 5. The default timer appears to be derived from a module parameter. > > > Set this the same, although we keep code that reconfigures the > > > timer in stmmac_init_phy(). > > > > > > 6. Remove the direct call to phy_support_eee(), which phylink will do > > > on the drivers behalf if phylink_config.eee_enabled_default is set. > > > > > > Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> > > > --- > > > .../net/ethernet/stmicro/stmmac/stmmac_main.c | 57 +++++++++++++++---- > > > 1 file changed, 45 insertions(+), 12 deletions(-) > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > index acd6994c1764..c5d293be8ab9 100644 > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > @@ -988,8 +988,8 @@ static void stmmac_mac_link_down(struct phylink_config *config, > > > struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > > > stmmac_mac_set(priv, priv->ioaddr, false); > > > - stmmac_eee_init(priv, false); > > > - stmmac_set_eee_pls(priv, priv->hw, false); > > > + if (priv->dma_cap.eee) > > > + stmmac_set_eee_pls(priv, priv->hw, false); > > > if (stmmac_fpe_supported(priv)) > > > stmmac_fpe_link_state_handle(priv, false); > > > @@ -1096,13 +1096,8 @@ static void stmmac_mac_link_up(struct phylink_config *config, > > > writel(ctrl, priv->ioaddr + MAC_CTRL_REG); > > > stmmac_mac_set(priv, priv->ioaddr, true); > > > - if (phy && priv->dma_cap.eee) { > > > - phy_eee_rx_clock_stop(phy, !(priv->plat->flags & > > > - STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)); > > > - priv->tx_lpi_timer = phy->eee_cfg.tx_lpi_timer; > > > - stmmac_eee_init(priv, phy->enable_tx_lpi); > > > + if (priv->dma_cap.eee) > > > stmmac_set_eee_pls(priv, priv->hw, true); > > > - } > > > if (stmmac_fpe_supported(priv)) > > > stmmac_fpe_link_state_handle(priv, true); > > > @@ -1111,12 +1106,32 @@ static void stmmac_mac_link_up(struct phylink_config *config, > > > stmmac_hwtstamp_correct_latency(priv, priv); > > > } > > > +static void stmmac_mac_disable_tx_lpi(struct phylink_config *config) > > > +{ > > > + struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > > > + > > > + stmmac_eee_init(priv, false); > > > +} > > > + > > > +static int stmmac_mac_enable_tx_lpi(struct phylink_config *config, u32 timer, > > > + bool tx_clk_stop) > > > +{ > > > + struct stmmac_priv *priv = netdev_priv(to_net_dev(config->dev)); > > > + > > > + priv->tx_lpi_timer = timer; > > > + stmmac_eee_init(priv, true); > > > + > > > + return 0; > > > +} > > > + > > > static const struct phylink_mac_ops stmmac_phylink_mac_ops = { > > > .mac_get_caps = stmmac_mac_get_caps, > > > .mac_select_pcs = stmmac_mac_select_pcs, > > > .mac_config = stmmac_mac_config, > > > .mac_link_down = stmmac_mac_link_down, > > > .mac_link_up = stmmac_mac_link_up, > > > + .mac_disable_tx_lpi = stmmac_mac_disable_tx_lpi, > > > + .mac_enable_tx_lpi = stmmac_mac_enable_tx_lpi, > > > }; > > > /** > > > @@ -1189,9 +1204,6 @@ static int stmmac_init_phy(struct net_device *dev) > > > return -ENODEV; > > > } > > > - if (priv->dma_cap.eee) > > > - phy_support_eee(phydev); > > > - > > > ret = phylink_connect_phy(priv->phylink, phydev); > > > } else { > > > fwnode_handle_put(phy_fwnode); > > > @@ -1201,7 +1213,12 @@ static int stmmac_init_phy(struct net_device *dev) > > > if (ret == 0) { > > > struct ethtool_keee eee; > > > - /* Configure phylib's copy of the LPI timer */ > > > + /* Configure phylib's copy of the LPI timer. Normally, > > > + * phylink_config.lpi_timer_default would do this, but there is > > > + * a chance that userspace could change the eee_timer setting > > > + * via sysfs before the first open. Thus, preserve existing > > > + * behaviour. > > > + */ > > > if (!phylink_ethtool_get_eee(priv->phylink, &eee)) { > > > eee.tx_lpi_timer = priv->tx_lpi_timer; > > > phylink_ethtool_set_eee(priv->phylink, &eee); > > > @@ -1234,6 +1251,9 @@ static int stmmac_phy_setup(struct stmmac_priv *priv) > > > /* Stmmac always requires an RX clock for hardware initialization */ > > > priv->phylink_config.mac_requires_rxc = true; > > > + if (!(priv->plat->flags & STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)) > > > + priv->phylink_config.eee_rx_clk_stop_enable = true; > > > + > > > mdio_bus_data = priv->plat->mdio_bus_data; > > > if (mdio_bus_data) > > > priv->phylink_config.default_an_inband = > > > @@ -1255,6 +1275,19 @@ static int stmmac_phy_setup(struct stmmac_priv *priv) > > > priv->phylink_config.supported_interfaces, > > > pcs->supported_interfaces); > > > + if (priv->dma_cap.eee) { > > > + /* Assume all supported interfaces also support LPI */ > > > + memcpy(priv->phylink_config.lpi_interfaces, > > > + priv->phylink_config.supported_interfaces, > > > + sizeof(priv->phylink_config.lpi_interfaces)); > > > + > > > + /* All full duplex speeds above 100Mbps are supported */ > > > + priv->phylink_config.lpi_capabilities = ~(MAC_1000FD - 1) | > > > + MAC_100FD; > > > + priv->phylink_config.lpi_timer_default = eee_timer * 1000; > > > + priv->phylink_config.eee_enabled_default = true; > > > + } > > > + > > > fwnode = priv->plat->port_node; > > > if (!fwnode) > > > fwnode = dev_fwnode(priv->device); > > > > > > I have been tracking down a suspend regression on Tegra186 and bisect is > > pointing to this change. If I revert this on top of v6.14-rc2 then > > suspend is working again. This is observed on the Jetson TX2 board > > (specifically tegra186-p2771-0000.dts). > > Thanks for the report. > > > This device is using NFS for testing. So it appears that for this board > > networking does not restart and the board hangs. Looking at the logs I > > do see this on resume ... > > > > [ 64.129079] dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma > > [ 64.133125] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed > > > > My first thought was if 'dma_cap.eee' is not supported for this device, > > but from what I can see it is and 'dma_cap.eee' is true. Here are some > > more details on this device regarding the ethernet controller. > > Could you see whether disabling EEE through ethtool (maybe first try > turning tx-lpi off before using the "eee off") to see whether that > makes any difference please? One thing that I'm wondering is - old code used to do: - phy_eee_rx_clock_stop(phy, !(priv->plat->flags & - STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)); The new code sets: + if (!(priv->plat->flags & STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)) + priv->phylink_config.eee_rx_clk_stop_enable = true; which does the same thing in phylink - phylink_bringup_phy() will call phy_eee_rx_clock_stop() when the PHY is attahed. So this happens at a different time. We know that stmmac_reset() can fail when the PHY receive clock is stopped - at least with some cores. So, I'm wondering whether I've inadvertently fixed another bug in stmmac which has uncovered a different bug - maybe the PHY clock must never be stopped even in LPI - or maybe we need to have a way of temporarily disabling the PHY's clock-stop ability during stmmac_reset(). In addition to what I asked previously, could you also intrument phy_eee_rx_clock_stop() and test before/after this patch to see (a) whether it gets called at all before this patch and (b) confirm the enable/disable state before and after. Thanks. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-13 12:00 ` Russell King (Oracle) @ 2025-02-14 10:58 ` Jon Hunter 2025-02-14 11:21 ` Russell King (Oracle) 0 siblings, 1 reply; 21+ messages in thread From: Jon Hunter @ 2025-02-14 10:58 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 13/02/2025 12:00, Russell King (Oracle) wrote: ... >>> I have been tracking down a suspend regression on Tegra186 and bisect is >>> pointing to this change. If I revert this on top of v6.14-rc2 then >>> suspend is working again. This is observed on the Jetson TX2 board >>> (specifically tegra186-p2771-0000.dts). >> >> Thanks for the report. >> >>> This device is using NFS for testing. So it appears that for this board >>> networking does not restart and the board hangs. Looking at the logs I >>> do see this on resume ... >>> >>> [ 64.129079] dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma >>> [ 64.133125] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed >>> >>> My first thought was if 'dma_cap.eee' is not supported for this device, >>> but from what I can see it is and 'dma_cap.eee' is true. Here are some >>> more details on this device regarding the ethernet controller. >> >> Could you see whether disabling EEE through ethtool (maybe first try >> turning tx-lpi off before using the "eee off") to see whether that >> makes any difference please? > > One thing that I'm wondering is - old code used to do: > > - phy_eee_rx_clock_stop(phy, !(priv->plat->flags & > - STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)); > > The new code sets: > > + if (!(priv->plat->flags & STMMAC_FLAG_RX_CLK_RUNS_IN_LPI)) > + priv->phylink_config.eee_rx_clk_stop_enable = true; > > which does the same thing in phylink - phylink_bringup_phy() will call > phy_eee_rx_clock_stop() when the PHY is attahed. So this happens at a > different time. > > We know that stmmac_reset() can fail when the PHY receive clock is > stopped - at least with some cores. > > So, I'm wondering whether I've inadvertently fixed another bug in stmmac > which has uncovered a different bug - maybe the PHY clock must never be > stopped even in LPI - or maybe we need to have a way of temporarily > disabling the PHY's clock-stop ability during stmmac_reset(). > > In addition to what I asked previously, could you also intrument > phy_eee_rx_clock_stop() and test before/after this patch to see > (a) whether it gets called at all before this patch and (b) confirm > the enable/disable state before and after. Thanks for the feedback. So ... 1. I can confirm that suspend works if I disable EEE via ethtool 2. Prior to this change I do see phy_eee_rx_clock_stop being called to enable the clock resuming from suspend, but after this change it is not. Prior to this change I see (note the prints around 389-392 are when we resume from suspend) ... [ 4.654454] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 0 [ 4.723123] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 7.629652] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 [ 389.086185] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 392.863744] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 After this change I see ... [ 4.644614] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 [ 4.679224] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 191.219828] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode So yes definitely related to the PHY clock. Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-14 10:58 ` Jon Hunter @ 2025-02-14 11:21 ` Russell King (Oracle) 2025-02-14 17:03 ` Jon Hunter 2025-02-19 14:01 ` Jon Hunter 0 siblings, 2 replies; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-14 11:21 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Fri, Feb 14, 2025 at 10:58:55AM +0000, Jon Hunter wrote: > Thanks for the feedback. So ... > > 1. I can confirm that suspend works if I disable EEE via ethtool > 2. Prior to this change I do see phy_eee_rx_clock_stop being called > to enable the clock resuming from suspend, but after this change > it is not. > > Prior to this change I see (note the prints around 389-392 are when > we resume from suspend) ... > > [ 4.654454] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 0 This is a bug in phylink - it shouldn't have been calling phy_eee_rx_clock_stop() where a MAC doesn't support phylink managed EEE. > [ 4.723123] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > [ 7.629652] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 Presumably, this is when the link comes up before suspend, so the PHY has been configured to allow the RX clock to be stopped prior to suspend > [ 389.086185] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > [ 392.863744] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 Presumably, as this is after resume, this is again when the link comes up (that's the only time that stmmac calls phy_eee_rx_clock_stop().) > After this change I see ... > > [ 4.644614] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > [ 4.679224] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > [ 191.219828] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode To me, this looks no different - the PHY was configured for clock stop before suspending in both cases. However, something else to verify with the old code - after boot and the link comes up (so you get the second phy_eee_rx_clock_stop() at 7s), try unplugging the link and re-plugging it. Then try suspending. The point of this test is to verify whether the PHY ignores changes to the RX clock stop configuration while the link is up. The next stage is to instrument dwmac4_set_eee_mode(), dwmac4_reset_eee_mode() and dwmac4_set_eee_lpi_entry_timer() to print the final register values in each function vs dwmac4_set_lpi_mode() in the new code. Also, I think instrumenting stmmac_common_interrupt() to print a message when we get either CORE_IRQ_TX_PATH_IN_LPI_MODE or CORE_IRQ_TX_PATH_EXIT_LPI_MODE indicating a change in LPI state would be a good idea. I'd like to see how this all ties up with suspend, resume, link up and down events, so please don't trim the log so much. Thanks. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-14 11:21 ` Russell King (Oracle) @ 2025-02-14 17:03 ` Jon Hunter 2025-02-19 14:01 ` Jon Hunter 1 sibling, 0 replies; 21+ messages in thread From: Jon Hunter @ 2025-02-14 17:03 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 14/02/2025 11:21, Russell King (Oracle) wrote: > On Fri, Feb 14, 2025 at 10:58:55AM +0000, Jon Hunter wrote: >> Thanks for the feedback. So ... >> >> 1. I can confirm that suspend works if I disable EEE via ethtool >> 2. Prior to this change I do see phy_eee_rx_clock_stop being called >> to enable the clock resuming from suspend, but after this change >> it is not. >> >> Prior to this change I see (note the prints around 389-392 are when >> we resume from suspend) ... >> >> [ 4.654454] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 0 > > This is a bug in phylink - it shouldn't have been calling > phy_eee_rx_clock_stop() where a MAC doesn't support phylink managed EEE. > >> [ 4.723123] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >> [ 7.629652] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > Presumably, this is when the link comes up before suspend, so the PHY > has been configured to allow the RX clock to be stopped prior to suspend > >> [ 389.086185] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >> [ 392.863744] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > Presumably, as this is after resume, this is again when the link comes > up (that's the only time that stmmac calls phy_eee_rx_clock_stop().) > >> After this change I see ... >> >> [ 4.644614] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 >> [ 4.679224] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >> [ 191.219828] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > > To me, this looks no different - the PHY was configured for clock stop > before suspending in both cases. > > However, something else to verify with the old code - after boot and the > link comes up (so you get the second phy_eee_rx_clock_stop() at 7s), > try unplugging the link and re-plugging it. Then try suspending. OK will do. I will have to try that when I am back in the office next week. > The point of this test is to verify whether the PHY ignores changes to > the RX clock stop configuration while the link is up. > > > > The next stage is to instrument dwmac4_set_eee_mode(), > dwmac4_reset_eee_mode() and dwmac4_set_eee_lpi_entry_timer() to print > the final register values in each function vs dwmac4_set_lpi_mode() in > the new code. Also, I think instrumenting stmmac_common_interrupt() to > print a message when we get either CORE_IRQ_TX_PATH_IN_LPI_MODE or > CORE_IRQ_TX_PATH_EXIT_LPI_MODE indicating a change in LPI state would > be a good idea. > > I'd like to see how this all ties up with suspend, resume, link up > and down events, so please don't trim the log so much. OK thanks. I will instrument these too and get some more logs. Cheers! Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-14 11:21 ` Russell King (Oracle) 2025-02-14 17:03 ` Jon Hunter @ 2025-02-19 14:01 ` Jon Hunter 2025-02-19 15:36 ` Russell King (Oracle) 1 sibling, 1 reply; 21+ messages in thread From: Jon Hunter @ 2025-02-19 14:01 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 14/02/2025 11:21, Russell King (Oracle) wrote: > On Fri, Feb 14, 2025 at 10:58:55AM +0000, Jon Hunter wrote: >> Thanks for the feedback. So ... >> >> 1. I can confirm that suspend works if I disable EEE via ethtool >> 2. Prior to this change I do see phy_eee_rx_clock_stop being called >> to enable the clock resuming from suspend, but after this change >> it is not. >> >> Prior to this change I see (note the prints around 389-392 are when >> we resume from suspend) ... >> >> [ 4.654454] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 0 > > This is a bug in phylink - it shouldn't have been calling > phy_eee_rx_clock_stop() where a MAC doesn't support phylink managed EEE. > >> [ 4.723123] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >> [ 7.629652] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > Presumably, this is when the link comes up before suspend, so the PHY > has been configured to allow the RX clock to be stopped prior to suspend > >> [ 389.086185] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >> [ 392.863744] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > Presumably, as this is after resume, this is again when the link comes > up (that's the only time that stmmac calls phy_eee_rx_clock_stop().) > >> After this change I see ... >> >> [ 4.644614] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 >> [ 4.679224] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >> [ 191.219828] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > > To me, this looks no different - the PHY was configured for clock stop > before suspending in both cases. > > However, something else to verify with the old code - after boot and the > link comes up (so you get the second phy_eee_rx_clock_stop() at 7s), > try unplugging the link and re-plugging it. Then try suspending. I still need to try this but I am still not back to the office to get to this. > The point of this test is to verify whether the PHY ignores changes to > the RX clock stop configuration while the link is up. > > > > The next stage is to instrument dwmac4_set_eee_mode(), > dwmac4_reset_eee_mode() and dwmac4_set_eee_lpi_entry_timer() to print > the final register values in each function vs dwmac4_set_lpi_mode() in > the new code. Also, I think instrumenting stmmac_common_interrupt() to > print a message when we get either CORE_IRQ_TX_PATH_IN_LPI_MODE or > CORE_IRQ_TX_PATH_EXIT_LPI_MODE indicating a change in LPI state would > be a good idea. > > I'd like to see how this all ties up with suspend, resume, link up > and down events, so please don't trim the log so much. I have been testing on top of v6.14-rc2 which does not have dwmac4_set_lpi_mode(). However, instrumenting the other functions, for a bad case I see ... [ 477.494226] PM: suspend entry (deep) [ 477.501869] Filesystems sync: 0.006 seconds [ 477.504518] Freezing user space processes [ 477.509067] Freezing user space processes completed (elapsed 0.001 seconds) [ 477.514770] OOM killer disabled. [ 477.517940] Freezing remaining freezable tasks [ 477.523449] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 477.566870] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC [ 477.586423] dwc-eth-dwmac 2490000.ethernet eth0: disable EEE [ 477.592052] dwmac4_set_eee_lpi_entry_timer: entered [ 477.596997] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0x0 [ 477.680193] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down [ 477.723771] Disabling non-boot CPUs ... [ 477.726898] psci: CPU5 killed (polled 0 ms) [ 477.731364] psci: CPU4 killed (polled 0 ms) [ 477.735439] psci: CPU3 killed (polled 0 ms) [ 477.740198] psci: CPU2 killed (polled 0 ms) [ 477.744220] psci: CPU1 killed (polled 0 ms) [ 477.748546] Enabling non-boot CPUs ... [ 477.751996] Detected PIPT I-cache on CPU1 [ 477.754800] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 [ 477.766211] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 [ 477.778379] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 [ 477.790231] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] [ 477.797726] CPU1 is up [ 477.799388] Detected PIPT I-cache on CPU2 [ 477.802952] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 [ 477.814415] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 [ 477.826537] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 [ 477.838440] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] [ 477.845865] CPU2 is up [ 477.847325] Detected PIPT I-cache on CPU3 [ 477.851136] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] [ 477.857958] CPU3 is up [ 477.860108] Detected PIPT I-cache on CPU4 [ 477.863949] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] [ 477.870714] CPU4 is up [ 477.872933] Detected PIPT I-cache on CPU5 [ 477.876778] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] [ 477.883556] CPU5 is up [ 477.985628] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 477.993771] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_enable_tx_lpi: tx_lpi_timer 1000000 [ 478.171396] dwmac4: Master AXI performs any burst length [ 478.174480] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found [ 478.181934] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported [ 478.202977] dwmac4_set_eee_lpi_entry_timer: entered [ 478.207918] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0xf4240 [ 478.287646] dwc-eth-dwmac 2490000.ethernet eth0: Energy-Efficient Ethernet initialized [ 478.295538] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 478.372819] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: repeated role: device [ 478.382118] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC [ 478.410458] OOM killer enabled. [ 478.411350] Restarting tasks ... done. [ 478.415459] VDDIO_SDMMC3_AP: voltage operation not allowed [ 478.417763] random: crng reseeded on system resumption [ 478.425747] PM: suspend exit [ 478.474698] VDDIO_SDMMC3_AP: voltage operation not allowed [ 478.533428] VDDIO_SDMMC3_AP: voltage operation not allowed [ 478.600368] VDDIO_SDMMC3_AP: voltage operation not allowed For a good case I see ... [ 28.548472] PM: suspend entry (deep) [ 28.560503] Filesystems sync: 0.010 seconds [ 28.563622] Freezing user space processes [ 28.567838] Freezing user space processes completed (elapsed 0.001 seconds) [ 28.573380] OOM killer disabled. [ 28.576563] Freezing remaining freezable tasks [ 28.582100] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 28.627180] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC [ 28.646770] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down [ 28.690432] Disabling non-boot CPUs ... [ 28.693397] psci: CPU5 killed (polled 4 ms) [ 28.697401] psci: CPU4 killed (polled 4 ms) [ 28.702649] psci: CPU3 killed (polled 0 ms) [ 28.707369] psci: CPU2 killed (polled 0 ms) [ 28.711482] psci: CPU1 killed (polled 0 ms) [ 28.970011] Enabling non-boot CPUs ... [ 28.974751] Detected PIPT I-cache on CPU1 [ 28.977635] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 [ 28.989014] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 [ 29.001163] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 [ 29.013015] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] [ 29.020526] CPU1 is up [ 29.022168] Detected PIPT I-cache on CPU2 [ 29.025747] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 [ 29.037182] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 [ 29.049328] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 [ 29.061196] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] [ 29.068779] CPU2 is up [ 29.070095] Detected PIPT I-cache on CPU3 [ 29.073916] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] [ 29.080749] CPU3 is up [ 29.082898] Detected PIPT I-cache on CPU4 [ 29.086729] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] [ 29.093496] CPU4 is up [ 29.095715] Detected PIPT I-cache on CPU5 [ 29.099556] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] [ 29.106351] CPU5 is up [ 29.218549] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 29.234190] dwmac4: Master AXI performs any burst length [ 29.237263] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found [ 29.244732] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported [ 29.267679] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: repeated role: device [ 29.270504] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC [ 29.306091] OOM killer enabled. [ 29.306981] Restarting tasks ... done. [ 29.311423] VDDIO_SDMMC3_AP: voltage operation not allowed [ 29.314095] random: crng reseeded on system resumption [ 29.321404] PM: suspend exit [ 29.370286] VDDIO_SDMMC3_AP: voltage operation not allowed [ 29.429655] VDDIO_SDMMC3_AP: voltage operation not allowed [ 29.496567] VDDIO_SDMMC3_AP: voltage operation not allowed [ 32.968855] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 [ 32.974779] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_link_up: tx_lpi_timer 1000000 [ 32.988755] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx The more I have been testing, the more I feel that this is timing related. In good cases, I see the MAC link coming up well after the PHY. Even with your change I did see suspend work on occassion and when it does I see ... [ 79.775977] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 79.784196] dwmac4: Master AXI performs any burst length [ 79.787280] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found [ 79.794736] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported [ 79.816642] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: repeated role: device [ 79.820437] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC [ 79.854481] OOM killer enabled. [ 79.855372] Restarting tasks ... done. [ 79.859460] VDDIO_SDMMC3_AP: voltage operation not allowed [ 79.861297] random: crng reseeded on system resumption [ 79.869773] PM: suspend exit [ 79.914909] VDDIO_SDMMC3_AP: voltage operation not allowed [ 79.974322] VDDIO_SDMMC3_AP: voltage operation not allowed [ 80.041236] VDDIO_SDMMC3_AP: voltage operation not allowed [ 83.547730] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_enable_tx_lpi: tx_lpi_timer 1000000 [ 83.566859] dwmac4_set_eee_lpi_entry_timer: entered [ 83.571782] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0xf4240 [ 83.651520] dwc-eth-dwmac 2490000.ethernet eth0: Energy-Efficient Ethernet initialized [ 83.659425] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx On a good case, the stmmac_mac_enable_tx_lpi call always happens much later. It seems that after this change it is more often that the link is coming up sooner and I guess probably too soon. May be we were getting lucky before? Anyway, I made the following change for testing and this is working ... diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index b34ebb916b89..44187e230a1e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7906,16 +7906,6 @@ int stmmac_resume(struct device *dev) return ret; } - rtnl_lock(); - if (device_may_wakeup(priv->device) && priv->plat->pmt) { - phylink_resume(priv->phylink); - } else { - phylink_resume(priv->phylink); - if (device_may_wakeup(priv->device)) - phylink_speed_up(priv->phylink); - } - rtnl_unlock(); - rtnl_lock(); mutex_lock(&priv->lock); @@ -7930,6 +7920,13 @@ int stmmac_resume(struct device *dev) stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); + if (device_may_wakeup(priv->device) && priv->plat->pmt) { + phylink_resume(priv->phylink); + } else { + phylink_resume(priv->phylink); + if (device_may_wakeup(priv->device)) + phylink_speed_up(priv->phylink); + } stmmac_enable_all_queues(priv); stmmac_enable_all_dma_irq(priv); I noticed that in __stmmac_open() the phylink_start() is called after stmmac_hw_setup and stmmac_init_coalesce, where as in stmmac_resume, phylink_resume() is called before these. I am not saying that this is correct in any way, but seems to indicate that the PHY is coming up too soon (at least for this device). I have ran 100 suspend iterations with the above and I have not seen any failures. Let me know if you have any thoughts on this. Thanks! Jon -- nvpublic ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-19 14:01 ` Jon Hunter @ 2025-02-19 15:36 ` Russell King (Oracle) 2025-02-19 17:52 ` Jon Hunter 0 siblings, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-19 15:36 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Wed, Feb 19, 2025 at 02:01:34PM +0000, Jon Hunter wrote: > > On 14/02/2025 11:21, Russell King (Oracle) wrote: > > On Fri, Feb 14, 2025 at 10:58:55AM +0000, Jon Hunter wrote: > > > Thanks for the feedback. So ... > > > > > > 1. I can confirm that suspend works if I disable EEE via ethtool > > > 2. Prior to this change I do see phy_eee_rx_clock_stop being called > > > to enable the clock resuming from suspend, but after this change > > > it is not. > > > > > > Prior to this change I see (note the prints around 389-392 are when > > > we resume from suspend) ... > > > > > > [ 4.654454] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 0 > > > > This is a bug in phylink - it shouldn't have been calling > > phy_eee_rx_clock_stop() where a MAC doesn't support phylink managed EEE. > > > > > [ 4.723123] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > > > [ 7.629652] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > > > Presumably, this is when the link comes up before suspend, so the PHY > > has been configured to allow the RX clock to be stopped prior to suspend > > > > > [ 389.086185] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > > > [ 392.863744] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > > > Presumably, as this is after resume, this is again when the link comes > > up (that's the only time that stmmac calls phy_eee_rx_clock_stop().) > > > > > After this change I see ... > > > > > > [ 4.644614] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > > [ 4.679224] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > > > [ 191.219828] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > > > > To me, this looks no different - the PHY was configured for clock stop > > before suspending in both cases. > > > > However, something else to verify with the old code - after boot and the > > link comes up (so you get the second phy_eee_rx_clock_stop() at 7s), > > try unplugging the link and re-plugging it. Then try suspending. > > I still need to try this but I am still not back to the office to get to this. > > The point of this test is to verify whether the PHY ignores changes to > > the RX clock stop configuration while the link is up. > > > > > > > > The next stage is to instrument dwmac4_set_eee_mode(), > > dwmac4_reset_eee_mode() and dwmac4_set_eee_lpi_entry_timer() to print > > the final register values in each function vs dwmac4_set_lpi_mode() in > > the new code. Also, I think instrumenting stmmac_common_interrupt() to > > print a message when we get either CORE_IRQ_TX_PATH_IN_LPI_MODE or > > CORE_IRQ_TX_PATH_EXIT_LPI_MODE indicating a change in LPI state would > > be a good idea. > > > > I'd like to see how this all ties up with suspend, resume, link up > > and down events, so please don't trim the log so much. > > I have been testing on top of v6.14-rc2 which does not have > dwmac4_set_lpi_mode(). However, instrumenting the other functions, > for a bad case I see ... > > [ 477.494226] PM: suspend entry (deep) > [ 477.501869] Filesystems sync: 0.006 seconds > [ 477.504518] Freezing user space processes > [ 477.509067] Freezing user space processes completed (elapsed 0.001 seconds) > [ 477.514770] OOM killer disabled. > [ 477.517940] Freezing remaining freezable tasks > [ 477.523449] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) > [ 477.566870] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC > [ 477.586423] dwc-eth-dwmac 2490000.ethernet eth0: disable EEE > [ 477.592052] dwmac4_set_eee_lpi_entry_timer: entered > [ 477.596997] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0x0 > [ 477.680193] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down This tells me WoL is not enabled, and thus phylink_suspend() did a phylink_stop() which took the link administratively down and disabled LPI at the MAC. The actual physical link on the media may still be up at this point, and the remote end may still signal LPI to the local PHY. ...system suspends and resumes... > [ 477.876778] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] > [ 477.883556] CPU5 is up stmmac_resume() gets called here, which will call into phylink_resume() and, because WoL wasn't used at suspend time, will call phylink_start() which immediately prints: > [ 477.985628] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode and then it allows the phylink resolver to run in a separate workqueue. The output from the phylink resolver thread, I'll label as "^WQ". Messages from the thread that called stmmac_resume() I'll labell with "^RES". > [ 477.993771] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_enable_tx_lpi: tx_lpi_timer 1000000 ^WQ At this point, the workqueue has called mac_link_up() and this indicates that that method has completed and it's now calling mac_enable_tx_lpi(). In other words, the transmitter and receiver have been enabled here! This is key... > [ 478.171396] dwmac4: Master AXI performs any burst length ^RES dwmac4_dma_axi(), which is called from stmmac_init_dma_engine() which then goes on to call stmmac_reset(). As noted above, however, the MAC has had its transmitter and receiver enabled at this point, so hitting the hardware with a reset probably undoes all that. stmmac_init_dma_engine() is called from stmmac_hw_setup() and stmmac_resume() _after_ calling phylink_resume(). > [ 478.174480] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found ^RES Printed by stmmac_safety_feat_configuration() which is called from stmmac_hw_setup(). > [ 478.181934] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported ^RES Printed by stmmac_init_ptp() called from stmmac_hw_setup(). > [ 478.202977] dwmac4_set_eee_lpi_entry_timer: entered ^WQ > [ 478.207918] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0xf4240 ^WQ > [ 478.287646] dwc-eth-dwmac 2490000.ethernet eth0: Energy-Efficient Ethernet initialized ^WQ > [ 478.295538] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx ^WQ So clearly the phylink resolver is racing with the rest of the stmmac resume path - which doesn't surprise me in the least. I believe I raised the fact that calling phylink_resume() before the hardware was ready to handle link-up is a bad idea precisely because of races like this. The reason stmmac does this is because of it's quirk that it needs the receive clock from the PHY in order for stmmac_reset() to work. > For a good case I see ... > > [ 28.548472] PM: suspend entry (deep) > [ 28.560503] Filesystems sync: 0.010 seconds > [ 28.563622] Freezing user space processes > [ 28.567838] Freezing user space processes completed (elapsed 0.001 seconds) > [ 28.573380] OOM killer disabled. > [ 28.576563] Freezing remaining freezable tasks > [ 28.582100] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) > [ 28.627180] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC > [ 28.646770] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down Same as above... ...system suspends and resumes... > [ 29.099556] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] > [ 29.106351] CPU5 is up > [ 29.218549] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode ^RES > [ 29.234190] dwmac4: Master AXI performs any burst length ^RES > [ 29.237263] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found ^RES > [ 29.244732] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported ^RES > [ 29.306981] Restarting tasks ... done. > [ 29.311423] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 29.314095] random: crng reseeded on system resumption > [ 29.321404] PM: suspend exit > [ 29.370286] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 29.429655] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 29.496567] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 32.968855] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 ^WQ > [ 32.974779] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_link_up: tx_lpi_timer 1000000 ^WQ > [ 32.988755] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx ^WQ So here, phylink_resolve() runs later. I think if you run this same test with an earlier kernel, you'll get much the same random behaviour, maybe with different weightings on "success" and "failure" because of course the code has changed - but only because that's caused a change in timings of the already present race. > The more I have been testing, the more I feel that this is timing > related. In good cases, I see the MAC link coming up well after the > PHY. Even with your change I did see suspend work on occassion and > when it does I see ... > > [ 79.775977] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > [ 79.784196] dwmac4: Master AXI performs any burst length > [ 79.787280] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found > [ 79.794736] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported > [ 79.816642] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: repeated role: device > [ 79.820437] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC > [ 79.854481] OOM killer enabled. > [ 79.855372] Restarting tasks ... done. > [ 79.859460] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 79.861297] random: crng reseeded on system resumption > [ 79.869773] PM: suspend exit > [ 79.914909] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 79.974322] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 80.041236] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 83.547730] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_enable_tx_lpi: tx_lpi_timer 1000000 > [ 83.566859] dwmac4_set_eee_lpi_entry_timer: entered > [ 83.571782] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0xf4240 > [ 83.651520] dwc-eth-dwmac 2490000.ethernet eth0: Energy-Efficient Ethernet initialized > [ 83.659425] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx > > On a good case, the stmmac_mac_enable_tx_lpi call always happens > much later. It seems that after this change it is more often > that the link is coming up sooner and I guess probably too soon. > May be we were getting lucky before? I think this is pure lottery. > Anyway, I made the following change for testing and this is > working ... > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index b34ebb916b89..44187e230a1e 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -7906,16 +7906,6 @@ int stmmac_resume(struct device *dev) > return ret; > } > - rtnl_lock(); > - if (device_may_wakeup(priv->device) && priv->plat->pmt) { > - phylink_resume(priv->phylink); > - } else { > - phylink_resume(priv->phylink); > - if (device_may_wakeup(priv->device)) > - phylink_speed_up(priv->phylink); > - } > - rtnl_unlock(); > - > rtnl_lock(); > mutex_lock(&priv->lock); > @@ -7930,6 +7920,13 @@ int stmmac_resume(struct device *dev) > stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); > + if (device_may_wakeup(priv->device) && priv->plat->pmt) { > + phylink_resume(priv->phylink); > + } else { > + phylink_resume(priv->phylink); > + if (device_may_wakeup(priv->device)) > + phylink_speed_up(priv->phylink); > + } > stmmac_enable_all_queues(priv); > stmmac_enable_all_dma_irq(priv); > > I noticed that in __stmmac_open() the phylink_start() is > called after stmmac_hw_setup and stmmac_init_coalesce, where > as in stmmac_resume, phylink_resume() is called before these. > I am not saying that this is correct in any way, but seems > to indicate that the PHY is coming up too soon (at least for > this device). I have ran 100 suspend iterations with the above > and I have not seen any failures. > > Let me know if you have any thoughts on this. With my phylink-maintainer hat on, this is definitely the correct solution - maybe even moving the phylink_resume() call later. phylink_resume() should only be called when the driver is prepared to handle and cope with an immediate call to the mac_link_up() method, and it's clear that its current placement is such that the driver isn't prepared for that. However... see: 36d18b5664ef ("net: stmmac: start phylink instance before stmmac_hw_setup()") but I also questioned this in: https://lore.kernel.org/netdev/20210903080147.GS22278@shell.armlinux.org.uk/ see the bottom of that email starting "While reading stmmac_resume(), I have to question the placement of this code block:". The response was: "There is a story here, SNPS EQOS IP need PHY provides RXC clock for MAC's receive logic, so we need phylink_start() to bring PHY link up, that make PHY resume back, PHY could stop RXC clock when in suspended state. This is the reason why calling phylink_start() before re-config MAC." However, in 21d9ba5bc551 ("net: phylink: add PHY_F_RXC_ALWAYS_ON to PHY dev flags") and associated patches, I added a way that phylib can be told that the MAC requires the RXC at all times. Romain Gantois arranged for this flag to always be set for stmmac in commit 58329b03a595 ("net: stmmac: Signal to PHY/PCS drivers to keep RX clock on"), which will pass PHY_F_RXC_ALWAYS_ON to the PHY driver. Whether the PHY driver honours this flag or not depends on which driver is used. So, my preference would be to move phylink_resume() later, removing the race condition. If there's any regressions, then we need to _properly_ solve them by ensuring that the PHY keeps the RX clock running by honouring PHY_F_RXC_ALWAYS_ON. That's going to need everyone to test their stmmac platforms to find all the cases that need fixing... -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-19 15:36 ` Russell King (Oracle) @ 2025-02-19 17:52 ` Jon Hunter 2025-02-19 19:13 ` Russell King (Oracle) 0 siblings, 1 reply; 21+ messages in thread From: Jon Hunter @ 2025-02-19 17:52 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 19/02/2025 15:36, Russell King (Oracle) wrote: > On Wed, Feb 19, 2025 at 02:01:34PM +0000, Jon Hunter wrote: >> >> On 14/02/2025 11:21, Russell King (Oracle) wrote: >>> On Fri, Feb 14, 2025 at 10:58:55AM +0000, Jon Hunter wrote: >>>> Thanks for the feedback. So ... >>>> >>>> 1. I can confirm that suspend works if I disable EEE via ethtool >>>> 2. Prior to this change I do see phy_eee_rx_clock_stop being called >>>> to enable the clock resuming from suspend, but after this change >>>> it is not. >>>> >>>> Prior to this change I see (note the prints around 389-392 are when >>>> we resume from suspend) ... >>>> >>>> [ 4.654454] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 0 >>> >>> This is a bug in phylink - it shouldn't have been calling >>> phy_eee_rx_clock_stop() where a MAC doesn't support phylink managed EEE. >>> >>>> [ 4.723123] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >>>> [ 7.629652] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 >>> >>> Presumably, this is when the link comes up before suspend, so the PHY >>> has been configured to allow the RX clock to be stopped prior to suspend >>> >>>> [ 389.086185] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >>>> [ 392.863744] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 >>> >>> Presumably, as this is after resume, this is again when the link comes >>> up (that's the only time that stmmac calls phy_eee_rx_clock_stop().) >>> >>>> After this change I see ... >>>> >>>> [ 4.644614] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 >>>> [ 4.679224] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >>>> [ 191.219828] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >>> >>> To me, this looks no different - the PHY was configured for clock stop >>> before suspending in both cases. >>> >>> However, something else to verify with the old code - after boot and the >>> link comes up (so you get the second phy_eee_rx_clock_stop() at 7s), >>> try unplugging the link and re-plugging it. Then try suspending. >> >> I still need to try this but I am still not back to the office to get to this. >> > The point of this test is to verify whether the PHY ignores changes to >>> the RX clock stop configuration while the link is up. >>> >>> >>> >>> The next stage is to instrument dwmac4_set_eee_mode(), >>> dwmac4_reset_eee_mode() and dwmac4_set_eee_lpi_entry_timer() to print >>> the final register values in each function vs dwmac4_set_lpi_mode() in >>> the new code. Also, I think instrumenting stmmac_common_interrupt() to >>> print a message when we get either CORE_IRQ_TX_PATH_IN_LPI_MODE or >>> CORE_IRQ_TX_PATH_EXIT_LPI_MODE indicating a change in LPI state would >>> be a good idea. >>> >>> I'd like to see how this all ties up with suspend, resume, link up >>> and down events, so please don't trim the log so much. >> >> I have been testing on top of v6.14-rc2 which does not have >> dwmac4_set_lpi_mode(). However, instrumenting the other functions, >> for a bad case I see ... >> >> [ 477.494226] PM: suspend entry (deep) >> [ 477.501869] Filesystems sync: 0.006 seconds >> [ 477.504518] Freezing user space processes >> [ 477.509067] Freezing user space processes completed (elapsed 0.001 seconds) >> [ 477.514770] OOM killer disabled. >> [ 477.517940] Freezing remaining freezable tasks >> [ 477.523449] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) >> [ 477.566870] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC >> [ 477.586423] dwc-eth-dwmac 2490000.ethernet eth0: disable EEE >> [ 477.592052] dwmac4_set_eee_lpi_entry_timer: entered >> [ 477.596997] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0x0 >> [ 477.680193] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down > > This tells me WoL is not enabled, and thus phylink_suspend() did a > phylink_stop() which took the link administratively down and disabled > LPI at the MAC. The actual physical link on the media may still be up > at this point, and the remote end may still signal LPI to the local > PHY. > > ...system suspends and resumes... >> [ 477.876778] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] >> [ 477.883556] CPU5 is up > > stmmac_resume() gets called here, which will call into phylink_resume() > and, because WoL wasn't used at suspend time, will call phylink_start() > which immediately prints: > >> [ 477.985628] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > > and then it allows the phylink resolver to run in a separate workqueue. > The output from the phylink resolver thread, I'll label as "^WQ". > Messages from the thread that called stmmac_resume() I'll labell with > "^RES". > >> [ 477.993771] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_enable_tx_lpi: tx_lpi_timer 1000000 > ^WQ > > At this point, the workqueue has called mac_link_up() and this indicates > that that method has completed and it's now calling mac_enable_tx_lpi(). > In other words, the transmitter and receiver have been enabled here! > This is key... > >> [ 478.171396] dwmac4: Master AXI performs any burst length > ^RES > > dwmac4_dma_axi(), which is called from stmmac_init_dma_engine() which > then goes on to call stmmac_reset(). As noted above, however, the > MAC has had its transmitter and receiver enabled at this point, so > hitting the hardware with a reset probably undoes all that. > stmmac_init_dma_engine() is called from stmmac_hw_setup() and > stmmac_resume() _after_ calling phylink_resume(). > >> [ 478.174480] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found > ^RES > > Printed by stmmac_safety_feat_configuration() which is called from > stmmac_hw_setup(). > >> [ 478.181934] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported > ^RES > > Printed by stmmac_init_ptp() called from stmmac_hw_setup(). > >> [ 478.202977] dwmac4_set_eee_lpi_entry_timer: entered > ^WQ >> [ 478.207918] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0xf4240 > ^WQ >> [ 478.287646] dwc-eth-dwmac 2490000.ethernet eth0: Energy-Efficient Ethernet initialized > ^WQ >> [ 478.295538] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx > ^WQ > > So clearly the phylink resolver is racing with the rest of the stmmac > resume path - which doesn't surprise me in the least. I believe I raised > the fact that calling phylink_resume() before the hardware was ready to > handle link-up is a bad idea precisely because of races like this. > > The reason stmmac does this is because of it's quirk that it needs the > receive clock from the PHY in order for stmmac_reset() to work. I do see the reset fail infrequently on previous kernels with this device and when it does I see these messages ... dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine initialization failed >> For a good case I see ... >> >> [ 28.548472] PM: suspend entry (deep) >> [ 28.560503] Filesystems sync: 0.010 seconds >> [ 28.563622] Freezing user space processes >> [ 28.567838] Freezing user space processes completed (elapsed 0.001 seconds) >> [ 28.573380] OOM killer disabled. >> [ 28.576563] Freezing remaining freezable tasks >> [ 28.582100] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) >> [ 28.627180] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC >> [ 28.646770] dwc-eth-dwmac 2490000.ethernet eth0: Link is Down > > Same as above... > > ...system suspends and resumes... >> [ 29.099556] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] >> [ 29.106351] CPU5 is up >> [ 29.218549] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode > ^RES >> [ 29.234190] dwmac4: Master AXI performs any burst length > ^RES >> [ 29.237263] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found > ^RES >> [ 29.244732] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported > ^RES >> [ 29.306981] Restarting tasks ... done. >> [ 29.311423] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 29.314095] random: crng reseeded on system resumption >> [ 29.321404] PM: suspend exit >> [ 29.370286] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 29.429655] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 29.496567] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 32.968855] Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > ^WQ >> [ 32.974779] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_link_up: tx_lpi_timer 1000000 > ^WQ >> [ 32.988755] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx > ^WQ > > So here, phylink_resolve() runs later. > > I think if you run this same test with an earlier kernel, you'll get > much the same random behaviour, maybe with different weightings on > "success" and "failure" because of course the code has changed - but > only because that's caused a change in timings of the already present > race. > >> The more I have been testing, the more I feel that this is timing >> related. In good cases, I see the MAC link coming up well after the >> PHY. Even with your change I did see suspend work on occassion and >> when it does I see ... >> >> [ 79.775977] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode >> [ 79.784196] dwmac4: Master AXI performs any burst length >> [ 79.787280] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found >> [ 79.794736] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported >> [ 79.816642] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: repeated role: device >> [ 79.820437] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC >> [ 79.854481] OOM killer enabled. >> [ 79.855372] Restarting tasks ... done. >> [ 79.859460] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 79.861297] random: crng reseeded on system resumption >> [ 79.869773] PM: suspend exit >> [ 79.914909] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 79.974322] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 80.041236] VDDIO_SDMMC3_AP: voltage operation not allowed >> [ 83.547730] dwc-eth-dwmac 2490000.ethernet eth0: stmmac_mac_enable_tx_lpi: tx_lpi_timer 1000000 >> [ 83.566859] dwmac4_set_eee_lpi_entry_timer: entered >> [ 83.571782] dwmac4_set_eee_lpi_entry_timer: GMAC4_LPI_CTRL_STATUS 0xf4240 >> [ 83.651520] dwc-eth-dwmac 2490000.ethernet eth0: Energy-Efficient Ethernet initialized >> [ 83.659425] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx >> >> On a good case, the stmmac_mac_enable_tx_lpi call always happens >> much later. It seems that after this change it is more often >> that the link is coming up sooner and I guess probably too soon. >> May be we were getting lucky before? > > I think this is pure lottery. Yes it does appear to be. >> Anyway, I made the following change for testing and this is >> working ... >> >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> index b34ebb916b89..44187e230a1e 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> @@ -7906,16 +7906,6 @@ int stmmac_resume(struct device *dev) >> return ret; >> } >> - rtnl_lock(); >> - if (device_may_wakeup(priv->device) && priv->plat->pmt) { >> - phylink_resume(priv->phylink); >> - } else { >> - phylink_resume(priv->phylink); >> - if (device_may_wakeup(priv->device)) >> - phylink_speed_up(priv->phylink); >> - } >> - rtnl_unlock(); >> - >> rtnl_lock(); >> mutex_lock(&priv->lock); >> @@ -7930,6 +7920,13 @@ int stmmac_resume(struct device *dev) >> stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); >> + if (device_may_wakeup(priv->device) && priv->plat->pmt) { >> + phylink_resume(priv->phylink); >> + } else { >> + phylink_resume(priv->phylink); >> + if (device_may_wakeup(priv->device)) >> + phylink_speed_up(priv->phylink); >> + } >> stmmac_enable_all_queues(priv); >> stmmac_enable_all_dma_irq(priv); >> >> I noticed that in __stmmac_open() the phylink_start() is >> called after stmmac_hw_setup and stmmac_init_coalesce, where >> as in stmmac_resume, phylink_resume() is called before these. >> I am not saying that this is correct in any way, but seems >> to indicate that the PHY is coming up too soon (at least for >> this device). I have ran 100 suspend iterations with the above >> and I have not seen any failures. >> >> Let me know if you have any thoughts on this. > > With my phylink-maintainer hat on, this is definitely the correct > solution - maybe even moving the phylink_resume() call later. > phylink_resume() should only be called when the driver is prepared > to handle and cope with an immediate call to the mac_link_up() > method, and it's clear that its current placement is such that the > driver isn't prepared for that. > > However... see: > > 36d18b5664ef ("net: stmmac: start phylink instance before stmmac_hw_setup()") > > but I also questioned this in: > > https://lore.kernel.org/netdev/20210903080147.GS22278@shell.armlinux.org.uk/ > > see the bottom of that email starting "While reading stmmac_resume(), I > have to question the placement of this code block:". The response was: > > "There is a story here, SNPS EQOS IP need PHY provides RXC clock for > MAC's receive logic, so we need phylink_start() to bring PHY link up, > that make PHY resume back, PHY could stop RXC clock when in suspended > state. This is the reason why calling phylink_start() before re-config > MAC." > > However, in 21d9ba5bc551 ("net: phylink: add PHY_F_RXC_ALWAYS_ON to PHY > dev flags") and associated patches, I added a way that phylib can be > told that the MAC requires the RXC at all times. > > Romain Gantois arranged for this flag to always be set for stmmac in > commit 58329b03a595 ("net: stmmac: Signal to PHY/PCS drivers to keep RX > clock on"), which will pass PHY_F_RXC_ALWAYS_ON to the PHY driver. > Whether the PHY driver honours this flag or not depends on which > driver is used. > > So, my preference would be to move phylink_resume() later, removing > the race condition. If there's any regressions, then we need to > _properly_ solve them by ensuring that the PHY keeps the RX clock > running by honouring PHY_F_RXC_ALWAYS_ON. That's going to need > everyone to test their stmmac platforms to find all the cases that > need fixing... Thanks for the in-depth analysis and feedback. We have 3 SoCs that use this driver and so I will do some testing with this change on all of them. Thanks again. Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-19 17:52 ` Jon Hunter @ 2025-02-19 19:13 ` Russell King (Oracle) 2025-02-19 20:05 ` Jon Hunter 0 siblings, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-19 19:13 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Wed, Feb 19, 2025 at 05:52:34PM +0000, Jon Hunter wrote: > On 19/02/2025 15:36, Russell King (Oracle) wrote: > > So clearly the phylink resolver is racing with the rest of the stmmac > > resume path - which doesn't surprise me in the least. I believe I raised > > the fact that calling phylink_resume() before the hardware was ready to > > handle link-up is a bad idea precisely because of races like this. > > > > The reason stmmac does this is because of it's quirk that it needs the > > receive clock from the PHY in order for stmmac_reset() to work. > > I do see the reset fail infrequently on previous kernels with this device > and when it does I see these messages ... > > dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma > dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine > initialization failed I wonder whether it's also racing with phylib, but phylink_resume() calling phylink_start() going in to call phy_start() is all synchronous. That causes __phy_resume() to be called. Which PHY device/driver is being used? > > So, my preference would be to move phylink_resume() later, removing > > the race condition. If there's any regressions, then we need to > > _properly_ solve them by ensuring that the PHY keeps the RX clock > > running by honouring PHY_F_RXC_ALWAYS_ON. That's going to need > > everyone to test their stmmac platforms to find all the cases that > > need fixing... > > Thanks for the in-depth analysis and feedback. We have 3 SoCs that use this > driver and so I will do some testing with this change on all of them. Thanks! -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-19 19:13 ` Russell King (Oracle) @ 2025-02-19 20:05 ` Jon Hunter 2025-02-19 20:57 ` Russell King (Oracle) 0 siblings, 1 reply; 21+ messages in thread From: Jon Hunter @ 2025-02-19 20:05 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 19/02/2025 19:13, Russell King (Oracle) wrote: > On Wed, Feb 19, 2025 at 05:52:34PM +0000, Jon Hunter wrote: >> On 19/02/2025 15:36, Russell King (Oracle) wrote: >>> So clearly the phylink resolver is racing with the rest of the stmmac >>> resume path - which doesn't surprise me in the least. I believe I raised >>> the fact that calling phylink_resume() before the hardware was ready to >>> handle link-up is a bad idea precisely because of races like this. >>> >>> The reason stmmac does this is because of it's quirk that it needs the >>> receive clock from the PHY in order for stmmac_reset() to work. >> >> I do see the reset fail infrequently on previous kernels with this device >> and when it does I see these messages ... >> >> dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma >> dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine >> initialization failed > > I wonder whether it's also racing with phylib, but phylink_resume() > calling phylink_start() going in to call phy_start() is all synchronous. > That causes __phy_resume() to be called. > > Which PHY device/driver is being used? Looks like it is this Broadcom driver ... Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-19 20:05 ` Jon Hunter @ 2025-02-19 20:57 ` Russell King (Oracle) 2025-02-25 14:21 ` Jon Hunter 0 siblings, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-19 20:57 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Wed, Feb 19, 2025 at 08:05:57PM +0000, Jon Hunter wrote: > On 19/02/2025 19:13, Russell King (Oracle) wrote: > > On Wed, Feb 19, 2025 at 05:52:34PM +0000, Jon Hunter wrote: > > > On 19/02/2025 15:36, Russell King (Oracle) wrote: > > > > So clearly the phylink resolver is racing with the rest of the stmmac > > > > resume path - which doesn't surprise me in the least. I believe I raised > > > > the fact that calling phylink_resume() before the hardware was ready to > > > > handle link-up is a bad idea precisely because of races like this. > > > > > > > > The reason stmmac does this is because of it's quirk that it needs the > > > > receive clock from the PHY in order for stmmac_reset() to work. > > > > > > I do see the reset fail infrequently on previous kernels with this device > > > and when it does I see these messages ... > > > > > > dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma > > > dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine > > > initialization failed > > > > I wonder whether it's also racing with phylib, but phylink_resume() > > calling phylink_start() going in to call phy_start() is all synchronous. > > That causes __phy_resume() to be called. > > > > Which PHY device/driver is being used? > > > Looks like it is this Broadcom driver ... > > Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 I don't see anything special happening in the PHY driver - it doesn't implement suspend/resume/config_aneg methods, so there's nothing going on with clocks in that driver beyond generic stuff. So, let's try something (I haven't tested this, and its likely you will need to work it in to your other change.) Essentially, this disables the receive clock stop around the reset, something the stmmac driver has never done in the past. diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 1cbea627b216..8e975863a2e3 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) rtnl_lock(); mutex_lock(&priv->lock); + phy_eee_rx_clock_stop(priv->dev->phydev, false); + stmmac_reset_queues_param(priv); stmmac_free_tx_skbufs(priv); @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); + phy_eee_rx_clock_stop(priv->dev->phydev, + priv->phylink_config.eee_rx_clk_stop_enable); + stmmac_enable_all_queues(priv); stmmac_enable_all_dma_irq(priv); -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-19 20:57 ` Russell King (Oracle) @ 2025-02-25 14:21 ` Jon Hunter 2025-02-26 10:02 ` Russell King (Oracle) 0 siblings, 1 reply; 21+ messages in thread From: Jon Hunter @ 2025-02-25 14:21 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org Hi Russell, On 19/02/2025 20:57, Russell King (Oracle) wrote: > On Wed, Feb 19, 2025 at 08:05:57PM +0000, Jon Hunter wrote: >> On 19/02/2025 19:13, Russell King (Oracle) wrote: >>> On Wed, Feb 19, 2025 at 05:52:34PM +0000, Jon Hunter wrote: >>>> On 19/02/2025 15:36, Russell King (Oracle) wrote: >>>>> So clearly the phylink resolver is racing with the rest of the stmmac >>>>> resume path - which doesn't surprise me in the least. I believe I raised >>>>> the fact that calling phylink_resume() before the hardware was ready to >>>>> handle link-up is a bad idea precisely because of races like this. >>>>> >>>>> The reason stmmac does this is because of it's quirk that it needs the >>>>> receive clock from the PHY in order for stmmac_reset() to work. >>>> >>>> I do see the reset fail infrequently on previous kernels with this device >>>> and when it does I see these messages ... >>>> >>>> dwc-eth-dwmac 2490000.ethernet: Failed to reset the dma >>>> dwc-eth-dwmac 2490000.ethernet eth0: stmmac_hw_setup: DMA engine >>>> initialization failed >>> >>> I wonder whether it's also racing with phylib, but phylink_resume() >>> calling phylink_start() going in to call phy_start() is all synchronous. >>> That causes __phy_resume() to be called. >>> >>> Which PHY device/driver is being used? >> >> >> Looks like it is this Broadcom driver ... >> >> Broadcom BCM89610 stmmac-0:00: phy_eee_rx_clock_stop: clk_stop_enable 1 > > I don't see anything special happening in the PHY driver - it doesn't > implement suspend/resume/config_aneg methods, so there's nothing going > on with clocks in that driver beyond generic stuff. > > So, let's try something (I haven't tested this, and its likely you > will need to work it in to your other change.) > > Essentially, this disables the receive clock stop around the reset, > something the stmmac driver has never done in the past. > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 1cbea627b216..8e975863a2e3 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) > rtnl_lock(); > mutex_lock(&priv->lock); > > + phy_eee_rx_clock_stop(priv->dev->phydev, false); > + > stmmac_reset_queues_param(priv); > > stmmac_free_tx_skbufs(priv); > @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) > > stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); > > + phy_eee_rx_clock_stop(priv->dev->phydev, > + priv->phylink_config.eee_rx_clk_stop_enable); > + > stmmac_enable_all_queues(priv); > stmmac_enable_all_dma_irq(priv); > Sorry for the delay, I have been testing various issues recently and needed a bit more time to test this. It turns out that what I had proposed last week does not work. I believe that with all the various debug/instrumentation I had added, I was again getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, it did not work :-( However, what you are suggesting above, all by itself, is working. I have tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working reliably. I have also tested on some other boards that use the same stmmac driver (but use the Aquantia PHY) and I have not seen any issues. So this does fix the issue I am seeing. I know we are getting quite late in the rc for v6.14, but not sure if we could add this as a fix? Thanks! Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-25 14:21 ` Jon Hunter @ 2025-02-26 10:02 ` Russell King (Oracle) 2025-02-26 10:11 ` Jon Hunter 0 siblings, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-26 10:02 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Tue, Feb 25, 2025 at 02:21:01PM +0000, Jon Hunter wrote: > Hi Russell, > > On 19/02/2025 20:57, Russell King (Oracle) wrote: > > So, let's try something (I haven't tested this, and its likely you > > will need to work it in to your other change.) > > > > Essentially, this disables the receive clock stop around the reset, > > something the stmmac driver has never done in the past. > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index 1cbea627b216..8e975863a2e3 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) > > rtnl_lock(); > > mutex_lock(&priv->lock); > > + phy_eee_rx_clock_stop(priv->dev->phydev, false); > > + > > stmmac_reset_queues_param(priv); > > stmmac_free_tx_skbufs(priv); > > @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) > > stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); > > + phy_eee_rx_clock_stop(priv->dev->phydev, > > + priv->phylink_config.eee_rx_clk_stop_enable); > > + > > stmmac_enable_all_queues(priv); > > stmmac_enable_all_dma_irq(priv); > > > Sorry for the delay, I have been testing various issues recently and needed > a bit more time to test this. > > It turns out that what I had proposed last week does not work. I believe > that with all the various debug/instrumentation I had added, I was again > getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, > it did not work :-( > > However, what you are suggesting above, all by itself, is working. I have > tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working > reliably. I have also tested on some other boards that use the same stmmac > driver (but use the Aquantia PHY) and I have not seen any issues. So this > does fix the issue I am seeing. > > I know we are getting quite late in the rc for v6.14, but not sure if we > could add this as a fix? The patch above was something of a hack, bypassing the layering, so I would like to consider how this should be done properly. I'm still wondering whether the early call to phylink_resume() is symptomatic of this same issue, or whether there is a PHY that needs phy_start() to be called to output its clock even with link down that we don't know about. The phylink_resume() call is relevant to this because I'd like to put: phy_eee_rx_clock_stop(priv->dev->phydev, priv->phylink_config.eee_rx_clk_stop_enable); in there to ensure that the PHY is correctly configured for clock-stop, but given stmmac's placement that wouldn't work. I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop at the PHY. I think the only thing we could do is try solving this problem as per above and see what the fall-out from it is. I don't get the impression that stmmac users are particularly active at testing patches though, so it may take months to get breakage reports. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-26 10:02 ` Russell King (Oracle) @ 2025-02-26 10:11 ` Jon Hunter 2025-02-26 10:59 ` Russell King (Oracle) 2025-02-26 11:37 ` Russell King (Oracle) 0 siblings, 2 replies; 21+ messages in thread From: Jon Hunter @ 2025-02-26 10:11 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 26/02/2025 10:02, Russell King (Oracle) wrote: > On Tue, Feb 25, 2025 at 02:21:01PM +0000, Jon Hunter wrote: >> Hi Russell, >> >> On 19/02/2025 20:57, Russell King (Oracle) wrote: >>> So, let's try something (I haven't tested this, and its likely you >>> will need to work it in to your other change.) >>> >>> Essentially, this disables the receive clock stop around the reset, >>> something the stmmac driver has never done in the past. >>> >>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>> index 1cbea627b216..8e975863a2e3 100644 >>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>> @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) >>> rtnl_lock(); >>> mutex_lock(&priv->lock); >>> + phy_eee_rx_clock_stop(priv->dev->phydev, false); >>> + >>> stmmac_reset_queues_param(priv); >>> stmmac_free_tx_skbufs(priv); >>> @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) >>> stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); >>> + phy_eee_rx_clock_stop(priv->dev->phydev, >>> + priv->phylink_config.eee_rx_clk_stop_enable); >>> + >>> stmmac_enable_all_queues(priv); >>> stmmac_enable_all_dma_irq(priv); >> >> >> Sorry for the delay, I have been testing various issues recently and needed >> a bit more time to test this. >> >> It turns out that what I had proposed last week does not work. I believe >> that with all the various debug/instrumentation I had added, I was again >> getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, >> it did not work :-( >> >> However, what you are suggesting above, all by itself, is working. I have >> tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working >> reliably. I have also tested on some other boards that use the same stmmac >> driver (but use the Aquantia PHY) and I have not seen any issues. So this >> does fix the issue I am seeing. >> >> I know we are getting quite late in the rc for v6.14, but not sure if we >> could add this as a fix? > > The patch above was something of a hack, bypassing the layering, so I > would like to consider how this should be done properly. > > I'm still wondering whether the early call to phylink_resume() is > symptomatic of this same issue, or whether there is a PHY that needs > phy_start() to be called to output its clock even with link down that > we don't know about. > > The phylink_resume() call is relevant to this because I'd like to put: > > phy_eee_rx_clock_stop(priv->dev->phydev, > priv->phylink_config.eee_rx_clk_stop_enable); > > in there to ensure that the PHY is correctly configured for clock-stop, > but given stmmac's placement that wouldn't work. > > I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop > at the PHY. > > I think the only thing we could do is try solving this problem as per > above and see what the fall-out from it is. I don't get the impression > that stmmac users are particularly active at testing patches though, so > it may take months to get breakage reports. We can ask Furong to test as he seems to active and making changes, but otherwise I am not sure how well it is being tested across various devices. On the other hand, it feels like there are still lingering issues like this with the driver and so I would hope this is moving in the right direction. Let me know if you have a patch you want me to test and I will run in on our Tegra186, Tegra194 and Tegra234 devices that all use this. Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-26 10:11 ` Jon Hunter @ 2025-02-26 10:59 ` Russell King (Oracle) 2025-02-26 15:55 ` Jon Hunter 2025-02-26 11:37 ` Russell King (Oracle) 1 sibling, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-26 10:59 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Wed, Feb 26, 2025 at 10:11:58AM +0000, Jon Hunter wrote: > > On 26/02/2025 10:02, Russell King (Oracle) wrote: > > On Tue, Feb 25, 2025 at 02:21:01PM +0000, Jon Hunter wrote: > > > Hi Russell, > > > > > > On 19/02/2025 20:57, Russell King (Oracle) wrote: > > > > So, let's try something (I haven't tested this, and its likely you > > > > will need to work it in to your other change.) > > > > > > > > Essentially, this disables the receive clock stop around the reset, > > > > something the stmmac driver has never done in the past. > > > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > index 1cbea627b216..8e975863a2e3 100644 > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) > > > > rtnl_lock(); > > > > mutex_lock(&priv->lock); > > > > + phy_eee_rx_clock_stop(priv->dev->phydev, false); > > > > + > > > > stmmac_reset_queues_param(priv); > > > > stmmac_free_tx_skbufs(priv); > > > > @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) > > > > stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); > > > > + phy_eee_rx_clock_stop(priv->dev->phydev, > > > > + priv->phylink_config.eee_rx_clk_stop_enable); > > > > + > > > > stmmac_enable_all_queues(priv); > > > > stmmac_enable_all_dma_irq(priv); > > > > > > > > > Sorry for the delay, I have been testing various issues recently and needed > > > a bit more time to test this. > > > > > > It turns out that what I had proposed last week does not work. I believe > > > that with all the various debug/instrumentation I had added, I was again > > > getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, > > > it did not work :-( > > > > > > However, what you are suggesting above, all by itself, is working. I have > > > tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working > > > reliably. I have also tested on some other boards that use the same stmmac > > > driver (but use the Aquantia PHY) and I have not seen any issues. So this > > > does fix the issue I am seeing. > > > > > > I know we are getting quite late in the rc for v6.14, but not sure if we > > > could add this as a fix? > > > > The patch above was something of a hack, bypassing the layering, so I > > would like to consider how this should be done properly. > > > > I'm still wondering whether the early call to phylink_resume() is > > symptomatic of this same issue, or whether there is a PHY that needs > > phy_start() to be called to output its clock even with link down that > > we don't know about. > > > > The phylink_resume() call is relevant to this because I'd like to put: > > > > phy_eee_rx_clock_stop(priv->dev->phydev, > > priv->phylink_config.eee_rx_clk_stop_enable); > > > > in there to ensure that the PHY is correctly configured for clock-stop, > > but given stmmac's placement that wouldn't work. > > > > I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop > > at the PHY. > > > > I think the only thing we could do is try solving this problem as per > > above and see what the fall-out from it is. I don't get the impression > > that stmmac users are particularly active at testing patches though, so > > it may take months to get breakage reports. > > > We can ask Furong to test as he seems to active and making changes, but > otherwise I am not sure how well it is being tested across various devices. > On the other hand, it feels like there are still lingering issues like this > with the driver and so I would hope this is moving in the right direction. > > Let me know if you have a patch you want me to test and I will run in on our > Tegra186, Tegra194 and Tegra234 devices that all use this. Do we think this needs to be a patch for the net tree or the net-next tree? I think we've established that it's been a long-standing bug, so maybe if we target net-next to give it more time to be tested? -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-26 10:59 ` Russell King (Oracle) @ 2025-02-26 15:55 ` Jon Hunter 2025-02-26 16:00 ` Russell King (Oracle) 0 siblings, 1 reply; 21+ messages in thread From: Jon Hunter @ 2025-02-26 15:55 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 26/02/2025 10:59, Russell King (Oracle) wrote: > On Wed, Feb 26, 2025 at 10:11:58AM +0000, Jon Hunter wrote: >> >> On 26/02/2025 10:02, Russell King (Oracle) wrote: >>> On Tue, Feb 25, 2025 at 02:21:01PM +0000, Jon Hunter wrote: >>>> Hi Russell, >>>> >>>> On 19/02/2025 20:57, Russell King (Oracle) wrote: >>>>> So, let's try something (I haven't tested this, and its likely you >>>>> will need to work it in to your other change.) >>>>> >>>>> Essentially, this disables the receive clock stop around the reset, >>>>> something the stmmac driver has never done in the past. >>>>> >>>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>>> index 1cbea627b216..8e975863a2e3 100644 >>>>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>>> @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) >>>>> rtnl_lock(); >>>>> mutex_lock(&priv->lock); >>>>> + phy_eee_rx_clock_stop(priv->dev->phydev, false); >>>>> + >>>>> stmmac_reset_queues_param(priv); >>>>> stmmac_free_tx_skbufs(priv); >>>>> @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) >>>>> stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); >>>>> + phy_eee_rx_clock_stop(priv->dev->phydev, >>>>> + priv->phylink_config.eee_rx_clk_stop_enable); >>>>> + >>>>> stmmac_enable_all_queues(priv); >>>>> stmmac_enable_all_dma_irq(priv); >>>> >>>> >>>> Sorry for the delay, I have been testing various issues recently and needed >>>> a bit more time to test this. >>>> >>>> It turns out that what I had proposed last week does not work. I believe >>>> that with all the various debug/instrumentation I had added, I was again >>>> getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, >>>> it did not work :-( >>>> >>>> However, what you are suggesting above, all by itself, is working. I have >>>> tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working >>>> reliably. I have also tested on some other boards that use the same stmmac >>>> driver (but use the Aquantia PHY) and I have not seen any issues. So this >>>> does fix the issue I am seeing. >>>> >>>> I know we are getting quite late in the rc for v6.14, but not sure if we >>>> could add this as a fix? >>> >>> The patch above was something of a hack, bypassing the layering, so I >>> would like to consider how this should be done properly. >>> >>> I'm still wondering whether the early call to phylink_resume() is >>> symptomatic of this same issue, or whether there is a PHY that needs >>> phy_start() to be called to output its clock even with link down that >>> we don't know about. >>> >>> The phylink_resume() call is relevant to this because I'd like to put: >>> >>> phy_eee_rx_clock_stop(priv->dev->phydev, >>> priv->phylink_config.eee_rx_clk_stop_enable); >>> >>> in there to ensure that the PHY is correctly configured for clock-stop, >>> but given stmmac's placement that wouldn't work. >>> >>> I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop >>> at the PHY. >>> >>> I think the only thing we could do is try solving this problem as per >>> above and see what the fall-out from it is. I don't get the impression >>> that stmmac users are particularly active at testing patches though, so >>> it may take months to get breakage reports. >> >> >> We can ask Furong to test as he seems to active and making changes, but >> otherwise I am not sure how well it is being tested across various devices. >> On the other hand, it feels like there are still lingering issues like this >> with the driver and so I would hope this is moving in the right direction. >> >> Let me know if you have a patch you want me to test and I will run in on our >> Tegra186, Tegra194 and Tegra234 devices that all use this. > > Do we think this needs to be a patch for the net tree or the net-next > tree? I think we've established that it's been a long-standing bug, > so maybe if we target net-next to give it more time to be tested? > Yes I agree there is a long-standing issue here. What is unfortunate for Linux v6.14 is that failure rate is much higher. However, I don't see what I can really do about that. I can mark suspend as broken for Linux v6.14 for this device and then hopefully we will get this resolved properly. Thanks! Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-26 15:55 ` Jon Hunter @ 2025-02-26 16:00 ` Russell King (Oracle) 2025-02-26 16:06 ` Jon Hunter 0 siblings, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-26 16:00 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On Wed, Feb 26, 2025 at 03:55:47PM +0000, Jon Hunter wrote: > > On 26/02/2025 10:59, Russell King (Oracle) wrote: > > On Wed, Feb 26, 2025 at 10:11:58AM +0000, Jon Hunter wrote: > > > > > > On 26/02/2025 10:02, Russell King (Oracle) wrote: > > > > On Tue, Feb 25, 2025 at 02:21:01PM +0000, Jon Hunter wrote: > > > > > Hi Russell, > > > > > > > > > > On 19/02/2025 20:57, Russell King (Oracle) wrote: > > > > > > So, let's try something (I haven't tested this, and its likely you > > > > > > will need to work it in to your other change.) > > > > > > > > > > > > Essentially, this disables the receive clock stop around the reset, > > > > > > something the stmmac driver has never done in the past. > > > > > > > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > > > index 1cbea627b216..8e975863a2e3 100644 > > > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > > > @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) > > > > > > rtnl_lock(); > > > > > > mutex_lock(&priv->lock); > > > > > > + phy_eee_rx_clock_stop(priv->dev->phydev, false); > > > > > > + > > > > > > stmmac_reset_queues_param(priv); > > > > > > stmmac_free_tx_skbufs(priv); > > > > > > @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) > > > > > > stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); > > > > > > + phy_eee_rx_clock_stop(priv->dev->phydev, > > > > > > + priv->phylink_config.eee_rx_clk_stop_enable); > > > > > > + > > > > > > stmmac_enable_all_queues(priv); > > > > > > stmmac_enable_all_dma_irq(priv); > > > > > > > > > > > > > > > Sorry for the delay, I have been testing various issues recently and needed > > > > > a bit more time to test this. > > > > > > > > > > It turns out that what I had proposed last week does not work. I believe > > > > > that with all the various debug/instrumentation I had added, I was again > > > > > getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, > > > > > it did not work :-( > > > > > > > > > > However, what you are suggesting above, all by itself, is working. I have > > > > > tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working > > > > > reliably. I have also tested on some other boards that use the same stmmac > > > > > driver (but use the Aquantia PHY) and I have not seen any issues. So this > > > > > does fix the issue I am seeing. > > > > > > > > > > I know we are getting quite late in the rc for v6.14, but not sure if we > > > > > could add this as a fix? > > > > > > > > The patch above was something of a hack, bypassing the layering, so I > > > > would like to consider how this should be done properly. > > > > > > > > I'm still wondering whether the early call to phylink_resume() is > > > > symptomatic of this same issue, or whether there is a PHY that needs > > > > phy_start() to be called to output its clock even with link down that > > > > we don't know about. > > > > > > > > The phylink_resume() call is relevant to this because I'd like to put: > > > > > > > > phy_eee_rx_clock_stop(priv->dev->phydev, > > > > priv->phylink_config.eee_rx_clk_stop_enable); > > > > > > > > in there to ensure that the PHY is correctly configured for clock-stop, > > > > but given stmmac's placement that wouldn't work. > > > > > > > > I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop > > > > at the PHY. > > > > > > > > I think the only thing we could do is try solving this problem as per > > > > above and see what the fall-out from it is. I don't get the impression > > > > that stmmac users are particularly active at testing patches though, so > > > > it may take months to get breakage reports. > > > > > > > > > We can ask Furong to test as he seems to active and making changes, but > > > otherwise I am not sure how well it is being tested across various devices. > > > On the other hand, it feels like there are still lingering issues like this > > > with the driver and so I would hope this is moving in the right direction. > > > > > > Let me know if you have a patch you want me to test and I will run in on our > > > Tegra186, Tegra194 and Tegra234 devices that all use this. > > > > Do we think this needs to be a patch for the net tree or the net-next > > tree? I think we've established that it's been a long-standing bug, > > so maybe if we target net-next to give it more time to be tested? > > > > Yes I agree there is a long-standing issue here. What is unfortunate for > Linux v6.14 is that failure rate is much higher. However, I don't see what I > can really do about that. I can mark suspend as broken for Linux v6.14 for > this device and then hopefully we will get this resolved properly. If we put the patches in net-next, it can have longer to be tested - it won't go straight into 6.14, but will wait until after net-next gets merged, and it'll then be backported to 6.14 stable trees. I think the fix that I've outlined is too big and too risky to go straight into 6.14, but the smaller fix may be better, but would then need to be rewritten into the larger fix. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-26 16:00 ` Russell King (Oracle) @ 2025-02-26 16:06 ` Jon Hunter 0 siblings, 0 replies; 21+ messages in thread From: Jon Hunter @ 2025-02-26 16:06 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org On 26/02/2025 16:00, Russell King (Oracle) wrote: > On Wed, Feb 26, 2025 at 03:55:47PM +0000, Jon Hunter wrote: >> >> On 26/02/2025 10:59, Russell King (Oracle) wrote: >>> On Wed, Feb 26, 2025 at 10:11:58AM +0000, Jon Hunter wrote: >>>> >>>> On 26/02/2025 10:02, Russell King (Oracle) wrote: >>>>> On Tue, Feb 25, 2025 at 02:21:01PM +0000, Jon Hunter wrote: >>>>>> Hi Russell, >>>>>> >>>>>> On 19/02/2025 20:57, Russell King (Oracle) wrote: >>>>>>> So, let's try something (I haven't tested this, and its likely you >>>>>>> will need to work it in to your other change.) >>>>>>> >>>>>>> Essentially, this disables the receive clock stop around the reset, >>>>>>> something the stmmac driver has never done in the past. >>>>>>> >>>>>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>>>>> index 1cbea627b216..8e975863a2e3 100644 >>>>>>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>>>>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>>>>> @@ -7926,6 +7926,8 @@ int stmmac_resume(struct device *dev) >>>>>>> rtnl_lock(); >>>>>>> mutex_lock(&priv->lock); >>>>>>> + phy_eee_rx_clock_stop(priv->dev->phydev, false); >>>>>>> + >>>>>>> stmmac_reset_queues_param(priv); >>>>>>> stmmac_free_tx_skbufs(priv); >>>>>>> @@ -7937,6 +7939,9 @@ int stmmac_resume(struct device *dev) >>>>>>> stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); >>>>>>> + phy_eee_rx_clock_stop(priv->dev->phydev, >>>>>>> + priv->phylink_config.eee_rx_clk_stop_enable); >>>>>>> + >>>>>>> stmmac_enable_all_queues(priv); >>>>>>> stmmac_enable_all_dma_irq(priv); >>>>>> >>>>>> >>>>>> Sorry for the delay, I have been testing various issues recently and needed >>>>>> a bit more time to test this. >>>>>> >>>>>> It turns out that what I had proposed last week does not work. I believe >>>>>> that with all the various debug/instrumentation I had added, I was again >>>>>> getting lucky. So when I tested again this week on top of vanilla v6.14-rc2, >>>>>> it did not work :-( >>>>>> >>>>>> However, what you are suggesting above, all by itself, is working. I have >>>>>> tested this on top of vanilla v6.14-rc2 and v6.14-rc4 and it is working >>>>>> reliably. I have also tested on some other boards that use the same stmmac >>>>>> driver (but use the Aquantia PHY) and I have not seen any issues. So this >>>>>> does fix the issue I am seeing. >>>>>> >>>>>> I know we are getting quite late in the rc for v6.14, but not sure if we >>>>>> could add this as a fix? >>>>> >>>>> The patch above was something of a hack, bypassing the layering, so I >>>>> would like to consider how this should be done properly. >>>>> >>>>> I'm still wondering whether the early call to phylink_resume() is >>>>> symptomatic of this same issue, or whether there is a PHY that needs >>>>> phy_start() to be called to output its clock even with link down that >>>>> we don't know about. >>>>> >>>>> The phylink_resume() call is relevant to this because I'd like to put: >>>>> >>>>> phy_eee_rx_clock_stop(priv->dev->phydev, >>>>> priv->phylink_config.eee_rx_clk_stop_enable); >>>>> >>>>> in there to ensure that the PHY is correctly configured for clock-stop, >>>>> but given stmmac's placement that wouldn't work. >>>>> >>>>> I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop >>>>> at the PHY. >>>>> >>>>> I think the only thing we could do is try solving this problem as per >>>>> above and see what the fall-out from it is. I don't get the impression >>>>> that stmmac users are particularly active at testing patches though, so >>>>> it may take months to get breakage reports. >>>> >>>> >>>> We can ask Furong to test as he seems to active and making changes, but >>>> otherwise I am not sure how well it is being tested across various devices. >>>> On the other hand, it feels like there are still lingering issues like this >>>> with the driver and so I would hope this is moving in the right direction. >>>> >>>> Let me know if you have a patch you want me to test and I will run in on our >>>> Tegra186, Tegra194 and Tegra234 devices that all use this. >>> >>> Do we think this needs to be a patch for the net tree or the net-next >>> tree? I think we've established that it's been a long-standing bug, >>> so maybe if we target net-next to give it more time to be tested? >>> >> >> Yes I agree there is a long-standing issue here. What is unfortunate for >> Linux v6.14 is that failure rate is much higher. However, I don't see what I >> can really do about that. I can mark suspend as broken for Linux v6.14 for >> this device and then hopefully we will get this resolved properly. > > If we put the patches in net-next, it can have longer to be tested - it > won't go straight into 6.14, but will wait until after net-next gets > merged, and it'll then be backported to 6.14 stable trees. Yes that would be great. > I think the fix that I've outlined is too big and too risky to go > straight into 6.14, but the smaller fix may be better, but would then > need to be rewritten into the larger fix. I think it is fine and better to get it fixed for the long term. Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-26 10:11 ` Jon Hunter 2025-02-26 10:59 ` Russell King (Oracle) @ 2025-02-26 11:37 ` Russell King (Oracle) 2025-02-26 17:24 ` Jon Hunter 1 sibling, 1 reply; 21+ messages in thread From: Russell King (Oracle) @ 2025-02-26 11:37 UTC (permalink / raw) To: Jon Hunter Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel, linux-stm32, Marcin Wojtas, Maxime Coquelin, netdev, Paolo Abeni, UNGLinuxDriver, linux-tegra@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1894 bytes --] On Wed, Feb 26, 2025 at 10:11:58AM +0000, Jon Hunter wrote: > On 26/02/2025 10:02, Russell King (Oracle) wrote: > > The patch above was something of a hack, bypassing the layering, so I > > would like to consider how this should be done properly. > > > > I'm still wondering whether the early call to phylink_resume() is > > symptomatic of this same issue, or whether there is a PHY that needs > > phy_start() to be called to output its clock even with link down that > > we don't know about. > > > > The phylink_resume() call is relevant to this because I'd like to put: > > > > phy_eee_rx_clock_stop(priv->dev->phydev, > > priv->phylink_config.eee_rx_clk_stop_enable); > > > > in there to ensure that the PHY is correctly configured for clock-stop, > > but given stmmac's placement that wouldn't work. > > > > I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop > > at the PHY. > > > > I think the only thing we could do is try solving this problem as per > > above and see what the fall-out from it is. I don't get the impression > > that stmmac users are particularly active at testing patches though, so > > it may take months to get breakage reports. > > > We can ask Furong to test as he seems to active and making changes, but > otherwise I am not sure how well it is being tested across various devices. > On the other hand, it feels like there are still lingering issues like this > with the driver and so I would hope this is moving in the right direction. > > Let me know if you have a patch you want me to test and I will run in on our > Tegra186, Tegra194 and Tegra234 devices that all use this. The attached patches shows what I'm thinking of - it's just been roughed out, and only been build tested. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! [-- Attachment #2: 0001-net-phylink-add-config-of-PHY-receive-clock-stop-in-.patch --] [-- Type: text/x-diff, Size: 1723 bytes --] From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Bcc: linux@mail.armlinux.org.uk Subject: [PATCH net-next 1/5] net: phylink: add config of PHY receive clock-stop in phylink_resume() MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- drivers/net/phy/phylink.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index a3b186ab3854..0aae0bb2a254 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -2264,7 +2264,7 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy, pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0; if (pl->mac_supports_eee_ops) { - /* Explicitly configure whether the PHY is allowed to stop it's + /* Explicitly configure whether the PHY is allowed to stop its * receive clock. */ ret = phy_eee_rx_clock_stop(phy, @@ -2645,8 +2645,22 @@ EXPORT_SYMBOL_GPL(phylink_suspend); */ void phylink_resume(struct phylink *pl) { + int ret; + ASSERT_RTNL(); + if (pl->mac_supports_eee_ops && pl->phydev) { + /* Explicitly configure whether the PHY is allowed to stop its + * receive clock on resume to ensure that it is correctly + * configured. + */ + ret = phy_eee_rx_clock_stop(pl->phydev, + pl->config->eee_rx_clk_stop_enable); + if (ret == -EOPNOTSUPP) + phylink_warn(pl, "failed to set rx clock stop: %pe\n", + ERR_PTR(ret)); + } + if (test_bit(PHYLINK_DISABLE_MAC_WOL, &pl->phylink_disable_state)) { /* Wake-on-Lan enabled, MAC handling */ -- 2.30.2 [-- Attachment #3: 0002-net-phylink-add-phylink_prepare_resume.patch --] [-- Type: text/x-diff, Size: 2365 bytes --] From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Bcc: linux@mail.armlinux.org.uk Subject: [PATCH net-next 2/5] net: phylink: add phylink_prepare_resume() MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" Add a resume preparation function, which will ensure that the receive clock from the PHY is appropriately configured while resuming. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- drivers/net/phy/phylink.c | 24 ++++++++++++++++++++++++ include/linux/phylink.h | 1 + 2 files changed, 25 insertions(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 0aae0bb2a254..976e569feb70 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -2636,6 +2636,30 @@ void phylink_suspend(struct phylink *pl, bool mac_wol) } EXPORT_SYMBOL_GPL(phylink_suspend); +/** + * phylink_prepare_resume() - prepare to resume a network device + * @pl: a pointer to a &struct phylink returned from phylink_create() + * + * Optional, thus must be called prior to phylink_resume(). + * + * Prepare to resume a network device, preparing the PHY as necessary. + */ +void phylink_prepare_resume(struct phylink *pl) +{ + ASSERT_RTNL(); + + /* If the MAC requires the receive clock, but receive clock + * stop was enabled at the PHY, we need to ensure that the + * receive clock is running. Disable receive clock stop. + * phylink_resume() will re-enable it if necessary. + */ + if (pl->mac_supports_eee_ops && pl->phydev && + pl->config->mac_requires_rxc && + pl->config->eee_rx_clk_stop_enable) + phy_eee_rx_clock_stop(pl->phydev, false); +} +EXPORT_SYMBOL_GPL(phylink_prepare_resume); + /** * phylink_resume() - handle a network device resume event * @pl: a pointer to a &struct phylink returned from phylink_create() diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 08df65f6867a..071ed4683c8c 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -699,6 +699,7 @@ void phylink_start(struct phylink *); void phylink_stop(struct phylink *); void phylink_suspend(struct phylink *pl, bool mac_wol); +void phylink_prepare_resume(struct phylink *pl); void phylink_resume(struct phylink *pl); void phylink_ethtool_get_wol(struct phylink *, struct ethtool_wolinfo *); -- 2.30.2 [-- Attachment #4: 0003-net-stmmac-move-phylink_resume-after-resume-setup-is.patch --] [-- Type: text/x-diff, Size: 1800 bytes --] From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Bcc: linux@mail.armlinux.org.uk Subject: [PATCH net-next 3/5] net: stmmac: move phylink_resume() after resume setup is complete MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" Move phylink_resume() to be after the setup in stmmac_resume() has completed, as phylink_resume() may result in an immediate call to the .mac_link_up method, which will enable the transmitter and receiver, and enable the transmit queues. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index d552e64eaa90..b9f651d77c4f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7926,16 +7926,6 @@ int stmmac_resume(struct device *dev) return ret; } - rtnl_lock(); - if (device_may_wakeup(priv->device) && priv->plat->pmt) { - phylink_resume(priv->phylink); - } else { - phylink_resume(priv->phylink); - if (device_may_wakeup(priv->device)) - phylink_speed_up(priv->phylink); - } - rtnl_unlock(); - rtnl_lock(); mutex_lock(&priv->lock); @@ -7954,6 +7944,15 @@ int stmmac_resume(struct device *dev) stmmac_enable_all_dma_irq(priv); mutex_unlock(&priv->lock); + + if (device_may_wakeup(priv->device) && priv->plat->pmt) { + phylink_resume(priv->phylink); + } else { + phylink_resume(priv->phylink); + if (device_may_wakeup(priv->device)) + phylink_speed_up(priv->phylink); + } + rtnl_unlock(); netif_device_attach(ndev); -- 2.30.2 [-- Attachment #5: 0004-net-stmmac-simplify-calls-to-phylink_suspend-and-phy.patch --] [-- Type: text/x-diff, Size: 2370 bytes --] From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Bcc: linux@mail.armlinux.org.uk Subject: [PATCH net-next 4/5] net: stmmac: simplify calls to phylink_suspend() and phylink_resume() MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" Currently, the calls to phylink's suspend and resume functions are inside overly complex tests, and boil down to: if (device_may_wakeup(priv->device) && priv->plat->pmt) { call phylink } else { call phylink and if (device_may_wakeup(priv->device)) do something else } This results in phylink always being called, possibly with differing arguments for phylink_suspend(). Simplify this code, noting that each site is slightly different due to the order in which phylink is called and the "something else". Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 +++++++------------ 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index b9f651d77c4f..262718e5c4f3 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7831,13 +7831,11 @@ int stmmac_suspend(struct device *dev) mutex_unlock(&priv->lock); rtnl_lock(); - if (device_may_wakeup(priv->device) && priv->plat->pmt) { - phylink_suspend(priv->phylink, true); - } else { - if (device_may_wakeup(priv->device)) - phylink_speed_down(priv->phylink, false); - phylink_suspend(priv->phylink, false); - } + if (device_may_wakeup(priv->device) && !priv->plat->pmt) + phylink_speed_down(priv->phylink, false); + + phylink_suspend(priv->phylink, + device_may_wakeup(priv->device) && priv->plat->pmt); rtnl_unlock(); if (stmmac_fpe_supported(priv)) @@ -7945,13 +7943,9 @@ int stmmac_resume(struct device *dev) mutex_unlock(&priv->lock); - if (device_may_wakeup(priv->device) && priv->plat->pmt) { - phylink_resume(priv->phylink); - } else { - phylink_resume(priv->phylink); - if (device_may_wakeup(priv->device)) - phylink_speed_up(priv->phylink); - } + phylink_resume(priv->phylink); + if (device_may_wakeup(priv->device) && !priv->plat->pmt) + phylink_speed_up(priv->phylink); rtnl_unlock(); -- 2.30.2 [-- Attachment #6: 0005-net-stmmac-call-phylink_prepare_resume.patch --] [-- Type: text/x-diff, Size: 1473 bytes --] From: "Russell King (Oracle)" <rmk+kernel@armlinux.org.uk> Bcc: linux@mail.armlinux.org.uk Subject: [PATCH net-next 5/5] net: stmmac: call phylink_prepare_resume() MIME-Version: 1.0 Content-Disposition: inline Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="utf-8" The stmmac core needs the receive clock to be running in order to complete its software triggered reset. However, the media link may be in EEE low-power mode, and as the driver allows the PHY receive clock to be disabled, the receive clock may not be present while resuming. This has been observed with Tegra 186. Fix this by using the newly provided phylink_prepare_resume() to temporarily disable receive clock stop while resuming. phylink_resume() will restore the receive clock stop setting according to the configuration passed from the netdev driver. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 262718e5c4f3..31ec1818211d 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7925,6 +7925,8 @@ int stmmac_resume(struct device *dev) } rtnl_lock(); + phylink_prepare_resume(priv->phylink); + mutex_lock(&priv->lock); stmmac_reset_queues_param(priv); -- 2.30.2 ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support 2025-02-26 11:37 ` Russell King (Oracle) @ 2025-02-26 17:24 ` Jon Hunter 0 siblings, 0 replies; 21+ messages in thread From: Jon Hunter @ 2025-02-26 17:24 UTC (permalink / raw) To: Russell King (Oracle) Cc: Andrew Lunn, Heiner Kallweit, Alexandre Torgue, Andrew Lunn, Bryan Whitehead, David S. Miller, Eric Dumazet, Jakub Kicinski, linux-arm-kernel@lists.infradead.org, linux-stm32@st-md-mailman.stormreply.com, Marcin Wojtas, Maxime Coquelin, netdev@vger.kernel.org, Paolo Abeni, UNGLinuxDriver@microchip.com, linux-tegra@vger.kernel.org On 26/02/2025 11:37, Russell King (Oracle) wrote: > On Wed, Feb 26, 2025 at 10:11:58AM +0000, Jon Hunter wrote: >> On 26/02/2025 10:02, Russell King (Oracle) wrote: >>> The patch above was something of a hack, bypassing the layering, so I >>> would like to consider how this should be done properly. >>> >>> I'm still wondering whether the early call to phylink_resume() is >>> symptomatic of this same issue, or whether there is a PHY that needs >>> phy_start() to be called to output its clock even with link down that >>> we don't know about. >>> >>> The phylink_resume() call is relevant to this because I'd like to put: >>> >>> phy_eee_rx_clock_stop(priv->dev->phydev, >>> priv->phylink_config.eee_rx_clk_stop_enable); >>> >>> in there to ensure that the PHY is correctly configured for clock-stop, >>> but given stmmac's placement that wouldn't work. >>> >>> I'm then thinking of phylink_pre_resume() to disable the EEE clock-stop >>> at the PHY. >>> >>> I think the only thing we could do is try solving this problem as per >>> above and see what the fall-out from it is. I don't get the impression >>> that stmmac users are particularly active at testing patches though, so >>> it may take months to get breakage reports. >> >> >> We can ask Furong to test as he seems to active and making changes, but >> otherwise I am not sure how well it is being tested across various devices. >> On the other hand, it feels like there are still lingering issues like this >> with the driver and so I would hope this is moving in the right direction. >> >> Let me know if you have a patch you want me to test and I will run in on our >> Tegra186, Tegra194 and Tegra234 devices that all use this. > > The attached patches shows what I'm thinking of - it's just been roughed > out, and only been build tested. I have tested these patches on Tegra186, Tegra194 and Tegra234 that they are all working fine. Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2025-02-26 17:24 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Z4gdtOaGsBhQCZXn@shell.armlinux.org.uk>
[not found] ` <E1tYAEG-0014QH-9O@rmk-PC.armlinux.org.uk>
2025-02-13 11:05 ` [PATCH net-next 9/9] net: stmmac: convert to phylink managed EEE support Jon Hunter
2025-02-13 11:37 ` Russell King (Oracle)
2025-02-13 12:00 ` Russell King (Oracle)
2025-02-14 10:58 ` Jon Hunter
2025-02-14 11:21 ` Russell King (Oracle)
2025-02-14 17:03 ` Jon Hunter
2025-02-19 14:01 ` Jon Hunter
2025-02-19 15:36 ` Russell King (Oracle)
2025-02-19 17:52 ` Jon Hunter
2025-02-19 19:13 ` Russell King (Oracle)
2025-02-19 20:05 ` Jon Hunter
2025-02-19 20:57 ` Russell King (Oracle)
2025-02-25 14:21 ` Jon Hunter
2025-02-26 10:02 ` Russell King (Oracle)
2025-02-26 10:11 ` Jon Hunter
2025-02-26 10:59 ` Russell King (Oracle)
2025-02-26 15:55 ` Jon Hunter
2025-02-26 16:00 ` Russell King (Oracle)
2025-02-26 16:06 ` Jon Hunter
2025-02-26 11:37 ` Russell King (Oracle)
2025-02-26 17:24 ` Jon Hunter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox