* Re: [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues [not found] <Z8m-CRucPxDW5zZK@shell.armlinux.org.uk> @ 2025-03-07 16:11 ` Jon Hunter 2025-03-07 17:07 ` Russell King (Oracle) 0 siblings, 1 reply; 6+ messages in thread From: Jon Hunter @ 2025-03-07 16:11 UTC (permalink / raw) To: Russell King (Oracle), Thierry Reding, Lad, Prabhakar Cc: Alexandre Torgue, Andrew Lunn, Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit, Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin, netdev, Paolo Abeni, linux-tegra@vger.kernel.org Hi Russell, On 06/03/2025 15:23, Russell King (Oracle) wrote: > Hi, > > This is a second approach to solving the STMMAC reset issues caused by > the lack of receive clock from the PHY where the media is in low power > mode with a PHY that supports receive clock-stop. > > The first approach centred around only addressing the issue in the > resume path, but it seems to also happen when the platform glue module > is removed and re-inserted (Jon - can you check whether that's also > the case for you please?) > > As this is more targetted, I've dropped the patches from this series > which move the call to phylink_resume(), so the link may still come > up too early on resume - but that's something I also intend to fix. > > This is experimental - so I value test reports for this change. The subject indicates 3 patches, but I only see 2 patches? Can you confirm if there are 2 or 3? So far I have only tested to resume case with the 2 patches to make that that is working but on Tegra186, which has been the most problematic, it is not working reliably on top of next-20250305. Cheers, Jon -- nvpublic ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues 2025-03-07 16:11 ` [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues Jon Hunter @ 2025-03-07 17:07 ` Russell King (Oracle) 2025-03-10 14:20 ` Jon Hunter 0 siblings, 1 reply; 6+ messages in thread From: Russell King (Oracle) @ 2025-03-07 17:07 UTC (permalink / raw) To: Jon Hunter Cc: Thierry Reding, Lad, Prabhakar, Alexandre Torgue, Andrew Lunn, Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit, Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin, netdev, Paolo Abeni, linux-tegra@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1584 bytes --] On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: > Hi Russell, > > On 06/03/2025 15:23, Russell King (Oracle) wrote: > > Hi, > > > > This is a second approach to solving the STMMAC reset issues caused by > > the lack of receive clock from the PHY where the media is in low power > > mode with a PHY that supports receive clock-stop. > > > > The first approach centred around only addressing the issue in the > > resume path, but it seems to also happen when the platform glue module > > is removed and re-inserted (Jon - can you check whether that's also > > the case for you please?) > > > > As this is more targetted, I've dropped the patches from this series > > which move the call to phylink_resume(), so the link may still come > > up too early on resume - but that's something I also intend to fix. > > > > This is experimental - so I value test reports for this change. > > > The subject indicates 3 patches, but I only see 2 patches? Can you confirm > if there are 2 or 3? Yes, 2 patches is correct. > So far I have only tested to resume case with the 2 patches to make that > that is working but on Tegra186, which has been the most problematic, it is > not working reliably on top of next-20250305. To confirm, you're seeing stmmac_reset() sporadically timing out on resume even with these patches appled? That's rather disappointing. Do either of the two attached diffs make any difference? Thanks for testing! -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! [-- Attachment #2: stmmac-block-rx-clk-stop.diff --] [-- Type: text/x-diff, Size: 1098 bytes --] diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 8d3cae5b43c5..63d30e09c095 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3108,9 +3108,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) priv->plat->dma_cfg->atds = 1; /* Note that the PHY clock must be running for reset to complete. */ - phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_reset(priv, priv->ioaddr); - phylink_rx_clk_stop_unblock(priv->phylink); if (ret) { netdev_err(priv->dev, "Failed to reset the dma\n"); return ret; @@ -3480,7 +3478,9 @@ static int stmmac_hw_setup(struct net_device *dev, bool ptp_register) phylink_pcs_pre_init(priv->phylink, priv->hw->phylink_pcs); /* DMA initialization and SW reset */ + phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_init_dma_engine(priv); + phylink_rx_clk_stop_unblock(priv->phylink); if (ret < 0) { netdev_err(priv->dev, "%s: DMA engine initialization failed\n", __func__); [-- Attachment #3: stmmac-block-rx-clk-stop-2.diff --] [-- Type: text/x-diff, Size: 1303 bytes --] diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 8d3cae5b43c5..bebc9f98c875 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3108,9 +3108,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) priv->plat->dma_cfg->atds = 1; /* Note that the PHY clock must be running for reset to complete. */ - phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_reset(priv, priv->ioaddr); - phylink_rx_clk_stop_unblock(priv->phylink); if (ret) { netdev_err(priv->dev, "Failed to reset the dma\n"); return ret; @@ -4045,7 +4043,9 @@ static int __stmmac_open(struct net_device *dev, } } + phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_hw_setup(dev, true); + phylink_rx_clk_stop_unblock(priv->phylink); if (ret < 0) { netdev_err(priv->dev, "%s: Hw setup failed\n", __func__); goto init_error; @@ -7949,7 +7949,9 @@ int stmmac_resume(struct device *dev) stmmac_free_tx_skbufs(priv); stmmac_clear_descriptors(priv, &priv->dma_conf); + phylink_rx_clk_stop_block(priv->phylink); stmmac_hw_setup(ndev, false); + phylink_rx_clk_stop_unblock(priv->phylink); stmmac_init_coalesce(priv); stmmac_set_rx_mode(ndev); ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues 2025-03-07 17:07 ` Russell King (Oracle) @ 2025-03-10 14:20 ` Jon Hunter 2025-03-11 13:25 ` Jon Hunter 0 siblings, 1 reply; 6+ messages in thread From: Jon Hunter @ 2025-03-10 14:20 UTC (permalink / raw) To: Russell King (Oracle) Cc: Thierry Reding, Lad, Prabhakar, Alexandre Torgue, Andrew Lunn, Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit, Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin, netdev, Paolo Abeni, linux-tegra@vger.kernel.org On 07/03/2025 17:07, Russell King (Oracle) wrote: > On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: >> Hi Russell, >> >> On 06/03/2025 15:23, Russell King (Oracle) wrote: >>> Hi, >>> >>> This is a second approach to solving the STMMAC reset issues caused by >>> the lack of receive clock from the PHY where the media is in low power >>> mode with a PHY that supports receive clock-stop. >>> >>> The first approach centred around only addressing the issue in the >>> resume path, but it seems to also happen when the platform glue module >>> is removed and re-inserted (Jon - can you check whether that's also >>> the case for you please?) >>> >>> As this is more targetted, I've dropped the patches from this series >>> which move the call to phylink_resume(), so the link may still come >>> up too early on resume - but that's something I also intend to fix. >>> >>> This is experimental - so I value test reports for this change. >> >> >> The subject indicates 3 patches, but I only see 2 patches? Can you confirm >> if there are 2 or 3? > > Yes, 2 patches is correct. > >> So far I have only tested to resume case with the 2 patches to make that >> that is working but on Tegra186, which has been the most problematic, it is >> not working reliably on top of next-20250305. > > To confirm, you're seeing stmmac_reset() sporadically timing out on > resume even with these patches appled? That's rather disappointing. So I am no longer seeing the reset fail, from what I can see, but now NFS is not responding after resume ... [ 49.825094] Enabling non-boot CPUs ... [ 49.829760] Detected PIPT I-cache on CPU1 [ 49.832694] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 [ 49.844120] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 [ 49.856231] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 [ 49.868081] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] [ 49.875389] CPU1 is up [ 49.877187] Detected PIPT I-cache on CPU2 [ 49.880824] CPU features: SANITY CHECK: Unexpected variation in SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 [ 49.892266] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 [ 49.904467] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 [ 49.916257] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] [ 49.923610] CPU2 is up [ 49.925194] Detected PIPT I-cache on CPU3 [ 49.929010] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] [ 49.935866] CPU3 is up [ 49.937983] Detected PIPT I-cache on CPU4 [ 49.941824] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] [ 49.948593] CPU4 is up [ 49.950810] Detected PIPT I-cache on CPU5 [ 49.954651] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] [ 49.961431] CPU5 is up [ 50.069784] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/rgmii link mode [ 50.077634] dwmac4: Master AXI performs any burst length [ 50.080718] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features support found [ 50.088172] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 Advanced Timestamp supported [ 50.096851] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 50.110897] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: repeated role: device [ 50.113922] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 13:39:28 UTC [ 50.147552] OOM killer enabled. [ 50.148441] Restarting tasks ... done. [ 50.152552] VDDIO_SDMMC3_AP: voltage operation not allowed [ 50.154761] random: crng reseeded on system resumption [ 50.162912] PM: suspend exit [ 50.212215] VDDIO_SDMMC3_AP: voltage operation not allowed [ 50.271578] VDDIO_SDMMC3_AP: voltage operation not allowed [ 50.338597] VDDIO_SDMMC3_AP: voltage operation not allowed [ 234.474848] nfs: server 10.26.51.252 not responding, still trying [ 234.538769] nfs: server 10.26.51.252 not responding, still trying [ 237.546922] nfs: server 10.26.51.252 not responding, still trying [ 254.762753] nfs: server 10.26.51.252 not responding, timed out [ 254.762771] nfs: server 10.26.51.252 not responding, timed out [ 254.766376] nfs: server 10.26.51.252 not responding, timed out [ 254.766392] nfs: server 10.26.51.252 not responding, timed out [ 254.783778] nfs: server 10.26.51.252 not responding, timed out [ 254.789582] nfs: server 10.26.51.252 not responding, timed out [ 254.795421] nfs: server 10.26.51.252 not responding, timed out [ 254.801193] nfs: server 10.26.51.252 not responding, timed out > Do either of the two attached diffs make any difference? I will try these next. Thanks Jon -- nvpublic ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues 2025-03-10 14:20 ` Jon Hunter @ 2025-03-11 13:25 ` Jon Hunter 2025-03-11 13:58 ` Russell King (Oracle) 0 siblings, 1 reply; 6+ messages in thread From: Jon Hunter @ 2025-03-11 13:25 UTC (permalink / raw) To: Russell King (Oracle) Cc: Thierry Reding, Lad, Prabhakar, Alexandre Torgue, Andrew Lunn, Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit, Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin, netdev, Paolo Abeni, linux-tegra@vger.kernel.org On 10/03/2025 14:20, Jon Hunter wrote: > > On 07/03/2025 17:07, Russell King (Oracle) wrote: >> On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: >>> Hi Russell, >>> >>> On 06/03/2025 15:23, Russell King (Oracle) wrote: >>>> Hi, >>>> >>>> This is a second approach to solving the STMMAC reset issues caused by >>>> the lack of receive clock from the PHY where the media is in low power >>>> mode with a PHY that supports receive clock-stop. >>>> >>>> The first approach centred around only addressing the issue in the >>>> resume path, but it seems to also happen when the platform glue module >>>> is removed and re-inserted (Jon - can you check whether that's also >>>> the case for you please?) >>>> >>>> As this is more targetted, I've dropped the patches from this series >>>> which move the call to phylink_resume(), so the link may still come >>>> up too early on resume - but that's something I also intend to fix. >>>> >>>> This is experimental - so I value test reports for this change. >>> >>> >>> The subject indicates 3 patches, but I only see 2 patches? Can you >>> confirm >>> if there are 2 or 3? >> >> Yes, 2 patches is correct. >> >>> So far I have only tested to resume case with the 2 patches to make that >>> that is working but on Tegra186, which has been the most problematic, >>> it is >>> not working reliably on top of next-20250305. >> >> To confirm, you're seeing stmmac_reset() sporadically timing out on >> resume even with these patches appled? That's rather disappointing. > > So I am no longer seeing the reset fail, from what I can see, but now > NFS is not responding after resume ... > > [ 49.825094] Enabling non-boot CPUs ... > [ 49.829760] Detected PIPT I-cache on CPU1 > [ 49.832694] CPU features: SANITY CHECK: Unexpected variation in > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 > [ 49.844120] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 > [ 49.856231] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 > [ 49.868081] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] > [ 49.875389] CPU1 is up > [ 49.877187] Detected PIPT I-cache on CPU2 > [ 49.880824] CPU features: SANITY CHECK: Unexpected variation in > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 > [ 49.892266] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 > [ 49.904467] CPU features: SANITY CHECK: Unexpected variation in > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 > [ 49.916257] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] > [ 49.923610] CPU2 is up > [ 49.925194] Detected PIPT I-cache on CPU3 > [ 49.929010] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] > [ 49.935866] CPU3 is up > [ 49.937983] Detected PIPT I-cache on CPU4 > [ 49.941824] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] > [ 49.948593] CPU4 is up > [ 49.950810] Detected PIPT I-cache on CPU5 > [ 49.954651] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] > [ 49.961431] CPU5 is up > [ 50.069784] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/ > rgmii link mode > [ 50.077634] dwmac4: Master AXI performs any burst length > [ 50.080718] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features > support found > [ 50.088172] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 > Advanced Timestamp supported > [ 50.096851] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/ > Full - flow control rx/tx > [ 50.110897] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: > repeated role: device > [ 50.113922] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 > 13:39:28 UTC > [ 50.147552] OOM killer enabled. > [ 50.148441] Restarting tasks ... done. > [ 50.152552] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 50.154761] random: crng reseeded on system resumption > [ 50.162912] PM: suspend exit > [ 50.212215] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 50.271578] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 50.338597] VDDIO_SDMMC3_AP: voltage operation not allowed > [ 234.474848] nfs: server 10.26.51.252 not responding, still trying > [ 234.538769] nfs: server 10.26.51.252 not responding, still trying > [ 237.546922] nfs: server 10.26.51.252 not responding, still trying > [ 254.762753] nfs: server 10.26.51.252 not responding, timed out > [ 254.762771] nfs: server 10.26.51.252 not responding, timed out > [ 254.766376] nfs: server 10.26.51.252 not responding, timed out > [ 254.766392] nfs: server 10.26.51.252 not responding, timed out > [ 254.783778] nfs: server 10.26.51.252 not responding, timed out > [ 254.789582] nfs: server 10.26.51.252 not responding, timed out > [ 254.795421] nfs: server 10.26.51.252 not responding, timed out > [ 254.801193] nfs: server 10.26.51.252 not responding, timed out > >> Do either of the two attached diffs make any difference? > > I will try these next. I tried both of the diffs, but both had the same problem as above and I see these nfs timeouts after resuming. What works the best is the original change you proposed (this is based upon the latest two patches) ... diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index e2146d3aee74..48a646b76a29 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) priv->plat->dma_cfg->atds = 1; - /* Note that the PHY clock must be running for reset to complete. */ - phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_reset(priv, priv->ioaddr); - phylink_rx_clk_stop_unblock(priv->phylink); if (ret) { netdev_err(priv->dev, "Failed to reset the dma\n"); return ret; @@ -7951,6 +7948,8 @@ int stmmac_resume(struct device *dev) rtnl_lock(); mutex_lock(&priv->lock); + /* Note that the PHY clock must be running for reset to complete. */ + phylink_rx_clk_stop_block(priv->phylink); stmmac_reset_queues_param(priv); stmmac_free_tx_skbufs(priv); @@ -7961,6 +7960,7 @@ int stmmac_resume(struct device *dev) stmmac_set_rx_mode(ndev); stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); + phylink_rx_clk_stop_unblock(priv->phylink); stmmac_enable_all_queues(priv); stmmac_enable_all_dma_irq(priv); -- nvpublic ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues 2025-03-11 13:25 ` Jon Hunter @ 2025-03-11 13:58 ` Russell King (Oracle) 2025-03-11 17:32 ` Jon Hunter 0 siblings, 1 reply; 6+ messages in thread From: Russell King (Oracle) @ 2025-03-11 13:58 UTC (permalink / raw) To: Jon Hunter Cc: Thierry Reding, Lad, Prabhakar, Alexandre Torgue, Andrew Lunn, Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit, Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin, netdev, Paolo Abeni, linux-tegra@vger.kernel.org On Tue, Mar 11, 2025 at 01:25:58PM +0000, Jon Hunter wrote: > > On 10/03/2025 14:20, Jon Hunter wrote: > > > > On 07/03/2025 17:07, Russell King (Oracle) wrote: > > > On Fri, Mar 07, 2025 at 04:11:19PM +0000, Jon Hunter wrote: > > > > Hi Russell, > > > > > > > > On 06/03/2025 15:23, Russell King (Oracle) wrote: > > > > > Hi, > > > > > > > > > > This is a second approach to solving the STMMAC reset issues caused by > > > > > the lack of receive clock from the PHY where the media is in low power > > > > > mode with a PHY that supports receive clock-stop. > > > > > > > > > > The first approach centred around only addressing the issue in the > > > > > resume path, but it seems to also happen when the platform glue module > > > > > is removed and re-inserted (Jon - can you check whether that's also > > > > > the case for you please?) > > > > > > > > > > As this is more targetted, I've dropped the patches from this series > > > > > which move the call to phylink_resume(), so the link may still come > > > > > up too early on resume - but that's something I also intend to fix. > > > > > > > > > > This is experimental - so I value test reports for this change. > > > > > > > > > > > > The subject indicates 3 patches, but I only see 2 patches? Can > > > > you confirm > > > > if there are 2 or 3? > > > > > > Yes, 2 patches is correct. > > > > > > > So far I have only tested to resume case with the 2 patches to make that > > > > that is working but on Tegra186, which has been the most > > > > problematic, it is > > > > not working reliably on top of next-20250305. > > > > > > To confirm, you're seeing stmmac_reset() sporadically timing out on > > > resume even with these patches appled? That's rather disappointing. > > > > So I am no longer seeing the reset fail, from what I can see, but now > > NFS is not responding after resume ... > > > > [ 49.825094] Enabling non-boot CPUs ... > > [ 49.829760] Detected PIPT I-cache on CPU1 > > [ 49.832694] CPU features: SANITY CHECK: Unexpected variation in > > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU1: 0x0000009444c004 > > [ 49.844120] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU1: 0x00000010305116 > > [ 49.856231] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU1: 0x00000003001066 > > [ 49.868081] CPU1: Booted secondary processor 0x0000000000 [0x4e0f0030] > > [ 49.875389] CPU1 is up > > [ 49.877187] Detected PIPT I-cache on CPU2 > > [ 49.880824] CPU features: SANITY CHECK: Unexpected variation in > > SYS_CTR_EL0. Boot CPU: 0x0000008444c004, CPU2: 0x0000009444c004 > > [ 49.892266] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_AA64DFR0_EL1. Boot CPU: 0x00000010305106, CPU2: 0x00000010305116 > > [ 49.904467] CPU features: SANITY CHECK: Unexpected variation in > > SYS_ID_DFR0_EL1. Boot CPU: 0x00000003010066, CPU2: 0x00000003001066 > > [ 49.916257] CPU2: Booted secondary processor 0x0000000001 [0x4e0f0030] > > [ 49.923610] CPU2 is up > > [ 49.925194] Detected PIPT I-cache on CPU3 > > [ 49.929010] CPU3: Booted secondary processor 0x0000000101 [0x411fd073] > > [ 49.935866] CPU3 is up > > [ 49.937983] Detected PIPT I-cache on CPU4 > > [ 49.941824] CPU4: Booted secondary processor 0x0000000102 [0x411fd073] > > [ 49.948593] CPU4 is up > > [ 49.950810] Detected PIPT I-cache on CPU5 > > [ 49.954651] CPU5: Booted secondary processor 0x0000000103 [0x411fd073] > > [ 49.961431] CPU5 is up > > [ 50.069784] dwc-eth-dwmac 2490000.ethernet eth0: configuring for phy/ > > rgmii link mode > > [ 50.077634] dwmac4: Master AXI performs any burst length > > [ 50.080718] dwc-eth-dwmac 2490000.ethernet eth0: No Safety Features > > support found > > [ 50.088172] dwc-eth-dwmac 2490000.ethernet eth0: IEEE 1588-2008 > > Advanced Timestamp supported > > [ 50.096851] dwc-eth-dwmac 2490000.ethernet eth0: Link is Up - 1Gbps/ > > Full - flow control rx/tx > > [ 50.110897] usb-conn-gpio 3520000.padctl:ports:usb2-0:connector: > > repeated role: device > > [ 50.113922] tegra-xusb 3530000.usb: Firmware timestamp: 2020-07-06 > > 13:39:28 UTC > > [ 50.147552] OOM killer enabled. > > [ 50.148441] Restarting tasks ... done. > > [ 50.152552] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 50.154761] random: crng reseeded on system resumption > > [ 50.162912] PM: suspend exit > > [ 50.212215] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 50.271578] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 50.338597] VDDIO_SDMMC3_AP: voltage operation not allowed > > [ 234.474848] nfs: server 10.26.51.252 not responding, still trying > > [ 234.538769] nfs: server 10.26.51.252 not responding, still trying > > [ 237.546922] nfs: server 10.26.51.252 not responding, still trying > > [ 254.762753] nfs: server 10.26.51.252 not responding, timed out > > [ 254.762771] nfs: server 10.26.51.252 not responding, timed out > > [ 254.766376] nfs: server 10.26.51.252 not responding, timed out > > [ 254.766392] nfs: server 10.26.51.252 not responding, timed out > > [ 254.783778] nfs: server 10.26.51.252 not responding, timed out > > [ 254.789582] nfs: server 10.26.51.252 not responding, timed out > > [ 254.795421] nfs: server 10.26.51.252 not responding, timed out > > [ 254.801193] nfs: server 10.26.51.252 not responding, timed out > > > > > Do either of the two attached diffs make any difference? > > > > I will try these next. > > > I tried both of the diffs, but both had the same problem as above and > I see these nfs timeouts after resuming. What works the best is the > original change you proposed (this is based upon the latest two > patches) ... I'm wondering whether there's something else which needs the RX clock running in order to take effect. > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index e2146d3aee74..48a646b76a29 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) > if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) > priv->plat->dma_cfg->atds = 1; > - /* Note that the PHY clock must be running for reset to complete. */ > - phylink_rx_clk_stop_block(priv->phylink); > ret = stmmac_reset(priv, priv->ioaddr); > - phylink_rx_clk_stop_unblock(priv->phylink); > if (ret) { > netdev_err(priv->dev, "Failed to reset the dma\n"); > return ret; > @@ -7951,6 +7948,8 @@ int stmmac_resume(struct device *dev) > rtnl_lock(); > mutex_lock(&priv->lock); > + /* Note that the PHY clock must be running for reset to complete. */ > + phylink_rx_clk_stop_block(priv->phylink); > stmmac_reset_queues_param(priv); > stmmac_free_tx_skbufs(priv); > @@ -7961,6 +7960,7 @@ int stmmac_resume(struct device *dev) > stmmac_set_rx_mode(ndev); > stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); > + phylink_rx_clk_stop_unblock(priv->phylink); > stmmac_enable_all_queues(priv); > stmmac_enable_all_dma_irq(priv); If you haven't already, can you try shrinking down the number of functions that are within the block..unblock region please? Looking at the functions called: stmmac_reset_queues_param() stmmac_free_tx_skbufs() stmmac_clear_descriptors() These look like it's only manipulating software state stmmac_hw_setup() We know this calls stmmac_reset() and thus needs the blocking stmmac_init_coalesce() Looks like it's only manipulating software state stmmac_set_rx_mode() This manipulates GMAC_RXQ_CTRL4, GMAC_HASH_TAB(*), GMAC_ADDR_HIGH(*), GMAC_ADDR_LOW(*), and GMAC_PACKET_FILTER. stmmac_restore_hw_vlan_rx_fltr() This manipulates GMAC_VLAN_TAG_DATA, GMAC_VLAN_TAG, GMAC_VLAN_HASH_TABLE, GMAC_VLAN_TAG I wonder whether the last two also require the RX clock to be running. The reason I want to track this down is that we may need to add block..unblock elsewhere in the driver to ensure that the RX clock is running when configuration is done elsewhere. Thanks. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues 2025-03-11 13:58 ` Russell King (Oracle) @ 2025-03-11 17:32 ` Jon Hunter 0 siblings, 0 replies; 6+ messages in thread From: Jon Hunter @ 2025-03-11 17:32 UTC (permalink / raw) To: Russell King (Oracle) Cc: Thierry Reding, Lad, Prabhakar, Alexandre Torgue, Andrew Lunn, Andrew Lunn, David S. Miller, Eric Dumazet, Heiner Kallweit, Jakub Kicinski, linux-arm-kernel, linux-stm32, Maxime Coquelin, netdev, Paolo Abeni, linux-tegra@vger.kernel.org On 11/03/2025 13:58, Russell King (Oracle) wrote: ... > I'm wondering whether there's something else which needs the RX clock > running in order to take effect. > >> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> index e2146d3aee74..48a646b76a29 100644 >> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >> @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) >> if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) >> priv->plat->dma_cfg->atds = 1; >> - /* Note that the PHY clock must be running for reset to complete. */ >> - phylink_rx_clk_stop_block(priv->phylink); >> ret = stmmac_reset(priv, priv->ioaddr); >> - phylink_rx_clk_stop_unblock(priv->phylink); >> if (ret) { >> netdev_err(priv->dev, "Failed to reset the dma\n"); >> return ret; >> @@ -7951,6 +7948,8 @@ int stmmac_resume(struct device *dev) >> rtnl_lock(); >> mutex_lock(&priv->lock); >> + /* Note that the PHY clock must be running for reset to complete. */ >> + phylink_rx_clk_stop_block(priv->phylink); >> stmmac_reset_queues_param(priv); >> stmmac_free_tx_skbufs(priv); >> @@ -7961,6 +7960,7 @@ int stmmac_resume(struct device *dev) >> stmmac_set_rx_mode(ndev); >> stmmac_restore_hw_vlan_rx_fltr(priv, ndev, priv->hw); >> + phylink_rx_clk_stop_unblock(priv->phylink); >> stmmac_enable_all_queues(priv); >> stmmac_enable_all_dma_irq(priv); > > If you haven't already, can you try shrinking down the number of > functions that are within the block..unblock region please? It seems that at a minimum I need to block/unblock around the following functions ... diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index e2146d3aee74..46c343088b1f 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -3109,10 +3109,7 @@ static int stmmac_init_dma_engine(struct stmmac_priv *priv) if (priv->extend_desc && (priv->mode == STMMAC_RING_MODE)) priv->plat->dma_cfg->atds = 1; - /* Note that the PHY clock must be running for reset to complete. */ - phylink_rx_clk_stop_block(priv->phylink); ret = stmmac_reset(priv, priv->ioaddr); - phylink_rx_clk_stop_unblock(priv->phylink); if (ret) { netdev_err(priv->dev, "Failed to reset the dma\n"); return ret; @@ -7953,10 +7950,13 @@ int stmmac_resume(struct device *dev) stmmac_reset_queues_param(priv); + /* Note that the PHY clock must be running for reset to complete. */ + phylink_rx_clk_stop_block(priv->phylink); stmmac_free_tx_skbufs(priv); stmmac_clear_descriptors(priv, &priv->dma_conf); stmmac_hw_setup(ndev, false); + phylink_rx_clk_stop_unblock(priv->phylink); stmmac_init_coalesce(priv); stmmac_set_rx_mode(ndev); > Looking at the functions called: > > stmmac_reset_queues_param() > stmmac_free_tx_skbufs() > stmmac_clear_descriptors() > These look like it's only manipulating software state So it appears that the last two need to be in the block/unblock region and ... > stmmac_hw_setup() > We know this calls stmmac_reset() and thus needs the blocking ... this one, which is no surprise, but the others are OK. Please note so far I have only tested on the Tegra186 board which seems to be the most sensitive. Cheers Jon -- nvpublic ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-11 17:32 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Z8m-CRucPxDW5zZK@shell.armlinux.org.uk>
2025-03-07 16:11 ` [PATCH RFC net-next v2 0/3] net: stmmac: approach 2 to solve EEE LPI reset issues Jon Hunter
2025-03-07 17:07 ` Russell King (Oracle)
2025-03-10 14:20 ` Jon Hunter
2025-03-11 13:25 ` Jon Hunter
2025-03-11 13:58 ` Russell King (Oracle)
2025-03-11 17:32 ` Jon Hunter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox