[PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T

devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
@ 2025-10-26 12:29 Laurent Pinchart
  2025-10-27  1:31 ` Fabio Estevam
                   ` (3 more replies)
  0 siblings, 4 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-26 12:29 UTC (permalink / raw)
  To: devicetree, imx, linux-arm-kernel
  Cc: Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
(DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
devices produces an interrupts storm. Disable EEE support to fix it.

Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
---
The exact reason for the interrupt storm is unknown, and my attempts to
diagnose it was hindered by my lack of expertise with DWMAC. As far as I
understand, the DWMAC implements EEE support, and so does the RTL8211E
PHY according to its datasheet. What each side does exactly is unknown
to me. One theory I've heard to explain the issue is that the two
implementations conflict. There is no register in the RTL8211E PHY to
disable EEE on the PHY side while still advertising its support to the
peer and relying on the implementation in the DWMAC (if this even makes
sense), so disabling EEE is the only viable option.

This patch is likely a workaround, but it fixes ethernet usage on the
board, so it's in my opinion worth being merged. If someone with better
knowledge of EEE and DWMAC, as well as an interest in getting it working
properly on the Debix board, wants to submit additional patches to drop
eee-broken-1000t, I will be happy to test them.
---
 arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts b/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
index 9422beee30b2..4aa47f71425b 100644
--- a/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
+++ b/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
@@ -102,6 +102,7 @@ ethphy0: ethernet-phy@1 { /* RTL8211E */
 			reset-gpios = <&gpio4 18 GPIO_ACTIVE_LOW>;
 			reset-assert-us = <20>;
 			reset-deassert-us = <200000>;
+			eee-broken-1000t;
 		};
 	};
 };
-- 
Regards,

Laurent Pinchart


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-26 12:29 [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T Laurent Pinchart
@ 2025-10-27  1:31 ` Fabio Estevam
  2025-10-27  3:08 ` Andrew Lunn
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 51+ messages in thread
From: Fabio Estevam @ 2025-10-27  1:31 UTC (permalink / raw)
  To: Laurent Pinchart, Andrew Lunn
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

Adding Andrew in case he can help with the review.

On Sun, Oct 26, 2025 at 9:29 AM Laurent Pinchart
<laurent.pinchart@ideasonboard.com> wrote:
>
> Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> devices produces an interrupts storm. Disable EEE support to fix it.
>
> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> ---
> The exact reason for the interrupt storm is unknown, and my attempts to
> diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> understand, the DWMAC implements EEE support, and so does the RTL8211E
> PHY according to its datasheet. What each side does exactly is unknown
> to me. One theory I've heard to explain the issue is that the two
> implementations conflict. There is no register in the RTL8211E PHY to
> disable EEE on the PHY side while still advertising its support to the
> peer and relying on the implementation in the DWMAC (if this even makes
> sense), so disabling EEE is the only viable option.
>
> This patch is likely a workaround, but it fixes ethernet usage on the
> board, so it's in my opinion worth being merged. If someone with better
> knowledge of EEE and DWMAC, as well as an interest in getting it working
> properly on the Debix board, wants to submit additional patches to drop
> eee-broken-1000t, I will be happy to test them.
> ---
>  arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts b/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
> index 9422beee30b2..4aa47f71425b 100644
> --- a/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
> +++ b/arch/arm64/boot/dts/freescale/imx8mp-debix-model-a.dts
> @@ -102,6 +102,7 @@ ethphy0: ethernet-phy@1 { /* RTL8211E */
>                         reset-gpios = <&gpio4 18 GPIO_ACTIVE_LOW>;
>                         reset-assert-us = <20>;
>                         reset-deassert-us = <200000>;
> +                       eee-broken-1000t;
>                 };
>         };
>  };
> --
> Regards,
>
> Laurent Pinchart
>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-26 12:29 [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T Laurent Pinchart
  2025-10-27  1:31 ` Fabio Estevam
@ 2025-10-27  3:08 ` Andrew Lunn
  2025-10-27  7:27   ` Laurent Pinchart
  2025-10-27  9:12   ` Oleksij Rempel
  2025-10-27  9:07 ` Russell King (Oracle)
  2025-10-27 15:13 ` Russell King (Oracle)
  3 siblings, 2 replies; 51+ messages in thread
From: Andrew Lunn @ 2025-10-27  3:08 UTC (permalink / raw)
  To: Laurent Pinchart, Russell King
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

Adding Russell King

On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> devices produces an interrupts storm. Disable EEE support to fix it.
> 
> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> ---
> The exact reason for the interrupt storm is unknown, and my attempts to
> diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> understand, the DWMAC implements EEE support, and so does the RTL8211E
> PHY according to its datasheet.

I believe for DWMAC it is a synthesis option. However, there is a bit
indicating if the hardware supports it.

The PHY should not be able to trigger an interrupt storm in the
MAC. So this is likely to be an DWMAC issue.

Which interrupt bit is causing the storm?

> What each side does exactly is unknown
> to me. One theory I've heard to explain the issue is that the two
> implementations conflict. There is no register in the RTL8211E PHY to
> disable EEE on the PHY side while still advertising its support to the
> peer and relying on the implementation in the DWMAC (if this even makes
> sense)

It does not make sense. EEE is split into two major parts. The two
PHYs communicate with each other to negotiate the feature, if both
ends support it and both ends want to use it. The result of this
negotiation is then passed to the MACs.

It is then the MAC who decides when to send a Low Power Indication to
the PHY to tell the PHY to enter low power mode. The MAC also wakes
the PHY when it has packets to send.

A quick look at the data sheet for the RTL8211E suggests this is what
is supports.

There are a few PHYs which implement SmartEEE, or some other similar
name. They operate differently, the PHY does it all, and the MAC is
not even aware EEE is happening. Such PHYs should really only be
paired with MACs which do not support EEE. An EEE capable MAC paired
with a SmartEEE PHY could have problems, but hopefully the EEE
abilities and negotiation registers in the PHY would be sufficient to
dissuade the MAC from doing EEE. But i would not expect a setup like
this to trigger an interrupt storm.

	Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  3:08 ` Andrew Lunn
@ 2025-10-27  7:27   ` Laurent Pinchart
  2025-10-27  8:47     ` Emanuele Ghidoli
                       ` (2 more replies)
  2025-10-27  9:12   ` Oleksij Rempel
  1 sibling, 3 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27  7:27 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King, devicetree, imx, linux-arm-kernel, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

Hi Andrew,

Thank you for your quick reply.

On Mon, Oct 27, 2025 at 04:08:42AM +0100, Andrew Lunn wrote:
> Adding Russell King
> 
> On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > devices produces an interrupts storm. Disable EEE support to fix it.
> > 
> > Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > ---
> > The exact reason for the interrupt storm is unknown, and my attempts to
> > diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> > understand, the DWMAC implements EEE support, and so does the RTL8211E
> > PHY according to its datasheet.
> 
> I believe for DWMAC it is a synthesis option. However, there is a bit
> indicating if the hardware supports it.
> 
> The PHY should not be able to trigger an interrupt storm in the
> MAC. So this is likely to be an DWMAC issue.
> 
> Which interrupt bit is causing the storm?

That's where I hit my first wall :-)

I've tried to diagnose the issue by adding interrupt counters to
dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
(0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
interrupts storm, increasing at around 10k per second. However, the
corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.

The ENET_EQOS interrupt on the i.MX8MP is an OR'ed signal that combines
four interrupt sources:

- ENET QOS TSN LPI RX exit Interrupt
- ENET QOS TSN Host System Interrupt
- ENET QOS TSN Host System RX Channel Interrupts
- ENET QOS TSN Host System TX Channel Interrupts

The last two interrupt sources are themselves local OR of channels[4:0].

I ould suspect that the LPI RX exit interrupt is the one that fires
constantly given its name, but I'm not sure how to test that.

> > What each side does exactly is unknown
> > to me. One theory I've heard to explain the issue is that the two
> > implementations conflict. There is no register in the RTL8211E PHY to
> > disable EEE on the PHY side while still advertising its support to the
> > peer and relying on the implementation in the DWMAC (if this even makes
> > sense)
> 
> It does not make sense. EEE is split into two major parts. The two
> PHYs communicate with each other to negotiate the feature, if both
> ends support it and both ends want to use it. The result of this
> negotiation is then passed to the MACs.
> 
> It is then the MAC who decides when to send a Low Power Indication to
> the PHY to tell the PHY to enter low power mode. The MAC also wakes
> the PHY when it has packets to send.
> 
> A quick look at the data sheet for the RTL8211E suggests this is what
> is supports.
> 
> There are a few PHYs which implement SmartEEE, or some other similar
> name. They operate differently, the PHY does it all, and the MAC is
> not even aware EEE is happening. Such PHYs should really only be
> paired with MACs which do not support EEE. An EEE capable MAC paired
> with a SmartEEE PHY could have problems, but hopefully the EEE
> abilities and negotiation registers in the PHY would be sufficient to
> dissuade the MAC from doing EEE. But i would not expect a setup like
> this to trigger an interrupt storm.

Thanks for the explanation, I read documents to try and figure out how
it worked and didn't find such a clear and concise high-level summary.

I'm not very experienced with ethernet, but I can easily test patches or
even rough ideas on hardware.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  7:27   ` Laurent Pinchart
@ 2025-10-27  8:47     ` Emanuele Ghidoli
  2025-10-27  9:00       ` Russell King (Oracle)
  2025-10-27  9:32     ` Russell King (Oracle)
  2025-10-27 11:22     ` Russell King (Oracle)
  2 siblings, 1 reply; 51+ messages in thread
From: Emanuele Ghidoli @ 2025-10-27  8:47 UTC (permalink / raw)
  To: Laurent Pinchart, Andrew Lunn
  Cc: Russell King, devicetree, imx, linux-arm-kernel, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo, Emanuele Ghidoli



On 27/10/2025 08:27, Laurent Pinchart wrote:
> Hi Andrew,
> 
> Thank you for your quick reply.
> 
> On Mon, Oct 27, 2025 at 04:08:42AM +0100, Andrew Lunn wrote:
>> Adding Russell King
>>
>> On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
>>> Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
>>> (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
>>> devices produces an interrupts storm. Disable EEE support to fix it.
>>>
>>> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
>>> ---
>>> The exact reason for the interrupt storm is unknown, and my attempts to
>>> diagnose it was hindered by my lack of expertise with DWMAC. As far as I
>>> understand, the DWMAC implements EEE support, and so does the RTL8211E
>>> PHY according to its datasheet.
>>
>> I believe for DWMAC it is a synthesis option. However, there is a bit
>> indicating if the hardware supports it.
>>
>> The PHY should not be able to trigger an interrupt storm in the
>> MAC. So this is likely to be an DWMAC issue.
>>
>> Which interrupt bit is causing the storm?
> 
> That's where I hit my first wall :-)
> 
> I've tried to diagnose the issue by adding interrupt counters to
> dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
> (0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
> interrupts storm, increasing at around 10k per second. However, the
> corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.
> 
> The ENET_EQOS interrupt on the i.MX8MP is an OR'ed signal that combines
> four interrupt sources:
> 
> - ENET QOS TSN LPI RX exit Interrupt
> - ENET QOS TSN Host System Interrupt
> - ENET QOS TSN Host System RX Channel Interrupts
> - ENET QOS TSN Host System TX Channel Interrupts
> 
> The last two interrupt sources are themselves local OR of channels[4:0].
> 
> I ould suspect that the LPI RX exit interrupt is the one that fires
> constantly given its name, but I'm not sure how to test that.
> 
>>> What each side does exactly is unknown
>>> to me. One theory I've heard to explain the issue is that the two
>>> implementations conflict. There is no register in the RTL8211E PHY to
>>> disable EEE on the PHY side while still advertising its support to the
>>> peer and relying on the implementation in the DWMAC (if this even makes
>>> sense)
>>
>> It does not make sense. EEE is split into two major parts. The two
>> PHYs communicate with each other to negotiate the feature, if both
>> ends support it and both ends want to use it. The result of this
>> negotiation is then passed to the MACs.
>>
>> It is then the MAC who decides when to send a Low Power Indication to
>> the PHY to tell the PHY to enter low power mode. The MAC also wakes
>> the PHY when it has packets to send.
>>
>> A quick look at the data sheet for the RTL8211E suggests this is what
>> is supports.
>>
>> There are a few PHYs which implement SmartEEE, or some other similar
>> name. They operate differently, the PHY does it all, and the MAC is
>> not even aware EEE is happening. Such PHYs should really only be
>> paired with MACs which do not support EEE. An EEE capable MAC paired
>> with a SmartEEE PHY could have problems, but hopefully the EEE
>> abilities and negotiation registers in the PHY would be sufficient to
>> dissuade the MAC from doing EEE. But i would not expect a setup like
>> this to trigger an interrupt storm.
> 
> Thanks for the explanation, I read documents to try and figure out how
> it worked and didn't find such a clear and concise high-level summary.
> 
> I'm not very experienced with ethernet, but I can easily test patches or
> even rough ideas on hardware.
> 

Hi Laurent,
I had the same problem, interrupt storm plus link instability with dwmac.

I found out that 2c81f3357136 ("net: stmmac: convert to phylink PCS support")
commit is the one causing the problem to me.

But the phy used by our board clearly do not support EEE, so I disabled
directly in the driver.

I’m very interested in your investigation, as I’d like to understand why that
commit causes a regression, given that it supposedly just switches to using
phylink for EEE management.

See https://lore.kernel.org/all/20251023144857.529566-1-ghidoliemanuele@gmail.com/

Thanks and regards,
Emanuele

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  8:47     ` Emanuele Ghidoli
@ 2025-10-27  9:00       ` Russell King (Oracle)
  2025-10-27  9:18         ` Emanuele Ghidoli
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27  9:00 UTC (permalink / raw)
  To: Emanuele Ghidoli
  Cc: Laurent Pinchart, Andrew Lunn, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo, Emanuele Ghidoli

On Mon, Oct 27, 2025 at 09:47:53AM +0100, Emanuele Ghidoli wrote:
> On 27/10/2025 08:27, Laurent Pinchart wrote:
> > Hi Andrew,
> > 
> > Thank you for your quick reply.
> > 
> > On Mon, Oct 27, 2025 at 04:08:42AM +0100, Andrew Lunn wrote:
> >> Adding Russell King
> >>
> >> On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> >>> Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> >>> (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> >>> devices produces an interrupts storm. Disable EEE support to fix it.
> >>>
> >>> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> >>> ---
> >>> The exact reason for the interrupt storm is unknown, and my attempts to
> >>> diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> >>> understand, the DWMAC implements EEE support, and so does the RTL8211E
> >>> PHY according to its datasheet.
> >>
> >> I believe for DWMAC it is a synthesis option. However, there is a bit
> >> indicating if the hardware supports it.
> >>
> >> The PHY should not be able to trigger an interrupt storm in the
> >> MAC. So this is likely to be an DWMAC issue.
> >>
> >> Which interrupt bit is causing the storm?
> > 
> > That's where I hit my first wall :-)
> > 
> > I've tried to diagnose the issue by adding interrupt counters to
> > dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
> > (0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
> > interrupts storm, increasing at around 10k per second. However, the
> > corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.
> > 
> > The ENET_EQOS interrupt on the i.MX8MP is an OR'ed signal that combines
> > four interrupt sources:
> > 
> > - ENET QOS TSN LPI RX exit Interrupt
> > - ENET QOS TSN Host System Interrupt
> > - ENET QOS TSN Host System RX Channel Interrupts
> > - ENET QOS TSN Host System TX Channel Interrupts
> > 
> > The last two interrupt sources are themselves local OR of channels[4:0].
> > 
> > I ould suspect that the LPI RX exit interrupt is the one that fires
> > constantly given its name, but I'm not sure how to test that.
> > 
> >>> What each side does exactly is unknown
> >>> to me. One theory I've heard to explain the issue is that the two
> >>> implementations conflict. There is no register in the RTL8211E PHY to
> >>> disable EEE on the PHY side while still advertising its support to the
> >>> peer and relying on the implementation in the DWMAC (if this even makes
> >>> sense)
> >>
> >> It does not make sense. EEE is split into two major parts. The two
> >> PHYs communicate with each other to negotiate the feature, if both
> >> ends support it and both ends want to use it. The result of this
> >> negotiation is then passed to the MACs.
> >>
> >> It is then the MAC who decides when to send a Low Power Indication to
> >> the PHY to tell the PHY to enter low power mode. The MAC also wakes
> >> the PHY when it has packets to send.
> >>
> >> A quick look at the data sheet for the RTL8211E suggests this is what
> >> is supports.
> >>
> >> There are a few PHYs which implement SmartEEE, or some other similar
> >> name. They operate differently, the PHY does it all, and the MAC is
> >> not even aware EEE is happening. Such PHYs should really only be
> >> paired with MACs which do not support EEE. An EEE capable MAC paired
> >> with a SmartEEE PHY could have problems, but hopefully the EEE
> >> abilities and negotiation registers in the PHY would be sufficient to
> >> dissuade the MAC from doing EEE. But i would not expect a setup like
> >> this to trigger an interrupt storm.
> > 
> > Thanks for the explanation, I read documents to try and figure out how
> > it worked and didn't find such a clear and concise high-level summary.
> > 
> > I'm not very experienced with ethernet, but I can easily test patches or
> > even rough ideas on hardware.
> > 
> 
> Hi Laurent,
> I had the same problem, interrupt storm plus link instability with dwmac.

You never said that in your patch description. You said "it causes
link instability and communication failures." Have you investigated
what the cause of the interrupt storm is?

> I found out that 2c81f3357136 ("net: stmmac: convert to phylink PCS support")
> commit is the one causing the problem to me.

You claim this commit enables EEE by default. It does. However, stmmac
_before_ this commit enables EEE by default as I've already explained,
quoting the old code which effects this. I've asked you to test
further. So far, I've heard nothing back.

What has changed is that we no longer do anything with the RGSMIIS
status, and in theory keep the mask/enable for this disabled. Howeer,
that is a subsequent commit.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-26 12:29 [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T Laurent Pinchart
  2025-10-27  1:31 ` Fabio Estevam
  2025-10-27  3:08 ` Andrew Lunn
@ 2025-10-27  9:07 ` Russell King (Oracle)
  2025-10-27  9:33   ` Laurent Pinchart
  2025-10-27 13:33   ` Russell King (Oracle)
  2025-10-27 15:13 ` Russell King (Oracle)
  3 siblings, 2 replies; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27  9:07 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> devices produces an interrupts storm. Disable EEE support to fix it.

As this is the second problem that has been reported recently, please
bisect the conversion of stmmac to phylink managed EEE support and see
whether there is anything in that which is causing this.

Please also confirm that EEE was enabled (as soon through ethtool)
prior to stmmac's conversion (I believe it was due to this code that
was present in stmmac_init_phy():

-               if (priv->dma_cap.eee)
-                       phy_support_eee(phydev);
-
                ret = phylink_connect_phy(priv->phylink, phydev);

> The exact reason for the interrupt storm is unknown, and my attempts to
> diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> understand, the DWMAC implements EEE support, and so does the RTL8211E
> PHY according to its datasheet. What each side does exactly is unknown
> to me. One theory I've heard to explain the issue is that the two
> implementations conflict. There is no register in the RTL8211E PHY to
> disable EEE on the PHY side while still advertising its support to the
> peer and relying on the implementation in the DWMAC (if this even makes
> sense), so disabling EEE is the only viable option.
> 
> This patch is likely a workaround, but it fixes ethernet usage on the
> board, so it's in my opinion worth being merged. If someone with better
> knowledge of EEE and DWMAC, as well as an interest in getting it working
> properly on the Debix board, wants to submit additional patches to drop
> eee-broken-1000t, I will be happy to test them.

The changes to stmmac have been tested on nVidia Jetson Xavier NX,
which uses RGMII with dwmac4 and a RTL8211F PHY, connected to a Netgear
GS108 switch. It seems to be the same that your board is using similar.

I will re-test today.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  3:08 ` Andrew Lunn
  2025-10-27  7:27   ` Laurent Pinchart
@ 2025-10-27  9:12   ` Oleksij Rempel
  2025-10-27 10:02     ` Laurent Pinchart
  2025-11-12 12:34     ` Russell King (Oracle)
  1 sibling, 2 replies; 51+ messages in thread
From: Oleksij Rempel @ 2025-10-27  9:12 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Laurent Pinchart, Russell King, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 04:08:42AM +0100, Andrew Lunn wrote:
> Adding Russell King
>
> On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > devices produces an interrupts storm. Disable EEE support to fix it.
> >
> > Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > ---
> > The exact reason for the interrupt storm is unknown, and my attempts to
> > diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> > understand, the DWMAC implements EEE support, and so does the RTL8211E
> > PHY according to its datasheet.
>
> I believe for DWMAC it is a synthesis option. However, there is a bit
> indicating if the hardware supports it.
>
> The PHY should not be able to trigger an interrupt storm in the
> MAC. So this is likely to be an DWMAC issue.
>
> Which interrupt bit is causing the storm?
>
> > What each side does exactly is unknown
> > to me. One theory I've heard to explain the issue is that the two
> > implementations conflict. There is no register in the RTL8211E PHY to
> > disable EEE on the PHY side while still advertising its support to the
> > peer and relying on the implementation in the DWMAC (if this even makes
> > sense)
>
> It does not make sense. EEE is split into two major parts. The two
> PHYs communicate with each other to negotiate the feature, if both
> ends support it and both ends want to use it. The result of this
> negotiation is then passed to the MACs.
>
> It is then the MAC who decides when to send a Low Power Indication to
> the PHY to tell the PHY to enter low power mode. The MAC also wakes
> the PHY when it has packets to send.
>
> A quick look at the data sheet for the RTL8211E suggests this is what
> is supports.
>
> There are a few PHYs which implement SmartEEE, or some other similar
> name. They operate differently, the PHY does it all, and the MAC is
> not even aware EEE is happening. Such PHYs should really only be
> paired with MACs which do not support EEE. An EEE capable MAC paired
> with a SmartEEE PHY could have problems, but hopefully the EEE
> abilities and negotiation registers in the PHY would be sufficient to
> dissuade the MAC from doing EEE. But i would not expect a setup like
> this to trigger an interrupt storm.

Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
monitoring RGMII TX and MDI lines with oscilloscope and changing
tx-timer configurations. I also confirmed this information from other
source. To disable SmartEEE and use plain MAC based mode, NDA documentation
is needed.

Best Regards,
Oleksij

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:00       ` Russell King (Oracle)
@ 2025-10-27  9:18         ` Emanuele Ghidoli
  0 siblings, 0 replies; 51+ messages in thread
From: Emanuele Ghidoli @ 2025-10-27  9:18 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Laurent Pinchart, Andrew Lunn, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo, Emanuele Ghidoli



On 27/10/2025 10:00, Russell King (Oracle) wrote:
> On Mon, Oct 27, 2025 at 09:47:53AM +0100, Emanuele Ghidoli wrote:
>> On 27/10/2025 08:27, Laurent Pinchart wrote:
>>> Hi Andrew,
>>>
>>> Thank you for your quick reply.
>>>
>>> On Mon, Oct 27, 2025 at 04:08:42AM +0100, Andrew Lunn wrote:
>>>> Adding Russell King
>>>>
>>>> On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
>>>>> Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
>>>>> (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
>>>>> devices produces an interrupts storm. Disable EEE support to fix it.
>>>>>
>>>>> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
>>>>> ---
>>>>> The exact reason for the interrupt storm is unknown, and my attempts to
>>>>> diagnose it was hindered by my lack of expertise with DWMAC. As far as I
>>>>> understand, the DWMAC implements EEE support, and so does the RTL8211E
>>>>> PHY according to its datasheet.
>>>>
>>>> I believe for DWMAC it is a synthesis option. However, there is a bit
>>>> indicating if the hardware supports it.
>>>>
>>>> The PHY should not be able to trigger an interrupt storm in the
>>>> MAC. So this is likely to be an DWMAC issue.
>>>>
>>>> Which interrupt bit is causing the storm?
>>>
>>> That's where I hit my first wall :-)
>>>
>>> I've tried to diagnose the issue by adding interrupt counters to
>>> dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
>>> (0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
>>> interrupts storm, increasing at around 10k per second. However, the
>>> corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.
>>>
>>> The ENET_EQOS interrupt on the i.MX8MP is an OR'ed signal that combines
>>> four interrupt sources:
>>>
>>> - ENET QOS TSN LPI RX exit Interrupt
>>> - ENET QOS TSN Host System Interrupt
>>> - ENET QOS TSN Host System RX Channel Interrupts
>>> - ENET QOS TSN Host System TX Channel Interrupts
>>>
>>> The last two interrupt sources are themselves local OR of channels[4:0].
>>>
>>> I ould suspect that the LPI RX exit interrupt is the one that fires
>>> constantly given its name, but I'm not sure how to test that.
>>>
>>>>> What each side does exactly is unknown
>>>>> to me. One theory I've heard to explain the issue is that the two
>>>>> implementations conflict. There is no register in the RTL8211E PHY to
>>>>> disable EEE on the PHY side while still advertising its support to the
>>>>> peer and relying on the implementation in the DWMAC (if this even makes
>>>>> sense)
>>>>
>>>> It does not make sense. EEE is split into two major parts. The two
>>>> PHYs communicate with each other to negotiate the feature, if both
>>>> ends support it and both ends want to use it. The result of this
>>>> negotiation is then passed to the MACs.
>>>>
>>>> It is then the MAC who decides when to send a Low Power Indication to
>>>> the PHY to tell the PHY to enter low power mode. The MAC also wakes
>>>> the PHY when it has packets to send.
>>>>
>>>> A quick look at the data sheet for the RTL8211E suggests this is what
>>>> is supports.
>>>>
>>>> There are a few PHYs which implement SmartEEE, or some other similar
>>>> name. They operate differently, the PHY does it all, and the MAC is
>>>> not even aware EEE is happening. Such PHYs should really only be
>>>> paired with MACs which do not support EEE. An EEE capable MAC paired
>>>> with a SmartEEE PHY could have problems, but hopefully the EEE
>>>> abilities and negotiation registers in the PHY would be sufficient to
>>>> dissuade the MAC from doing EEE. But i would not expect a setup like
>>>> this to trigger an interrupt storm.
>>>
>>> Thanks for the explanation, I read documents to try and figure out how
>>> it worked and didn't find such a clear and concise high-level summary.
>>>
>>> I'm not very experienced with ethernet, but I can easily test patches or
>>> even rough ideas on hardware.
>>>
>>
>> Hi Laurent,
>> I had the same problem, interrupt storm plus link instability with dwmac.
> 
> You never said that in your patch description. You said "it causes
> link instability and communication failures." Have you investigated
> what the cause of the interrupt storm is?
> 
>> I found out that 2c81f3357136 ("net: stmmac: convert to phylink PCS support")
>> commit is the one causing the problem to me.
The correct commit is 4218647d4556 (“net: stmmac: convert to phylink managed
EEE support”).>
> You claim this commit enables EEE by default. It does. However, stmmac
> _before_ this commit enables EEE by default as I've already explained,
> quoting the old code which effects this. I've asked you to test
> further. So far, I've heard nothing back.
> 
> What has changed is that we no longer do anything with the RGSMIIS
> status, and in theory keep the mask/enable for this disabled. Howeer,
> that is a subsequent commit.
> 
Hi Russell,

Sorry, I made a copy-and-paste mistake earlier.


I identified it through a bisect, and reverting this commit (or disabling EEE)
resolves the issue I’m seeing.

I’m continuing to investigate further to understand the root cause.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  7:27   ` Laurent Pinchart
  2025-10-27  8:47     ` Emanuele Ghidoli
@ 2025-10-27  9:32     ` Russell King (Oracle)
  2025-10-27 23:08       ` Laurent Pinchart
  2025-10-27 11:22     ` Russell King (Oracle)
  2 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27  9:32 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Andrew Lunn, devicetree, imx, linux-arm-kernel, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 09:27:49AM +0200, Laurent Pinchart wrote:
> I've tried to diagnose the issue by adding interrupt counters to
> dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
> (0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
> interrupts storm, increasing at around 10k per second. However, the
> corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.

This is a change in the PCS series rather than the EEE series. It would
be good to narrow down whehn this problem appeared for you.

The RGSMIIIS bit set without RGSMIIIM (0x00b4 bit 0) shouldn't result
in an interrupt storm since the status will be masked. That doens't
mean that RGSMIIIS won't be set. So, at this point I'm not worried
about that.

Can you print the intr_status and intr_values in dwmac4_irq_status(),
maybe something like this:

	static int ctr = 0;

	if (ctr++ >= 9996) {
		printk("stmmac: INTS=%08x INTE=%08x\n", intr_status,
			intr_enable);

		if (ctr >= 10000)
			ctr = 0;
	}

        /* Discard disabled bits */
        intr_status &= intr_enable;

which should avoid too much noise during "normal" operation. It'll
print six consecutive interrupts every 10000.

> I ould suspect that the LPI RX exit interrupt is the one that fires
> constantly given its name, but I'm not sure how to test that.

You can check this because the LPI interrupts have statistic counter
associated with them. ethtool -S should give these, look for
lpi_mode_n.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:07 ` Russell King (Oracle)
@ 2025-10-27  9:33   ` Laurent Pinchart
  2025-10-27  9:45     ` Russell King (Oracle)
  2025-10-27 13:33   ` Russell King (Oracle)
  1 sibling, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27  9:33 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Emanuele Ghidoli

Hi Russell,

On Mon, Oct 27, 2025 at 09:07:32AM +0000, Russell King (Oracle) wrote:
> On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > devices produces an interrupts storm. Disable EEE support to fix it.
> 
> As this is the second problem that has been reported recently, please
> bisect the conversion of stmmac to phylink managed EEE support and see
> whether there is anything in that which is causing this.

Emanuele Ghidoli has bisected this to commit 2c81f3357136 ("net: stmmac:
convert to phylink PCS support"), as reported in [1]. I can test that
commit and the commit just before tonight.

[1] https://lore.kernel.org/all/341f56de-9dde-4c44-9542-b523e1917dcb@gmail.com/

> Please also confirm that EEE was enabled (as soon through ethtool)
> prior to stmmac's conversion (I believe it was due to this code that
> was present in stmmac_init_phy():
> 
> -               if (priv->dma_cap.eee)
> -                       phy_support_eee(phydev);
> -
>                 ret = phylink_connect_phy(priv->phylink, phydev);
> 
> > The exact reason for the interrupt storm is unknown, and my attempts to
> > diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> > understand, the DWMAC implements EEE support, and so does the RTL8211E
> > PHY according to its datasheet. What each side does exactly is unknown
> > to me. One theory I've heard to explain the issue is that the two
> > implementations conflict. There is no register in the RTL8211E PHY to
> > disable EEE on the PHY side while still advertising its support to the
> > peer and relying on the implementation in the DWMAC (if this even makes
> > sense), so disabling EEE is the only viable option.
> > 
> > This patch is likely a workaround, but it fixes ethernet usage on the
> > board, so it's in my opinion worth being merged. If someone with better
> > knowledge of EEE and DWMAC, as well as an interest in getting it working
> > properly on the Debix board, wants to submit additional patches to drop
> > eee-broken-1000t, I will be happy to test them.
> 
> The changes to stmmac have been tested on nVidia Jetson Xavier NX,
> which uses RGMII with dwmac4 and a RTL8211F PHY, connected to a Netgear
> GS108 switch. It seems to be the same that your board is using similar.

Very similar indeed, with a RTL8211E instead of the RTL8211F.

> I will re-test today.

Thank you.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:33   ` Laurent Pinchart
@ 2025-10-27  9:45     ` Russell King (Oracle)
  2025-10-27  9:55       ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27  9:45 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Emanuele Ghidoli

On Mon, Oct 27, 2025 at 11:33:46AM +0200, Laurent Pinchart wrote:
> Hi Russell,
> 
> On Mon, Oct 27, 2025 at 09:07:32AM +0000, Russell King (Oracle) wrote:
> > On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > > devices produces an interrupts storm. Disable EEE support to fix it.
> > 
> > As this is the second problem that has been reported recently, please
> > bisect the conversion of stmmac to phylink managed EEE support and see
> > whether there is anything in that which is causing this.
> 
> Emanuele Ghidoli has bisected this to commit 2c81f3357136 ("net: stmmac:
> convert to phylink PCS support"), as reported in [1]. I can test that
> commit and the commit just before tonight.
> 
> [1] https://lore.kernel.org/all/341f56de-9dde-4c44-9542-b523e1917dcb@gmail.com/

As you will notice in that thread, I have responded to it last week, so
I am well aware of it. I am also well aware of the claims Emanuele made
in his commit description are demonstrably false.

That's not to say the commit isn't a problem, but the explanation of
why it's a problem doesn't make sense right now, and thus what needs to
be done to fix it is unknown.

I don't think going around disabling EEE on individual platforms is the
right approach.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:45     ` Russell King (Oracle)
@ 2025-10-27  9:55       ` Laurent Pinchart
  0 siblings, 0 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27  9:55 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Emanuele Ghidoli

On Mon, Oct 27, 2025 at 09:45:54AM +0000, Russell King (Oracle) wrote:
> On Mon, Oct 27, 2025 at 11:33:46AM +0200, Laurent Pinchart wrote:
> > On Mon, Oct 27, 2025 at 09:07:32AM +0000, Russell King (Oracle) wrote:
> > > On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > > > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > > > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > > > devices produces an interrupts storm. Disable EEE support to fix it.
> > > 
> > > As this is the second problem that has been reported recently, please
> > > bisect the conversion of stmmac to phylink managed EEE support and see
> > > whether there is anything in that which is causing this.
> > 
> > Emanuele Ghidoli has bisected this to commit 2c81f3357136 ("net: stmmac:
> > convert to phylink PCS support"), as reported in [1]. I can test that
> > commit and the commit just before tonight.
> > 
> > [1] https://lore.kernel.org/all/341f56de-9dde-4c44-9542-b523e1917dcb@gmail.com/
> 
> As you will notice in that thread, I have responded to it last week, so
> I am well aware of it. I am also well aware of the claims Emanuele made
> in his commit description are demonstrably false.
> 
> That's not to say the commit isn't a problem, but the explanation of
> why it's a problem doesn't make sense right now, and thus what needs to
> be done to fix it is unknown.
> 
> I don't think going around disabling EEE on individual platforms is the
> right approach.

I fully agree with you, seeing how eee-broken-1000t spread on many
i.MX8MP DT sources raised a red cargo-cult flag. I didn't have the
knowledge to properly diagnose and fix this myself, hence this patch,
hoping someone more knowledgeable than me could help. Based on the
replies received so far, it seems I was right to be hopeful :-)

I'll perform more tests tonight, printing the interrupt status values as
you requested in a separate e-mail.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:12   ` Oleksij Rempel
@ 2025-10-27 10:02     ` Laurent Pinchart
  2025-10-27 10:23       ` Oleksij Rempel
  2025-11-12 12:34     ` Russell King (Oracle)
  1 sibling, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27 10:02 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Russell King, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

Hi Oleksij,

On Mon, Oct 27, 2025 at 10:12:12AM +0100, Oleksij Rempel wrote:
> On Mon, Oct 27, 2025 at 04:08:42AM +0100, Andrew Lunn wrote:
> > Adding Russell King
> >
> > On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > > devices produces an interrupts storm. Disable EEE support to fix it.
> > >
> > > Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> > > ---
> > > The exact reason for the interrupt storm is unknown, and my attempts to
> > > diagnose it was hindered by my lack of expertise with DWMAC. As far as I
> > > understand, the DWMAC implements EEE support, and so does the RTL8211E
> > > PHY according to its datasheet.
> >
> > I believe for DWMAC it is a synthesis option. However, there is a bit
> > indicating if the hardware supports it.
> >
> > The PHY should not be able to trigger an interrupt storm in the
> > MAC. So this is likely to be an DWMAC issue.
> >
> > Which interrupt bit is causing the storm?
> >
> > > What each side does exactly is unknown
> > > to me. One theory I've heard to explain the issue is that the two
> > > implementations conflict. There is no register in the RTL8211E PHY to
> > > disable EEE on the PHY side while still advertising its support to the
> > > peer and relying on the implementation in the DWMAC (if this even makes
> > > sense)
> >
> > It does not make sense. EEE is split into two major parts. The two
> > PHYs communicate with each other to negotiate the feature, if both
> > ends support it and both ends want to use it. The result of this
> > negotiation is then passed to the MACs.
> >
> > It is then the MAC who decides when to send a Low Power Indication to
> > the PHY to tell the PHY to enter low power mode. The MAC also wakes
> > the PHY when it has packets to send.
> >
> > A quick look at the data sheet for the RTL8211E suggests this is what
> > is supports.
> >
> > There are a few PHYs which implement SmartEEE, or some other similar
> > name. They operate differently, the PHY does it all, and the MAC is
> > not even aware EEE is happening. Such PHYs should really only be
> > paired with MACs which do not support EEE. An EEE capable MAC paired
> > with a SmartEEE PHY could have problems, but hopefully the EEE
> > abilities and negotiation registers in the PHY would be sufficient to
> > dissuade the MAC from doing EEE. But i would not expect a setup like
> > this to trigger an interrupt storm.
> 
> Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> monitoring RGMII TX and MDI lines with oscilloscope and changing
> tx-timer configurations. I also confirmed this information from other
> source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> is needed.

That's useful information, thank you. Would you by any chance to know if
such NDA would allow contributing the feature upstream ?

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 10:02     ` Laurent Pinchart
@ 2025-10-27 10:23       ` Oleksij Rempel
  2025-10-27 10:31         ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Oleksij Rempel @ 2025-10-27 10:23 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Andrew Lunn, Russell King, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

Hi Laurent,

On Mon, Oct 27, 2025 at 12:02:27PM +0200, Laurent Pinchart wrote:
> Hi Oleksij,
> > Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> > It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> > monitoring RGMII TX and MDI lines with oscilloscope and changing
> > tx-timer configurations. I also confirmed this information from other
> > source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> > is needed.
> 
> That's useful information, thank you. Would you by any chance to know if
> such NDA would allow contributing the feature upstream ?

Good question, but the NDA process was actually aborted. We didn't move
forward due to a lack of time and ultimately, a lack of commercial
interest from any projects or customers for this PHY.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 10:23       ` Oleksij Rempel
@ 2025-10-27 10:31         ` Laurent Pinchart
  2025-10-27 10:34           ` Russell King (Oracle)
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27 10:31 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Russell King, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 11:23:03AM +0100, Oleksij Rempel wrote:
> On Mon, Oct 27, 2025 at 12:02:27PM +0200, Laurent Pinchart wrote:
> > Hi Oleksij,
> > > Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> > > It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> > > monitoring RGMII TX and MDI lines with oscilloscope and changing
> > > tx-timer configurations. I also confirmed this information from other
> > > source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> > > is needed.
> > 
> > That's useful information, thank you. Would you by any chance to know if
> > such NDA would allow contributing the feature upstream ?
> 
> Good question, but the NDA process was actually aborted. We didn't move
> forward due to a lack of time and ultimately, a lack of commercial
> interest from any projects or customers for this PHY.

Fair enough. I've tried :-)

If we can't disable SmartEEE in the PHY, does it mean we need to somehow
disable EEE in the MAC, but still program the PHY to advertise EEE to
the link partner ?

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 10:31         ` Laurent Pinchart
@ 2025-10-27 10:34           ` Russell King (Oracle)
  2025-10-27 10:44             ` Oleksij Rempel
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27 10:34 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Oleksij Rempel, Andrew Lunn, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 12:31:07PM +0200, Laurent Pinchart wrote:
> On Mon, Oct 27, 2025 at 11:23:03AM +0100, Oleksij Rempel wrote:
> > On Mon, Oct 27, 2025 at 12:02:27PM +0200, Laurent Pinchart wrote:
> > > Hi Oleksij,
> > > > Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> > > > It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> > > > monitoring RGMII TX and MDI lines with oscilloscope and changing
> > > > tx-timer configurations. I also confirmed this information from other
> > > > source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> > > > is needed.
> > > 
> > > That's useful information, thank you. Would you by any chance to know if
> > > such NDA would allow contributing the feature upstream ?
> > 
> > Good question, but the NDA process was actually aborted. We didn't move
> > forward due to a lack of time and ultimately, a lack of commercial
> > interest from any projects or customers for this PHY.
> 
> Fair enough. I've tried :-)
> 
> If we can't disable SmartEEE in the PHY, does it mean we need to somehow
> disable EEE in the MAC, but still program the PHY to advertise EEE to
> the link partner ?

Or maybe the PHY needs to have EEE capability disabled?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 10:34           ` Russell King (Oracle)
@ 2025-10-27 10:44             ` Oleksij Rempel
  2025-10-27 10:48               ` Russell King (Oracle)
  0 siblings, 1 reply; 51+ messages in thread
From: Oleksij Rempel @ 2025-10-27 10:44 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Laurent Pinchart, devicetree, Conor Dooley, imx, Shawn Guo,
	Sascha Hauer, Kieran Bingham, Andrew Lunn, Daniel Scally,
	Pengutronix Kernel Team, Stefan Klug, Krzysztof Kozlowski,
	Fabio Estevam, Rob Herring, linux-arm-kernel

On Mon, Oct 27, 2025 at 10:34:35AM +0000, Russell King (Oracle) wrote:
> On Mon, Oct 27, 2025 at 12:31:07PM +0200, Laurent Pinchart wrote:
> > On Mon, Oct 27, 2025 at 11:23:03AM +0100, Oleksij Rempel wrote:
> > > On Mon, Oct 27, 2025 at 12:02:27PM +0200, Laurent Pinchart wrote:
> > > > Hi Oleksij,
> > > > > Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> > > > > It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> > > > > monitoring RGMII TX and MDI lines with oscilloscope and changing
> > > > > tx-timer configurations. I also confirmed this information from other
> > > > > source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> > > > > is needed.
> > > > 
> > > > That's useful information, thank you. Would you by any chance to know if
> > > > such NDA would allow contributing the feature upstream ?
> > > 
> > > Good question, but the NDA process was actually aborted. We didn't move
> > > forward due to a lack of time and ultimately, a lack of commercial
> > > interest from any projects or customers for this PHY.
> > 
> > Fair enough. I've tried :-)
> > 
> > If we can't disable SmartEEE in the PHY, does it mean we need to somehow
> > disable EEE in the MAC, but still program the PHY to advertise EEE to
> > the link partner ?
> 
> Or maybe the PHY needs to have EEE capability disabled?

Ack. With comment in the code, why we prefer this way, in case some one
wont to spend time on making it work. Probably SmartEEE or some other
word should be used.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 10:44             ` Oleksij Rempel
@ 2025-10-27 10:48               ` Russell King (Oracle)
  2025-10-27 12:50                 ` Andrew Lunn
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27 10:48 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Laurent Pinchart, devicetree, Conor Dooley, imx, Shawn Guo,
	Sascha Hauer, Kieran Bingham, Andrew Lunn, Daniel Scally,
	Pengutronix Kernel Team, Stefan Klug, Krzysztof Kozlowski,
	Fabio Estevam, Rob Herring, linux-arm-kernel

On Mon, Oct 27, 2025 at 11:44:52AM +0100, Oleksij Rempel wrote:
> On Mon, Oct 27, 2025 at 10:34:35AM +0000, Russell King (Oracle) wrote:
> > On Mon, Oct 27, 2025 at 12:31:07PM +0200, Laurent Pinchart wrote:
> > > On Mon, Oct 27, 2025 at 11:23:03AM +0100, Oleksij Rempel wrote:
> > > > On Mon, Oct 27, 2025 at 12:02:27PM +0200, Laurent Pinchart wrote:
> > > > > Hi Oleksij,
> > > > > > Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> > > > > > It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> > > > > > monitoring RGMII TX and MDI lines with oscilloscope and changing
> > > > > > tx-timer configurations. I also confirmed this information from other
> > > > > > source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> > > > > > is needed.
> > > > > 
> > > > > That's useful information, thank you. Would you by any chance to know if
> > > > > such NDA would allow contributing the feature upstream ?
> > > > 
> > > > Good question, but the NDA process was actually aborted. We didn't move
> > > > forward due to a lack of time and ultimately, a lack of commercial
> > > > interest from any projects or customers for this PHY.
> > > 
> > > Fair enough. I've tried :-)
> > > 
> > > If we can't disable SmartEEE in the PHY, does it mean we need to somehow
> > > disable EEE in the MAC, but still program the PHY to advertise EEE to
> > > the link partner ?
> > 
> > Or maybe the PHY needs to have EEE capability disabled?
> 
> Ack. With comment in the code, why we prefer this way, in case some one
> wont to spend time on making it work. Probably SmartEEE or some other
> word should be used.

So we have options.

However, we need to get to the bottom of what caused the change of
behaviour before we start throwing solutions at this.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  7:27   ` Laurent Pinchart
  2025-10-27  8:47     ` Emanuele Ghidoli
  2025-10-27  9:32     ` Russell King (Oracle)
@ 2025-10-27 11:22     ` Russell King (Oracle)
  2025-10-27 23:15       ` Laurent Pinchart
  2 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27 11:22 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Andrew Lunn, devicetree, imx, linux-arm-kernel, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 09:27:49AM +0200, Laurent Pinchart wrote:
> I've tried to diagnose the issue by adding interrupt counters to
> dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
> (0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
> interrupts storm, increasing at around 10k per second. However, the
> corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.

I'll add to my comments earlier, because it may help you work out
what's going on.

RGSMIIS will be set when the LNKSTS bit (bit 19) of 0xf8 changes
state. RGSMIIS is only cleared by reading this register. So, something
else to test would be to do a dummy read of this register and see
whether the interrupt storm still has the RGSMIIS bit set.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 10:48               ` Russell King (Oracle)
@ 2025-10-27 12:50                 ` Andrew Lunn
  2025-10-27 14:50                   ` Oleksij Rempel
  0 siblings, 1 reply; 51+ messages in thread
From: Andrew Lunn @ 2025-10-27 12:50 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Laurent Pinchart, devicetree, Conor Dooley, imx,
	Shawn Guo, Sascha Hauer, Kieran Bingham, Daniel Scally,
	Pengutronix Kernel Team, Stefan Klug, Krzysztof Kozlowski,
	Fabio Estevam, Rob Herring, linux-arm-kernel

> > Ack. With comment in the code, why we prefer this way, in case some one
> > wont to spend time on making it work. Probably SmartEEE or some other
> > word should be used.
> 
> So we have options.
> 
> However, we need to get to the bottom of what caused the change of
> behaviour before we start throwing solutions at this.

It also seems like the PHY is FUBAR. If the standard 802.3 EEE
registers are being used, a management plane is using them to
negotiate EEE with the link partner, the PHY firmware should disable
SmartEEE and only provide 802.3 EEE.

It sounds like this PHY is not 802.3 compatible.

	Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:07 ` Russell King (Oracle)
  2025-10-27  9:33   ` Laurent Pinchart
@ 2025-10-27 13:33   ` Russell King (Oracle)
  1 sibling, 0 replies; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27 13:33 UTC (permalink / raw)
  To: Laurent Pinchart, Emanuele Ghidoli
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 09:07:32AM +0000, Russell King (Oracle) wrote:
> The changes to stmmac have been tested on nVidia Jetson Xavier NX,
> which uses RGMII with dwmac4 and a RTL8211F PHY, connected to a Netgear
> GS108 switch. It seems to be the same that your board is using similar.
> 
> I will re-test today.

I just booted net-next on this platform.

# ethtool -S eth0 | grep lpi_mode_n
     irq_tx_path_in_lpi_mode_n: 24
     irq_tx_path_exit_lpi_mode_n: 23
     irq_rx_path_in_lpi_mode_n: 201
     irq_rx_path_exit_lpi_mode_n: 200
# ethtool --show-eee eth0
EEE Settings for eth0:
        EEE status: enabled - active
        Tx LPI: 1000000 (us)
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
        Advertised EEE link modes:  100baseT/Full
                                    1000baseT/Full
        Link partner advertised EEE link modes:  100baseT/Full
                                                 1000baseT/Full

So it looks like everything is working as it should here.

stmmac was converted to phylink managed EEE in v6.14-rc1. I've built
v6.13 to check that my assertions w.r.t. EEE defaulting to being
enabled are correct, and:

# # ethtool -S eth0 | grep lpi
     irq_tx_path_in_lpi_mode_n: 15
     irq_tx_path_exit_lpi_mode_n: 14
     irq_rx_path_in_lpi_mode_n: 0
     irq_rx_path_exit_lpi_mode_n: 0
# ethtool --show-eee eth0
EEE Settings for eth0:
        EEE status: enabled - active
        Tx LPI: disabled
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
        Advertised EEE link modes:  100baseT/Full
                                    1000baseT/Full
        Link partner advertised EEE link modes:  100baseT/Full
                                                 1000baseT/Full

So, as I have asserted in response to Emanuele, the conversion of
stmmac to phylink-managed EEE hasn't changed whether EEE is enabled
by default. It was enabled by default before phylink-managed EEE, and
as I always try to do, I try to avoid introducing different behaviours
when converting drivers to a new implementation. That point holds up
here w.r.t. whether EEE is enabled by default. Hence, blaming these
problems on the phylink conversion enabling EEE by default is
incorrect - and I wish people would *stop* jumping to false conclusions
without evidence. As phylink maintainer, it is extremely disheartening
to keep having problems falsely levelled at phylink.

Note that the difference here is the receive path at the MAC doesn't
enter LPI mode. This is because PHY-mode EEE is enabled, which prevents
the LPI state on the receive side being forwarded to the MAC. I fixed
via commit bfc17c165835 ("net: phy: realtek: disable PHY-mode EEE") for
RTL8211F PHYs merged in v6.15-rc1.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 12:50                 ` Andrew Lunn
@ 2025-10-27 14:50                   ` Oleksij Rempel
  0 siblings, 0 replies; 51+ messages in thread
From: Oleksij Rempel @ 2025-10-27 14:50 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King (Oracle), Laurent Pinchart, devicetree, Conor Dooley,
	imx, Shawn Guo, Sascha Hauer, Kieran Bingham, Daniel Scally,
	Pengutronix Kernel Team, Stefan Klug, Krzysztof Kozlowski,
	Fabio Estevam, Rob Herring, linux-arm-kernel

On Mon, Oct 27, 2025 at 01:50:21PM +0100, Andrew Lunn wrote:
> > > Ack. With comment in the code, why we prefer this way, in case some one
> > > wont to spend time on making it work. Probably SmartEEE or some other
> > > word should be used.
> > 
> > So we have options.
> > 
> > However, we need to get to the bottom of what caused the change of
> > behaviour before we start throwing solutions at this.
> 
> It also seems like the PHY is FUBAR. If the standard 802.3 EEE
> registers are being used, a management plane is using them to
> negotiate EEE with the link partner, the PHY firmware should disable
> SmartEEE and only provide 802.3 EEE.
> 
> It sounds like this PHY is not 802.3 compatible.

I do not know better place to post it, so I add it here for archive.
At least, it explains a reason why EEE fails. Something like this is not
possible to handle on the MAC side. At same time it is hard to
diagnose:

In 100BASE-TX EEE mode, the link partner may drop link or loss packet
when the local MAC/PHY (device) starts to transmit the “Wake” signal
immediately following the “Sleep”/“Refresh” signal to exit Low-Power
Idle mode and return to Active mode.

Many EEE PHY link partners require a short “Quiet” (Tq) duration after
receiving the “Sleep”/“Refresh” signal. Without this short Tq wait time,
that is not specified in the IEEE 802.3az Standard, link drop or packet
loss can occur.

https://ww1.microchip.com/downloads/aemDocuments/documents/OTH/ProductDocuments/Errata/80000708B.pdf

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-26 12:29 [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T Laurent Pinchart
                   ` (2 preceding siblings ...)
  2025-10-27  9:07 ` Russell King (Oracle)
@ 2025-10-27 15:13 ` Russell King (Oracle)
  2025-10-27 19:52   ` Andrew Lunn
  2025-10-27 23:46   ` Laurent Pinchart
  3 siblings, 2 replies; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-27 15:13 UTC (permalink / raw)
  To: Laurent Pinchart, Oleksij Rempel, Emanuele Ghidoli
  Cc: devicetree, imx, linux-arm-kernel, Daniel Scally, Kieran Bingham,
	Stefan Klug, Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> devices produces an interrupts storm. Disable EEE support to fix it.

We've finally got to the bottom of what's going on here. Please try
this patch (it's building locally, but will take some time because
I'd wound the tree back to 6.13 and 6.14, so it's going to be a full
rebuild.) Thus, there may be compile bugs remaining.

This uncovered a latent bug in Emanuele's case - the TI PHY drivers
report EEE capabilities despite not being capable which also needs
fixing. The patch below will stop stmmac enabling EEE by default on
PHYs described in firmware, which is the behaviour the driver used
to have.

If we decide that EEE should be enabled by default, then we'll need
to revert this change. However, given Oleksij's recent input, I'm
wondering whether EEE should default to disabled given the issues
with Tq. The suggestion there is that many PHYs get it wrong and thus
are incompatible with each other when EEE is enabled.

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index fd5106880192..c18690a6804f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1208,6 +1208,24 @@ static int stmmac_init_phy(struct net_device *dev)
 	return 0;
 }
 
+static bool stmmac_has_fw_phy(struct stmmac_priv *priv)
+{
+	struct fwnode_handle *fwnode;
+
+	fwnode = priv->plat->port_node;
+	if (!fwnode)
+		fwnode = dev_fwnode(priv->device);
+
+	if (!fwnode)
+		return false;
+
+	fwnode = fwnode_get_phy_node(fwnode);
+	if (fwnode)
+		fwnode_handle_put(fwnode);
+
+	return !!fwnode;
+}
+
 static int stmmac_phylink_setup(struct stmmac_priv *priv)
 {
 	struct stmmac_mdio_bus_data *mdio_bus_data;
@@ -1270,7 +1288,7 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
 		/* All full duplex speeds above 100Mbps are supported */
 		config->lpi_capabilities = ~(MAC_1000FD - 1) | MAC_100FD;
 		config->lpi_timer_default = eee_timer * 1000;
-		config->eee_enabled_default = true;
+		config->eee_enabled_default = !stmmac_has_fw_phy(priv);
 	}
 
 	config->wol_phy_speed_ctrl = true;

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 15:13 ` Russell King (Oracle)
@ 2025-10-27 19:52   ` Andrew Lunn
  2025-10-27 23:46   ` Laurent Pinchart
  1 sibling, 0 replies; 51+ messages in thread
From: Andrew Lunn @ 2025-10-27 19:52 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Laurent Pinchart, Oleksij Rempel, Emanuele Ghidoli, devicetree,
	imx, linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

> If we decide that EEE should be enabled by default, then we'll need
> to revert this change. However, given Oleksij's recent input, I'm
> wondering whether EEE should default to disabled given the issues
> with Tq. The suggestion there is that many PHYs get it wrong and thus
> are incompatible with each other when EEE is enabled.

I would probably default to leaving the hardware alone, take over its
configuration. If the reset default or strapping has EEE enabled, its
probably been reasonable well tested. If the reset default or
strapping is EEE is disabled, it probably is not so well tested, so
maybe dangerous to enable by default.

      Andrew

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:32     ` Russell King (Oracle)
@ 2025-10-27 23:08       ` Laurent Pinchart
  0 siblings, 0 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27 23:08 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, devicetree, imx, linux-arm-kernel, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 09:32:24AM +0000, Russell King (Oracle) wrote:
> On Mon, Oct 27, 2025 at 09:27:49AM +0200, Laurent Pinchart wrote:
> > I've tried to diagnose the issue by adding interrupt counters to
> > dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
> > (0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
> > interrupts storm, increasing at around 10k per second. However, the
> > corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.
> 
> This is a change in the PCS series rather than the EEE series. It would
> be good to narrow down whehn this problem appeared for you.
> 
> The RGSMIIIS bit set without RGSMIIIM (0x00b4 bit 0) shouldn't result
> in an interrupt storm since the status will be masked. That doens't
> mean that RGSMIIIS won't be set. So, at this point I'm not worried
> about that.
> 
> Can you print the intr_status and intr_values in dwmac4_irq_status(),
> maybe something like this:
> 
> 	static int ctr = 0;
> 
> 	if (ctr++ >= 9996) {
> 		printk("stmmac: INTS=%08x INTE=%08x\n", intr_status,
> 			intr_enable);
> 
> 		if (ctr >= 10000)
> 			ctr = 0;
> 	}
> 
>         /* Discard disabled bits */
>         intr_status &= intr_enable;
> 
> which should avoid too much noise during "normal" operation. It'll
> print six consecutive interrupts every 10000.

I'm always getting the same values:

[   62.638187] stmmac: INTS=00000001 INTE=00001030

Now the funny part. I get about 20 of those messages printed to the
serial console every time I press enter, and rarely otherwise. Typing
other characters in the console do not trigger the messages.

> > I ould suspect that the LPI RX exit interrupt is the one that fires
> > constantly given its name, but I'm not sure how to test that.
> 
> You can check this because the LPI interrupts have statistic counter
> associated with them. ethtool -S should give these, look for
> lpi_mode_n.

# ethtool -S eth0 | grep lpi
     irq_tx_path_in_lpi_mode_n: 32
     irq_tx_path_exit_lpi_mode_n: 32
     irq_rx_path_in_lpi_mode_n: 2512
     irq_rx_path_exit_lpi_mode_n: 2508

That seems reasonable.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 11:22     ` Russell King (Oracle)
@ 2025-10-27 23:15       ` Laurent Pinchart
  0 siblings, 0 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27 23:15 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, devicetree, imx, linux-arm-kernel, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 11:22:35AM +0000, Russell King (Oracle) wrote:
> On Mon, Oct 27, 2025 at 09:27:49AM +0200, Laurent Pinchart wrote:
> > I've tried to diagnose the issue by adding interrupt counters to
> > dwmac4_irq_status(), counting interrupts for each bit of GMAC_INT_STATUS
> > (0x00b0). Bit RGSMIIIS (0) is the only one that seems linked to the
> > interrupts storm, increasing at around 10k per second. However, the
> > corresponding bit in GMAC_INT_EN (0x00b4) is *not* set.
> 
> I'll add to my comments earlier, because it may help you work out
> what's going on.
> 
> RGSMIIS will be set when the LNKSTS bit (bit 19) of 0xf8 changes
> state. RGSMIIS is only cleared by reading this register. So, something
> else to test would be to do a dummy read of this register and see
> whether the interrupt storm still has the RGSMIIS bit set.

It does. I then get

[   22.880935] stmmac: INTS=00000000 INTE=00001030

with the same interrupt storm. This is getting weirder and weirder.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 15:13 ` Russell King (Oracle)
  2025-10-27 19:52   ` Andrew Lunn
@ 2025-10-27 23:46   ` Laurent Pinchart
  2025-10-28  0:57     ` Russell King (Oracle)
  1 sibling, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-27 23:46 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 03:13:51PM +0000, Russell King (Oracle) wrote:
> On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > devices produces an interrupts storm. Disable EEE support to fix it.
> 
> We've finally got to the bottom of what's going on here. Please try
> this patch (it's building locally, but will take some time because
> I'd wound the tree back to 6.13 and 6.14, so it's going to be a full
> rebuild.) Thus, there may be compile bugs remaining.

I've applied it on top of 

I've started with a branch based on v6.18-rc3 plus "[PATCH net-next 0/5]
net: stmmac: more cleanups" ([1]) and "[PATCH net-next v2 0/6] net: add
phylink managed WoL and convert stmmac" ([2]) to make the patch apply
cleanly.

[1] https://lore.kernel.org/all/aO_HIwT_YvxkDS8D@shell.armlinux.org.uk/
[2] https://lore.kernel.org/all/aPnyW54J80h9DmhB@shell.armlinux.org.uk/

The base branch exhibits the interrupt storm issue. The patch
unfortunately doesn't fix it.

> This uncovered a latent bug in Emanuele's case - the TI PHY drivers
> report EEE capabilities despite not being capable which also needs
> fixing. The patch below will stop stmmac enabling EEE by default on
> PHYs described in firmware, which is the behaviour the driver used
> to have.
> 
> If we decide that EEE should be enabled by default, then we'll need
> to revert this change. However, given Oleksij's recent input, I'm
> wondering whether EEE should default to disabled given the issues
> with Tq. The suggestion there is that many PHYs get it wrong and thus
> are incompatible with each other when EEE is enabled.
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index fd5106880192..c18690a6804f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1208,6 +1208,24 @@ static int stmmac_init_phy(struct net_device *dev)
>  	return 0;
>  }
>  
> +static bool stmmac_has_fw_phy(struct stmmac_priv *priv)
> +{
> +	struct fwnode_handle *fwnode;
> +
> +	fwnode = priv->plat->port_node;
> +	if (!fwnode)
> +		fwnode = dev_fwnode(priv->device);
> +
> +	if (!fwnode)
> +		return false;
> +
> +	fwnode = fwnode_get_phy_node(fwnode);
> +	if (fwnode)
> +		fwnode_handle_put(fwnode);
> +
> +	return !!fwnode;
> +}
> +
>  static int stmmac_phylink_setup(struct stmmac_priv *priv)
>  {
>  	struct stmmac_mdio_bus_data *mdio_bus_data;
> @@ -1270,7 +1288,7 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
>  		/* All full duplex speeds above 100Mbps are supported */
>  		config->lpi_capabilities = ~(MAC_1000FD - 1) | MAC_100FD;
>  		config->lpi_timer_default = eee_timer * 1000;
> -		config->eee_enabled_default = true;
> +		config->eee_enabled_default = !stmmac_has_fw_phy(priv);
>  	}
>  
>  	config->wol_phy_speed_ctrl = true;

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27 23:46   ` Laurent Pinchart
@ 2025-10-28  0:57     ` Russell King (Oracle)
  2025-10-28  7:18       ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-10-28  0:57 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo

On Tue, Oct 28, 2025 at 01:46:48AM +0200, Laurent Pinchart wrote:
> On Mon, Oct 27, 2025 at 03:13:51PM +0000, Russell King (Oracle) wrote:
> > On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > > devices produces an interrupts storm. Disable EEE support to fix it.
> > 
> > We've finally got to the bottom of what's going on here. Please try
> > this patch (it's building locally, but will take some time because
> > I'd wound the tree back to 6.13 and 6.14, so it's going to be a full
> > rebuild.) Thus, there may be compile bugs remaining.
> 
> I've applied it on top of 
> 
> I've started with a branch based on v6.18-rc3 plus "[PATCH net-next 0/5]
> net: stmmac: more cleanups" ([1]) and "[PATCH net-next v2 0/6] net: add
> phylink managed WoL and convert stmmac" ([2]) to make the patch apply
> cleanly.
> 
> [1] https://lore.kernel.org/all/aO_HIwT_YvxkDS8D@shell.armlinux.org.uk/
> [2] https://lore.kernel.org/all/aPnyW54J80h9DmhB@shell.armlinux.org.uk/
> 
> The base branch exhibits the interrupt storm issue. The patch
> unfortunately doesn't fix it.

So it's highly unlikely that your problem is the same as Emanuele's.

Do you know when the interrupt storm behaviour started? If not, I'd
suggest testing 6.13 and 6.14 as a starting point to see whether
the phylink-managed EEE conversion is involved.

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-28  0:57     ` Russell King (Oracle)
@ 2025-10-28  7:18       ` Laurent Pinchart
  2025-11-11 23:54         ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-10-28  7:18 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Catalin Popescu

(CC'ing Catalin)

On Tue, Oct 28, 2025 at 12:57:55AM +0000, Russell King (Oracle) wrote:
> On Tue, Oct 28, 2025 at 01:46:48AM +0200, Laurent Pinchart wrote:
> > On Mon, Oct 27, 2025 at 03:13:51PM +0000, Russell King (Oracle) wrote:
> > > On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > > > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > > > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > > > devices produces an interrupts storm. Disable EEE support to fix it.
> > > 
> > > We've finally got to the bottom of what's going on here. Please try
> > > this patch (it's building locally, but will take some time because
> > > I'd wound the tree back to 6.13 and 6.14, so it's going to be a full
> > > rebuild.) Thus, there may be compile bugs remaining.
> > 
> > I've applied it on top of 
> > 
> > I've started with a branch based on v6.18-rc3 plus "[PATCH net-next 0/5]
> > net: stmmac: more cleanups" ([1]) and "[PATCH net-next v2 0/6] net: add
> > phylink managed WoL and convert stmmac" ([2]) to make the patch apply
> > cleanly.
> > 
> > [1] https://lore.kernel.org/all/aO_HIwT_YvxkDS8D@shell.armlinux.org.uk/
> > [2] https://lore.kernel.org/all/aPnyW54J80h9DmhB@shell.armlinux.org.uk/
> > 
> > The base branch exhibits the interrupt storm issue. The patch
> > unfortunately doesn't fix it.
> 
> So it's highly unlikely that your problem is the same as Emanuele's.
> 
> Do you know when the interrupt storm behaviour started? If not, I'd
> suggest testing 6.13 and 6.14 as a starting point to see whether
> the phylink-managed EEE conversion is involved.

I can't test it right now (no access to hardware during daytime for this
week), but if I recall correctly my colleague Stefan Klug bisected the
issue to

commit dda1bc1d8ad13672c2728eedee0dd02d27a5314a
Author: Catalin Popescu <catalin.popescu@leica-geosystems.com>
Date:   Mon Oct 7 15:44:24 2024 +0200

    arm64: dts: imx8mp: add cpuidle state "cpu-pd-wait"

    So far, only WFI is supported on i.MX8mp platform. Add support for
    deeper cpuidle state "cpu-pd-wait" that would allow for better power
    usage during runtime. This is a port from NXP downstream kernel.

    Signed-off-by: Catalin Popescu <catalin.popescu@leica-geosystems.com>
    Signed-off-by: Shawn Guo <shawnguo@kernel.org>

I didn't notice it at the time because my board was connected to a
switch that didn't support EEE.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-28  7:18       ` Laurent Pinchart
@ 2025-11-11 23:54         ` Laurent Pinchart
  2025-11-12 12:03           ` Russell King (Oracle)
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-11 23:54 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Catalin Popescu

On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> (CC'ing Catalin)
> 
> On Tue, Oct 28, 2025 at 12:57:55AM +0000, Russell King (Oracle) wrote:
> > On Tue, Oct 28, 2025 at 01:46:48AM +0200, Laurent Pinchart wrote:
> > > On Mon, Oct 27, 2025 at 03:13:51PM +0000, Russell King (Oracle) wrote:
> > > > On Sun, Oct 26, 2025 at 02:29:04PM +0200, Laurent Pinchart wrote:
> > > > > Energy Efficient Ethernet (EEE) is broken at least for 1000T on the EQOS
> > > > > (DWMAC) interface. When connected to an EEE-enabled peer, the ethernet
> > > > > devices produces an interrupts storm. Disable EEE support to fix it.
> > > > 
> > > > We've finally got to the bottom of what's going on here. Please try
> > > > this patch (it's building locally, but will take some time because
> > > > I'd wound the tree back to 6.13 and 6.14, so it's going to be a full
> > > > rebuild.) Thus, there may be compile bugs remaining.
> > > 
> > > I've applied it on top of 
> > > 
> > > I've started with a branch based on v6.18-rc3 plus "[PATCH net-next 0/5]
> > > net: stmmac: more cleanups" ([1]) and "[PATCH net-next v2 0/6] net: add
> > > phylink managed WoL and convert stmmac" ([2]) to make the patch apply
> > > cleanly.
> > > 
> > > [1] https://lore.kernel.org/all/aO_HIwT_YvxkDS8D@shell.armlinux.org.uk/
> > > [2] https://lore.kernel.org/all/aPnyW54J80h9DmhB@shell.armlinux.org.uk/
> > > 
> > > The base branch exhibits the interrupt storm issue. The patch
> > > unfortunately doesn't fix it.
> > 
> > So it's highly unlikely that your problem is the same as Emanuele's.
> > 
> > Do you know when the interrupt storm behaviour started? If not, I'd
> > suggest testing 6.13 and 6.14 as a starting point to see whether
> > the phylink-managed EEE conversion is involved.
> 
> I can't test it right now (no access to hardware during daytime for this
> week), but if I recall correctly my colleague Stefan Klug bisected the
> issue to
> 
> commit dda1bc1d8ad13672c2728eedee0dd02d27a5314a
> Author: Catalin Popescu <catalin.popescu@leica-geosystems.com>
> Date:   Mon Oct 7 15:44:24 2024 +0200
> 
>     arm64: dts: imx8mp: add cpuidle state "cpu-pd-wait"
> 
>     So far, only WFI is supported on i.MX8mp platform. Add support for
>     deeper cpuidle state "cpu-pd-wait" that would allow for better power
>     usage during runtime. This is a port from NXP downstream kernel.
> 
>     Signed-off-by: Catalin Popescu <catalin.popescu@leica-geosystems.com>
>     Signed-off-by: Shawn Guo <shawnguo@kernel.org>
> 
> I didn't notice it at the time because my board was connected to a
> switch that didn't support EEE.

I can confirm that reverting that commit makes the issue disappear. So
we're dealing with an interrupt storm that occurs when all three of the
following conditions are true:

- cpu-pd-wait is enabled
- EEE is enabled
- the peer also supports EEE

Furthermore, I tried counting bits from all the interrupt status
registers I could find. The count of MTL_INTERRUPT_STATUS Q0IS to Q4IS
bits is very high, and so are the DMA_CH0_STATUS TBU and ETI bits.

The debix board's DT doesn't specify a multi-queue setup, so only
channel 0 gets processed in stmmac_dma_interrupt(). I thought that could
explain why Q1IS to Q4IS stay set (but not why Q0IS also has a high
count, or why Q1IS to Q4IS are set in the first place), and enabled
multi-queue support in DT by copying the imx8mp-evk configuration. I
then see lots of non-zero DMA_CH1_STATUS, DMA_CH2_STATUS and
DMA_CH4_STATUS values (but DMA_CH3_STATUS stays 0 all the time), but
sadly this doesn't fix the interrupt storm.

I don't know how much sense all this makes, and I'm sorry if the above
information is unclear, incomplete or completely wrong, my experience
with the DWMAC is very limited.

I don't think I can debug this further and figure out the root cause
unassisted in a reasonable amount of time, so I'd like to merge
disabling EEE as a workaround for the time being, unless someone has any
idea of what I could test next. I'll submit a v2 of this patch with an
updated commit message.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-11 23:54         ` Laurent Pinchart
@ 2025-11-12 12:03           ` Russell King (Oracle)
  2025-11-12 22:25             ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-12 12:03 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Catalin Popescu

On Wed, Nov 12, 2025 at 01:54:34AM +0200, Laurent Pinchart wrote:
> On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> > I didn't notice it at the time because my board was connected to a
> > switch that didn't support EEE.
> 
> I can confirm that reverting that commit makes the issue disappear. So
> we're dealing with an interrupt storm that occurs when all three of the
> following conditions are true:
> 
> - cpu-pd-wait is enabled
> - EEE is enabled
> - the peer also supports EEE

Thanks - overall, please take the statistics and interrupt status bits
with a pinch of salt - I suspect there are cases where the interrupt
is not actually enabled, and the code doesn't take action to clear
down a set status bit, but _does_ count it - so every interrupt that
happens increments the counter.

> 
> Furthermore, I tried counting bits from all the interrupt status
> registers I could find. The count of MTL_INTERRUPT_STATUS Q0IS to Q4IS
> bits is very high, and so are the DMA_CH0_STATUS TBU and ETI bits.

TBU means that the transmitter found that the next buffer was owned by
the "application" rather than the hardware, which would be normal after
getting to the end of the queued packets.

ETI means that a packet has been transferred into MTL memory, and thus
would occur for every transmitted packet.

Having dug into the imx8m documentation and the driver this morning,
I don't think TBU and ETI are the source of the interrupt storm. Their
corresponding interrupt enable bits are DMA_CHAN_INTR_ENA_TBUE and
DMA_CHAN_INTR_ENA_ETE (driver names). Both of these only appear in a
header file - the code never enables these interrupts. So, TBU and ETI
should not be causing an interrupt storm.

As for QxIS, stmmac_common_interrupt() will iterate over the queues
in use, calling stmmac_host_mtl_irq_status() aka dwmac4_irq_mtl_status()
for each. Only if this happens will MTL_CHAN_INT_CTRL() be read which
clears the status bit. In other words, if e.g. Q1IS is set, but only
one queue is being used. dwmac4_irq_mtl_status() won't be called for
queue 1, and thus MTL_CHAN_INT_CTRL() won't be read to clear Q1IS.

> The debix board's DT doesn't specify a multi-queue setup, so only
> channel 0 gets processed in stmmac_dma_interrupt(). I thought that could
> explain why Q1IS to Q4IS stay set (but not why Q0IS also has a high
> count, or why Q1IS to Q4IS are set in the first place), and enabled
> multi-queue support in DT by copying the imx8mp-evk configuration. I
> then see lots of non-zero DMA_CH1_STATUS, DMA_CH2_STATUS and
> DMA_CH4_STATUS values (but DMA_CH3_STATUS stays 0 all the time), but
> sadly this doesn't fix the interrupt storm.

Now, a queue will only be enabled if stmmac_dma_rx_mode() /
stmmac_dma_tx_mode() is called, which only happens for queues that are
going ot be used. So, I think QxIS where x >= 1 is set is a red
herring.

Given that the driver does a software reset which clears out all the
registers, any stale configuration for queues e.g. from a boot loader
can't be preserved.

> I don't think I can debug this further and figure out the root cause
> unassisted in a reasonable amount of time, so I'd like to merge
> disabling EEE as a workaround for the time being, unless someone has any
> idea of what I could test next. I'll submit a v2 of this patch with an
> updated commit message.

I'm also not fully conversant with dwmac hardware, especially not the
v5.10 hardware that is in imx8m. All the above is stuff I've pieced
together this morning from reading the driver code and the imx8m
manual. I'm putting in effort here to try and get to the bottom of
your problem without hardware... it would be helpful if others could
do the same rather than throwing their hands up.

The driver is really crappy, and part of the reason its crappy is
because of this kind of "patch in a workaround because we can't be
bothered to do the research and fix problems properly" attitude.

I'm saying enough is enough. I'm saying no, not going to merge a
workaround for this problem. I want to see stmmac improve. I've
put in considerable effort over the last year or so sorting out
fundamental issues that others just can't be bothered to solve
properly (like the DMA reset failures on resume that has plagued
this driver which no one seems _capable_ of fixing, yet I, with no
experience of stmmac, was able to analyse the issue, read the
availble documentation, and fix the problem properly once and for
all.) Either I'm bloody good at what I do and everyone else is
useless, or it's laziness by others. It pisses me off that I seem
to be one of the few who is willing to put the effort in to stuff
in the kernel to see _improvement_. I don't _have_ to work on stmmac,
but me working on stmmac benefits a lot of people.

What I'm saying is, we need more people willing to put effort in
and less bodging.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-10-27  9:12   ` Oleksij Rempel
  2025-10-27 10:02     ` Laurent Pinchart
@ 2025-11-12 12:34     ` Russell King (Oracle)
  2025-11-12 12:41       ` Kieran Bingham
  2025-11-12 21:32       ` Laurent Pinchart
  1 sibling, 2 replies; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-12 12:34 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Laurent Pinchart, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

On Mon, Oct 27, 2025 at 10:12:12AM +0100, Oleksij Rempel wrote:
> Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.

Same as RTL8211F I believe (as used on the Jetson Xavier NX platform I
have.) I submitted commit bfc17c165835 ("net: phy: realtek: disable
PHY-mode EEE") to get EEE working on this platform.

> It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> monitoring RGMII TX and MDI lines with oscilloscope and changing
> tx-timer configurations. I also confirmed this information from other
> source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> is needed.

What I saw there was similar to what you describe (although I have no
way to monitor these signals.) No interrupt storms, but while the
stmmac TX path would enter LPI mode (whether that provoked anything
in the PHY, I do not know), the RX path never entered LPI mode because
the PHY never forwarded that status.

So, I don't think having SmartEEE enabled on the RTL8211E would cause
this interrupt storm that Laurent is reporting.

In Emanuele's case, things are different. The TI PHY reports that EEE
is supported, implements the autoneg registers for EEE, but *doesn't*
implement the necessary hardware for detecting/entering/exiting LPI
mode. So, if EEE is negotiated, the remote end thinks it can enter
LPI mode... which likely causes the link to drop as the TI PHY can't
cope with that, and I suspect that's the cause of Emanuele's problem.

I'm wondering why "arm64: dts: imx8mp: add cpuidle state "cpu-pd-wait""
impacts this - could it be that entering the idle state does more than
just affecting the CPU domain, but interferes with the EQOS domain in
some way. Given that the entry/exit to this state is all buried in
PSCI stuff, without digging through the ATF implementation for this
platform and then cross-referencing the iMX8M documentation, I don't
know what effect this has on the system. Is it possible that PSCI is
messing with the EQOS?

What about the clock tree? Is it possible that the stmmac and/or RGMII
clocks could be lost when cpu-pd-wait state is entered on all CPUs?

Has anyone checked whether there's anything in the errata
documentation?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-12 12:34     ` Russell King (Oracle)
@ 2025-11-12 12:41       ` Kieran Bingham
  2025-11-12 12:56         ` Russell King (Oracle)
  2025-11-12 21:32       ` Laurent Pinchart
  1 sibling, 1 reply; 51+ messages in thread
From: Kieran Bingham @ 2025-11-12 12:41 UTC (permalink / raw)
  To: Oleksij Rempel, Russell King
  Cc: Andrew Lunn, Laurent Pinchart, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

Quoting Russell King (Oracle) (2025-11-12 12:34:48)
> On Mon, Oct 27, 2025 at 10:12:12AM +0100, Oleksij Rempel wrote:
> > Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> 
> Same as RTL8211F I believe (as used on the Jetson Xavier NX platform I
> have.) I submitted commit bfc17c165835 ("net: phy: realtek: disable
> PHY-mode EEE") to get EEE working on this platform.
> 
> > It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> > monitoring RGMII TX and MDI lines with oscilloscope and changing
> > tx-timer configurations. I also confirmed this information from other
> > source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> > is needed.
> 
> What I saw there was similar to what you describe (although I have no
> way to monitor these signals.) No interrupt storms, but while the
> stmmac TX path would enter LPI mode (whether that provoked anything
> in the PHY, I do not know), the RX path never entered LPI mode because
> the PHY never forwarded that status.
> 
> So, I don't think having SmartEEE enabled on the RTL8211E would cause
> this interrupt storm that Laurent is reporting.

Perhaps further complicating matters.

I have a Debix Model A as well ... I'm in a different office to Laurent
- and I believe EEE is enabled on my board/network switch.

I do not get an interrupt storm.

I'm not sure how this helps yet, - I don't know what to debug as I can't
reproduce the issue!

I can provide remote access to the board with ssh if that helps anyone
who wants to look at something specific about my setup or run anything
if anyone has ideas of what to check my side.

Perhaps we can find some subtle difference between a working case and a
non-working case...

--
Kieran




> In Emanuele's case, things are different. The TI PHY reports that EEE
> is supported, implements the autoneg registers for EEE, but *doesn't*
> implement the necessary hardware for detecting/entering/exiting LPI
> mode. So, if EEE is negotiated, the remote end thinks it can enter
> LPI mode... which likely causes the link to drop as the TI PHY can't
> cope with that, and I suspect that's the cause of Emanuele's problem.
> 
> I'm wondering why "arm64: dts: imx8mp: add cpuidle state "cpu-pd-wait""
> impacts this - could it be that entering the idle state does more than
> just affecting the CPU domain, but interferes with the EQOS domain in
> some way. Given that the entry/exit to this state is all buried in
> PSCI stuff, without digging through the ATF implementation for this
> platform and then cross-referencing the iMX8M documentation, I don't
> know what effect this has on the system. Is it possible that PSCI is
> messing with the EQOS?
> 
> What about the clock tree? Is it possible that the stmmac and/or RGMII
> clocks could be lost when cpu-pd-wait state is entered on all CPUs?
> 
> Has anyone checked whether there's anything in the errata
> documentation?
> 
> -- 
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-12 12:41       ` Kieran Bingham
@ 2025-11-12 12:56         ` Russell King (Oracle)
  2025-11-13  1:17           ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-12 12:56 UTC (permalink / raw)
  To: Kieran Bingham
  Cc: Oleksij Rempel, Andrew Lunn, Laurent Pinchart, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

On Wed, Nov 12, 2025 at 12:41:50PM +0000, Kieran Bingham wrote:
> Perhaps further complicating matters.
> 
> I have a Debix Model A as well ... I'm in a different office to Laurent
> - and I believe EEE is enabled on my board/network switch.
> 
> I do not get an interrupt storm.
> 
> I'm not sure how this helps yet, - I don't know what to debug as I can't
> reproduce the issue!
> 
> I can provide remote access to the board with ssh if that helps anyone
> who wants to look at something specific about my setup or run anything
> if anyone has ideas of what to check my side.
> 
> Perhaps we can find some subtle difference between a working case and a
> non-working case...

Thanks, that's interesting. I guess the next steps would be to try and
work out what's different between your two setups.

- same board revision?
- same firmware/ATF?
- same kernel/modules?
- same type of link partner? (I suspect, given the cpu-pd-wait
   interaction, isn't the problem.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-12 12:34     ` Russell King (Oracle)
  2025-11-12 12:41       ` Kieran Bingham
@ 2025-11-12 21:32       ` Laurent Pinchart
  1 sibling, 0 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-12 21:32 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Andrew Lunn, devicetree, imx, linux-arm-kernel,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

On Wed, Nov 12, 2025 at 12:34:48PM +0000, Russell King (Oracle) wrote:
> On Mon, Oct 27, 2025 at 10:12:12AM +0100, Oleksij Rempel wrote:
> > Please note, RTL8211E PHY do use undocumented SmartEEE mode by default.
> 
> Same as RTL8211F I believe (as used on the Jetson Xavier NX platform I
> have.) I submitted commit bfc17c165835 ("net: phy: realtek: disable
> PHY-mode EEE") to get EEE working on this platform.
> 
> > It ignores RGMII LPI opcodes and doing own thing. It can be confirmed by
> > monitoring RGMII TX and MDI lines with oscilloscope and changing
> > tx-timer configurations. I also confirmed this information from other
> > source. To disable SmartEEE and use plain MAC based mode, NDA documentation
> > is needed.
> 
> What I saw there was similar to what you describe (although I have no
> way to monitor these signals.) No interrupt storms, but while the
> stmmac TX path would enter LPI mode (whether that provoked anything
> in the PHY, I do not know), the RX path never entered LPI mode because
> the PHY never forwarded that status.
> 
> So, I don't think having SmartEEE enabled on the RTL8211E would cause
> this interrupt storm that Laurent is reporting.
> 
> In Emanuele's case, things are different. The TI PHY reports that EEE
> is supported, implements the autoneg registers for EEE, but *doesn't*
> implement the necessary hardware for detecting/entering/exiting LPI
> mode. So, if EEE is negotiated, the remote end thinks it can enter
> LPI mode... which likely causes the link to drop as the TI PHY can't
> cope with that, and I suspect that's the cause of Emanuele's problem.
> 
> I'm wondering why "arm64: dts: imx8mp: add cpuidle state "cpu-pd-wait""
> impacts this - could it be that entering the idle state does more than
> just affecting the CPU domain, but interferes with the EQOS domain in
> some way. Given that the entry/exit to this state is all buried in
> PSCI stuff, without digging through the ATF implementation for this
> platform and then cross-referencing the iMX8M documentation, I don't
> know what effect this has on the system. Is it possible that PSCI is
> messing with the EQOS?

I'm running the mainline Trusted Firmware-A v2.13. I'm not familiar with
the code base, but tracing the cpu_standby operation, I haven't seen any
code interacting directly with the EQOS.

> What about the clock tree? Is it possible that the stmmac and/or RGMII
> clocks could be lost when cpu-pd-wait state is entered on all CPUs?

That's something I am suspecting too, but reading the code I don't see
where it would occur. I've also tried to see if we could be missing
power domain handling for the EQOS, but I don't see a mention of a
related power domain in the reference manual or the BSP kernel.

Interestingly, running `stress -c 5` helps, so the issue seems related
to CPUs getting suspended. However, I appear to have previously spoken
too fast. While reverting the cpuidle state commit helps with the
interrupt storm, it doesn't fully get rid of it. I still get several
hundreds of thousands of EQOS interrupts during boot. The situation then
appears to calm down after boot completes. Adding the `eee-broken-1000t`
property, on the other hand, gets rid of the problem completely and
interrupt counts return back to normal. It may therefore be that the
problem was present before cpuidle states were introduced, but with a
low-enough impact at runtime that they went unnoticed.

> Has anyone checked whether there's anything in the errata
> documentation?

Yes I have. The document is available at
https://www.nxp.com/webapp/Download?colCode=IMX8MP_1P33A (it annoyingly
requires an NXP account, but is otherwise publicly accessible). There
are three items related to the EQOS:

- ENET_QOS: Failure to generate Fatal Bus Error interrupt when
  descriptor posted write is enabled

- ENET_QOS: MAC incorrectly discards the received packets when Preamble
  Byte does not precede SFD or SMD

- ENET_QOS: Scheduled transmit packet not sent in the allotted slot or
  the remaining fragment of a Preempted Packet incorrectly dropped due
  to scheduling timeout in the EST GCL

Those do not seem related. I haven't seen any other errata entries that
seem related.

Here's a lockup report I've received from the kernel while testing:

[  156.563792] CPU#0 Utilization every 4000ms during lockup:
[  156.563799]  #1:   0% system,         26% softirq,    75% hardirq,     0% idle
[  156.563808]  #2:   0% system,         26% softirq,    75% hardirq,     0% idle
[  156.563818]  #3:   0% system,         26% softirq,    75% hardirq,     0% idle
[  156.563827]  #4:   0% system,         26% softirq,    75% hardirq,     0% idle
[  156.563836]  #5:   0% system,         25% softirq,    76% hardirq,     0% idle
[  156.566161] CPU#0 Detect HardIRQ Time exceeds 50%. Most frequent HardIRQs:
[  156.566167]  #1: 2030282     irq#200
[  156.566173]  #2: 5           irq#11
[  156.566181] Modules linked in: gpio_adp5585 pwm_adp5585 hantro_vpu v4l2_vp9 rockchip_isp1 v4l2_jpeg dw100 v4l2_h264 v4l2_mem2mem videobuf2_vmalloc videobuf2_dma_contig videobuf2_memopsr
[  156.566335] irq event stamp: 6180519
[  156.659469] hardirqs last  enabled at (6180518): [<ffff800081285d88>] exit_to_kernel_mode+0x10/0x20
[  156.668532] hardirqs last disabled at (6180519): [<ffff800081285e68>] enter_from_kernel_mode+0x10/0x40
[  156.677848] softirqs last  enabled at (633238): [<ffff8000800cafe4>] handle_softirqs+0x4ac/0x4d0
[  156.686644] softirqs last disabled at (633245): [<ffff800080010394>] __do_softirq+0x1c/0x28
[  156.695008] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.18.0-rc3-dirty #915 PREEMPT
[  156.695020] Hardware name: Polyhex Debix Model A i.MX8MPlus board (DT)
[  156.695028] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  156.695039] pc : handle_softirqs+0xfc/0x4d0
[  156.695050] lr : handle_softirqs+0xf8/0x4d0
[  156.695061] sp : ffff800082aebf30
[  156.695066] x29: ffff800082aebf30 x28: ffff800081bea000 x27: ffff800081cd60c0
[  156.695090] x26: ffff800081ce0e00 x25: ffff800081ceaa80 x24: 0000000000000000
[  156.695110] x23: 0000000060000005 x22: 0000000000000008 x21: ffff8000800101a8
[  156.695128] x20: ffff800082aebf30 x19: 0000000000000000 x18: 0000000000000000
[  156.695148] x17: ffff7ffffded5000 x16: ffff800082ae8000 x15: 0000000000000000
[  156.695168] x14: 0000000000000000 x13: 0000000000000000 x12: ffff0000019714f8
[  156.695187] x11: 0000000000000039 x10: 0000000000000039 x9 : ffff80008128a08c
[  156.695205] x8 : ffff800082aebe58 x7 : 0000000000000000 x6 : ffff800082aebf00
[  156.695224] x5 : ffff800082aebe88 x4 : 0000000000000000 x3 : 0000000000000001
[  156.695245] x2 : ffff7ffffded5000 x1 : 00000000000c48c4 x0 : ffff800081bea510
[  156.695267] Call trace:
[  156.695274]  handle_softirqs+0xfc/0x4d0 (P)
[  156.695289]  __do_softirq+0x1c/0x28
[  156.695301]  ____do_softirq+0x18/0x30
[  156.695315]  call_on_irq_stack+0x30/0x70
[  156.695330]  do_softirq_own_stack+0x24/0x38
[  156.695344]  __irq_exit_rcu+0x174/0x1c0
[  156.695356]  irq_exit_rcu+0x18/0x48
[  156.695370]  el1_interrupt+0x40/0x60
[  156.695385]  el1h_64_irq_handler+0x18/0x28
[  156.695405]  el1h_64_irq+0x6c/0x70
[  156.695418]  default_idle_call+0xbc/0x298 (P)
[  156.695430]  do_idle+0x21c/0x288
[  156.695445]  cpu_startup_entry+0x40/0x50
[  156.695459]  rest_init+0x100/0x190
[  156.695472]  start_kernel+0x7e0/0x938
[  156.695483]  __primary_switched+0x88/0x98

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-12 12:03           ` Russell King (Oracle)
@ 2025-11-12 22:25             ` Laurent Pinchart
  2025-11-13  1:06               ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-12 22:25 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Catalin Popescu

On Wed, Nov 12, 2025 at 12:03:13PM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 12, 2025 at 01:54:34AM +0200, Laurent Pinchart wrote:
> > On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> > > I didn't notice it at the time because my board was connected to a
> > > switch that didn't support EEE.
> > 
> > I can confirm that reverting that commit makes the issue disappear. So
> > we're dealing with an interrupt storm that occurs when all three of the
> > following conditions are true:
> > 
> > - cpu-pd-wait is enabled
> > - EEE is enabled
> > - the peer also supports EEE
> 
> Thanks - overall, please take the statistics and interrupt status bits
> with a pinch of salt - I suspect there are cases where the interrupt
> is not actually enabled, and the code doesn't take action to clear
> down a set status bit, but _does_ count it - so every interrupt that
> happens increments the counter.

True. To (partly) avoid that, I've dropped the line that discards
disabled bits in dwmac4_irq_status():

 	/* Discard disabled bits */
-	intr_status &= intr_enable;

to ensure that all bits are processed and cleared. I then didn't see any
high count of any of the GMAC_INT_STATUS interrupts. For
MTL_INTERRUPT_STATUS it's a bit different, as by default only one queue
is processed.

> > Furthermore, I tried counting bits from all the interrupt status
> > registers I could find. The count of MTL_INTERRUPT_STATUS Q0IS to Q4IS
> > bits is very high, and so are the DMA_CH0_STATUS TBU and ETI bits.
> 
> TBU means that the transmitter found that the next buffer was owned by
> the "application" rather than the hardware, which would be normal after
> getting to the end of the queued packets.
> 
> ETI means that a packet has been transferred into MTL memory, and thus
> would occur for every transmitted packet.
> 
> Having dug into the imx8m documentation and the driver this morning,
> I don't think TBU and ETI are the source of the interrupt storm. Their
> corresponding interrupt enable bits are DMA_CHAN_INTR_ENA_TBUE and
> DMA_CHAN_INTR_ENA_ETE (driver names). Both of these only appear in a
> header file - the code never enables these interrupts. So, TBU and ETI
> should not be causing an interrupt storm.
> 
> As for QxIS, stmmac_common_interrupt() will iterate over the queues
> in use, calling stmmac_host_mtl_irq_status() aka dwmac4_irq_mtl_status()
> for each. Only if this happens will MTL_CHAN_INT_CTRL() be read which
> clears the status bit. In other words, if e.g. Q1IS is set, but only
> one queue is being used. dwmac4_irq_mtl_status() won't be called for
> queue 1, and thus MTL_CHAN_INT_CTRL() won't be read to clear Q1IS.

That's why I tried to enable all 5 queues in DT, but alas, it didn't
help. I'll try again and count all possible interrupts.

> > The debix board's DT doesn't specify a multi-queue setup, so only
> > channel 0 gets processed in stmmac_dma_interrupt(). I thought that could
> > explain why Q1IS to Q4IS stay set (but not why Q0IS also has a high
> > count, or why Q1IS to Q4IS are set in the first place), and enabled
> > multi-queue support in DT by copying the imx8mp-evk configuration. I
> > then see lots of non-zero DMA_CH1_STATUS, DMA_CH2_STATUS and
> > DMA_CH4_STATUS values (but DMA_CH3_STATUS stays 0 all the time), but
> > sadly this doesn't fix the interrupt storm.
> 
> Now, a queue will only be enabled if stmmac_dma_rx_mode() /
> stmmac_dma_tx_mode() is called, which only happens for queues that are
> going ot be used. So, I think QxIS where x >= 1 is set is a red
> herring.
> 
> Given that the driver does a software reset which clears out all the
> registers, any stale configuration for queues e.g. from a boot loader
> can't be preserved.

I agree that it seems really weird. And why this would be related to
cpuidle and EEE is also a mystery.

> > I don't think I can debug this further and figure out the root cause
> > unassisted in a reasonable amount of time, so I'd like to merge
> > disabling EEE as a workaround for the time being, unless someone has any
> > idea of what I could test next. I'll submit a v2 of this patch with an
> > updated commit message.
> 
> I'm also not fully conversant with dwmac hardware, especially not the
> v5.10 hardware that is in imx8m. All the above is stuff I've pieced
> together this morning from reading the driver code and the imx8m
> manual. I'm putting in effort here to try and get to the bottom of
> your problem without hardware... it would be helpful if others could
> do the same rather than throwing their hands up.

More help would certainly be welcome. And I really appreciate your
support Russell.

> The driver is really crappy, and part of the reason its crappy is
> because of this kind of "patch in a workaround because we can't be
> bothered to do the research and fix problems properly" attitude.
> 
> I'm saying enough is enough. I'm saying no, not going to merge a
> workaround for this problem. I want to see stmmac improve. I've
> put in considerable effort over the last year or so sorting out
> fundamental issues that others just can't be bothered to solve
> properly (like the DMA reset failures on resume that has plagued
> this driver which no one seems _capable_ of fixing, yet I, with no
> experience of stmmac, was able to analyse the issue, read the
> availble documentation, and fix the problem properly once and for
> all.) Either I'm bloody good at what I do and everyone else is
> useless, or it's laziness by others. It pisses me off that I seem
> to be one of the few who is willing to put the effort in to stuff
> in the kernel to see _improvement_. I don't _have_ to work on stmmac,
> but me working on stmmac benefits a lot of people.
> 
> What I'm saying is, we need more people willing to put effort in
> and less bodging.

While I would like to merge a workaround and move on, I also understand
your position, having had the exact same stance in other kernel areas
and pushing for problems to be fixed correctly instead of worked around.
The only argument I have to defend the workaround approach is that I'm
putting a lot of hours trying to do the right things in other
subsystems, and I can hardly scale that to networking. It's not a great
argument though.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-12 22:25             ` Laurent Pinchart
@ 2025-11-13  1:06               ` Laurent Pinchart
  2025-11-13 10:59                 ` Russell King (Oracle)
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-13  1:06 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Catalin Popescu

On Thu, Nov 13, 2025 at 12:25:52AM +0200, Laurent Pinchart wrote:
> On Wed, Nov 12, 2025 at 12:03:13PM +0000, Russell King (Oracle) wrote:
> > On Wed, Nov 12, 2025 at 01:54:34AM +0200, Laurent Pinchart wrote:
> > > On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> > > > I didn't notice it at the time because my board was connected to a
> > > > switch that didn't support EEE.
> > > 
> > > I can confirm that reverting that commit makes the issue disappear. So
> > > we're dealing with an interrupt storm that occurs when all three of the
> > > following conditions are true:
> > > 
> > > - cpu-pd-wait is enabled
> > > - EEE is enabled
> > > - the peer also supports EEE
> > 
> > Thanks - overall, please take the statistics and interrupt status bits
> > with a pinch of salt - I suspect there are cases where the interrupt
> > is not actually enabled, and the code doesn't take action to clear
> > down a set status bit, but _does_ count it - so every interrupt that
> > happens increments the counter.
> 
> True. To (partly) avoid that, I've dropped the line that discards
> disabled bits in dwmac4_irq_status():
> 
>  	/* Discard disabled bits */
> -	intr_status &= intr_enable;
> 
> to ensure that all bits are processed and cleared. I then didn't see any
> high count of any of the GMAC_INT_STATUS interrupts. For
> MTL_INTERRUPT_STATUS it's a bit different, as by default only one queue
> is processed.
> 
> > > Furthermore, I tried counting bits from all the interrupt status
> > > registers I could find. The count of MTL_INTERRUPT_STATUS Q0IS to Q4IS
> > > bits is very high, and so are the DMA_CH0_STATUS TBU and ETI bits.
> > 
> > TBU means that the transmitter found that the next buffer was owned by
> > the "application" rather than the hardware, which would be normal after
> > getting to the end of the queued packets.
> > 
> > ETI means that a packet has been transferred into MTL memory, and thus
> > would occur for every transmitted packet.
> > 
> > Having dug into the imx8m documentation and the driver this morning,
> > I don't think TBU and ETI are the source of the interrupt storm. Their
> > corresponding interrupt enable bits are DMA_CHAN_INTR_ENA_TBUE and
> > DMA_CHAN_INTR_ENA_ETE (driver names). Both of these only appear in a
> > header file - the code never enables these interrupts. So, TBU and ETI
> > should not be causing an interrupt storm.
> > 
> > As for QxIS, stmmac_common_interrupt() will iterate over the queues
> > in use, calling stmmac_host_mtl_irq_status() aka dwmac4_irq_mtl_status()
> > for each. Only if this happens will MTL_CHAN_INT_CTRL() be read which
> > clears the status bit. In other words, if e.g. Q1IS is set, but only
> > one queue is being used. dwmac4_irq_mtl_status() won't be called for
> > queue 1, and thus MTL_CHAN_INT_CTRL() won't be read to clear Q1IS.
> 
> That's why I tried to enable all 5 queues in DT, but alas, it didn't
> help. I'll try again and count all possible interrupts.

Here's my debug patch (not very pretty, sorry about that):

diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 8f34c9ad457f..52810c45c635 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -139,6 +139,253 @@ struct stmmac_extra_stats {
 	unsigned long rx_vlan;
 	unsigned long rx_split_hdr_pkt_n;
 	/* Tx/Rx IRQ error info */
+	unsigned long irq_down_n;
+	unsigned long irq_fpe_n;
+	unsigned long irq_sfty_n;
+
+	unsigned long irq_gmac_0_n;
+	unsigned long irq_gmac_1_n;
+	unsigned long irq_gmac_2_n;
+	unsigned long irq_gmac_3_n;
+	unsigned long irq_gmac_4_n;
+	unsigned long irq_gmac_5_n;
+	unsigned long irq_gmac_6_n;
+	unsigned long irq_gmac_7_n;
+	unsigned long irq_gmac_8_n;
+	unsigned long irq_gmac_9_n;
+	unsigned long irq_gmac_10_n;
+	unsigned long irq_gmac_11_n;
+	unsigned long irq_gmac_12_n;
+	unsigned long irq_gmac_13_n;
+	unsigned long irq_gmac_14_n;
+	unsigned long irq_gmac_15_n;
+	unsigned long irq_gmac_16_n;
+	unsigned long irq_gmac_17_n;
+	unsigned long irq_gmac_18_n;
+	unsigned long irq_gmac_19_n;
+	unsigned long irq_gmac_20_n;
+	unsigned long irq_gmac_21_n;
+	unsigned long irq_gmac_22_n;
+	unsigned long irq_gmac_23_n;
+	unsigned long irq_gmac_24_n;
+	unsigned long irq_gmac_25_n;
+	unsigned long irq_gmac_26_n;
+	unsigned long irq_gmac_27_n;
+	unsigned long irq_gmac_28_n;
+	unsigned long irq_gmac_29_n;
+	unsigned long irq_gmac_30_n;
+	unsigned long irq_gmac_31_n;
+
+	unsigned long irq_mtl0_n;
+	unsigned long irq_mtl1_n;
+	unsigned long irq_mtl2_n;
+	unsigned long irq_mtl3_n;
+	unsigned long irq_mtl4_n;
+
+	unsigned long irq_mtl_0_n;
+	unsigned long irq_mtl_1_n;
+	unsigned long irq_mtl_2_n;
+	unsigned long irq_mtl_3_n;
+	unsigned long irq_mtl_4_n;
+	unsigned long irq_mtl_5_n;
+	unsigned long irq_mtl_6_n;
+	unsigned long irq_mtl_7_n;
+	unsigned long irq_mtl_8_n;
+	unsigned long irq_mtl_9_n;
+	unsigned long irq_mtl_10_n;
+	unsigned long irq_mtl_11_n;
+	unsigned long irq_mtl_12_n;
+	unsigned long irq_mtl_13_n;
+	unsigned long irq_mtl_14_n;
+	unsigned long irq_mtl_15_n;
+	unsigned long irq_mtl_16_n;
+	unsigned long irq_mtl_17_n;
+	unsigned long irq_mtl_18_n;
+	unsigned long irq_mtl_19_n;
+	unsigned long irq_mtl_20_n;
+	unsigned long irq_mtl_21_n;
+	unsigned long irq_mtl_22_n;
+	unsigned long irq_mtl_23_n;
+	unsigned long irq_mtl_24_n;
+	unsigned long irq_mtl_25_n;
+	unsigned long irq_mtl_26_n;
+	unsigned long irq_mtl_27_n;
+	unsigned long irq_mtl_28_n;
+	unsigned long irq_mtl_29_n;
+	unsigned long irq_mtl_30_n;
+	unsigned long irq_mtl_31_n;
+
+	unsigned long irq_chan0_n;
+	unsigned long irq_chan1_n;
+	unsigned long irq_chan2_n;
+	unsigned long irq_chan3_n;
+	unsigned long irq_chan4_n;
+
+	unsigned long irq_chan0_0_n;
+	unsigned long irq_chan0_1_n;
+	unsigned long irq_chan0_2_n;
+	unsigned long irq_chan0_3_n;
+	unsigned long irq_chan0_4_n;
+	unsigned long irq_chan0_5_n;
+	unsigned long irq_chan0_6_n;
+	unsigned long irq_chan0_7_n;
+	unsigned long irq_chan0_8_n;
+	unsigned long irq_chan0_9_n;
+	unsigned long irq_chan0_10_n;
+	unsigned long irq_chan0_11_n;
+	unsigned long irq_chan0_12_n;
+	unsigned long irq_chan0_13_n;
+	unsigned long irq_chan0_14_n;
+	unsigned long irq_chan0_15_n;
+	unsigned long irq_chan0_16_n;
+	unsigned long irq_chan0_17_n;
+	unsigned long irq_chan0_18_n;
+	unsigned long irq_chan0_19_n;
+	unsigned long irq_chan0_20_n;
+	unsigned long irq_chan0_21_n;
+	unsigned long irq_chan0_22_n;
+	unsigned long irq_chan0_23_n;
+	unsigned long irq_chan0_24_n;
+	unsigned long irq_chan0_25_n;
+	unsigned long irq_chan0_26_n;
+	unsigned long irq_chan0_27_n;
+	unsigned long irq_chan0_28_n;
+	unsigned long irq_chan0_29_n;
+	unsigned long irq_chan0_30_n;
+	unsigned long irq_chan0_31_n;
+
+	unsigned long irq_chan1_0_n;
+	unsigned long irq_chan1_1_n;
+	unsigned long irq_chan1_2_n;
+	unsigned long irq_chan1_3_n;
+	unsigned long irq_chan1_4_n;
+	unsigned long irq_chan1_5_n;
+	unsigned long irq_chan1_6_n;
+	unsigned long irq_chan1_7_n;
+	unsigned long irq_chan1_8_n;
+	unsigned long irq_chan1_9_n;
+	unsigned long irq_chan1_10_n;
+	unsigned long irq_chan1_11_n;
+	unsigned long irq_chan1_12_n;
+	unsigned long irq_chan1_13_n;
+	unsigned long irq_chan1_14_n;
+	unsigned long irq_chan1_15_n;
+	unsigned long irq_chan1_16_n;
+	unsigned long irq_chan1_17_n;
+	unsigned long irq_chan1_18_n;
+	unsigned long irq_chan1_19_n;
+	unsigned long irq_chan1_20_n;
+	unsigned long irq_chan1_21_n;
+	unsigned long irq_chan1_22_n;
+	unsigned long irq_chan1_23_n;
+	unsigned long irq_chan1_24_n;
+	unsigned long irq_chan1_25_n;
+	unsigned long irq_chan1_26_n;
+	unsigned long irq_chan1_27_n;
+	unsigned long irq_chan1_28_n;
+	unsigned long irq_chan1_29_n;
+	unsigned long irq_chan1_30_n;
+	unsigned long irq_chan1_31_n;
+
+	unsigned long irq_chan2_0_n;
+	unsigned long irq_chan2_1_n;
+	unsigned long irq_chan2_2_n;
+	unsigned long irq_chan2_3_n;
+	unsigned long irq_chan2_4_n;
+	unsigned long irq_chan2_5_n;
+	unsigned long irq_chan2_6_n;
+	unsigned long irq_chan2_7_n;
+	unsigned long irq_chan2_8_n;
+	unsigned long irq_chan2_9_n;
+	unsigned long irq_chan2_10_n;
+	unsigned long irq_chan2_11_n;
+	unsigned long irq_chan2_12_n;
+	unsigned long irq_chan2_13_n;
+	unsigned long irq_chan2_14_n;
+	unsigned long irq_chan2_15_n;
+	unsigned long irq_chan2_16_n;
+	unsigned long irq_chan2_17_n;
+	unsigned long irq_chan2_18_n;
+	unsigned long irq_chan2_19_n;
+	unsigned long irq_chan2_20_n;
+	unsigned long irq_chan2_21_n;
+	unsigned long irq_chan2_22_n;
+	unsigned long irq_chan2_23_n;
+	unsigned long irq_chan2_24_n;
+	unsigned long irq_chan2_25_n;
+	unsigned long irq_chan2_26_n;
+	unsigned long irq_chan2_27_n;
+	unsigned long irq_chan2_28_n;
+	unsigned long irq_chan2_29_n;
+	unsigned long irq_chan2_30_n;
+	unsigned long irq_chan2_31_n;
+
+	unsigned long irq_chan3_0_n;
+	unsigned long irq_chan3_1_n;
+	unsigned long irq_chan3_2_n;
+	unsigned long irq_chan3_3_n;
+	unsigned long irq_chan3_4_n;
+	unsigned long irq_chan3_5_n;
+	unsigned long irq_chan3_6_n;
+	unsigned long irq_chan3_7_n;
+	unsigned long irq_chan3_8_n;
+	unsigned long irq_chan3_9_n;
+	unsigned long irq_chan3_10_n;
+	unsigned long irq_chan3_11_n;
+	unsigned long irq_chan3_12_n;
+	unsigned long irq_chan3_13_n;
+	unsigned long irq_chan3_14_n;
+	unsigned long irq_chan3_15_n;
+	unsigned long irq_chan3_16_n;
+	unsigned long irq_chan3_17_n;
+	unsigned long irq_chan3_18_n;
+	unsigned long irq_chan3_19_n;
+	unsigned long irq_chan3_20_n;
+	unsigned long irq_chan3_21_n;
+	unsigned long irq_chan3_22_n;
+	unsigned long irq_chan3_23_n;
+	unsigned long irq_chan3_24_n;
+	unsigned long irq_chan3_25_n;
+	unsigned long irq_chan3_26_n;
+	unsigned long irq_chan3_27_n;
+	unsigned long irq_chan3_28_n;
+	unsigned long irq_chan3_29_n;
+	unsigned long irq_chan3_30_n;
+	unsigned long irq_chan3_31_n;
+
+	unsigned long irq_chan4_0_n;
+	unsigned long irq_chan4_1_n;
+	unsigned long irq_chan4_2_n;
+	unsigned long irq_chan4_3_n;
+	unsigned long irq_chan4_4_n;
+	unsigned long irq_chan4_5_n;
+	unsigned long irq_chan4_6_n;
+	unsigned long irq_chan4_7_n;
+	unsigned long irq_chan4_8_n;
+	unsigned long irq_chan4_9_n;
+	unsigned long irq_chan4_10_n;
+	unsigned long irq_chan4_11_n;
+	unsigned long irq_chan4_12_n;
+	unsigned long irq_chan4_13_n;
+	unsigned long irq_chan4_14_n;
+	unsigned long irq_chan4_15_n;
+	unsigned long irq_chan4_16_n;
+	unsigned long irq_chan4_17_n;
+	unsigned long irq_chan4_18_n;
+	unsigned long irq_chan4_19_n;
+	unsigned long irq_chan4_20_n;
+	unsigned long irq_chan4_21_n;
+	unsigned long irq_chan4_22_n;
+	unsigned long irq_chan4_23_n;
+	unsigned long irq_chan4_24_n;
+	unsigned long irq_chan4_25_n;
+	unsigned long irq_chan4_26_n;
+	unsigned long irq_chan4_27_n;
+	unsigned long irq_chan4_28_n;
+	unsigned long irq_chan4_29_n;
+	unsigned long irq_chan4_30_n;
+	unsigned long irq_chan4_31_n;
+
 	unsigned long tx_undeflow_irq;
 	unsigned long tx_process_stopped_irq;
 	unsigned long tx_jabber_irq;
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
index d85bc0bb5c3c..b1a6416ce9e1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c
@@ -630,8 +630,16 @@ static int dwmac4_irq_mtl_status(struct stmmac_priv *priv,
 	u32 mtl_int_qx_status;
 	int ret = 0;
 
+	if (!WARN_ON(chan >= 5))
+		(&priv->xstats.irq_mtl0_n)[chan]++;
+
 	mtl_int_qx_status = readl(ioaddr + MTL_INT_STATUS);
 
+	for (unsigned int i = 0; i < 32; ++i) {
+		if (mtl_int_qx_status & BIT(i))
+			(&priv->xstats.irq_mtl_0_n)[i]++;
+	}
+
 	/* Check MTL Interrupt */
 	if (mtl_int_qx_status & MTL_INT_QX(chan)) {
 		/* read Queue x Interrupt status */
@@ -654,11 +662,16 @@ static int dwmac4_irq_status(struct mac_device_info *hw,
 {
 	void __iomem *ioaddr = hw->pcsr;
 	u32 intr_status = readl(ioaddr + GMAC_INT_STATUS);
-	u32 intr_enable = readl(ioaddr + GMAC_INT_EN);
+//	u32 intr_enable = readl(ioaddr + GMAC_INT_EN);
 	int ret = 0;
 
+	for (unsigned int i = 0; i < 32; ++i) {
+		if (intr_status & BIT(i))
+			(&x->irq_gmac_0_n)[i]++;
+	}
+
 	/* Discard disabled bits */
-	intr_status &= intr_enable;
+//	intr_status &= intr_enable;
 
 	/* Not used events (e.g. MMC interrupts) are not handled. */
 	if ((intr_status & mmc_tx_irq))
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
index 57c03d491774..106a59afc96c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_lib.c
@@ -179,6 +179,15 @@ int dwmac4_dma_interrupt(struct stmmac_priv *priv, void __iomem *ioaddr,
 	else if (dir == DMA_DIR_TX)
 		intr_status &= DMA_CHAN_STATUS_MSK_TX;
 
+	if (!WARN_ON(chan >= 5)) {
+		(&x->irq_chan0_n)[chan]++;
+
+		for (unsigned int i = 0; i < 32; ++i) {
+			if (intr_status & BIT(i))
+				(&priv->xstats.irq_chan0_0_n)[32*chan + i]++;
+		}
+	}
+
 	/* ABNORMAL interrupts */
 	if (unlikely(intr_status & DMA_CHAN_STATUS_AIS)) {
 		if (unlikely(intr_status & DMA_CHAN_STATUS_RBU))
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
index 39fa1ec92f82..492d65314e51 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_ethtool.c
@@ -78,6 +78,252 @@ static const struct stmmac_stats stmmac_gstrings_stats[] = {
 	STMMAC_STAT(rx_vlan),
 	STMMAC_STAT(rx_split_hdr_pkt_n),
 	/* Tx/Rx IRQ error info */
+	STMMAC_STAT(irq_down_n),
+	STMMAC_STAT(irq_fpe_n),
+	STMMAC_STAT(irq_sfty_n),
+
+	STMMAC_STAT(irq_gmac_0_n),
+	STMMAC_STAT(irq_gmac_1_n),
+	STMMAC_STAT(irq_gmac_2_n),
+	STMMAC_STAT(irq_gmac_3_n),
+	STMMAC_STAT(irq_gmac_4_n),
+	STMMAC_STAT(irq_gmac_5_n),
+	STMMAC_STAT(irq_gmac_6_n),
+	STMMAC_STAT(irq_gmac_7_n),
+	STMMAC_STAT(irq_gmac_8_n),
+	STMMAC_STAT(irq_gmac_9_n),
+	STMMAC_STAT(irq_gmac_10_n),
+	STMMAC_STAT(irq_gmac_11_n),
+	STMMAC_STAT(irq_gmac_12_n),
+	STMMAC_STAT(irq_gmac_13_n),
+	STMMAC_STAT(irq_gmac_14_n),
+	STMMAC_STAT(irq_gmac_15_n),
+	STMMAC_STAT(irq_gmac_16_n),
+	STMMAC_STAT(irq_gmac_17_n),
+	STMMAC_STAT(irq_gmac_18_n),
+	STMMAC_STAT(irq_gmac_19_n),
+	STMMAC_STAT(irq_gmac_20_n),
+	STMMAC_STAT(irq_gmac_21_n),
+	STMMAC_STAT(irq_gmac_22_n),
+	STMMAC_STAT(irq_gmac_23_n),
+	STMMAC_STAT(irq_gmac_24_n),
+	STMMAC_STAT(irq_gmac_25_n),
+	STMMAC_STAT(irq_gmac_26_n),
+	STMMAC_STAT(irq_gmac_27_n),
+	STMMAC_STAT(irq_gmac_28_n),
+	STMMAC_STAT(irq_gmac_29_n),
+	STMMAC_STAT(irq_gmac_30_n),
+	STMMAC_STAT(irq_gmac_31_n),
+
+	STMMAC_STAT(irq_mtl0_n),
+	STMMAC_STAT(irq_mtl1_n),
+	STMMAC_STAT(irq_mtl2_n),
+	STMMAC_STAT(irq_mtl3_n),
+	STMMAC_STAT(irq_mtl4_n),
+	STMMAC_STAT(irq_mtl_0_n),
+	STMMAC_STAT(irq_mtl_1_n),
+	STMMAC_STAT(irq_mtl_2_n),
+	STMMAC_STAT(irq_mtl_3_n),
+	STMMAC_STAT(irq_mtl_4_n),
+	STMMAC_STAT(irq_mtl_5_n),
+	STMMAC_STAT(irq_mtl_6_n),
+	STMMAC_STAT(irq_mtl_7_n),
+	STMMAC_STAT(irq_mtl_8_n),
+	STMMAC_STAT(irq_mtl_9_n),
+	STMMAC_STAT(irq_mtl_10_n),
+	STMMAC_STAT(irq_mtl_11_n),
+	STMMAC_STAT(irq_mtl_12_n),
+	STMMAC_STAT(irq_mtl_13_n),
+	STMMAC_STAT(irq_mtl_14_n),
+	STMMAC_STAT(irq_mtl_15_n),
+	STMMAC_STAT(irq_mtl_16_n),
+	STMMAC_STAT(irq_mtl_17_n),
+	STMMAC_STAT(irq_mtl_18_n),
+	STMMAC_STAT(irq_mtl_19_n),
+	STMMAC_STAT(irq_mtl_20_n),
+	STMMAC_STAT(irq_mtl_21_n),
+	STMMAC_STAT(irq_mtl_22_n),
+	STMMAC_STAT(irq_mtl_23_n),
+	STMMAC_STAT(irq_mtl_24_n),
+	STMMAC_STAT(irq_mtl_25_n),
+	STMMAC_STAT(irq_mtl_26_n),
+	STMMAC_STAT(irq_mtl_27_n),
+	STMMAC_STAT(irq_mtl_28_n),
+	STMMAC_STAT(irq_mtl_29_n),
+	STMMAC_STAT(irq_mtl_30_n),
+	STMMAC_STAT(irq_mtl_31_n),
+
+	STMMAC_STAT(irq_chan0_n),
+	STMMAC_STAT(irq_chan1_n),
+	STMMAC_STAT(irq_chan2_n),
+	STMMAC_STAT(irq_chan3_n),
+	STMMAC_STAT(irq_chan4_n),
+
+	STMMAC_STAT(irq_chan0_0_n),
+	STMMAC_STAT(irq_chan0_1_n),
+	STMMAC_STAT(irq_chan0_2_n),
+	STMMAC_STAT(irq_chan0_3_n),
+	STMMAC_STAT(irq_chan0_4_n),
+	STMMAC_STAT(irq_chan0_5_n),
+	STMMAC_STAT(irq_chan0_6_n),
+	STMMAC_STAT(irq_chan0_7_n),
+	STMMAC_STAT(irq_chan0_8_n),
+	STMMAC_STAT(irq_chan0_9_n),
+	STMMAC_STAT(irq_chan0_10_n),
+	STMMAC_STAT(irq_chan0_11_n),
+	STMMAC_STAT(irq_chan0_12_n),
+	STMMAC_STAT(irq_chan0_13_n),
+	STMMAC_STAT(irq_chan0_14_n),
+	STMMAC_STAT(irq_chan0_15_n),
+	STMMAC_STAT(irq_chan0_16_n),
+	STMMAC_STAT(irq_chan0_17_n),
+	STMMAC_STAT(irq_chan0_18_n),
+	STMMAC_STAT(irq_chan0_19_n),
+	STMMAC_STAT(irq_chan0_20_n),
+	STMMAC_STAT(irq_chan0_21_n),
+	STMMAC_STAT(irq_chan0_22_n),
+	STMMAC_STAT(irq_chan0_23_n),
+	STMMAC_STAT(irq_chan0_24_n),
+	STMMAC_STAT(irq_chan0_25_n),
+	STMMAC_STAT(irq_chan0_26_n),
+	STMMAC_STAT(irq_chan0_27_n),
+	STMMAC_STAT(irq_chan0_28_n),
+	STMMAC_STAT(irq_chan0_29_n),
+	STMMAC_STAT(irq_chan0_30_n),
+	STMMAC_STAT(irq_chan0_31_n),
+
+	STMMAC_STAT(irq_chan1_0_n),
+	STMMAC_STAT(irq_chan1_1_n),
+	STMMAC_STAT(irq_chan1_2_n),
+	STMMAC_STAT(irq_chan1_3_n),
+	STMMAC_STAT(irq_chan1_4_n),
+	STMMAC_STAT(irq_chan1_5_n),
+	STMMAC_STAT(irq_chan1_6_n),
+	STMMAC_STAT(irq_chan1_7_n),
+	STMMAC_STAT(irq_chan1_8_n),
+	STMMAC_STAT(irq_chan1_9_n),
+	STMMAC_STAT(irq_chan1_10_n),
+	STMMAC_STAT(irq_chan1_11_n),
+	STMMAC_STAT(irq_chan1_12_n),
+	STMMAC_STAT(irq_chan1_13_n),
+	STMMAC_STAT(irq_chan1_14_n),
+	STMMAC_STAT(irq_chan1_15_n),
+	STMMAC_STAT(irq_chan1_16_n),
+	STMMAC_STAT(irq_chan1_17_n),
+	STMMAC_STAT(irq_chan1_18_n),
+	STMMAC_STAT(irq_chan1_19_n),
+	STMMAC_STAT(irq_chan1_20_n),
+	STMMAC_STAT(irq_chan1_21_n),
+	STMMAC_STAT(irq_chan1_22_n),
+	STMMAC_STAT(irq_chan1_23_n),
+	STMMAC_STAT(irq_chan1_24_n),
+	STMMAC_STAT(irq_chan1_25_n),
+	STMMAC_STAT(irq_chan1_26_n),
+	STMMAC_STAT(irq_chan1_27_n),
+	STMMAC_STAT(irq_chan1_28_n),
+	STMMAC_STAT(irq_chan1_29_n),
+	STMMAC_STAT(irq_chan1_30_n),
+	STMMAC_STAT(irq_chan1_31_n),
+
+	STMMAC_STAT(irq_chan2_0_n),
+	STMMAC_STAT(irq_chan2_1_n),
+	STMMAC_STAT(irq_chan2_2_n),
+	STMMAC_STAT(irq_chan2_3_n),
+	STMMAC_STAT(irq_chan2_4_n),
+	STMMAC_STAT(irq_chan2_5_n),
+	STMMAC_STAT(irq_chan2_6_n),
+	STMMAC_STAT(irq_chan2_7_n),
+	STMMAC_STAT(irq_chan2_8_n),
+	STMMAC_STAT(irq_chan2_9_n),
+	STMMAC_STAT(irq_chan2_10_n),
+	STMMAC_STAT(irq_chan2_11_n),
+	STMMAC_STAT(irq_chan2_12_n),
+	STMMAC_STAT(irq_chan2_13_n),
+	STMMAC_STAT(irq_chan2_14_n),
+	STMMAC_STAT(irq_chan2_15_n),
+	STMMAC_STAT(irq_chan2_16_n),
+	STMMAC_STAT(irq_chan2_17_n),
+	STMMAC_STAT(irq_chan2_18_n),
+	STMMAC_STAT(irq_chan2_19_n),
+	STMMAC_STAT(irq_chan2_20_n),
+	STMMAC_STAT(irq_chan2_21_n),
+	STMMAC_STAT(irq_chan2_22_n),
+	STMMAC_STAT(irq_chan2_23_n),
+	STMMAC_STAT(irq_chan2_24_n),
+	STMMAC_STAT(irq_chan2_25_n),
+	STMMAC_STAT(irq_chan2_26_n),
+	STMMAC_STAT(irq_chan2_27_n),
+	STMMAC_STAT(irq_chan2_28_n),
+	STMMAC_STAT(irq_chan2_29_n),
+	STMMAC_STAT(irq_chan2_30_n),
+	STMMAC_STAT(irq_chan2_31_n),
+
+	STMMAC_STAT(irq_chan3_0_n),
+	STMMAC_STAT(irq_chan3_1_n),
+	STMMAC_STAT(irq_chan3_2_n),
+	STMMAC_STAT(irq_chan3_3_n),
+	STMMAC_STAT(irq_chan3_4_n),
+	STMMAC_STAT(irq_chan3_5_n),
+	STMMAC_STAT(irq_chan3_6_n),
+	STMMAC_STAT(irq_chan3_7_n),
+	STMMAC_STAT(irq_chan3_8_n),
+	STMMAC_STAT(irq_chan3_9_n),
+	STMMAC_STAT(irq_chan3_10_n),
+	STMMAC_STAT(irq_chan3_11_n),
+	STMMAC_STAT(irq_chan3_12_n),
+	STMMAC_STAT(irq_chan3_13_n),
+	STMMAC_STAT(irq_chan3_14_n),
+	STMMAC_STAT(irq_chan3_15_n),
+	STMMAC_STAT(irq_chan3_16_n),
+	STMMAC_STAT(irq_chan3_17_n),
+	STMMAC_STAT(irq_chan3_18_n),
+	STMMAC_STAT(irq_chan3_19_n),
+	STMMAC_STAT(irq_chan3_20_n),
+	STMMAC_STAT(irq_chan3_21_n),
+	STMMAC_STAT(irq_chan3_22_n),
+	STMMAC_STAT(irq_chan3_23_n),
+	STMMAC_STAT(irq_chan3_24_n),
+	STMMAC_STAT(irq_chan3_25_n),
+	STMMAC_STAT(irq_chan3_26_n),
+	STMMAC_STAT(irq_chan3_27_n),
+	STMMAC_STAT(irq_chan3_28_n),
+	STMMAC_STAT(irq_chan3_29_n),
+	STMMAC_STAT(irq_chan3_30_n),
+	STMMAC_STAT(irq_chan3_31_n),
+
+	STMMAC_STAT(irq_chan4_0_n),
+	STMMAC_STAT(irq_chan4_1_n),
+	STMMAC_STAT(irq_chan4_2_n),
+	STMMAC_STAT(irq_chan4_3_n),
+	STMMAC_STAT(irq_chan4_4_n),
+	STMMAC_STAT(irq_chan4_5_n),
+	STMMAC_STAT(irq_chan4_6_n),
+	STMMAC_STAT(irq_chan4_7_n),
+	STMMAC_STAT(irq_chan4_8_n),
+	STMMAC_STAT(irq_chan4_9_n),
+	STMMAC_STAT(irq_chan4_10_n),
+	STMMAC_STAT(irq_chan4_11_n),
+	STMMAC_STAT(irq_chan4_12_n),
+	STMMAC_STAT(irq_chan4_13_n),
+	STMMAC_STAT(irq_chan4_14_n),
+	STMMAC_STAT(irq_chan4_15_n),
+	STMMAC_STAT(irq_chan4_16_n),
+	STMMAC_STAT(irq_chan4_17_n),
+	STMMAC_STAT(irq_chan4_18_n),
+	STMMAC_STAT(irq_chan4_19_n),
+	STMMAC_STAT(irq_chan4_20_n),
+	STMMAC_STAT(irq_chan4_21_n),
+	STMMAC_STAT(irq_chan4_22_n),
+	STMMAC_STAT(irq_chan4_23_n),
+	STMMAC_STAT(irq_chan4_24_n),
+	STMMAC_STAT(irq_chan4_25_n),
+	STMMAC_STAT(irq_chan4_26_n),
+	STMMAC_STAT(irq_chan4_27_n),
+	STMMAC_STAT(irq_chan4_28_n),
+	STMMAC_STAT(irq_chan4_29_n),
+	STMMAC_STAT(irq_chan4_30_n),
+	STMMAC_STAT(irq_chan4_31_n),
+
 	STMMAC_STAT(tx_undeflow_irq),
 	STMMAC_STAT(tx_process_stopped_irq),
 	STMMAC_STAT(tx_jabber_irq),
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_fpe.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_fpe.c
index 75b470ee621a..32a2b440fc46 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_fpe.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_fpe.c
@@ -141,6 +141,8 @@ void stmmac_fpe_irq_status(struct stmmac_priv *priv)
 	 * here, since the status flags of MAC_FPE_CTRL_STS are "clear on read"
 	 */
 	value = readl(ioaddr + reg->mac_fpe_reg);
+	if (value)
+		priv->xstats.irq_fpe_n++;
 
 	if (value & STMMAC_MAC_FPE_CTRL_STS_TRSP) {
 		status |= FPE_EVENT_TRSP;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 650d75b73e0b..cbc748380dda 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -3771,6 +3771,7 @@ static int stmmac_request_irq_single(struct net_device *dev)
 	enum request_irq_err irq_err;
 	int ret;
 
+	netdev_info(dev, "%s: requesting main IRQ\n", __func__);
 	ret = request_irq(dev->irq, stmmac_interrupt,
 			  IRQF_SHARED, dev->name, dev);
 	if (unlikely(ret < 0)) {
@@ -3785,6 +3786,7 @@ static int stmmac_request_irq_single(struct net_device *dev)
 	 * is used for WoL
 	 */
 	if (priv->wol_irq > 0 && priv->wol_irq != dev->irq) {
+		netdev_info(dev, "%s: requesting WOL IRQ\n", __func__);
 		ret = request_irq(priv->wol_irq, stmmac_interrupt,
 				  IRQF_SHARED, dev->name, dev);
 		if (unlikely(ret < 0)) {
@@ -3798,6 +3800,7 @@ static int stmmac_request_irq_single(struct net_device *dev)
 
 	/* Request the IRQ lines */
 	if (priv->lpi_irq > 0 && priv->lpi_irq != dev->irq) {
+		netdev_info(dev, "%s: requesting LPI IRQ\n", __func__);
 		ret = request_irq(priv->lpi_irq, stmmac_interrupt,
 				  IRQF_SHARED, dev->name, dev);
 		if (unlikely(ret < 0)) {
@@ -3813,6 +3816,7 @@ static int stmmac_request_irq_single(struct net_device *dev)
 	 * Error line in case of another line is used
 	 */
 	if (priv->sfty_irq > 0 && priv->sfty_irq != dev->irq) {
+		netdev_info(dev, "%s: requesting safety IRQ\n", __func__);
 		ret = request_irq(priv->sfty_irq, stmmac_safety_interrupt,
 				  IRQF_SHARED, dev->name, dev);
 		if (unlikely(ret < 0)) {
@@ -6030,12 +6034,16 @@ static irqreturn_t stmmac_interrupt(int irq, void *dev_id)
 	struct stmmac_priv *priv = netdev_priv(dev);
 
 	/* Check if adapter is up */
-	if (test_bit(STMMAC_DOWN, &priv->state))
+	if (test_bit(STMMAC_DOWN, &priv->state)) {
+		priv->xstats.irq_down_n++;
 		return IRQ_HANDLED;
+	}
 
 	/* Check ASP error if it isn't delivered via an individual IRQ */
-	if (priv->sfty_irq <= 0 && stmmac_safety_feat_interrupt(priv))
+	if (priv->sfty_irq <= 0 && stmmac_safety_feat_interrupt(priv)) {
+		priv->xstats.irq_sfty_n++;
 		return IRQ_HANDLED;
+	}
 
 	/* To handle Common interrupts */
 	stmmac_common_interrupt(priv);


Here are the corresponding stats captured right after booting to
userspace, with the 0 counts stripped off to keep the output readable:

     irq_gmac_0_n: 1
     irq_gmac_5_n: 4047
     irq_gmac_18_n: 46
     irq_mtl0_n: 2244307
     irq_mtl_0_n: 2244307
     irq_mtl_1_n: 2244307
     irq_mtl_2_n: 2244307
     irq_mtl_3_n: 2244307
     irq_mtl_4_n: 2244307
     irq_chan0_n: 2244307
     irq_chan0_0_n: 333
     irq_chan0_2_n: 2244307
     irq_chan0_6_n: 2769
     irq_chan0_10_n: 2244307
     irq_chan0_11_n: 2799
     irq_chan0_15_n: 2701

Here are the stats after enabling five queues in DT, also captured right
after booting to userspace:

     irq_gmac_0_n: 1
     irq_gmac_5_n: 4020
     irq_gmac_18_n: 41
     irq_mtl0_n: 1286469
     irq_mtl1_n: 1286469
     irq_mtl2_n: 1286469
     irq_mtl3_n: 1286469
     irq_mtl4_n: 1286469
     irq_mtl_0_n: 6432345
     irq_mtl_1_n: 6432345
     irq_mtl_2_n: 6432345
     irq_mtl_3_n: 6432345
     irq_mtl_4_n: 6432345
     irq_chan0_n: 1286469
     irq_chan1_n: 1286469
     irq_chan2_n: 1286469
     irq_chan3_n: 1286469
     irq_chan4_n: 1286469
     irq_chan0_0_n: 416
     irq_chan0_2_n: 1286466
     irq_chan0_6_n: 3470
     irq_chan0_10_n: 1286466
     irq_chan0_11_n: 2740
     irq_chan0_15_n: 2686
     irq_chan1_2_n: 1286469
     irq_chan1_10_n: 1286469
     irq_chan2_2_n: 1286467
     irq_chan2_10_n: 1286467
     irq_chan4_2_n: 1286469
     irq_chan4_10_n: 1286469

Setting eee-broken-1000t, with a single queue:

     irq_gmac_0_n: 1
     irq_gmac_18_n: 6
     irq_mtl0_n: 2548
     irq_mtl_0_n: 2548
     irq_mtl_1_n: 2548
     irq_mtl_2_n: 2548
     irq_mtl_3_n: 2548
     irq_mtl_4_n: 2548
     irq_chan0_n: 2548
     irq_chan0_0_n: 282
     irq_chan0_2_n: 2548
     irq_chan0_6_n: 2324
     irq_chan0_10_n: 2548
     irq_chan0_11_n: 29
     irq_chan0_15_n: 2548

And eee-broken-1000t with 5 queues:

     irq_gmac_0_n: 1
     irq_gmac_18_n: 8
     irq_mtl0_n: 2672
     irq_mtl1_n: 2672
     irq_mtl2_n: 2672
     irq_mtl3_n: 2672
     irq_mtl4_n: 2672
     irq_mtl_0_n: 13360
     irq_mtl_1_n: 13360
     irq_mtl_2_n: 13360
     irq_mtl_3_n: 13360
     irq_mtl_4_n: 13360
     irq_chan0_n: 2672
     irq_chan1_n: 2672
     irq_chan2_n: 2672
     irq_chan3_n: 2672
     irq_chan4_n: 2672
     irq_chan0_0_n: 283
     irq_chan0_2_n: 2672
     irq_chan0_6_n: 2439
     irq_chan0_10_n: 2672
     irq_chan0_11_n: 46
     irq_chan0_15_n: 2672
     irq_chan2_2_n: 2670
     irq_chan2_10_n: 2670
     irq_chan3_2_n: 2672
     irq_chan3_10_n: 2672

I've also printed the value of the interrupt enable registers. With one
queue,

MAC_INTERRUPT_ENABLE 0x00001030
DMA_CH0_INTERRUPT_ENABLE 0x0000d041
DMA_CH1_INTERRUPT_ENABLE 0x00000000
DMA_CH2_INTERRUPT_ENABLE 0x00000000
DMA_CH3_INTERRUPT_ENABLE 0x00000000
DMA_CH4_INTERRUPT_ENABLE 0x00000000

And with 4 queues,

MAC_INTERRUPT_ENABLE 0x00001030
DMA_CH0_INTERRUPT_ENABLE 0x0000d041
DMA_CH1_INTERRUPT_ENABLE 0x0000d041
DMA_CH2_INTERRUPT_ENABLE 0x0000d041
DMA_CH3_INTERRUPT_ENABLE 0x0000d041
DMA_CH4_INTERRUPT_ENABLE 0x0000d041

(bit 0 of the DMA interrupt enable registers is sometimes not set, which
I understand is normal)

Given the enabled interrupts, I agree that the counters are misleading,
as none of the interrupt bits with high counts are enabled. I'm however
not entirely sure about the MTL interrupt status register, it's not
clear to me if it is wired to the EQOS IRQ line as I don't see a
corresponding interrupt enable register.

If we rule out the main EQOS IRQ line and the per-channel RX and TX IRQ
lines as the source of the interrupt storm, the last possible culprit
according to section 7.1.2 (A53 Interrupts) of the i.MX8MP reference
manual would be the "ENET QOS TSN LPI RX exit Interrupt" that is OR'ed
into IRQ 135. As that's related to EEE, it's a probable culprit, but I
don't know how what controls that IRQ line. The LPI interrupt status bit
is set in MAC_INTERRUPT_STATUS with a reasonable count, and we clear the
LPI interrupt sources when that happens. Just to be sure, I modified
dwmac4_irq_status() to read and process the GMAC4_LPI_CTRL_STATUS
register regardless of the LPIIS bit status, and that doesn't help. The
corresponding stats seem reasonable:

     irq_tx_path_in_lpi_mode_n: 4
     irq_tx_path_exit_lpi_mode_n: 4
     irq_rx_path_in_lpi_mode_n: 2537
     irq_rx_path_exit_lpi_mode_n: 2535

I also checked the MAC_MMC_RX_INTERRUPT and MAC_MMC_TX_INTERRUPT
registers as they contain LPI-related bits, but the corresponding mask
registers have all bits set:

MAC_MMC_RX_INTERRUPT      0x00000000
MAC_MMC_TX_INTERRUPT      0x00000000
MAC_MMC_RX_INTERRUPT_MASK 0x0fffffff
MAC_MMC_TX_INTERRUPT_MASK 0x0fffffff

According to the reference manual, this masks all corresponding
interrupts.

I'm really puzzled by the "ENET QOS TSN LPI RX exit Interrupt" IRQ line.
Based on its name I would assume it would be linked to bit RLPIEX in the
MAC_LPI_CONTROL_STATUS register, but that seems quite pointless as
that's available as an interrupt source through
MAC_INTERRUPT_STATUS.LPIIS. The shortcut doesn't seem necessary. Are we
missing something, or chasing the wrong suspect ?

> > > The debix board's DT doesn't specify a multi-queue setup, so only
> > > channel 0 gets processed in stmmac_dma_interrupt(). I thought that could
> > > explain why Q1IS to Q4IS stay set (but not why Q0IS also has a high
> > > count, or why Q1IS to Q4IS are set in the first place), and enabled
> > > multi-queue support in DT by copying the imx8mp-evk configuration. I
> > > then see lots of non-zero DMA_CH1_STATUS, DMA_CH2_STATUS and
> > > DMA_CH4_STATUS values (but DMA_CH3_STATUS stays 0 all the time), but
> > > sadly this doesn't fix the interrupt storm.
> > 
> > Now, a queue will only be enabled if stmmac_dma_rx_mode() /
> > stmmac_dma_tx_mode() is called, which only happens for queues that are
> > going ot be used. So, I think QxIS where x >= 1 is set is a red
> > herring.
> > 
> > Given that the driver does a software reset which clears out all the
> > registers, any stale configuration for queues e.g. from a boot loader
> > can't be preserved.
> 
> I agree that it seems really weird. And why this would be related to
> cpuidle and EEE is also a mystery.
> 
> > > I don't think I can debug this further and figure out the root cause
> > > unassisted in a reasonable amount of time, so I'd like to merge
> > > disabling EEE as a workaround for the time being, unless someone has any
> > > idea of what I could test next. I'll submit a v2 of this patch with an
> > > updated commit message.
> > 
> > I'm also not fully conversant with dwmac hardware, especially not the
> > v5.10 hardware that is in imx8m. All the above is stuff I've pieced
> > together this morning from reading the driver code and the imx8m
> > manual. I'm putting in effort here to try and get to the bottom of
> > your problem without hardware... it would be helpful if others could
> > do the same rather than throwing their hands up.
> 
> More help would certainly be welcome. And I really appreciate your
> support Russell.
> 
> > The driver is really crappy, and part of the reason its crappy is
> > because of this kind of "patch in a workaround because we can't be
> > bothered to do the research and fix problems properly" attitude.
> > 
> > I'm saying enough is enough. I'm saying no, not going to merge a
> > workaround for this problem. I want to see stmmac improve. I've
> > put in considerable effort over the last year or so sorting out
> > fundamental issues that others just can't be bothered to solve
> > properly (like the DMA reset failures on resume that has plagued
> > this driver which no one seems _capable_ of fixing, yet I, with no
> > experience of stmmac, was able to analyse the issue, read the
> > availble documentation, and fix the problem properly once and for
> > all.) Either I'm bloody good at what I do and everyone else is
> > useless, or it's laziness by others. It pisses me off that I seem
> > to be one of the few who is willing to put the effort in to stuff
> > in the kernel to see _improvement_. I don't _have_ to work on stmmac,
> > but me working on stmmac benefits a lot of people.
> > 
> > What I'm saying is, we need more people willing to put effort in
> > and less bodging.
> 
> While I would like to merge a workaround and move on, I also understand
> your position, having had the exact same stance in other kernel areas
> and pushing for problems to be fixed correctly instead of worked around.
> The only argument I have to defend the workaround approach is that I'm
> putting a lot of hours trying to do the right things in other
> subsystems, and I can hardly scale that to networking. It's not a great
> argument though.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-12 12:56         ` Russell King (Oracle)
@ 2025-11-13  1:17           ` Laurent Pinchart
  0 siblings, 0 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-13  1:17 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Kieran Bingham, Oleksij Rempel, Andrew Lunn, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo

On Wed, Nov 12, 2025 at 12:56:22PM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 12, 2025 at 12:41:50PM +0000, Kieran Bingham wrote:
> > Perhaps further complicating matters.
> > 
> > I have a Debix Model A as well ... I'm in a different office to Laurent
> > - and I believe EEE is enabled on my board/network switch.
> > 
> > I do not get an interrupt storm.
> > 
> > I'm not sure how this helps yet, - I don't know what to debug as I can't
> > reproduce the issue!
> > 
> > I can provide remote access to the board with ssh if that helps anyone
> > who wants to look at something specific about my setup or run anything
> > if anyone has ideas of what to check my side.
> > 
> > Perhaps we can find some subtle difference between a working case and a
> > non-working case...
> 
> Thanks, that's interesting. I guess the next steps would be to try and
> work out what's different between your two setups.
> 
> - same board revision?
> - same firmware/ATF?
> - same kernel/modules?
> - same type of link partner? (I suspect, given the cpu-pd-wait
>    interaction, isn't the problem.)

I've provided the binaries I use for U-Boot (including TF-A) and the
kernel to Kieran, he will test to see if they make a difference. I also
tested replacing mainline TF-A (v2.13) with the downstream NXP version
listed in the U-Boot i.MX8MP EVK documentation, and that didn't appear
to make any difference.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-13  1:06               ` Laurent Pinchart
@ 2025-11-13 10:59                 ` Russell King (Oracle)
  2025-11-14 22:26                   ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-13 10:59 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Catalin Popescu

On Thu, Nov 13, 2025 at 03:06:27AM +0200, Laurent Pinchart wrote:
> On Thu, Nov 13, 2025 at 12:25:52AM +0200, Laurent Pinchart wrote:
> > On Wed, Nov 12, 2025 at 12:03:13PM +0000, Russell King (Oracle) wrote:
> > > On Wed, Nov 12, 2025 at 01:54:34AM +0200, Laurent Pinchart wrote:
> > > > On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> > > > > I didn't notice it at the time because my board was connected to a
> > > > > switch that didn't support EEE.
> > > > 
> > > > I can confirm that reverting that commit makes the issue disappear. So
> > > > we're dealing with an interrupt storm that occurs when all three of the
> > > > following conditions are true:
> > > > 
> > > > - cpu-pd-wait is enabled
> > > > - EEE is enabled
> > > > - the peer also supports EEE
> > > 
> > > Thanks - overall, please take the statistics and interrupt status bits
> > > with a pinch of salt - I suspect there are cases where the interrupt
> > > is not actually enabled, and the code doesn't take action to clear
> > > down a set status bit, but _does_ count it - so every interrupt that
> > > happens increments the counter.
> > 
> > True. To (partly) avoid that, I've dropped the line that discards
> > disabled bits in dwmac4_irq_status():
> > 
> >  	/* Discard disabled bits */
> > -	intr_status &= intr_enable;
> > 
> > to ensure that all bits are processed and cleared. I then didn't see any
> > high count of any of the GMAC_INT_STATUS interrupts. For
> > MTL_INTERRUPT_STATUS it's a bit different, as by default only one queue
> > is processed.
> > 
> > > > Furthermore, I tried counting bits from all the interrupt status
> > > > registers I could find. The count of MTL_INTERRUPT_STATUS Q0IS to Q4IS
> > > > bits is very high, and so are the DMA_CH0_STATUS TBU and ETI bits.
> > > 
> > > TBU means that the transmitter found that the next buffer was owned by
> > > the "application" rather than the hardware, which would be normal after
> > > getting to the end of the queued packets.
> > > 
> > > ETI means that a packet has been transferred into MTL memory, and thus
> > > would occur for every transmitted packet.
> > > 
> > > Having dug into the imx8m documentation and the driver this morning,
> > > I don't think TBU and ETI are the source of the interrupt storm. Their
> > > corresponding interrupt enable bits are DMA_CHAN_INTR_ENA_TBUE and
> > > DMA_CHAN_INTR_ENA_ETE (driver names). Both of these only appear in a
> > > header file - the code never enables these interrupts. So, TBU and ETI
> > > should not be causing an interrupt storm.
> > > 
> > > As for QxIS, stmmac_common_interrupt() will iterate over the queues
> > > in use, calling stmmac_host_mtl_irq_status() aka dwmac4_irq_mtl_status()
> > > for each. Only if this happens will MTL_CHAN_INT_CTRL() be read which
> > > clears the status bit. In other words, if e.g. Q1IS is set, but only
> > > one queue is being used. dwmac4_irq_mtl_status() won't be called for
> > > queue 1, and thus MTL_CHAN_INT_CTRL() won't be read to clear Q1IS.
> > 
> > That's why I tried to enable all 5 queues in DT, but alas, it didn't
> > help. I'll try again and count all possible interrupts.
> 
> Here's my debug patch (not very pretty, sorry about that):

That's fine. Thanks for providing this and the raw data.

> Here are the corresponding stats captured right after booting to
> userspace, with the 0 counts stripped off to keep the output readable:
> 
>      irq_gmac_0_n: 1

RSGMIIS, disabled, cleared by read of MAC_PHYIF_CONTROL_STATUS.

>      irq_gmac_5_n: 4047

LPIIS, enabled, cleared by read of LPI_CONTROL_STATUS which is done.

>      irq_gmac_18_n: 46

MDIOIS, disabled, clear on read of _this_ status register

>      irq_mtl0_n: 2244307

This will increment each time dwmac4_irq_mtl_status() is called for
channel 0, which will be called each time stmmac_common_interrupt() is
called. Thus, this indicates the total number of times the stmmac
interrupt handler has been called.

>      irq_mtl_0_n: 2244307
>      irq_mtl_1_n: 2244307
>      irq_mtl_2_n: 2244307
>      irq_mtl_3_n: 2244307
>      irq_mtl_4_n: 2244307

These should be cleared by reading the corresponding queue interrupt
control/status register, iow MTL_CHAN_INT_CTRL(). However, we do not
write to MTL_CHAN_INT_CTRL() to enable any of the interrupts there, so
this looks weird to me, so it would be an idea to look at what value
this MTL_CHAN_INT_CTRL() register contains, it may provide something
useful, but I actually suspect it's another red herring.

>      irq_chan0_n: 2244307

Similarly to irq_mtl0_n, this will increment each time
dwmac4_dma_interrupt() is called for channel 0, which will be via
stmmac_napi_check(), stmmac_dma_interrupt() and
stmmac_common_interrupt(). Therefore, it is expected to have the same
value as irq_mtl0_n.

>      irq_chan0_0_n: 333
>      irq_chan0_2_n: 2244307
>      irq_chan0_6_n: 2769
>      irq_chan0_10_n: 2244307
>      irq_chan0_11_n: 2799
>      irq_chan0_15_n: 2701

Only interrupts 0, 6, 12, 14 and 15 are enabled. Status bits in this
register require '1' to be written to clear them. As the value written
back is the status that was read masked by the interrupt enable, if
bits 2 or 10 are set, they will never be cleared, so will increment
each and every time stmmac_common_interrupt() is called. Therefore,
these values are not significant.

> 
> Here are the stats after enabling five queues in DT, also captured right
> after booting to userspace:
> 
>      irq_gmac_0_n: 1
>      irq_gmac_5_n: 4020
>      irq_gmac_18_n: 41
>      irq_mtl0_n: 1286469
>      irq_mtl1_n: 1286469
>      irq_mtl2_n: 1286469
>      irq_mtl3_n: 1286469
>      irq_mtl4_n: 1286469
>      irq_mtl_0_n: 6432345
>      irq_mtl_1_n: 6432345
>      irq_mtl_2_n: 6432345
>      irq_mtl_3_n: 6432345
>      irq_mtl_4_n: 6432345

These values are the sum of irq_mtl[0-4]_n, so would be expected given
the other numbers.

>      irq_chan0_n: 1286469
>      irq_chan1_n: 1286469
>      irq_chan2_n: 1286469
>      irq_chan3_n: 1286469
>      irq_chan4_n: 1286469
>      irq_chan0_0_n: 416
>      irq_chan0_2_n: 1286466
>      irq_chan0_6_n: 3470
>      irq_chan0_10_n: 1286466
>      irq_chan0_11_n: 2740
>      irq_chan0_15_n: 2686
>      irq_chan1_2_n: 1286469
>      irq_chan1_10_n: 1286469
>      irq_chan2_2_n: 1286467
>      irq_chan2_10_n: 1286467
>      irq_chan4_2_n: 1286469
>      irq_chan4_10_n: 1286469

It's slightly interesting that irq_chanX_2_n and irq_chanX_10_n don't
match their corresponding irq_chanX_n values, which implies that they
have been clear. It's likely given that we're talking about 0, 2 or 3
times that's due to the first few packets and these bits hadn't been
set. So again, I don't think TBU and ETI are significant.

> Setting eee-broken-1000t, with a single queue:
> 
>      irq_gmac_0_n: 1
>      irq_gmac_18_n: 6
>      irq_mtl0_n: 2548
>      irq_mtl_0_n: 2548
>      irq_mtl_1_n: 2548
>      irq_mtl_2_n: 2548
>      irq_mtl_3_n: 2548
>      irq_mtl_4_n: 2548
>      irq_chan0_n: 2548
>      irq_chan0_0_n: 282
>      irq_chan0_2_n: 2548
>      irq_chan0_6_n: 2324
>      irq_chan0_10_n: 2548
>      irq_chan0_11_n: 29
>      irq_chan0_15_n: 2548

These counts suggest that the interrupt handler was entered 2548 times
at the point they were captured, which corresponds to "normal"
interrupts for channel 0.

> 
> And eee-broken-1000t with 5 queues:
> 
>      irq_gmac_0_n: 1
>      irq_gmac_18_n: 8
>      irq_mtl0_n: 2672
>      irq_mtl1_n: 2672
>      irq_mtl2_n: 2672
>      irq_mtl3_n: 2672
>      irq_mtl4_n: 2672
>      irq_mtl_0_n: 13360
>      irq_mtl_1_n: 13360
>      irq_mtl_2_n: 13360
>      irq_mtl_3_n: 13360
>      irq_mtl_4_n: 13360
>      irq_chan0_n: 2672
>      irq_chan1_n: 2672
>      irq_chan2_n: 2672
>      irq_chan3_n: 2672
>      irq_chan4_n: 2672
>      irq_chan0_0_n: 283
>      irq_chan0_2_n: 2672
>      irq_chan0_6_n: 2439
>      irq_chan0_10_n: 2672
>      irq_chan0_11_n: 46
>      irq_chan0_15_n: 2672
>      irq_chan2_2_n: 2670
>      irq_chan2_10_n: 2670
>      irq_chan3_2_n: 2672
>      irq_chan3_10_n: 2672

So channel 0 responsible for 2672 normal interrupts. Again, this
reinforces that the other values with 2672 are likely not significant.

> Given the enabled interrupts, I agree that the counters are misleading,
> as none of the interrupt bits with high counts are enabled. I'm however
> not entirely sure about the MTL interrupt status register, it's not
> clear to me if it is wired to the EQOS IRQ line as I don't see a
> corresponding interrupt enable register.
> 
> If we rule out the main EQOS IRQ line and the per-channel RX and TX IRQ
> lines as the source of the interrupt storm, the last possible culprit
> according to section 7.1.2 (A53 Interrupts) of the i.MX8MP reference
> manual would be the "ENET QOS TSN LPI RX exit Interrupt" that is OR'ed
> into IRQ 135. As that's related to EEE, it's a probable culprit, but I
> don't know how what controls that IRQ line.

As you have several interrupt signals which presumably show up in
/proc/interrupts, do the values in your IRQ counters correspond with
those interrupt sources? Are any of these interrupts shared with
anything else?

Hmm, looking at 7.1.2, and the mention of "ENET QOS TSN LPI RX exit
Interrupt" I'm wondering whether Freescale have wired the lpi_intr_o
signal of the GMAC to their OR4 gate. This is the LPI RX exit
interrupt output, and it is cleared when reading the LPI control/
status register. However, its deassertion is synchronous to the RX
clock domain, so it will take time to clear.

The purpose of this signal is to trigger to external hardware (to the
GMAC) to restore the application clock to the MAC. I'm not sure that
this was meant to be wired to an actual CPU interrupt. The only clue
is the name which suggests it is, but there's nothing that states
there's a way to disable it being asserted which makes me more
suspicious that it's not meant to be a CPU interrupt.

So, maybe this is the cause of the interrupt storm. Maybe Kieran isn't
seeing the storm because his receive path is not entering LPI.

I think a useful check for this would be if you could either disable
LPI entry at the link partner, or hook it up to another system which
can have tx_lpi disabled, and see how the iMX8 system behaves.

If preventing the iMX8 receive path entering LPI fixes the problem,
then I think this is likely the culpret.

However, I'd be worred about this - if we "disable LPI" by way of
the advertisement at the local end, there is the possibility that a
remote system could override the negotiation and force its transmit
link into LPI mode, which would cause the iMX8MP receive side to see
LPI entry and exit, triggering this interrupt. If this is correct,
that gives an attacker a way to manipulate the iMX8MP system,
potentially causing all sorts of problems.

Hmm. Not sure I like this look of that.

If this hypothesis is correct, then yes, disabling EEE is the only
way forward for this, but I would suggest going further - ensuring
that SmartEEE is enabled on the PHY but with the advertisement
cleared (so EEE negotiation indicates not supported) to block the
receive side LPI getting to the EQOS.

This also means that 100M EEE would also be affected, so just
disabling 1G EEE in DT is insufficient.

Andrew - if we need to go down this path, I think we need a flag in
the PHY flags to indicate that we want SmartEEE enabled.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-13 10:59                 ` Russell King (Oracle)
@ 2025-11-14 22:26                   ` Laurent Pinchart
  2025-11-18  1:50                     ` Wei Fang
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-14 22:26 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree, imx,
	linux-arm-kernel, Daniel Scally, Kieran Bingham, Stefan Klug,
	Conor Dooley, Fabio Estevam, Krzysztof Kozlowski,
	Pengutronix Kernel Team, Rob Herring, Sascha Hauer, Shawn Guo,
	Fugang Duan, Joakim Zhang, Wei Fang, Yannick Vignon

Dropping Catalin Popescu from CC as his e-mail address bounces, and
adding Fugang Duan, Joakim Zhang, Wei Fang and Yannick Vignong from NXP
who have worked on upstream i.MX8MP support in the driver.

Fugang, Joakim, Wei and Yannick, there's a question for you below.

On Thu, Nov 13, 2025 at 10:59:23AM +0000, Russell King (Oracle) wrote:
> On Thu, Nov 13, 2025 at 03:06:27AM +0200, Laurent Pinchart wrote:
> > On Thu, Nov 13, 2025 at 12:25:52AM +0200, Laurent Pinchart wrote:
> > > On Wed, Nov 12, 2025 at 12:03:13PM +0000, Russell King (Oracle) wrote:
> > > > On Wed, Nov 12, 2025 at 01:54:34AM +0200, Laurent Pinchart wrote:
> > > > > On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> > > > > > I didn't notice it at the time because my board was connected to a
> > > > > > switch that didn't support EEE.
> > > > > 
> > > > > I can confirm that reverting that commit makes the issue disappear. So
> > > > > we're dealing with an interrupt storm that occurs when all three of the
> > > > > following conditions are true:
> > > > > 
> > > > > - cpu-pd-wait is enabled
> > > > > - EEE is enabled
> > > > > - the peer also supports EEE
> > > > 
> > > > Thanks - overall, please take the statistics and interrupt status bits
> > > > with a pinch of salt - I suspect there are cases where the interrupt
> > > > is not actually enabled, and the code doesn't take action to clear
> > > > down a set status bit, but _does_ count it - so every interrupt that
> > > > happens increments the counter.
> > > 
> > > True. To (partly) avoid that, I've dropped the line that discards
> > > disabled bits in dwmac4_irq_status():
> > > 
> > >  	/* Discard disabled bits */
> > > -	intr_status &= intr_enable;
> > > 
> > > to ensure that all bits are processed and cleared. I then didn't see any
> > > high count of any of the GMAC_INT_STATUS interrupts. For
> > > MTL_INTERRUPT_STATUS it's a bit different, as by default only one queue
> > > is processed.
> > > 
> > > > > Furthermore, I tried counting bits from all the interrupt status
> > > > > registers I could find. The count of MTL_INTERRUPT_STATUS Q0IS to Q4IS
> > > > > bits is very high, and so are the DMA_CH0_STATUS TBU and ETI bits.
> > > > 
> > > > TBU means that the transmitter found that the next buffer was owned by
> > > > the "application" rather than the hardware, which would be normal after
> > > > getting to the end of the queued packets.
> > > > 
> > > > ETI means that a packet has been transferred into MTL memory, and thus
> > > > would occur for every transmitted packet.
> > > > 
> > > > Having dug into the imx8m documentation and the driver this morning,
> > > > I don't think TBU and ETI are the source of the interrupt storm. Their
> > > > corresponding interrupt enable bits are DMA_CHAN_INTR_ENA_TBUE and
> > > > DMA_CHAN_INTR_ENA_ETE (driver names). Both of these only appear in a
> > > > header file - the code never enables these interrupts. So, TBU and ETI
> > > > should not be causing an interrupt storm.
> > > > 
> > > > As for QxIS, stmmac_common_interrupt() will iterate over the queues
> > > > in use, calling stmmac_host_mtl_irq_status() aka dwmac4_irq_mtl_status()
> > > > for each. Only if this happens will MTL_CHAN_INT_CTRL() be read which
> > > > clears the status bit. In other words, if e.g. Q1IS is set, but only
> > > > one queue is being used. dwmac4_irq_mtl_status() won't be called for
> > > > queue 1, and thus MTL_CHAN_INT_CTRL() won't be read to clear Q1IS.
> > > 
> > > That's why I tried to enable all 5 queues in DT, but alas, it didn't
> > > help. I'll try again and count all possible interrupts.
> > 
> > Here's my debug patch (not very pretty, sorry about that):
> 
> That's fine. Thanks for providing this and the raw data.
> 
> > Here are the corresponding stats captured right after booting to
> > userspace, with the 0 counts stripped off to keep the output readable:
> > 
> >      irq_gmac_0_n: 1
> 
> RSGMIIS, disabled, cleared by read of MAC_PHYIF_CONTROL_STATUS.
> 
> >      irq_gmac_5_n: 4047
> 
> LPIIS, enabled, cleared by read of LPI_CONTROL_STATUS which is done.
> 
> >      irq_gmac_18_n: 46
> 
> MDIOIS, disabled, clear on read of _this_ status register
> 
> >      irq_mtl0_n: 2244307
> 
> This will increment each time dwmac4_irq_mtl_status() is called for
> channel 0, which will be called each time stmmac_common_interrupt() is
> called. Thus, this indicates the total number of times the stmmac
> interrupt handler has been called.

Yes, my goal with the irq_mtlX_n counters was to check for which
channels/queues the dwmac4_irq_mtl_status() was called.

> >      irq_mtl_0_n: 2244307
> >      irq_mtl_1_n: 2244307
> >      irq_mtl_2_n: 2244307
> >      irq_mtl_3_n: 2244307
> >      irq_mtl_4_n: 2244307
> 
> These should be cleared by reading the corresponding queue interrupt
> control/status register, iow MTL_CHAN_INT_CTRL(). However, we do not
> write to MTL_CHAN_INT_CTRL() to enable any of the interrupts there, so
> this looks weird to me, so it would be an idea to look at what value
> this MTL_CHAN_INT_CTRL() register contains, it may provide something
> useful, but I actually suspect it's another red herring.

All the MTL_CHAN_INT_CTRL() registers read as 0x00000002, so the
interrupts are not enabled.

> >      irq_chan0_n: 2244307
> 
> Similarly to irq_mtl0_n, this will increment each time
> dwmac4_dma_interrupt() is called for channel 0, which will be via
> stmmac_napi_check(), stmmac_dma_interrupt() and
> stmmac_common_interrupt(). Therefore, it is expected to have the same
> value as irq_mtl0_n.
> 
> >      irq_chan0_0_n: 333
> >      irq_chan0_2_n: 2244307
> >      irq_chan0_6_n: 2769
> >      irq_chan0_10_n: 2244307
> >      irq_chan0_11_n: 2799
> >      irq_chan0_15_n: 2701
> 
> Only interrupts 0, 6, 12, 14 and 15 are enabled. Status bits in this
> register require '1' to be written to clear them. As the value written
> back is the status that was read masked by the interrupt enable, if
> bits 2 or 10 are set, they will never be cleared, so will increment
> each and every time stmmac_common_interrupt() is called. Therefore,
> these values are not significant.

I've commented out the masking in dwmac4_dma_interrupt(), and the
counters show that bits 2 and 10 were indeed not significant:

     irq_gmac_0_n: 1
     irq_gmac_5_n: 3846
     irq_gmac_18_n: 59
     irq_mtl0_n: 2189598
     irq_mtl_0_n: 2189598
     irq_mtl_1_n: 2189598
     irq_mtl_2_n: 2189598
     irq_mtl_3_n: 2189598
     irq_mtl_4_n: 2189598
     irq_chan0_n: 2189598
     irq_chan0_0_n: 258
     irq_chan0_2_n: 2680
     irq_chan0_6_n: 2660
     irq_chan0_10_n: 2682
     irq_chan0_11_n: 1659
     irq_chan0_15_n: 2598
     irq_tx_path_in_lpi_mode_n: 6
     irq_tx_path_exit_lpi_mode_n: 6
     irq_rx_path_in_lpi_mode_n: 2012
     irq_rx_path_exit_lpi_mode_n: 2009
     irq_rgmii_n: 1
     rx_normal_irq_n: 2660
     tx_normal_irq_n: 258
     normal_irq_n: 4577
     q0_tx_irq_n: 258
     q0_rx_irq_n: 2660

There is still an interrupt storm, as shown by bits Q0IS to Q4IS in
MTL_INTERRUPT_STATUS. Those bits are documented in the i.MX8MP RM as

  Queue 0 Interrupt status

  This bit indicates that there is an interrupt from Queue 0. To reset
  this bit, the application must read Queue 0 Interrupt Control and
  Status register to get the exact cause of the interrupt and clear its
  source.

I've added counters for the MTL_CHAN_INT_CTRL() registers bits in
dwmac4_irq_mtl_status():

     irq_gmac_0_n: 1
     irq_gmac_5_n: 4088
     irq_gmac_18_n: 70
     irq_mtl0_n: 2279161
     irq_mtl_0_n: 2279161
     irq_mtl_1_n: 2279161
     irq_mtl_2_n: 2279161
     irq_mtl_3_n: 2279161
     irq_mtl_4_n: 2279161
     irq_mtl_chan0_1_n: 2279161
     irq_chan0_n: 2279161
     irq_chan0_0_n: 269
     irq_chan0_2_n: 2874
     irq_chan0_6_n: 2754
     irq_chan0_10_n: 2871
     irq_chan0_11_n: 1793
     irq_chan0_15_n: 2749
     irq_tx_path_in_lpi_mode_n: 13
     irq_tx_path_exit_lpi_mode_n: 13
     irq_rx_path_in_lpi_mode_n: 2112
     irq_rx_path_exit_lpi_mode_n: 2111
     irq_rgmii_n: 1
     rx_normal_irq_n: 2754
     tx_normal_irq_n: 269
     normal_irq_n: 4816
     q0_tx_irq_n: 269
     q0_rx_irq_n: 2754

I've then modified dwmac4_irq_mtl_status() to write back the status
value to MTL_CHAN_INT_CTRL() unconditionally:

     irq_gmac_0_n: 1
     irq_gmac_5_n: 4429
     irq_gmac_18_n: 96
     irq_mtl0_n: 5165861
     irq_mtl_0_n: 5212
     irq_mtl_1_n: 5165861
     irq_mtl_2_n: 5165861
     irq_mtl_3_n: 5165861
     irq_mtl_4_n: 5165861
     irq_mtl_chan0_1_n: 5212
     irq_chan0_n: 5165861
     irq_chan0_0_n: 274
     irq_chan0_2_n: 2965
     irq_chan0_6_n: 2858
     irq_chan0_10_n: 2965
     irq_chan0_11_n: 1899
     irq_chan0_15_n: 2838
     irq_tx_path_in_lpi_mode_n: 6
     irq_tx_path_exit_lpi_mode_n: 6
     irq_rx_path_in_lpi_mode_n: 2364
     irq_rx_path_exit_lpi_mode_n: 2363
     irq_rgmii_n: 1
     rx_normal_irq_n: 2858
     tx_normal_irq_n: 274
     normal_irq_n: 5031
     q0_tx_irq_n: 274
     q0_rx_irq_n: 2858

As expected, that clears the interrupt source for Q01S, so
irq_mtl_chan0_1_n and irq_mtl_0_n are now under control.Enabling support
for 5 channels in DT:

     irq_gmac_0_n: 1
     irq_gmac_5_n: 4993
     irq_gmac_18_n: 74
     irq_mtl0_n: 3084994
     irq_mtl1_n: 3084994
     irq_mtl2_n: 3084994
     irq_mtl3_n: 3084994
     irq_mtl4_n: 3084994
     irq_mtl_0_n: 5433
     irq_mtl_1_n: 9272
     irq_mtl_2_n: 13218
     irq_mtl_3_n: 17084
     irq_mtl_4_n: 21010
     irq_mtl_chan0_0_n: 1
     irq_mtl_chan0_1_n: 4401
     irq_mtl_chan0_16_n: 1
     irq_mtl_chan1_1_n: 4401
     irq_mtl_chan2_1_n: 4401
     irq_mtl_chan3_1_n: 4401
     irq_mtl_chan4_1_n: 4401
     irq_chan0_n: 3084994
     irq_chan1_n: 3084994
     irq_chan2_n: 3084994
     irq_chan3_n: 3084994
     irq_chan4_n: 3084994
     irq_chan0_0_n: 266
     irq_chan0_2_n: 2923
     irq_chan0_6_n: 2809
     irq_chan0_10_n: 2925
     irq_chan0_11_n: 2203
     irq_chan0_15_n: 2738
     irq_chan1_2_n: 3
     irq_chan1_10_n: 3
     irq_chan2_2_n: 1
     irq_chan2_10_n: 1
     irq_chan3_2_n: 8
     irq_chan3_10_n: 8
     irq_chan4_2_n: 2
     irq_chan4_10_n: 2
     irq_tx_path_in_lpi_mode_n: 6
     irq_tx_path_exit_lpi_mode_n: 6
     irq_rx_path_in_lpi_mode_n: 2633
     irq_rx_path_exit_lpi_mode_n: 2632
     irq_rgmii_n: 1
     rx_normal_irq_n: 2809
     tx_normal_irq_n: 266
     normal_irq_n: 5278
     q0_tx_irq_n: 266
     q0_rx_irq_n: 2809

There are no more storms in interrupt bit counters. The only counters
that are out of control are irq_mtlX_n and irq_chanX_n, as expected, as
they simply count the number of times the IRQ handling functions are
called.

Unless we're missing some interrupt sources in other registers, I think
this indicates that the storm is not caused by the sbd_intr_o or
sbd_perch_[rt]x_intr_o signals. lpi_intr_o seems the most likely culprit
at this point (more on that below).

> > Here are the stats after enabling five queues in DT, also captured right
> > after booting to userspace:
> > 
> >      irq_gmac_0_n: 1
> >      irq_gmac_5_n: 4020
> >      irq_gmac_18_n: 41
> >      irq_mtl0_n: 1286469
> >      irq_mtl1_n: 1286469
> >      irq_mtl2_n: 1286469
> >      irq_mtl3_n: 1286469
> >      irq_mtl4_n: 1286469
> >      irq_mtl_0_n: 6432345
> >      irq_mtl_1_n: 6432345
> >      irq_mtl_2_n: 6432345
> >      irq_mtl_3_n: 6432345
> >      irq_mtl_4_n: 6432345
> 
> These values are the sum of irq_mtl[0-4]_n, so would be expected given
> the other numbers.
> 
> >      irq_chan0_n: 1286469
> >      irq_chan1_n: 1286469
> >      irq_chan2_n: 1286469
> >      irq_chan3_n: 1286469
> >      irq_chan4_n: 1286469
> >      irq_chan0_0_n: 416
> >      irq_chan0_2_n: 1286466
> >      irq_chan0_6_n: 3470
> >      irq_chan0_10_n: 1286466
> >      irq_chan0_11_n: 2740
> >      irq_chan0_15_n: 2686
> >      irq_chan1_2_n: 1286469
> >      irq_chan1_10_n: 1286469
> >      irq_chan2_2_n: 1286467
> >      irq_chan2_10_n: 1286467
> >      irq_chan4_2_n: 1286469
> >      irq_chan4_10_n: 1286469
> 
> It's slightly interesting that irq_chanX_2_n and irq_chanX_10_n don't
> match their corresponding irq_chanX_n values, which implies that they
> have been clear. It's likely given that we're talking about 0, 2 or 3
> times that's due to the first few packets and these bits hadn't been
> set. So again, I don't think TBU and ETI are significant.
> 
> > Setting eee-broken-1000t, with a single queue:
> > 
> >      irq_gmac_0_n: 1
> >      irq_gmac_18_n: 6
> >      irq_mtl0_n: 2548
> >      irq_mtl_0_n: 2548
> >      irq_mtl_1_n: 2548
> >      irq_mtl_2_n: 2548
> >      irq_mtl_3_n: 2548
> >      irq_mtl_4_n: 2548
> >      irq_chan0_n: 2548
> >      irq_chan0_0_n: 282
> >      irq_chan0_2_n: 2548
> >      irq_chan0_6_n: 2324
> >      irq_chan0_10_n: 2548
> >      irq_chan0_11_n: 29
> >      irq_chan0_15_n: 2548
> 
> These counts suggest that the interrupt handler was entered 2548 times
> at the point they were captured, which corresponds to "normal"
> interrupts for channel 0.
> 
> > 
> > And eee-broken-1000t with 5 queues:
> > 
> >      irq_gmac_0_n: 1
> >      irq_gmac_18_n: 8
> >      irq_mtl0_n: 2672
> >      irq_mtl1_n: 2672
> >      irq_mtl2_n: 2672
> >      irq_mtl3_n: 2672
> >      irq_mtl4_n: 2672
> >      irq_mtl_0_n: 13360
> >      irq_mtl_1_n: 13360
> >      irq_mtl_2_n: 13360
> >      irq_mtl_3_n: 13360
> >      irq_mtl_4_n: 13360
> >      irq_chan0_n: 2672
> >      irq_chan1_n: 2672
> >      irq_chan2_n: 2672
> >      irq_chan3_n: 2672
> >      irq_chan4_n: 2672
> >      irq_chan0_0_n: 283
> >      irq_chan0_2_n: 2672
> >      irq_chan0_6_n: 2439
> >      irq_chan0_10_n: 2672
> >      irq_chan0_11_n: 46
> >      irq_chan0_15_n: 2672
> >      irq_chan2_2_n: 2670
> >      irq_chan2_10_n: 2670
> >      irq_chan3_2_n: 2672
> >      irq_chan3_10_n: 2672
> 
> So channel 0 responsible for 2672 normal interrupts. Again, this
> reinforces that the other values with 2672 are likely not significant.
> 
> > Given the enabled interrupts, I agree that the counters are misleading,
> > as none of the interrupt bits with high counts are enabled. I'm however
> > not entirely sure about the MTL interrupt status register, it's not
> > clear to me if it is wired to the EQOS IRQ line as I don't see a
> > corresponding interrupt enable register.
> > 
> > If we rule out the main EQOS IRQ line and the per-channel RX and TX IRQ
> > lines as the source of the interrupt storm, the last possible culprit
> > according to section 7.1.2 (A53 Interrupts) of the i.MX8MP reference
> > manual would be the "ENET QOS TSN LPI RX exit Interrupt" that is OR'ed
> > into IRQ 135. As that's related to EEE, it's a probable culprit, but I
> > don't know how what controls that IRQ line.
> 
> As you have several interrupt signals which presumably show up in
> /proc/interrupts, do the values in your IRQ counters correspond with
> those interrupt sources? Are any of these interrupts shared with
> anything else?

# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  9:          0          0          0          0    GICv3  25 Level     vgic
 11:       4587       5251       5038       5230    GICv3  30 Level     arch_timer
 12:          0          0          0          0    GICv3  27 Level     kvm guest vtimer
 14:       3953       7210       6374       5861    GICv3  79 Level     timer@306a0000
 15:          0          0          0          0    GICv3  60 Level     30880000.serial
 16:        173          0          0          0    GICv3  59 Level     30890000.serial
 17:          0          0          0          0    GICv3  61 Level     30a60000.serial
 18:          0          0          0          0    GICv3  36 Level     30370000.snvs:snvs-powerkey
 19:          0          0          0          0    GICv3  51 Level     rtc alarm
 20:          0          0          0          0    GICv3 110 Level     30280000.watchdog
 21:         52          0          0          0    GICv3  56 Level     mmc2
 23:          0          0          0          0    GICv3  23 Level     arm-pmu
 24:          0          0          0          0    GICv3 130 Level     imx8_ddr_perf_pmu
 30:          0          0          0          0 gpio-mxc   3 Edge      pca9450-irq
 72:          0          0          0          0 gpio-mxc  11 Edge      hym8563
 73:          0          0          0          0 gpio-mxc  12 Edge      30b50000.mmc cd
195:        810          0          0          0    GICv3  67 Level     30a20000.i2c
196:        140          0          0          0    GICv3  68 Level     30a30000.i2c
197:          0          0          0          0    GICv3  69 Level     30a40000.i2c
198:         35          0          0          0    GICv3  70 Level     30a50000.i2c
199:          0          0          0          0    GICv3 109 Level     30ae0000.i2c
200:    5930706          0          0          0    GICv3 167 Level     eth0
201:          0          0          0          0    GICv3 166 Level     eth0
202:        370          0          0          0    GICv3  55 Level     mmc1
203:          0          0          0          0    GICv3 181 Level     32f10108.usb
205:         81          0          0          0    GICv3  73 Level     xhci-hcd:usb1
206:          0          0          0          0    GICv3  34 Level     30bd0000.dma-controller
207:          0          0          0          0    GICv3  49 Level     32e40000.csi
208:          0          0          0          0    GICv3  35 Level     38000000.gpu
209:          0          0          0          0    GICv3  66 Level     30e00000.dma-controller
210:          0          0          0          0    GICv3  57 Level     38008000.gpu
211:          0          0          0          0    GICv3  45 Level     38500000.npu
212:          0          0          0          0    GICv3 132 Level     32e30000.dwe
213:          0          0          0          0 irqsteer   0 Level     32fd8000.hdmi
214:          0          0          0          0    GICv3 135 Level     30e10000.dma-controller
215:          0          0          0          0    GICv3 106 Level     rkisp1
216:          0          0          0          0 irqsteer   8 Level     imx-lcdif
217:          0          0          0          0    GICv3  39 Level     38300000.video-codec
218:          0          0          0          0    GICv3  40 Level     38310000.video-codec
IPI0:       587        430        859        896       Rescheduling interrupts
IPI1:      5548       7530       6814       7366       Function call interrupts
IPI2:         0          0          0          0       CPU stop interrupts
IPI3:         0          0          0          0       CPU stop NMIs
IPI4:      2410       3635       3487       3707       Timer broadcast interrupts
IPI5:      3554       4650       3986       3762       IRQ work interrupts
IPI6:         0          0          0          0       CPU backtrace interrupts
IPI7:         0          0          0          0       KGDB roundup interrupts
Err:          0

GICv3 167 is interrupt 135 from section 7.1.2.

> Hmm, looking at 7.1.2, and the mention of "ENET QOS TSN LPI RX exit
> Interrupt" I'm wondering whether Freescale have wired the lpi_intr_o
> signal of the GMAC to their OR4 gate. This is the LPI RX exit
> interrupt output, and it is cleared when reading the LPI control/
> status register. However, its deassertion is synchronous to the RX
> clock domain, so it will take time to clear.

I think we're getting somewhere... All the data above confirm this
hypothesis in my opinion (or at least they rule out all the other
hypotheses I had).

Fugang, Joakim, Wei, Yannick, would you be able to check is the
lpi_intr_o signal is indeed OR'ed into interrupt 137 ? Are you aware of
the issue investigated in this mail thread ?

> The purpose of this signal is to trigger to external hardware (to the
> GMAC) to restore the application clock to the MAC. I'm not sure that
> this was meant to be wired to an actual CPU interrupt. The only clue
> is the name which suggests it is, but there's nothing that states
> there's a way to disable it being asserted which makes me more
> suspicious that it's not meant to be a CPU interrupt.

I've modified dwmac4_irq_status() to read GMAC4_LPI_CTRL_STATUS
unconditionally, and the problem persists. This could be explained by
the fact that lpi_intr_o takes time to clear as you mentioned.

Now I'm exploring unknown territory, this may be a stupid hypothesis,
but what if:

- The PHY exits LPI mode, and restarts generating the RX clock (clk_rx_i).
- The MAC detects exit from LPI, and asserts lpi_intr_o.
- Before the CPU has time to process the interrupt, the PHY enters LPI
  mode again, and stops generating the RX clock.
- The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
  registers. This does not clear lpi_intr_o as there's no clk_rx_i.

> So, maybe this is the cause of the interrupt storm. Maybe Kieran isn't
> seeing the storm because his receive path is not entering LPI.

Kieran told me he will perform more tests, but ran out of time this
week.

> I think a useful check for this would be if you could either disable
> LPI entry at the link partner, or hook it up to another system which
> can have tx_lpi disabled, and see how the iMX8 system behaves.

I tried that with my RTL8153 USB-ethernet adapter, but I don't think I
can really trust the result. The device doesn't respond to `ethtool
--set-eee` in an expected way, it got stuck with LPI completely disabled
and I had to disconnect and reconnect it to recover from that.

I have another USB-ethernet adapter doesn't support EEE, and no second
i.MX8MP system I could use for testing right now. I'll see if I can find
suitable hardware, but it may take a while (I'm about to go on a trip
abroad).

> If preventing the iMX8 receive path entering LPI fixes the problem,
> then I think this is likely the culpret.
> 
> However, I'd be worred about this - if we "disable LPI" by way of
> the advertisement at the local end, there is the possibility that a
> remote system could override the negotiation and force its transmit
> link into LPI mode, which would cause the iMX8MP receive side to see
> LPI entry and exit, triggering this interrupt. If this is correct,
> that gives an attacker a way to manipulate the iMX8MP system,
> potentially causing all sorts of problems.
> 
> Hmm. Not sure I like this look of that.

I'm sure I don't like it :-/

> If this hypothesis is correct, then yes, disabling EEE is the only
> way forward for this, but I would suggest going further - ensuring
> that SmartEEE is enabled on the PHY but with the advertisement
> cleared (so EEE negotiation indicates not supported) to block the
> receive side LPI getting to the EQOS.

I'm not sure how that should be implemented, I'd appreciate guidance. In
particular, the RTL8211E appears to support SmartEEE (based on the
information provided in this mail thread), but the registers to control
it are not documented. Maybe we can just rely on the fact it will be
enabled as a reset default at boot time.

> This also means that 100M EEE would also be affected, so just
> disabling 1G EEE in DT is insufficient.

Agreed. I've just tested forcing 100BaseT with EEE enabled, and the
issue persists.

> Andrew - if we need to go down this path, I think we need a flag in
> the PHY flags to indicate that we want SmartEEE enabled.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-14 22:26                   ` Laurent Pinchart
@ 2025-11-18  1:50                     ` Wei Fang
  2025-11-22  7:22                       ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Wei Fang @ 2025-11-18  1:50 UTC (permalink / raw)
  To: Laurent Pinchart, Clark Wang
  Cc: Oleksij Rempel, Emanuele Ghidoli, devicetree@vger.kernel.org,
	imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
	Daniel Scally, Kieran Bingham, Stefan Klug, Conor Dooley,
	Fabio Estevam, Krzysztof Kozlowski, Pengutronix Kernel Team,
	Rob Herring, Sascha Hauer, Shawn Guo, Russell King (Oracle)

Sorry, I only have a little experience with DWMac, add Clark to help look
at this issue.

> Dropping Catalin Popescu from CC as his e-mail address bounces, and adding
> Fugang Duan, Joakim Zhang, Wei Fang and Yannick Vignong from NXP who have
> worked on upstream i.MX8MP support in the driver.
> 
> Fugang, Joakim, Wei and Yannick, there's a question for you below.
> 
> On Thu, Nov 13, 2025 at 10:59:23AM +0000, Russell King (Oracle) wrote:
> > On Thu, Nov 13, 2025 at 03:06:27AM +0200, Laurent Pinchart wrote:
> > > On Thu, Nov 13, 2025 at 12:25:52AM +0200, Laurent Pinchart wrote:
> > > > On Wed, Nov 12, 2025 at 12:03:13PM +0000, Russell King (Oracle) wrote:
> > > > > On Wed, Nov 12, 2025 at 01:54:34AM +0200, Laurent Pinchart wrote:
> > > > > > On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> > > > > > > I didn't notice it at the time because my board was
> > > > > > > connected to a switch that didn't support EEE.
> > > > > >
> > > > > > I can confirm that reverting that commit makes the issue
> > > > > > disappear. So we're dealing with an interrupt storm that
> > > > > > occurs when all three of the following conditions are true:
> > > > > >
> > > > > > - cpu-pd-wait is enabled
> > > > > > - EEE is enabled
> > > > > > - the peer also supports EEE
> > > > >
> > > > > Thanks - overall, please take the statistics and interrupt
> > > > > status bits with a pinch of salt - I suspect there are cases
> > > > > where the interrupt is not actually enabled, and the code
> > > > > doesn't take action to clear down a set status bit, but _does_
> > > > > count it - so every interrupt that happens increments the counter.
> > > >
> > > > True. To (partly) avoid that, I've dropped the line that discards
> > > > disabled bits in dwmac4_irq_status():
> > > >
> > > >  	/* Discard disabled bits */
> > > > -	intr_status &= intr_enable;
> > > >
> > > > to ensure that all bits are processed and cleared. I then didn't
> > > > see any high count of any of the GMAC_INT_STATUS interrupts. For
> > > > MTL_INTERRUPT_STATUS it's a bit different, as by default only one
> > > > queue is processed.
> > > >
> > > > > > Furthermore, I tried counting bits from all the interrupt
> > > > > > status registers I could find. The count of
> > > > > > MTL_INTERRUPT_STATUS Q0IS to Q4IS bits is very high, and so are the
> DMA_CH0_STATUS TBU and ETI bits.
> > > > >
> > > > > TBU means that the transmitter found that the next buffer was
> > > > > owned by the "application" rather than the hardware, which would
> > > > > be normal after getting to the end of the queued packets.
> > > > >
> > > > > ETI means that a packet has been transferred into MTL memory,
> > > > > and thus would occur for every transmitted packet.
> > > > >
> > > > > Having dug into the imx8m documentation and the driver this
> > > > > morning, I don't think TBU and ETI are the source of the
> > > > > interrupt storm. Their corresponding interrupt enable bits are
> > > > > DMA_CHAN_INTR_ENA_TBUE and DMA_CHAN_INTR_ENA_ETE (driver
> names).
> > > > > Both of these only appear in a header file - the code never
> > > > > enables these interrupts. So, TBU and ETI should not be causing an
> interrupt storm.
> > > > >
> > > > > As for QxIS, stmmac_common_interrupt() will iterate over the
> > > > > queues in use, calling stmmac_host_mtl_irq_status() aka
> > > > > dwmac4_irq_mtl_status() for each. Only if this happens will
> > > > > MTL_CHAN_INT_CTRL() be read which clears the status bit. In
> > > > > other words, if e.g. Q1IS is set, but only one queue is being
> > > > > used. dwmac4_irq_mtl_status() won't be called for queue 1, and thus
> MTL_CHAN_INT_CTRL() won't be read to clear Q1IS.
> > > >
> > > > That's why I tried to enable all 5 queues in DT, but alas, it
> > > > didn't help. I'll try again and count all possible interrupts.
> > >
> > > Here's my debug patch (not very pretty, sorry about that):
> >
> > That's fine. Thanks for providing this and the raw data.
> >
> > > Here are the corresponding stats captured right after booting to
> > > userspace, with the 0 counts stripped off to keep the output readable:
> > >
> > >      irq_gmac_0_n: 1
> >
> > RSGMIIS, disabled, cleared by read of MAC_PHYIF_CONTROL_STATUS.
> >
> > >      irq_gmac_5_n: 4047
> >
> > LPIIS, enabled, cleared by read of LPI_CONTROL_STATUS which is done.
> >
> > >      irq_gmac_18_n: 46
> >
> > MDIOIS, disabled, clear on read of _this_ status register
> >
> > >      irq_mtl0_n: 2244307
> >
> > This will increment each time dwmac4_irq_mtl_status() is called for
> > channel 0, which will be called each time stmmac_common_interrupt() is
> > called. Thus, this indicates the total number of times the stmmac
> > interrupt handler has been called.
> 
> Yes, my goal with the irq_mtlX_n counters was to check for which
> channels/queues the dwmac4_irq_mtl_status() was called.
> 
> > >      irq_mtl_0_n: 2244307
> > >      irq_mtl_1_n: 2244307
> > >      irq_mtl_2_n: 2244307
> > >      irq_mtl_3_n: 2244307
> > >      irq_mtl_4_n: 2244307
> >
> > These should be cleared by reading the corresponding queue interrupt
> > control/status register, iow MTL_CHAN_INT_CTRL(). However, we do not
> > write to MTL_CHAN_INT_CTRL() to enable any of the interrupts there, so
> > this looks weird to me, so it would be an idea to look at what value
> > this MTL_CHAN_INT_CTRL() register contains, it may provide something
> > useful, but I actually suspect it's another red herring.
> 
> All the MTL_CHAN_INT_CTRL() registers read as 0x00000002, so the interrupts
> are not enabled.
> 
> > >      irq_chan0_n: 2244307
> >
> > Similarly to irq_mtl0_n, this will increment each time
> > dwmac4_dma_interrupt() is called for channel 0, which will be via
> > stmmac_napi_check(), stmmac_dma_interrupt() and
> > stmmac_common_interrupt(). Therefore, it is expected to have the same
> > value as irq_mtl0_n.
> >
> > >      irq_chan0_0_n: 333
> > >      irq_chan0_2_n: 2244307
> > >      irq_chan0_6_n: 2769
> > >      irq_chan0_10_n: 2244307
> > >      irq_chan0_11_n: 2799
> > >      irq_chan0_15_n: 2701
> >
> > Only interrupts 0, 6, 12, 14 and 15 are enabled. Status bits in this
> > register require '1' to be written to clear them. As the value written
> > back is the status that was read masked by the interrupt enable, if
> > bits 2 or 10 are set, they will never be cleared, so will increment
> > each and every time stmmac_common_interrupt() is called. Therefore,
> > these values are not significant.
> 
> I've commented out the masking in dwmac4_dma_interrupt(), and the counters
> show that bits 2 and 10 were indeed not significant:
> 
>      irq_gmac_0_n: 1
>      irq_gmac_5_n: 3846
>      irq_gmac_18_n: 59
>      irq_mtl0_n: 2189598
>      irq_mtl_0_n: 2189598
>      irq_mtl_1_n: 2189598
>      irq_mtl_2_n: 2189598
>      irq_mtl_3_n: 2189598
>      irq_mtl_4_n: 2189598
>      irq_chan0_n: 2189598
>      irq_chan0_0_n: 258
>      irq_chan0_2_n: 2680
>      irq_chan0_6_n: 2660
>      irq_chan0_10_n: 2682
>      irq_chan0_11_n: 1659
>      irq_chan0_15_n: 2598
>      irq_tx_path_in_lpi_mode_n: 6
>      irq_tx_path_exit_lpi_mode_n: 6
>      irq_rx_path_in_lpi_mode_n: 2012
>      irq_rx_path_exit_lpi_mode_n: 2009
>      irq_rgmii_n: 1
>      rx_normal_irq_n: 2660
>      tx_normal_irq_n: 258
>      normal_irq_n: 4577
>      q0_tx_irq_n: 258
>      q0_rx_irq_n: 2660
> 
> There is still an interrupt storm, as shown by bits Q0IS to Q4IS in
> MTL_INTERRUPT_STATUS. Those bits are documented in the i.MX8MP RM as
> 
>   Queue 0 Interrupt status
> 
>   This bit indicates that there is an interrupt from Queue 0. To reset
>   this bit, the application must read Queue 0 Interrupt Control and
>   Status register to get the exact cause of the interrupt and clear its
>   source.
> 
> I've added counters for the MTL_CHAN_INT_CTRL() registers bits in
> dwmac4_irq_mtl_status():
> 
>      irq_gmac_0_n: 1
>      irq_gmac_5_n: 4088
>      irq_gmac_18_n: 70
>      irq_mtl0_n: 2279161
>      irq_mtl_0_n: 2279161
>      irq_mtl_1_n: 2279161
>      irq_mtl_2_n: 2279161
>      irq_mtl_3_n: 2279161
>      irq_mtl_4_n: 2279161
>      irq_mtl_chan0_1_n: 2279161
>      irq_chan0_n: 2279161
>      irq_chan0_0_n: 269
>      irq_chan0_2_n: 2874
>      irq_chan0_6_n: 2754
>      irq_chan0_10_n: 2871
>      irq_chan0_11_n: 1793
>      irq_chan0_15_n: 2749
>      irq_tx_path_in_lpi_mode_n: 13
>      irq_tx_path_exit_lpi_mode_n: 13
>      irq_rx_path_in_lpi_mode_n: 2112
>      irq_rx_path_exit_lpi_mode_n: 2111
>      irq_rgmii_n: 1
>      rx_normal_irq_n: 2754
>      tx_normal_irq_n: 269
>      normal_irq_n: 4816
>      q0_tx_irq_n: 269
>      q0_rx_irq_n: 2754
> 
> I've then modified dwmac4_irq_mtl_status() to write back the status value to
> MTL_CHAN_INT_CTRL() unconditionally:
> 
>      irq_gmac_0_n: 1
>      irq_gmac_5_n: 4429
>      irq_gmac_18_n: 96
>      irq_mtl0_n: 5165861
>      irq_mtl_0_n: 5212
>      irq_mtl_1_n: 5165861
>      irq_mtl_2_n: 5165861
>      irq_mtl_3_n: 5165861
>      irq_mtl_4_n: 5165861
>      irq_mtl_chan0_1_n: 5212
>      irq_chan0_n: 5165861
>      irq_chan0_0_n: 274
>      irq_chan0_2_n: 2965
>      irq_chan0_6_n: 2858
>      irq_chan0_10_n: 2965
>      irq_chan0_11_n: 1899
>      irq_chan0_15_n: 2838
>      irq_tx_path_in_lpi_mode_n: 6
>      irq_tx_path_exit_lpi_mode_n: 6
>      irq_rx_path_in_lpi_mode_n: 2364
>      irq_rx_path_exit_lpi_mode_n: 2363
>      irq_rgmii_n: 1
>      rx_normal_irq_n: 2858
>      tx_normal_irq_n: 274
>      normal_irq_n: 5031
>      q0_tx_irq_n: 274
>      q0_rx_irq_n: 2858
> 
> As expected, that clears the interrupt source for Q01S, so irq_mtl_chan0_1_n
> and irq_mtl_0_n are now under control.Enabling support for 5 channels in DT:
> 
>      irq_gmac_0_n: 1
>      irq_gmac_5_n: 4993
>      irq_gmac_18_n: 74
>      irq_mtl0_n: 3084994
>      irq_mtl1_n: 3084994
>      irq_mtl2_n: 3084994
>      irq_mtl3_n: 3084994
>      irq_mtl4_n: 3084994
>      irq_mtl_0_n: 5433
>      irq_mtl_1_n: 9272
>      irq_mtl_2_n: 13218
>      irq_mtl_3_n: 17084
>      irq_mtl_4_n: 21010
>      irq_mtl_chan0_0_n: 1
>      irq_mtl_chan0_1_n: 4401
>      irq_mtl_chan0_16_n: 1
>      irq_mtl_chan1_1_n: 4401
>      irq_mtl_chan2_1_n: 4401
>      irq_mtl_chan3_1_n: 4401
>      irq_mtl_chan4_1_n: 4401
>      irq_chan0_n: 3084994
>      irq_chan1_n: 3084994
>      irq_chan2_n: 3084994
>      irq_chan3_n: 3084994
>      irq_chan4_n: 3084994
>      irq_chan0_0_n: 266
>      irq_chan0_2_n: 2923
>      irq_chan0_6_n: 2809
>      irq_chan0_10_n: 2925
>      irq_chan0_11_n: 2203
>      irq_chan0_15_n: 2738
>      irq_chan1_2_n: 3
>      irq_chan1_10_n: 3
>      irq_chan2_2_n: 1
>      irq_chan2_10_n: 1
>      irq_chan3_2_n: 8
>      irq_chan3_10_n: 8
>      irq_chan4_2_n: 2
>      irq_chan4_10_n: 2
>      irq_tx_path_in_lpi_mode_n: 6
>      irq_tx_path_exit_lpi_mode_n: 6
>      irq_rx_path_in_lpi_mode_n: 2633
>      irq_rx_path_exit_lpi_mode_n: 2632
>      irq_rgmii_n: 1
>      rx_normal_irq_n: 2809
>      tx_normal_irq_n: 266
>      normal_irq_n: 5278
>      q0_tx_irq_n: 266
>      q0_rx_irq_n: 2809
> 
> There are no more storms in interrupt bit counters. The only counters that are
> out of control are irq_mtlX_n and irq_chanX_n, as expected, as they simply
> count the number of times the IRQ handling functions are called.
> 
> Unless we're missing some interrupt sources in other registers, I think this
> indicates that the storm is not caused by the sbd_intr_o or
> sbd_perch_[rt]x_intr_o signals. lpi_intr_o seems the most likely culprit at this
> point (more on that below).
> 
> > > Here are the stats after enabling five queues in DT, also captured
> > > right after booting to userspace:
> > >
> > >      irq_gmac_0_n: 1
> > >      irq_gmac_5_n: 4020
> > >      irq_gmac_18_n: 41
> > >      irq_mtl0_n: 1286469
> > >      irq_mtl1_n: 1286469
> > >      irq_mtl2_n: 1286469
> > >      irq_mtl3_n: 1286469
> > >      irq_mtl4_n: 1286469
> > >      irq_mtl_0_n: 6432345
> > >      irq_mtl_1_n: 6432345
> > >      irq_mtl_2_n: 6432345
> > >      irq_mtl_3_n: 6432345
> > >      irq_mtl_4_n: 6432345
> >
> > These values are the sum of irq_mtl[0-4]_n, so would be expected given
> > the other numbers.
> >
> > >      irq_chan0_n: 1286469
> > >      irq_chan1_n: 1286469
> > >      irq_chan2_n: 1286469
> > >      irq_chan3_n: 1286469
> > >      irq_chan4_n: 1286469
> > >      irq_chan0_0_n: 416
> > >      irq_chan0_2_n: 1286466
> > >      irq_chan0_6_n: 3470
> > >      irq_chan0_10_n: 1286466
> > >      irq_chan0_11_n: 2740
> > >      irq_chan0_15_n: 2686
> > >      irq_chan1_2_n: 1286469
> > >      irq_chan1_10_n: 1286469
> > >      irq_chan2_2_n: 1286467
> > >      irq_chan2_10_n: 1286467
> > >      irq_chan4_2_n: 1286469
> > >      irq_chan4_10_n: 1286469
> >
> > It's slightly interesting that irq_chanX_2_n and irq_chanX_10_n don't
> > match their corresponding irq_chanX_n values, which implies that they
> > have been clear. It's likely given that we're talking about 0, 2 or 3
> > times that's due to the first few packets and these bits hadn't been
> > set. So again, I don't think TBU and ETI are significant.
> >
> > > Setting eee-broken-1000t, with a single queue:
> > >
> > >      irq_gmac_0_n: 1
> > >      irq_gmac_18_n: 6
> > >      irq_mtl0_n: 2548
> > >      irq_mtl_0_n: 2548
> > >      irq_mtl_1_n: 2548
> > >      irq_mtl_2_n: 2548
> > >      irq_mtl_3_n: 2548
> > >      irq_mtl_4_n: 2548
> > >      irq_chan0_n: 2548
> > >      irq_chan0_0_n: 282
> > >      irq_chan0_2_n: 2548
> > >      irq_chan0_6_n: 2324
> > >      irq_chan0_10_n: 2548
> > >      irq_chan0_11_n: 29
> > >      irq_chan0_15_n: 2548
> >
> > These counts suggest that the interrupt handler was entered 2548 times
> > at the point they were captured, which corresponds to "normal"
> > interrupts for channel 0.
> >
> > >
> > > And eee-broken-1000t with 5 queues:
> > >
> > >      irq_gmac_0_n: 1
> > >      irq_gmac_18_n: 8
> > >      irq_mtl0_n: 2672
> > >      irq_mtl1_n: 2672
> > >      irq_mtl2_n: 2672
> > >      irq_mtl3_n: 2672
> > >      irq_mtl4_n: 2672
> > >      irq_mtl_0_n: 13360
> > >      irq_mtl_1_n: 13360
> > >      irq_mtl_2_n: 13360
> > >      irq_mtl_3_n: 13360
> > >      irq_mtl_4_n: 13360
> > >      irq_chan0_n: 2672
> > >      irq_chan1_n: 2672
> > >      irq_chan2_n: 2672
> > >      irq_chan3_n: 2672
> > >      irq_chan4_n: 2672
> > >      irq_chan0_0_n: 283
> > >      irq_chan0_2_n: 2672
> > >      irq_chan0_6_n: 2439
> > >      irq_chan0_10_n: 2672
> > >      irq_chan0_11_n: 46
> > >      irq_chan0_15_n: 2672
> > >      irq_chan2_2_n: 2670
> > >      irq_chan2_10_n: 2670
> > >      irq_chan3_2_n: 2672
> > >      irq_chan3_10_n: 2672
> >
> > So channel 0 responsible for 2672 normal interrupts. Again, this
> > reinforces that the other values with 2672 are likely not significant.
> >
> > > Given the enabled interrupts, I agree that the counters are
> > > misleading, as none of the interrupt bits with high counts are
> > > enabled. I'm however not entirely sure about the MTL interrupt
> > > status register, it's not clear to me if it is wired to the EQOS IRQ
> > > line as I don't see a corresponding interrupt enable register.
> > >
> > > If we rule out the main EQOS IRQ line and the per-channel RX and TX
> > > IRQ lines as the source of the interrupt storm, the last possible
> > > culprit according to section 7.1.2 (A53 Interrupts) of the i.MX8MP
> > > reference manual would be the "ENET QOS TSN LPI RX exit Interrupt"
> > > that is OR'ed into IRQ 135. As that's related to EEE, it's a
> > > probable culprit, but I don't know how what controls that IRQ line.
> >
> > As you have several interrupt signals which presumably show up in
> > /proc/interrupts, do the values in your IRQ counters correspond with
> > those interrupt sources? Are any of these interrupts shared with
> > anything else?
> 
> # cat /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   9:          0          0          0          0    GICv3  25
> Level     vgic
>  11:       4587       5251       5038       5230    GICv3  30
> Level     arch_timer
>  12:          0          0          0          0    GICv3  27
> Level     kvm guest vtimer
>  14:       3953       7210       6374       5861    GICv3  79
> Level     timer@306a0000
>  15:          0          0          0          0    GICv3  60
> Level     30880000.serial
>  16:        173          0          0          0    GICv3  59
> Level     30890000.serial
>  17:          0          0          0          0    GICv3  61
> Level     30a60000.serial
>  18:          0          0          0          0    GICv3  36
> Level     30370000.snvs:snvs-powerkey
>  19:          0          0          0          0    GICv3  51
> Level     rtc alarm
>  20:          0          0          0          0    GICv3 110
> Level     30280000.watchdog
>  21:         52          0          0          0    GICv3  56
> Level     mmc2
>  23:          0          0          0          0    GICv3  23
> Level     arm-pmu
>  24:          0          0          0          0    GICv3 130
> Level     imx8_ddr_perf_pmu
>  30:          0          0          0          0 gpio-mxc   3 Edge
> pca9450-irq
>  72:          0          0          0          0 gpio-mxc  11 Edge
> hym8563
>  73:          0          0          0          0 gpio-mxc  12 Edge
> 30b50000.mmc cd
> 195:        810          0          0          0    GICv3  67
> Level     30a20000.i2c
> 196:        140          0          0          0    GICv3  68
> Level     30a30000.i2c
> 197:          0          0          0          0    GICv3  69
> Level     30a40000.i2c
> 198:         35          0          0          0    GICv3  70
> Level     30a50000.i2c
> 199:          0          0          0          0    GICv3 109
> Level     30ae0000.i2c
> 200:    5930706          0          0          0    GICv3 167
> Level     eth0
> 201:          0          0          0          0    GICv3 166
> Level     eth0
> 202:        370          0          0          0    GICv3  55
> Level     mmc1
> 203:          0          0          0          0    GICv3 181
> Level     32f10108.usb
> 205:         81          0          0          0    GICv3  73
> Level     xhci-hcd:usb1
> 206:          0          0          0          0    GICv3  34
> Level     30bd0000.dma-controller
> 207:          0          0          0          0    GICv3  49
> Level     32e40000.csi
> 208:          0          0          0          0    GICv3  35
> Level     38000000.gpu
> 209:          0          0          0          0    GICv3  66
> Level     30e00000.dma-controller
> 210:          0          0          0          0    GICv3  57
> Level     38008000.gpu
> 211:          0          0          0          0    GICv3  45
> Level     38500000.npu
> 212:          0          0          0          0    GICv3 132
> Level     32e30000.dwe
> 213:          0          0          0          0 irqsteer   0 Level
> 32fd8000.hdmi
> 214:          0          0          0          0    GICv3 135
> Level     30e10000.dma-controller
> 215:          0          0          0          0    GICv3 106
> Level     rkisp1
> 216:          0          0          0          0 irqsteer   8 Level
> imx-lcdif
> 217:          0          0          0          0    GICv3  39
> Level     38300000.video-codec
> 218:          0          0          0          0    GICv3  40
> Level     38310000.video-codec
> IPI0:       587        430        859        896       Rescheduling
> interrupts
> IPI1:      5548       7530       6814       7366       Function call
> interrupts
> IPI2:         0          0          0          0       CPU stop
> interrupts
> IPI3:         0          0          0          0       CPU stop
> NMIs
> IPI4:      2410       3635       3487       3707       Timer
> broadcast interrupts
> IPI5:      3554       4650       3986       3762       IRQ work
> interrupts
> IPI6:         0          0          0          0       CPU
> backtrace interrupts
> IPI7:         0          0          0          0       KGDB
> roundup interrupts
> Err:          0
> 
> GICv3 167 is interrupt 135 from section 7.1.2.
> 
> > Hmm, looking at 7.1.2, and the mention of "ENET QOS TSN LPI RX exit
> > Interrupt" I'm wondering whether Freescale have wired the lpi_intr_o
> > signal of the GMAC to their OR4 gate. This is the LPI RX exit
> > interrupt output, and it is cleared when reading the LPI control/
> > status register. However, its deassertion is synchronous to the RX
> > clock domain, so it will take time to clear.
> 
> I think we're getting somewhere... All the data above confirm this hypothesis in
> my opinion (or at least they rule out all the other hypotheses I had).
> 
> Fugang, Joakim, Wei, Yannick, would you be able to check is the lpi_intr_o signal
> is indeed OR'ed into interrupt 137 ? Are you aware of the issue investigated in
> this mail thread ?
> 
> > The purpose of this signal is to trigger to external hardware (to the
> > GMAC) to restore the application clock to the MAC. I'm not sure that
> > this was meant to be wired to an actual CPU interrupt. The only clue
> > is the name which suggests it is, but there's nothing that states
> > there's a way to disable it being asserted which makes me more
> > suspicious that it's not meant to be a CPU interrupt.
> 
> I've modified dwmac4_irq_status() to read GMAC4_LPI_CTRL_STATUS
> unconditionally, and the problem persists. This could be explained by the fact
> that lpi_intr_o takes time to clear as you mentioned.
> 
> Now I'm exploring unknown territory, this may be a stupid hypothesis, but what
> if:
> 
> - The PHY exits LPI mode, and restarts generating the RX clock (clk_rx_i).
> - The MAC detects exit from LPI, and asserts lpi_intr_o.
> - Before the CPU has time to process the interrupt, the PHY enters LPI
>   mode again, and stops generating the RX clock.
> - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
>   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> 
> > So, maybe this is the cause of the interrupt storm. Maybe Kieran isn't
> > seeing the storm because his receive path is not entering LPI.
> 
> Kieran told me he will perform more tests, but ran out of time this week.
> 
> > I think a useful check for this would be if you could either disable
> > LPI entry at the link partner, or hook it up to another system which
> > can have tx_lpi disabled, and see how the iMX8 system behaves.
> 
> I tried that with my RTL8153 USB-ethernet adapter, but I don't think I can really
> trust the result. The device doesn't respond to `ethtool --set-eee` in an expected
> way, it got stuck with LPI completely disabled and I had to disconnect and
> reconnect it to recover from that.
> 
> I have another USB-ethernet adapter doesn't support EEE, and no second
> i.MX8MP system I could use for testing right now. I'll see if I can find suitable
> hardware, but it may take a while (I'm about to go on a trip abroad).
> 
> > If preventing the iMX8 receive path entering LPI fixes the problem,
> > then I think this is likely the culpret.
> >
> > However, I'd be worred about this - if we "disable LPI" by way of the
> > advertisement at the local end, there is the possibility that a remote
> > system could override the negotiation and force its transmit link into
> > LPI mode, which would cause the iMX8MP receive side to see LPI entry
> > and exit, triggering this interrupt. If this is correct, that gives an
> > attacker a way to manipulate the iMX8MP system, potentially causing
> > all sorts of problems.
> >
> > Hmm. Not sure I like this look of that.
> 
> I'm sure I don't like it :-/
> 
> > If this hypothesis is correct, then yes, disabling EEE is the only way
> > forward for this, but I would suggest going further - ensuring that
> > SmartEEE is enabled on the PHY but with the advertisement cleared (so
> > EEE negotiation indicates not supported) to block the receive side LPI
> > getting to the EQOS.
> 
> I'm not sure how that should be implemented, I'd appreciate guidance. In
> particular, the RTL8211E appears to support SmartEEE (based on the
> information provided in this mail thread), but the registers to control it are not
> documented. Maybe we can just rely on the fact it will be enabled as a reset
> default at boot time.
> 
> > This also means that 100M EEE would also be affected, so just
> > disabling 1G EEE in DT is insufficient.
> 
> Agreed. I've just tested forcing 100BaseT with EEE enabled, and the issue
> persists.
> 
> > Andrew - if we need to go down this path, I think we need a flag in
> > the PHY flags to indicate that we want SmartEEE enabled.
> 
> --
> Regards,
> 
> Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-18  1:50                     ` Wei Fang
@ 2025-11-22  7:22                       ` Laurent Pinchart
  2025-11-22  9:57                         ` Russell King (Oracle)
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-22  7:22 UTC (permalink / raw)
  To: Wei Fang
  Cc: Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo, Russell King (Oracle)

Hello Wei,

On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> Sorry, I only have a little experience with DWMac, add Clark to help look
> at this issue.

Thank you.

I think we're getting close to having a good understanding of the
problem. I've debugged it as far as I could based on the information
available publicly. Let's try to get to the bottom of this issue, it
impacts quite a lot of people and it would be very nice to fix it
properly in mainline.

The short summary is that I'm experiencing an interrupt storm on IRQ 135
when EEE is enabled with the EQOS interface.

My current theory is that

- The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
- The issue is triggerted by the PHY exiting LPI mode
- When it exits LPI mode, the PHY restarts generating the RX clock
  (clk_rx_i).
- The MAC detects exit from LPI, and asserts lpi_intr_o.
- Before the CPU has time to process the interrupt, the PHY enters LPI
  mode again, and stops generating the RX clock.
- The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
  registers. This does not clear lpi_intr_o as there's no clk_rx_i.

Would someone at NXP with access to internal documentation and/or the
RTL be able to confirm that lpi_intr_o is indeed OR'ed into IRQ 135 ?

> > Dropping Catalin Popescu from CC as his e-mail address bounces, and adding
> > Fugang Duan, Joakim Zhang, Wei Fang and Yannick Vignong from NXP who have
> > worked on upstream i.MX8MP support in the driver.
> > 
> > Fugang, Joakim, Wei and Yannick, there's a question for you below.
> > 
> > On Thu, Nov 13, 2025 at 10:59:23AM +0000, Russell King (Oracle) wrote:
> > > On Thu, Nov 13, 2025 at 03:06:27AM +0200, Laurent Pinchart wrote:
> > > > On Thu, Nov 13, 2025 at 12:25:52AM +0200, Laurent Pinchart wrote:
> > > > > On Wed, Nov 12, 2025 at 12:03:13PM +0000, Russell King (Oracle) wrote:
> > > > > > On Wed, Nov 12, 2025 at 01:54:34AM +0200, Laurent Pinchart wrote:
> > > > > > > On Tue, Oct 28, 2025 at 09:18:17AM +0200, Laurent Pinchart wrote:
> > > > > > > > I didn't notice it at the time because my board was
> > > > > > > > connected to a switch that didn't support EEE.
> > > > > > >
> > > > > > > I can confirm that reverting that commit makes the issue
> > > > > > > disappear. So we're dealing with an interrupt storm that
> > > > > > > occurs when all three of the following conditions are true:
> > > > > > >
> > > > > > > - cpu-pd-wait is enabled
> > > > > > > - EEE is enabled
> > > > > > > - the peer also supports EEE
> > > > > >
> > > > > > Thanks - overall, please take the statistics and interrupt
> > > > > > status bits with a pinch of salt - I suspect there are cases
> > > > > > where the interrupt is not actually enabled, and the code
> > > > > > doesn't take action to clear down a set status bit, but _does_
> > > > > > count it - so every interrupt that happens increments the counter.
> > > > >
> > > > > True. To (partly) avoid that, I've dropped the line that discards
> > > > > disabled bits in dwmac4_irq_status():
> > > > >
> > > > >  	/* Discard disabled bits */
> > > > > -	intr_status &= intr_enable;
> > > > >
> > > > > to ensure that all bits are processed and cleared. I then didn't
> > > > > see any high count of any of the GMAC_INT_STATUS interrupts. For
> > > > > MTL_INTERRUPT_STATUS it's a bit different, as by default only one
> > > > > queue is processed.
> > > > >
> > > > > > > Furthermore, I tried counting bits from all the interrupt
> > > > > > > status registers I could find. The count of
> > > > > > > MTL_INTERRUPT_STATUS Q0IS to Q4IS bits is very high, and so are the
> > DMA_CH0_STATUS TBU and ETI bits.
> > > > > >
> > > > > > TBU means that the transmitter found that the next buffer was
> > > > > > owned by the "application" rather than the hardware, which would
> > > > > > be normal after getting to the end of the queued packets.
> > > > > >
> > > > > > ETI means that a packet has been transferred into MTL memory,
> > > > > > and thus would occur for every transmitted packet.
> > > > > >
> > > > > > Having dug into the imx8m documentation and the driver this
> > > > > > morning, I don't think TBU and ETI are the source of the
> > > > > > interrupt storm. Their corresponding interrupt enable bits are
> > > > > > DMA_CHAN_INTR_ENA_TBUE and DMA_CHAN_INTR_ENA_ETE (driver
> > names).
> > > > > > Both of these only appear in a header file - the code never
> > > > > > enables these interrupts. So, TBU and ETI should not be causing an
> > interrupt storm.
> > > > > >
> > > > > > As for QxIS, stmmac_common_interrupt() will iterate over the
> > > > > > queues in use, calling stmmac_host_mtl_irq_status() aka
> > > > > > dwmac4_irq_mtl_status() for each. Only if this happens will
> > > > > > MTL_CHAN_INT_CTRL() be read which clears the status bit. In
> > > > > > other words, if e.g. Q1IS is set, but only one queue is being
> > > > > > used. dwmac4_irq_mtl_status() won't be called for queue 1, and thus
> > MTL_CHAN_INT_CTRL() won't be read to clear Q1IS.
> > > > >
> > > > > That's why I tried to enable all 5 queues in DT, but alas, it
> > > > > didn't help. I'll try again and count all possible interrupts.
> > > >
> > > > Here's my debug patch (not very pretty, sorry about that):
> > >
> > > That's fine. Thanks for providing this and the raw data.
> > >
> > > > Here are the corresponding stats captured right after booting to
> > > > userspace, with the 0 counts stripped off to keep the output readable:
> > > >
> > > >      irq_gmac_0_n: 1
> > >
> > > RSGMIIS, disabled, cleared by read of MAC_PHYIF_CONTROL_STATUS.
> > >
> > > >      irq_gmac_5_n: 4047
> > >
> > > LPIIS, enabled, cleared by read of LPI_CONTROL_STATUS which is done.
> > >
> > > >      irq_gmac_18_n: 46
> > >
> > > MDIOIS, disabled, clear on read of _this_ status register
> > >
> > > >      irq_mtl0_n: 2244307
> > >
> > > This will increment each time dwmac4_irq_mtl_status() is called for
> > > channel 0, which will be called each time stmmac_common_interrupt() is
> > > called. Thus, this indicates the total number of times the stmmac
> > > interrupt handler has been called.
> > 
> > Yes, my goal with the irq_mtlX_n counters was to check for which
> > channels/queues the dwmac4_irq_mtl_status() was called.
> > 
> > > >      irq_mtl_0_n: 2244307
> > > >      irq_mtl_1_n: 2244307
> > > >      irq_mtl_2_n: 2244307
> > > >      irq_mtl_3_n: 2244307
> > > >      irq_mtl_4_n: 2244307
> > >
> > > These should be cleared by reading the corresponding queue interrupt
> > > control/status register, iow MTL_CHAN_INT_CTRL(). However, we do not
> > > write to MTL_CHAN_INT_CTRL() to enable any of the interrupts there, so
> > > this looks weird to me, so it would be an idea to look at what value
> > > this MTL_CHAN_INT_CTRL() register contains, it may provide something
> > > useful, but I actually suspect it's another red herring.
> > 
> > All the MTL_CHAN_INT_CTRL() registers read as 0x00000002, so the interrupts
> > are not enabled.
> > 
> > > >      irq_chan0_n: 2244307
> > >
> > > Similarly to irq_mtl0_n, this will increment each time
> > > dwmac4_dma_interrupt() is called for channel 0, which will be via
> > > stmmac_napi_check(), stmmac_dma_interrupt() and
> > > stmmac_common_interrupt(). Therefore, it is expected to have the same
> > > value as irq_mtl0_n.
> > >
> > > >      irq_chan0_0_n: 333
> > > >      irq_chan0_2_n: 2244307
> > > >      irq_chan0_6_n: 2769
> > > >      irq_chan0_10_n: 2244307
> > > >      irq_chan0_11_n: 2799
> > > >      irq_chan0_15_n: 2701
> > >
> > > Only interrupts 0, 6, 12, 14 and 15 are enabled. Status bits in this
> > > register require '1' to be written to clear them. As the value written
> > > back is the status that was read masked by the interrupt enable, if
> > > bits 2 or 10 are set, they will never be cleared, so will increment
> > > each and every time stmmac_common_interrupt() is called. Therefore,
> > > these values are not significant.
> > 
> > I've commented out the masking in dwmac4_dma_interrupt(), and the counters
> > show that bits 2 and 10 were indeed not significant:
> > 
> >      irq_gmac_0_n: 1
> >      irq_gmac_5_n: 3846
> >      irq_gmac_18_n: 59
> >      irq_mtl0_n: 2189598
> >      irq_mtl_0_n: 2189598
> >      irq_mtl_1_n: 2189598
> >      irq_mtl_2_n: 2189598
> >      irq_mtl_3_n: 2189598
> >      irq_mtl_4_n: 2189598
> >      irq_chan0_n: 2189598
> >      irq_chan0_0_n: 258
> >      irq_chan0_2_n: 2680
> >      irq_chan0_6_n: 2660
> >      irq_chan0_10_n: 2682
> >      irq_chan0_11_n: 1659
> >      irq_chan0_15_n: 2598
> >      irq_tx_path_in_lpi_mode_n: 6
> >      irq_tx_path_exit_lpi_mode_n: 6
> >      irq_rx_path_in_lpi_mode_n: 2012
> >      irq_rx_path_exit_lpi_mode_n: 2009
> >      irq_rgmii_n: 1
> >      rx_normal_irq_n: 2660
> >      tx_normal_irq_n: 258
> >      normal_irq_n: 4577
> >      q0_tx_irq_n: 258
> >      q0_rx_irq_n: 2660
> > 
> > There is still an interrupt storm, as shown by bits Q0IS to Q4IS in
> > MTL_INTERRUPT_STATUS. Those bits are documented in the i.MX8MP RM as
> > 
> >   Queue 0 Interrupt status
> > 
> >   This bit indicates that there is an interrupt from Queue 0. To reset
> >   this bit, the application must read Queue 0 Interrupt Control and
> >   Status register to get the exact cause of the interrupt and clear its
> >   source.
> > 
> > I've added counters for the MTL_CHAN_INT_CTRL() registers bits in
> > dwmac4_irq_mtl_status():
> > 
> >      irq_gmac_0_n: 1
> >      irq_gmac_5_n: 4088
> >      irq_gmac_18_n: 70
> >      irq_mtl0_n: 2279161
> >      irq_mtl_0_n: 2279161
> >      irq_mtl_1_n: 2279161
> >      irq_mtl_2_n: 2279161
> >      irq_mtl_3_n: 2279161
> >      irq_mtl_4_n: 2279161
> >      irq_mtl_chan0_1_n: 2279161
> >      irq_chan0_n: 2279161
> >      irq_chan0_0_n: 269
> >      irq_chan0_2_n: 2874
> >      irq_chan0_6_n: 2754
> >      irq_chan0_10_n: 2871
> >      irq_chan0_11_n: 1793
> >      irq_chan0_15_n: 2749
> >      irq_tx_path_in_lpi_mode_n: 13
> >      irq_tx_path_exit_lpi_mode_n: 13
> >      irq_rx_path_in_lpi_mode_n: 2112
> >      irq_rx_path_exit_lpi_mode_n: 2111
> >      irq_rgmii_n: 1
> >      rx_normal_irq_n: 2754
> >      tx_normal_irq_n: 269
> >      normal_irq_n: 4816
> >      q0_tx_irq_n: 269
> >      q0_rx_irq_n: 2754
> > 
> > I've then modified dwmac4_irq_mtl_status() to write back the status value to
> > MTL_CHAN_INT_CTRL() unconditionally:
> > 
> >      irq_gmac_0_n: 1
> >      irq_gmac_5_n: 4429
> >      irq_gmac_18_n: 96
> >      irq_mtl0_n: 5165861
> >      irq_mtl_0_n: 5212
> >      irq_mtl_1_n: 5165861
> >      irq_mtl_2_n: 5165861
> >      irq_mtl_3_n: 5165861
> >      irq_mtl_4_n: 5165861
> >      irq_mtl_chan0_1_n: 5212
> >      irq_chan0_n: 5165861
> >      irq_chan0_0_n: 274
> >      irq_chan0_2_n: 2965
> >      irq_chan0_6_n: 2858
> >      irq_chan0_10_n: 2965
> >      irq_chan0_11_n: 1899
> >      irq_chan0_15_n: 2838
> >      irq_tx_path_in_lpi_mode_n: 6
> >      irq_tx_path_exit_lpi_mode_n: 6
> >      irq_rx_path_in_lpi_mode_n: 2364
> >      irq_rx_path_exit_lpi_mode_n: 2363
> >      irq_rgmii_n: 1
> >      rx_normal_irq_n: 2858
> >      tx_normal_irq_n: 274
> >      normal_irq_n: 5031
> >      q0_tx_irq_n: 274
> >      q0_rx_irq_n: 2858
> > 
> > As expected, that clears the interrupt source for Q01S, so irq_mtl_chan0_1_n
> > and irq_mtl_0_n are now under control.Enabling support for 5 channels in DT:
> > 
> >      irq_gmac_0_n: 1
> >      irq_gmac_5_n: 4993
> >      irq_gmac_18_n: 74
> >      irq_mtl0_n: 3084994
> >      irq_mtl1_n: 3084994
> >      irq_mtl2_n: 3084994
> >      irq_mtl3_n: 3084994
> >      irq_mtl4_n: 3084994
> >      irq_mtl_0_n: 5433
> >      irq_mtl_1_n: 9272
> >      irq_mtl_2_n: 13218
> >      irq_mtl_3_n: 17084
> >      irq_mtl_4_n: 21010
> >      irq_mtl_chan0_0_n: 1
> >      irq_mtl_chan0_1_n: 4401
> >      irq_mtl_chan0_16_n: 1
> >      irq_mtl_chan1_1_n: 4401
> >      irq_mtl_chan2_1_n: 4401
> >      irq_mtl_chan3_1_n: 4401
> >      irq_mtl_chan4_1_n: 4401
> >      irq_chan0_n: 3084994
> >      irq_chan1_n: 3084994
> >      irq_chan2_n: 3084994
> >      irq_chan3_n: 3084994
> >      irq_chan4_n: 3084994
> >      irq_chan0_0_n: 266
> >      irq_chan0_2_n: 2923
> >      irq_chan0_6_n: 2809
> >      irq_chan0_10_n: 2925
> >      irq_chan0_11_n: 2203
> >      irq_chan0_15_n: 2738
> >      irq_chan1_2_n: 3
> >      irq_chan1_10_n: 3
> >      irq_chan2_2_n: 1
> >      irq_chan2_10_n: 1
> >      irq_chan3_2_n: 8
> >      irq_chan3_10_n: 8
> >      irq_chan4_2_n: 2
> >      irq_chan4_10_n: 2
> >      irq_tx_path_in_lpi_mode_n: 6
> >      irq_tx_path_exit_lpi_mode_n: 6
> >      irq_rx_path_in_lpi_mode_n: 2633
> >      irq_rx_path_exit_lpi_mode_n: 2632
> >      irq_rgmii_n: 1
> >      rx_normal_irq_n: 2809
> >      tx_normal_irq_n: 266
> >      normal_irq_n: 5278
> >      q0_tx_irq_n: 266
> >      q0_rx_irq_n: 2809
> > 
> > There are no more storms in interrupt bit counters. The only counters that are
> > out of control are irq_mtlX_n and irq_chanX_n, as expected, as they simply
> > count the number of times the IRQ handling functions are called.
> > 
> > Unless we're missing some interrupt sources in other registers, I think this
> > indicates that the storm is not caused by the sbd_intr_o or
> > sbd_perch_[rt]x_intr_o signals. lpi_intr_o seems the most likely culprit at this
> > point (more on that below).
> > 
> > > > Here are the stats after enabling five queues in DT, also captured
> > > > right after booting to userspace:
> > > >
> > > >      irq_gmac_0_n: 1
> > > >      irq_gmac_5_n: 4020
> > > >      irq_gmac_18_n: 41
> > > >      irq_mtl0_n: 1286469
> > > >      irq_mtl1_n: 1286469
> > > >      irq_mtl2_n: 1286469
> > > >      irq_mtl3_n: 1286469
> > > >      irq_mtl4_n: 1286469
> > > >      irq_mtl_0_n: 6432345
> > > >      irq_mtl_1_n: 6432345
> > > >      irq_mtl_2_n: 6432345
> > > >      irq_mtl_3_n: 6432345
> > > >      irq_mtl_4_n: 6432345
> > >
> > > These values are the sum of irq_mtl[0-4]_n, so would be expected given
> > > the other numbers.
> > >
> > > >      irq_chan0_n: 1286469
> > > >      irq_chan1_n: 1286469
> > > >      irq_chan2_n: 1286469
> > > >      irq_chan3_n: 1286469
> > > >      irq_chan4_n: 1286469
> > > >      irq_chan0_0_n: 416
> > > >      irq_chan0_2_n: 1286466
> > > >      irq_chan0_6_n: 3470
> > > >      irq_chan0_10_n: 1286466
> > > >      irq_chan0_11_n: 2740
> > > >      irq_chan0_15_n: 2686
> > > >      irq_chan1_2_n: 1286469
> > > >      irq_chan1_10_n: 1286469
> > > >      irq_chan2_2_n: 1286467
> > > >      irq_chan2_10_n: 1286467
> > > >      irq_chan4_2_n: 1286469
> > > >      irq_chan4_10_n: 1286469
> > >
> > > It's slightly interesting that irq_chanX_2_n and irq_chanX_10_n don't
> > > match their corresponding irq_chanX_n values, which implies that they
> > > have been clear. It's likely given that we're talking about 0, 2 or 3
> > > times that's due to the first few packets and these bits hadn't been
> > > set. So again, I don't think TBU and ETI are significant.
> > >
> > > > Setting eee-broken-1000t, with a single queue:
> > > >
> > > >      irq_gmac_0_n: 1
> > > >      irq_gmac_18_n: 6
> > > >      irq_mtl0_n: 2548
> > > >      irq_mtl_0_n: 2548
> > > >      irq_mtl_1_n: 2548
> > > >      irq_mtl_2_n: 2548
> > > >      irq_mtl_3_n: 2548
> > > >      irq_mtl_4_n: 2548
> > > >      irq_chan0_n: 2548
> > > >      irq_chan0_0_n: 282
> > > >      irq_chan0_2_n: 2548
> > > >      irq_chan0_6_n: 2324
> > > >      irq_chan0_10_n: 2548
> > > >      irq_chan0_11_n: 29
> > > >      irq_chan0_15_n: 2548
> > >
> > > These counts suggest that the interrupt handler was entered 2548 times
> > > at the point they were captured, which corresponds to "normal"
> > > interrupts for channel 0.
> > >
> > > >
> > > > And eee-broken-1000t with 5 queues:
> > > >
> > > >      irq_gmac_0_n: 1
> > > >      irq_gmac_18_n: 8
> > > >      irq_mtl0_n: 2672
> > > >      irq_mtl1_n: 2672
> > > >      irq_mtl2_n: 2672
> > > >      irq_mtl3_n: 2672
> > > >      irq_mtl4_n: 2672
> > > >      irq_mtl_0_n: 13360
> > > >      irq_mtl_1_n: 13360
> > > >      irq_mtl_2_n: 13360
> > > >      irq_mtl_3_n: 13360
> > > >      irq_mtl_4_n: 13360
> > > >      irq_chan0_n: 2672
> > > >      irq_chan1_n: 2672
> > > >      irq_chan2_n: 2672
> > > >      irq_chan3_n: 2672
> > > >      irq_chan4_n: 2672
> > > >      irq_chan0_0_n: 283
> > > >      irq_chan0_2_n: 2672
> > > >      irq_chan0_6_n: 2439
> > > >      irq_chan0_10_n: 2672
> > > >      irq_chan0_11_n: 46
> > > >      irq_chan0_15_n: 2672
> > > >      irq_chan2_2_n: 2670
> > > >      irq_chan2_10_n: 2670
> > > >      irq_chan3_2_n: 2672
> > > >      irq_chan3_10_n: 2672
> > >
> > > So channel 0 responsible for 2672 normal interrupts. Again, this
> > > reinforces that the other values with 2672 are likely not significant.
> > >
> > > > Given the enabled interrupts, I agree that the counters are
> > > > misleading, as none of the interrupt bits with high counts are
> > > > enabled. I'm however not entirely sure about the MTL interrupt
> > > > status register, it's not clear to me if it is wired to the EQOS IRQ
> > > > line as I don't see a corresponding interrupt enable register.
> > > >
> > > > If we rule out the main EQOS IRQ line and the per-channel RX and TX
> > > > IRQ lines as the source of the interrupt storm, the last possible
> > > > culprit according to section 7.1.2 (A53 Interrupts) of the i.MX8MP
> > > > reference manual would be the "ENET QOS TSN LPI RX exit Interrupt"
> > > > that is OR'ed into IRQ 135. As that's related to EEE, it's a
> > > > probable culprit, but I don't know how what controls that IRQ line.
> > >
> > > As you have several interrupt signals which presumably show up in
> > > /proc/interrupts, do the values in your IRQ counters correspond with
> > > those interrupt sources? Are any of these interrupts shared with
> > > anything else?
> > 
> > # cat /proc/interrupts
> >            CPU0       CPU1       CPU2       CPU3
> >   9:          0          0          0          0    GICv3  25
> > Level     vgic
> >  11:       4587       5251       5038       5230    GICv3  30
> > Level     arch_timer
> >  12:          0          0          0          0    GICv3  27
> > Level     kvm guest vtimer
> >  14:       3953       7210       6374       5861    GICv3  79
> > Level     timer@306a0000
> >  15:          0          0          0          0    GICv3  60
> > Level     30880000.serial
> >  16:        173          0          0          0    GICv3  59
> > Level     30890000.serial
> >  17:          0          0          0          0    GICv3  61
> > Level     30a60000.serial
> >  18:          0          0          0          0    GICv3  36
> > Level     30370000.snvs:snvs-powerkey
> >  19:          0          0          0          0    GICv3  51
> > Level     rtc alarm
> >  20:          0          0          0          0    GICv3 110
> > Level     30280000.watchdog
> >  21:         52          0          0          0    GICv3  56
> > Level     mmc2
> >  23:          0          0          0          0    GICv3  23
> > Level     arm-pmu
> >  24:          0          0          0          0    GICv3 130
> > Level     imx8_ddr_perf_pmu
> >  30:          0          0          0          0 gpio-mxc   3 Edge
> > pca9450-irq
> >  72:          0          0          0          0 gpio-mxc  11 Edge
> > hym8563
> >  73:          0          0          0          0 gpio-mxc  12 Edge
> > 30b50000.mmc cd
> > 195:        810          0          0          0    GICv3  67
> > Level     30a20000.i2c
> > 196:        140          0          0          0    GICv3  68
> > Level     30a30000.i2c
> > 197:          0          0          0          0    GICv3  69
> > Level     30a40000.i2c
> > 198:         35          0          0          0    GICv3  70
> > Level     30a50000.i2c
> > 199:          0          0          0          0    GICv3 109
> > Level     30ae0000.i2c
> > 200:    5930706          0          0          0    GICv3 167
> > Level     eth0
> > 201:          0          0          0          0    GICv3 166
> > Level     eth0
> > 202:        370          0          0          0    GICv3  55
> > Level     mmc1
> > 203:          0          0          0          0    GICv3 181
> > Level     32f10108.usb
> > 205:         81          0          0          0    GICv3  73
> > Level     xhci-hcd:usb1
> > 206:          0          0          0          0    GICv3  34
> > Level     30bd0000.dma-controller
> > 207:          0          0          0          0    GICv3  49
> > Level     32e40000.csi
> > 208:          0          0          0          0    GICv3  35
> > Level     38000000.gpu
> > 209:          0          0          0          0    GICv3  66
> > Level     30e00000.dma-controller
> > 210:          0          0          0          0    GICv3  57
> > Level     38008000.gpu
> > 211:          0          0          0          0    GICv3  45
> > Level     38500000.npu
> > 212:          0          0          0          0    GICv3 132
> > Level     32e30000.dwe
> > 213:          0          0          0          0 irqsteer   0 Level
> > 32fd8000.hdmi
> > 214:          0          0          0          0    GICv3 135
> > Level     30e10000.dma-controller
> > 215:          0          0          0          0    GICv3 106
> > Level     rkisp1
> > 216:          0          0          0          0 irqsteer   8 Level
> > imx-lcdif
> > 217:          0          0          0          0    GICv3  39
> > Level     38300000.video-codec
> > 218:          0          0          0          0    GICv3  40
> > Level     38310000.video-codec
> > IPI0:       587        430        859        896       Rescheduling
> > interrupts
> > IPI1:      5548       7530       6814       7366       Function call
> > interrupts
> > IPI2:         0          0          0          0       CPU stop
> > interrupts
> > IPI3:         0          0          0          0       CPU stop
> > NMIs
> > IPI4:      2410       3635       3487       3707       Timer
> > broadcast interrupts
> > IPI5:      3554       4650       3986       3762       IRQ work
> > interrupts
> > IPI6:         0          0          0          0       CPU
> > backtrace interrupts
> > IPI7:         0          0          0          0       KGDB
> > roundup interrupts
> > Err:          0
> > 
> > GICv3 167 is interrupt 135 from section 7.1.2.
> > 
> > > Hmm, looking at 7.1.2, and the mention of "ENET QOS TSN LPI RX exit
> > > Interrupt" I'm wondering whether Freescale have wired the lpi_intr_o
> > > signal of the GMAC to their OR4 gate. This is the LPI RX exit
> > > interrupt output, and it is cleared when reading the LPI control/
> > > status register. However, its deassertion is synchronous to the RX
> > > clock domain, so it will take time to clear.
> > 
> > I think we're getting somewhere... All the data above confirm this hypothesis in
> > my opinion (or at least they rule out all the other hypotheses I had).
> > 
> > Fugang, Joakim, Wei, Yannick, would you be able to check is the lpi_intr_o signal
> > is indeed OR'ed into interrupt 137 ? Are you aware of the issue investigated in
> > this mail thread ?
> > 
> > > The purpose of this signal is to trigger to external hardware (to the
> > > GMAC) to restore the application clock to the MAC. I'm not sure that
> > > this was meant to be wired to an actual CPU interrupt. The only clue
> > > is the name which suggests it is, but there's nothing that states
> > > there's a way to disable it being asserted which makes me more
> > > suspicious that it's not meant to be a CPU interrupt.
> > 
> > I've modified dwmac4_irq_status() to read GMAC4_LPI_CTRL_STATUS
> > unconditionally, and the problem persists. This could be explained by the fact
> > that lpi_intr_o takes time to clear as you mentioned.
> > 
> > Now I'm exploring unknown territory, this may be a stupid hypothesis, but what
> > if:
> > 
> > - The PHY exits LPI mode, and restarts generating the RX clock (clk_rx_i).
> > - The MAC detects exit from LPI, and asserts lpi_intr_o.
> > - Before the CPU has time to process the interrupt, the PHY enters LPI
> >   mode again, and stops generating the RX clock.
> > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> > 
> > > So, maybe this is the cause of the interrupt storm. Maybe Kieran isn't
> > > seeing the storm because his receive path is not entering LPI.
> > 
> > Kieran told me he will perform more tests, but ran out of time this week.
> > 
> > > I think a useful check for this would be if you could either disable
> > > LPI entry at the link partner, or hook it up to another system which
> > > can have tx_lpi disabled, and see how the iMX8 system behaves.
> > 
> > I tried that with my RTL8153 USB-ethernet adapter, but I don't think I can really
> > trust the result. The device doesn't respond to `ethtool --set-eee` in an expected
> > way, it got stuck with LPI completely disabled and I had to disconnect and
> > reconnect it to recover from that.
> > 
> > I have another USB-ethernet adapter doesn't support EEE, and no second
> > i.MX8MP system I could use for testing right now. I'll see if I can find suitable
> > hardware, but it may take a while (I'm about to go on a trip abroad).
> > 
> > > If preventing the iMX8 receive path entering LPI fixes the problem,
> > > then I think this is likely the culpret.
> > >
> > > However, I'd be worred about this - if we "disable LPI" by way of the
> > > advertisement at the local end, there is the possibility that a remote
> > > system could override the negotiation and force its transmit link into
> > > LPI mode, which would cause the iMX8MP receive side to see LPI entry
> > > and exit, triggering this interrupt. If this is correct, that gives an
> > > attacker a way to manipulate the iMX8MP system, potentially causing
> > > all sorts of problems.
> > >
> > > Hmm. Not sure I like this look of that.
> > 
> > I'm sure I don't like it :-/
> > 
> > > If this hypothesis is correct, then yes, disabling EEE is the only way
> > > forward for this, but I would suggest going further - ensuring that
> > > SmartEEE is enabled on the PHY but with the advertisement cleared (so
> > > EEE negotiation indicates not supported) to block the receive side LPI
> > > getting to the EQOS.
> > 
> > I'm not sure how that should be implemented, I'd appreciate guidance. In
> > particular, the RTL8211E appears to support SmartEEE (based on the
> > information provided in this mail thread), but the registers to control it are not
> > documented. Maybe we can just rely on the fact it will be enabled as a reset
> > default at boot time.
> > 
> > > This also means that 100M EEE would also be affected, so just
> > > disabling 1G EEE in DT is insufficient.
> > 
> > Agreed. I've just tested forcing 100BaseT with EEE enabled, and the issue
> > persists.
> > 
> > > Andrew - if we need to go down this path, I think we need a flag in
> > > the PHY flags to indicate that we want SmartEEE enabled.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-22  7:22                       ` Laurent Pinchart
@ 2025-11-22  9:57                         ` Russell King (Oracle)
  2025-11-23  5:38                           ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-22  9:57 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Wei Fang, Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Sat, Nov 22, 2025 at 04:22:46PM +0900, Laurent Pinchart wrote:
> Hello Wei,
> 
> On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> > Sorry, I only have a little experience with DWMac, add Clark to help look
> > at this issue.
> 
> Thank you.
> 
> I think we're getting close to having a good understanding of the
> problem. I've debugged it as far as I could based on the information
> available publicly. Let's try to get to the bottom of this issue, it
> impacts quite a lot of people and it would be very nice to fix it
> properly in mainline.
> 
> The short summary is that I'm experiencing an interrupt storm on IRQ 135
> when EEE is enabled with the EQOS interface.
> 
> My current theory is that
> 
> - The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
> - The issue is triggerted by the PHY exiting LPI mode
> - When it exits LPI mode, the PHY restarts generating the RX clock
>   (clk_rx_i).
> - The MAC detects exit from LPI, and asserts lpi_intr_o.
> - Before the CPU has time to process the interrupt, the PHY enters LPI
>   mode again, and stops generating the RX clock.
> - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
>   registers. This does not clear lpi_intr_o as there's no clk_rx_i.

Please try setting STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in dwmac-imx.c and
see whether that changes the behaviour.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-22  9:57                         ` Russell King (Oracle)
@ 2025-11-23  5:38                           ` Laurent Pinchart
  2025-11-23  8:52                             ` Russell King (Oracle)
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-23  5:38 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Wei Fang, Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

Hi Russell,

On Sat, Nov 22, 2025 at 09:57:49AM +0000, Russell King (Oracle) wrote:
> On Sat, Nov 22, 2025 at 04:22:46PM +0900, Laurent Pinchart wrote:
> > Hello Wei,
> > 
> > On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> > > Sorry, I only have a little experience with DWMac, add Clark to help look
> > > at this issue.
> > 
> > Thank you.
> > 
> > I think we're getting close to having a good understanding of the
> > problem. I've debugged it as far as I could based on the information
> > available publicly. Let's try to get to the bottom of this issue, it
> > impacts quite a lot of people and it would be very nice to fix it
> > properly in mainline.
> > 
> > The short summary is that I'm experiencing an interrupt storm on IRQ 135
> > when EEE is enabled with the EQOS interface.
> > 
> > My current theory is that
> > 
> > - The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
> > - The issue is triggerted by the PHY exiting LPI mode
> > - When it exits LPI mode, the PHY restarts generating the RX clock
> >   (clk_rx_i).
> > - The MAC detects exit from LPI, and asserts lpi_intr_o.
> > - Before the CPU has time to process the interrupt, the PHY enters LPI
> >   mode again, and stops generating the RX clock.
> > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> 
> Please try setting STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in dwmac-imx.c and
> see whether that changes the behaviour.

I have tested that and it worked like a charm ! I have submitted
https://lore.kernel.org/r/20251123053518.8478-1-laurent.pinchart@ideasonboard.com

That was quite an adventure. Thank you so much for all your support, I'm
not sure I would have managed without you (or at least I would have
needed way more time). I really really appreciate it.

If the above patch gets accepted, we will probably be able to remove the
eee-broken-* properties from the i.MX8MP device tree files (and possibly
from i.MX8DXL and i.MX93 as well). I have mentioned that below the
commit message of the patch, with a test procedure as it should be
tested on each board.

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-23  5:38                           ` Laurent Pinchart
@ 2025-11-23  8:52                             ` Russell King (Oracle)
  2025-11-23 15:23                               ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-23  8:52 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Wei Fang, Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Sun, Nov 23, 2025 at 02:38:02PM +0900, Laurent Pinchart wrote:
> Hi Russell,
> 
> On Sat, Nov 22, 2025 at 09:57:49AM +0000, Russell King (Oracle) wrote:
> > On Sat, Nov 22, 2025 at 04:22:46PM +0900, Laurent Pinchart wrote:
> > > Hello Wei,
> > > 
> > > On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> > > > Sorry, I only have a little experience with DWMac, add Clark to help look
> > > > at this issue.
> > > 
> > > Thank you.
> > > 
> > > I think we're getting close to having a good understanding of the
> > > problem. I've debugged it as far as I could based on the information
> > > available publicly. Let's try to get to the bottom of this issue, it
> > > impacts quite a lot of people and it would be very nice to fix it
> > > properly in mainline.
> > > 
> > > The short summary is that I'm experiencing an interrupt storm on IRQ 135
> > > when EEE is enabled with the EQOS interface.
> > > 
> > > My current theory is that
> > > 
> > > - The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
> > > - The issue is triggerted by the PHY exiting LPI mode
> > > - When it exits LPI mode, the PHY restarts generating the RX clock
> > >   (clk_rx_i).
> > > - The MAC detects exit from LPI, and asserts lpi_intr_o.
> > > - Before the CPU has time to process the interrupt, the PHY enters LPI
> > >   mode again, and stops generating the RX clock.
> > > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> > >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> > 
> > Please try setting STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in dwmac-imx.c and
> > see whether that changes the behaviour.
> 
> I have tested that and it worked like a charm ! I have submitted
> https://lore.kernel.org/r/20251123053518.8478-1-laurent.pinchart@ideasonboard.com
> 
> That was quite an adventure. Thank you so much for all your support, I'm
> not sure I would have managed without you (or at least I would have
> needed way more time). I really really appreciate it.
> 
> If the above patch gets accepted, we will probably be able to remove the
> eee-broken-* properties from the i.MX8MP device tree files (and possibly
> from i.MX8DXL and i.MX93 as well). I have mentioned that below the
> commit message of the patch, with a test procedure as it should be
> tested on each board.

As stated in reply to that patch, while this may reduce the severity of
the storm, I don't think it'll completely eliminate it.

I made the suggestion to set the flag as a test to confirm whether the
lpi_intr_o is indeed the problem by ensuring that the receive domain is
always clocked, and thus ensuring that the signal clears within four
clock cycles, rather than an indefinite period should the remote end
re-enter LPI mode quicky.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-23  8:52                             ` Russell King (Oracle)
@ 2025-11-23 15:23                               ` Laurent Pinchart
  2025-11-23 17:11                                 ` Russell King (Oracle)
  0 siblings, 1 reply; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-23 15:23 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Wei Fang, Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Sun, Nov 23, 2025 at 08:52:00AM +0000, Russell King (Oracle) wrote:
> On Sun, Nov 23, 2025 at 02:38:02PM +0900, Laurent Pinchart wrote:
> > On Sat, Nov 22, 2025 at 09:57:49AM +0000, Russell King (Oracle) wrote:
> > > On Sat, Nov 22, 2025 at 04:22:46PM +0900, Laurent Pinchart wrote:
> > > > Hello Wei,
> > > > 
> > > > On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> > > > > Sorry, I only have a little experience with DWMac, add Clark to help look
> > > > > at this issue.
> > > > 
> > > > Thank you.
> > > > 
> > > > I think we're getting close to having a good understanding of the
> > > > problem. I've debugged it as far as I could based on the information
> > > > available publicly. Let's try to get to the bottom of this issue, it
> > > > impacts quite a lot of people and it would be very nice to fix it
> > > > properly in mainline.
> > > > 
> > > > The short summary is that I'm experiencing an interrupt storm on IRQ 135
> > > > when EEE is enabled with the EQOS interface.
> > > > 
> > > > My current theory is that
> > > > 
> > > > - The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
> > > > - The issue is triggerted by the PHY exiting LPI mode
> > > > - When it exits LPI mode, the PHY restarts generating the RX clock
> > > >   (clk_rx_i).
> > > > - The MAC detects exit from LPI, and asserts lpi_intr_o.
> > > > - Before the CPU has time to process the interrupt, the PHY enters LPI
> > > >   mode again, and stops generating the RX clock.
> > > > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> > > >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> > > 
> > > Please try setting STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in dwmac-imx.c and
> > > see whether that changes the behaviour.
> > 
> > I have tested that and it worked like a charm ! I have submitted
> > https://lore.kernel.org/r/20251123053518.8478-1-laurent.pinchart@ideasonboard.com
> > 
> > That was quite an adventure. Thank you so much for all your support, I'm
> > not sure I would have managed without you (or at least I would have
> > needed way more time). I really really appreciate it.
> > 
> > If the above patch gets accepted, we will probably be able to remove the
> > eee-broken-* properties from the i.MX8MP device tree files (and possibly
> > from i.MX8DXL and i.MX93 as well). I have mentioned that below the
> > commit message of the patch, with a test procedure as it should be
> > tested on each board.
> 
> As stated in reply to that patch, while this may reduce the severity of
> the storm, I don't think it'll completely eliminate it.
> 
> I made the suggestion to set the flag as a test to confirm whether the
> lpi_intr_o is indeed the problem by ensuring that the receive domain is
> always clocked, and thus ensuring that the signal clears within four
> clock cycles, rather than an indefinite period should the remote end
> re-enter LPI mode quicky.

You're right. I've checked replied to the patch with the following
numbers.

100TX link, eee-broken-* set: 7000 interrupts
1000T link, eee-broken-* set: 2711 interrupts
100TX link, eee-broken-* unset: 9450 interrupts
1000T link, eee-broken-* unset: 6066 interrupts

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-23 15:23                               ` Laurent Pinchart
@ 2025-11-23 17:11                                 ` Russell King (Oracle)
  2025-11-24  0:12                                   ` Laurent Pinchart
  0 siblings, 1 reply; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-23 17:11 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Wei Fang, Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Mon, Nov 24, 2025 at 12:23:56AM +0900, Laurent Pinchart wrote:
> On Sun, Nov 23, 2025 at 08:52:00AM +0000, Russell King (Oracle) wrote:
> > On Sun, Nov 23, 2025 at 02:38:02PM +0900, Laurent Pinchart wrote:
> > > On Sat, Nov 22, 2025 at 09:57:49AM +0000, Russell King (Oracle) wrote:
> > > > On Sat, Nov 22, 2025 at 04:22:46PM +0900, Laurent Pinchart wrote:
> > > > > Hello Wei,
> > > > > 
> > > > > On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> > > > > > Sorry, I only have a little experience with DWMac, add Clark to help look
> > > > > > at this issue.
> > > > > 
> > > > > Thank you.
> > > > > 
> > > > > I think we're getting close to having a good understanding of the
> > > > > problem. I've debugged it as far as I could based on the information
> > > > > available publicly. Let's try to get to the bottom of this issue, it
> > > > > impacts quite a lot of people and it would be very nice to fix it
> > > > > properly in mainline.
> > > > > 
> > > > > The short summary is that I'm experiencing an interrupt storm on IRQ 135
> > > > > when EEE is enabled with the EQOS interface.
> > > > > 
> > > > > My current theory is that
> > > > > 
> > > > > - The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
> > > > > - The issue is triggerted by the PHY exiting LPI mode
> > > > > - When it exits LPI mode, the PHY restarts generating the RX clock
> > > > >   (clk_rx_i).
> > > > > - The MAC detects exit from LPI, and asserts lpi_intr_o.
> > > > > - Before the CPU has time to process the interrupt, the PHY enters LPI
> > > > >   mode again, and stops generating the RX clock.
> > > > > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> > > > >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> > > > 
> > > > Please try setting STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in dwmac-imx.c and
> > > > see whether that changes the behaviour.
> > > 
> > > I have tested that and it worked like a charm ! I have submitted
> > > https://lore.kernel.org/r/20251123053518.8478-1-laurent.pinchart@ideasonboard.com
> > > 
> > > That was quite an adventure. Thank you so much for all your support, I'm
> > > not sure I would have managed without you (or at least I would have
> > > needed way more time). I really really appreciate it.
> > > 
> > > If the above patch gets accepted, we will probably be able to remove the
> > > eee-broken-* properties from the i.MX8MP device tree files (and possibly
> > > from i.MX8DXL and i.MX93 as well). I have mentioned that below the
> > > commit message of the patch, with a test procedure as it should be
> > > tested on each board.
> > 
> > As stated in reply to that patch, while this may reduce the severity of
> > the storm, I don't think it'll completely eliminate it.
> > 
> > I made the suggestion to set the flag as a test to confirm whether the
> > lpi_intr_o is indeed the problem by ensuring that the receive domain is
> > always clocked, and thus ensuring that the signal clears within four
> > clock cycles, rather than an indefinite period should the remote end
> > re-enter LPI mode quicky.
> 
> You're right. I've checked replied to the patch with the following
> numbers.
> 
> 100TX link, eee-broken-* set: 7000 interrupts
> 1000T link, eee-broken-* set: 2711 interrupts
> 100TX link, eee-broken-* unset: 9450 interrupts
> 1000T link, eee-broken-* unset: 6066 interrupts

Sadly, I think this means for iMX8MP, the correct answer is to disable
EEE completely. What I was thinking when I brought this up is as follows
and dwmac-imx.c can set STMMAC_FLAG_EEE_DISABLE for iMX8MP to prevent
the use of EEE.

This works because, in phylink, pl->mac_supports_eee_ops will be set
since stmmac implements the two LPI operations. pl->mac_supports_eee
will be clear because pl->config->lpi_capabilities will be zero, and
pl->config->lpi_interfaces will be empty. This causes phylink to call
phy_disable_eee() on all PHYs that end up being attached to this
phylink instance, which should result in the PHY EEE advertisement
being cleared.

I'll package this up into a proper patch tomorrow.

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 ++++++-
 include/linux/stmmac.h                            | 9 +++++----
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 1d37c2b5ad46..113cae2bc593 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1376,7 +1376,12 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
 				 config->supported_interfaces,
 				 pcs->supported_interfaces);
 
-	if (priv->dma_cap.eee) {
+	/* Some platforms, e.g. iMX8MP, wire lpi_intr_o to the same interrupt
+	 * used for stmmac's main interrupts, which leads to interrupt storms.
+	 * STMMAC_FLAG_EEE_DISABLE allows EEE to be disabled on such platforms.
+	 */
+	if (priv->dma_cap.eee &&
+	    !(priv->plat->flags & STMMAC_FLAG_EEE_DISABLE)) {
 		/* Assume all supported interfaces also support LPI */
 		memcpy(config->lpi_interfaces, config->supported_interfaces,
 		       sizeof(config->lpi_interfaces));
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 8ae068706b63..4c770262a2f8 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -187,10 +187,11 @@ enum dwmac_core_type {
 #define STMMAC_FLAG_MULTI_MSI_EN		BIT(7)
 #define STMMAC_FLAG_EXT_SNAPSHOT_EN		BIT(8)
 #define STMMAC_FLAG_INT_SNAPSHOT_EN		BIT(9)
-#define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI		BIT(10)
-#define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING	BIT(11)
-#define STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP	BIT(12)
-#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY	BIT(13)
+#define STMMAC_FLAG_EEE_DISABLE			BIT(10)
+#define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI		BIT(11)
+#define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING	BIT(12)
+#define STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP	BIT(13)
+#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY	BIT(14)
 
 struct mac_device_info;
 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-23 17:11                                 ` Russell King (Oracle)
@ 2025-11-24  0:12                                   ` Laurent Pinchart
  2025-11-24  5:44                                     ` Oleksij Rempel
  2025-11-24  8:43                                     ` Russell King (Oracle)
  0 siblings, 2 replies; 51+ messages in thread
From: Laurent Pinchart @ 2025-11-24  0:12 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Wei Fang, Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Sun, Nov 23, 2025 at 05:11:27PM +0000, Russell King (Oracle) wrote:
> On Mon, Nov 24, 2025 at 12:23:56AM +0900, Laurent Pinchart wrote:
> > On Sun, Nov 23, 2025 at 08:52:00AM +0000, Russell King (Oracle) wrote:
> > > On Sun, Nov 23, 2025 at 02:38:02PM +0900, Laurent Pinchart wrote:
> > > > On Sat, Nov 22, 2025 at 09:57:49AM +0000, Russell King (Oracle) wrote:
> > > > > On Sat, Nov 22, 2025 at 04:22:46PM +0900, Laurent Pinchart wrote:
> > > > > > Hello Wei,
> > > > > > 
> > > > > > On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> > > > > > > Sorry, I only have a little experience with DWMac, add Clark to help look
> > > > > > > at this issue.
> > > > > > 
> > > > > > Thank you.
> > > > > > 
> > > > > > I think we're getting close to having a good understanding of the
> > > > > > problem. I've debugged it as far as I could based on the information
> > > > > > available publicly. Let's try to get to the bottom of this issue, it
> > > > > > impacts quite a lot of people and it would be very nice to fix it
> > > > > > properly in mainline.
> > > > > > 
> > > > > > The short summary is that I'm experiencing an interrupt storm on IRQ 135
> > > > > > when EEE is enabled with the EQOS interface.
> > > > > > 
> > > > > > My current theory is that
> > > > > > 
> > > > > > - The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
> > > > > > - The issue is triggerted by the PHY exiting LPI mode
> > > > > > - When it exits LPI mode, the PHY restarts generating the RX clock
> > > > > >   (clk_rx_i).
> > > > > > - The MAC detects exit from LPI, and asserts lpi_intr_o.
> > > > > > - Before the CPU has time to process the interrupt, the PHY enters LPI
> > > > > >   mode again, and stops generating the RX clock.
> > > > > > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> > > > > >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> > > > > 
> > > > > Please try setting STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in dwmac-imx.c and
> > > > > see whether that changes the behaviour.
> > > > 
> > > > I have tested that and it worked like a charm ! I have submitted
> > > > https://lore.kernel.org/r/20251123053518.8478-1-laurent.pinchart@ideasonboard.com
> > > > 
> > > > That was quite an adventure. Thank you so much for all your support, I'm
> > > > not sure I would have managed without you (or at least I would have
> > > > needed way more time). I really really appreciate it.
> > > > 
> > > > If the above patch gets accepted, we will probably be able to remove the
> > > > eee-broken-* properties from the i.MX8MP device tree files (and possibly
> > > > from i.MX8DXL and i.MX93 as well). I have mentioned that below the
> > > > commit message of the patch, with a test procedure as it should be
> > > > tested on each board.
> > > 
> > > As stated in reply to that patch, while this may reduce the severity of
> > > the storm, I don't think it'll completely eliminate it.
> > > 
> > > I made the suggestion to set the flag as a test to confirm whether the
> > > lpi_intr_o is indeed the problem by ensuring that the receive domain is
> > > always clocked, and thus ensuring that the signal clears within four
> > > clock cycles, rather than an indefinite period should the remote end
> > > re-enter LPI mode quicky.
> > 
> > You're right. I've checked replied to the patch with the following
> > numbers.
> > 
> > 100TX link, eee-broken-* set: 7000 interrupts
> > 1000T link, eee-broken-* set: 2711 interrupts
> > 100TX link, eee-broken-* unset: 9450 interrupts
> > 1000T link, eee-broken-* unset: 6066 interrupts
> 
> Sadly, I think this means for iMX8MP, the correct answer is to disable
> EEE completely. What I was thinking when I brought this up is as follows
> and dwmac-imx.c can set STMMAC_FLAG_EEE_DISABLE for iMX8MP to prevent
> the use of EEE.

I suppose there's no way to disable EEE in the RX path while keeping it
enabled in the TX path ?

> This works because, in phylink, pl->mac_supports_eee_ops will be set
> since stmmac implements the two LPI operations. pl->mac_supports_eee
> will be clear because pl->config->lpi_capabilities will be zero, and
> pl->config->lpi_interfaces will be empty. This causes phylink to call
> phy_disable_eee() on all PHYs that end up being attached to this
> phylink instance, which should result in the PHY EEE advertisement
> being cleared.
> 
> I'll package this up into a proper patch tomorrow.

Thank you. I can respin my patch on top.

>  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 ++++++-
>  include/linux/stmmac.h                            | 9 +++++----
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 1d37c2b5ad46..113cae2bc593 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -1376,7 +1376,12 @@ static int stmmac_phylink_setup(struct stmmac_priv *priv)
>  				 config->supported_interfaces,
>  				 pcs->supported_interfaces);
>  
> -	if (priv->dma_cap.eee) {
> +	/* Some platforms, e.g. iMX8MP, wire lpi_intr_o to the same interrupt
> +	 * used for stmmac's main interrupts, which leads to interrupt storms.
> +	 * STMMAC_FLAG_EEE_DISABLE allows EEE to be disabled on such platforms.
> +	 */
> +	if (priv->dma_cap.eee &&
> +	    !(priv->plat->flags & STMMAC_FLAG_EEE_DISABLE)) {
>  		/* Assume all supported interfaces also support LPI */
>  		memcpy(config->lpi_interfaces, config->supported_interfaces,
>  		       sizeof(config->lpi_interfaces));
> diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
> index 8ae068706b63..4c770262a2f8 100644
> --- a/include/linux/stmmac.h
> +++ b/include/linux/stmmac.h
> @@ -187,10 +187,11 @@ enum dwmac_core_type {
>  #define STMMAC_FLAG_MULTI_MSI_EN		BIT(7)
>  #define STMMAC_FLAG_EXT_SNAPSHOT_EN		BIT(8)
>  #define STMMAC_FLAG_INT_SNAPSHOT_EN		BIT(9)
> -#define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI		BIT(10)
> -#define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING	BIT(11)
> -#define STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP	BIT(12)
> -#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY	BIT(13)
> +#define STMMAC_FLAG_EEE_DISABLE			BIT(10)
> +#define STMMAC_FLAG_RX_CLK_RUNS_IN_LPI		BIT(11)
> +#define STMMAC_FLAG_EN_TX_LPI_CLOCKGATING	BIT(12)
> +#define STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP	BIT(13)
> +#define STMMAC_FLAG_HWTSTAMP_CORRECT_LATENCY	BIT(14)
>  
>  struct mac_device_info;
>  

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-24  0:12                                   ` Laurent Pinchart
@ 2025-11-24  5:44                                     ` Oleksij Rempel
  2025-11-24  8:43                                     ` Russell King (Oracle)
  1 sibling, 0 replies; 51+ messages in thread
From: Oleksij Rempel @ 2025-11-24  5:44 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Russell King (Oracle), Wei Fang, Clark Wang, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Mon, Nov 24, 2025 at 09:12:14AM +0900, Laurent Pinchart wrote:
> On Sun, Nov 23, 2025 at 05:11:27PM +0000, Russell King (Oracle) wrote:
> > On Mon, Nov 24, 2025 at 12:23:56AM +0900, Laurent Pinchart wrote:
> > > On Sun, Nov 23, 2025 at 08:52:00AM +0000, Russell King (Oracle) wrote:
> > > > On Sun, Nov 23, 2025 at 02:38:02PM +0900, Laurent Pinchart wrote:
> > > > > On Sat, Nov 22, 2025 at 09:57:49AM +0000, Russell King (Oracle) wrote:
> > > > > > On Sat, Nov 22, 2025 at 04:22:46PM +0900, Laurent Pinchart wrote:
> > > > > > > Hello Wei,
> > > > > > > 
> > > > > > > On Tue, Nov 18, 2025 at 01:50:55AM +0000, Wei Fang wrote:
> > > > > > > > Sorry, I only have a little experience with DWMac, add Clark to help look
> > > > > > > > at this issue.
> > > > > > > 
> > > > > > > Thank you.
> > > > > > > 
> > > > > > > I think we're getting close to having a good understanding of the
> > > > > > > problem. I've debugged it as far as I could based on the information
> > > > > > > available publicly. Let's try to get to the bottom of this issue, it
> > > > > > > impacts quite a lot of people and it would be very nice to fix it
> > > > > > > properly in mainline.
> > > > > > > 
> > > > > > > The short summary is that I'm experiencing an interrupt storm on IRQ 135
> > > > > > > when EEE is enabled with the EQOS interface.
> > > > > > > 
> > > > > > > My current theory is that
> > > > > > > 
> > > > > > > - The lpi_intr_o signal of the EQOS is OR'ed into IRQ 135.
> > > > > > > - The issue is triggerted by the PHY exiting LPI mode
> > > > > > > - When it exits LPI mode, the PHY restarts generating the RX clock
> > > > > > >   (clk_rx_i).
> > > > > > > - The MAC detects exit from LPI, and asserts lpi_intr_o.
> > > > > > > - Before the CPU has time to process the interrupt, the PHY enters LPI
> > > > > > >   mode again, and stops generating the RX clock.
> > > > > > > - The CPU processes the interrupt and reads the GMAC4_LPI_CTRL_STATUS
> > > > > > >   registers. This does not clear lpi_intr_o as there's no clk_rx_i.
> > > > > > 
> > > > > > Please try setting STMMAC_FLAG_RX_CLK_RUNS_IN_LPI in dwmac-imx.c and
> > > > > > see whether that changes the behaviour.
> > > > > 
> > > > > I have tested that and it worked like a charm ! I have submitted
> > > > > https://lore.kernel.org/r/20251123053518.8478-1-laurent.pinchart@ideasonboard.com
> > > > > 
> > > > > That was quite an adventure. Thank you so much for all your support, I'm
> > > > > not sure I would have managed without you (or at least I would have
> > > > > needed way more time). I really really appreciate it.
> > > > > 
> > > > > If the above patch gets accepted, we will probably be able to remove the
> > > > > eee-broken-* properties from the i.MX8MP device tree files (and possibly
> > > > > from i.MX8DXL and i.MX93 as well). I have mentioned that below the
> > > > > commit message of the patch, with a test procedure as it should be
> > > > > tested on each board.
> > > > 
> > > > As stated in reply to that patch, while this may reduce the severity of
> > > > the storm, I don't think it'll completely eliminate it.
> > > > 
> > > > I made the suggestion to set the flag as a test to confirm whether the
> > > > lpi_intr_o is indeed the problem by ensuring that the receive domain is
> > > > always clocked, and thus ensuring that the signal clears within four
> > > > clock cycles, rather than an indefinite period should the remote end
> > > > re-enter LPI mode quicky.
> > > 
> > > You're right. I've checked replied to the patch with the following
> > > numbers.
> > > 
> > > 100TX link, eee-broken-* set: 7000 interrupts
> > > 1000T link, eee-broken-* set: 2711 interrupts
> > > 100TX link, eee-broken-* unset: 9450 interrupts
> > > 1000T link, eee-broken-* unset: 6066 interrupts
> > 
> > Sadly, I think this means for iMX8MP, the correct answer is to disable
> > EEE completely. What I was thinking when I brought this up is as follows
> > and dwmac-imx.c can set STMMAC_FLAG_EEE_DISABLE for iMX8MP to prevent
> > the use of EEE.
> 
> I suppose there's no way to disable EEE in the RX path while keeping it
> enabled in the TX path ?

In case of 100BaseTX it may work, but not for 1000BaseT - I guess, it is
not worth it.

The other question is: will it work if SmartEEE is active?  If I recall
it correctly, your board is using RTL PHY with SmartEEE support. Will it
work if we disable LPI on the MAC side, but keep announcing EEE on the
PHY side? It will be good to test it as long as you have the reproducer.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T
  2025-11-24  0:12                                   ` Laurent Pinchart
  2025-11-24  5:44                                     ` Oleksij Rempel
@ 2025-11-24  8:43                                     ` Russell King (Oracle)
  1 sibling, 0 replies; 51+ messages in thread
From: Russell King (Oracle) @ 2025-11-24  8:43 UTC (permalink / raw)
  To: Laurent Pinchart
  Cc: Wei Fang, Clark Wang, Oleksij Rempel, Emanuele Ghidoli,
	devicetree@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, Daniel Scally,
	Kieran Bingham, Stefan Klug, Conor Dooley, Fabio Estevam,
	Krzysztof Kozlowski, Pengutronix Kernel Team, Rob Herring,
	Sascha Hauer, Shawn Guo

On Mon, Nov 24, 2025 at 09:12:14AM +0900, Laurent Pinchart wrote:
> On Sun, Nov 23, 2025 at 05:11:27PM +0000, Russell King (Oracle) wrote:
> > On Mon, Nov 24, 2025 at 12:23:56AM +0900, Laurent Pinchart wrote:
> > > You're right. I've checked replied to the patch with the following
> > > numbers.
> > > 
> > > 100TX link, eee-broken-* set: 7000 interrupts
> > > 1000T link, eee-broken-* set: 2711 interrupts
> > > 100TX link, eee-broken-* unset: 9450 interrupts
> > > 1000T link, eee-broken-* unset: 6066 interrupts
> > 
> > Sadly, I think this means for iMX8MP, the correct answer is to disable
> > EEE completely. What I was thinking when I brought this up is as follows
> > and dwmac-imx.c can set STMMAC_FLAG_EEE_DISABLE for iMX8MP to prevent
> > the use of EEE.
> 
> I suppose there's no way to disable EEE in the RX path while keeping it
> enabled in the TX path ?

Sadly not, sorry.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2025-11-24  8:43 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-26 12:29 [PATCH] arm64: dts: imx8mp-debix-model-a: Disable EEE for 1000T Laurent Pinchart
2025-10-27  1:31 ` Fabio Estevam
2025-10-27  3:08 ` Andrew Lunn
2025-10-27  7:27   ` Laurent Pinchart
2025-10-27  8:47     ` Emanuele Ghidoli
2025-10-27  9:00       ` Russell King (Oracle)
2025-10-27  9:18         ` Emanuele Ghidoli
2025-10-27  9:32     ` Russell King (Oracle)
2025-10-27 23:08       ` Laurent Pinchart
2025-10-27 11:22     ` Russell King (Oracle)
2025-10-27 23:15       ` Laurent Pinchart
2025-10-27  9:12   ` Oleksij Rempel
2025-10-27 10:02     ` Laurent Pinchart
2025-10-27 10:23       ` Oleksij Rempel
2025-10-27 10:31         ` Laurent Pinchart
2025-10-27 10:34           ` Russell King (Oracle)
2025-10-27 10:44             ` Oleksij Rempel
2025-10-27 10:48               ` Russell King (Oracle)
2025-10-27 12:50                 ` Andrew Lunn
2025-10-27 14:50                   ` Oleksij Rempel
2025-11-12 12:34     ` Russell King (Oracle)
2025-11-12 12:41       ` Kieran Bingham
2025-11-12 12:56         ` Russell King (Oracle)
2025-11-13  1:17           ` Laurent Pinchart
2025-11-12 21:32       ` Laurent Pinchart
2025-10-27  9:07 ` Russell King (Oracle)
2025-10-27  9:33   ` Laurent Pinchart
2025-10-27  9:45     ` Russell King (Oracle)
2025-10-27  9:55       ` Laurent Pinchart
2025-10-27 13:33   ` Russell King (Oracle)
2025-10-27 15:13 ` Russell King (Oracle)
2025-10-27 19:52   ` Andrew Lunn
2025-10-27 23:46   ` Laurent Pinchart
2025-10-28  0:57     ` Russell King (Oracle)
2025-10-28  7:18       ` Laurent Pinchart
2025-11-11 23:54         ` Laurent Pinchart
2025-11-12 12:03           ` Russell King (Oracle)
2025-11-12 22:25             ` Laurent Pinchart
2025-11-13  1:06               ` Laurent Pinchart
2025-11-13 10:59                 ` Russell King (Oracle)
2025-11-14 22:26                   ` Laurent Pinchart
2025-11-18  1:50                     ` Wei Fang
2025-11-22  7:22                       ` Laurent Pinchart
2025-11-22  9:57                         ` Russell King (Oracle)
2025-11-23  5:38                           ` Laurent Pinchart
2025-11-23  8:52                             ` Russell King (Oracle)
2025-11-23 15:23                               ` Laurent Pinchart
2025-11-23 17:11                                 ` Russell King (Oracle)
2025-11-24  0:12                                   ` Laurent Pinchart
2025-11-24  5:44                                     ` Oleksij Rempel
2025-11-24  8:43                                     ` Russell King (Oracle)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).