Netdev List
 help / color / mirror / Atom feed
* [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
@ 2026-04-29 12:53 Jensen Huang
  2026-05-05  8:26 ` Thorsten Leemhuis
  0 siblings, 1 reply; 8+ messages in thread
From: Jensen Huang @ 2026-04-29 12:53 UTC (permalink / raw)
  To: Russell King, Andrew Lunn, Heiner Kallweit; +Cc: regressions, netdev, LKML

Hi,

I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
When a network cable is connected during boot, the DMA reset
occasionally fails with the error message: "Failed to reset the dma".

This appears to be a timing issue related to the EEE RX clock-stop
logic. Based on my investigation with the RTL8211E PHY, I monitored
the PHY register PS1R (MMD device 3, address 0x01) and observed a
value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
clock may have already stopped.

While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
ensures the clock is running before the DMA reset, my tests suggest
that the phylink_rx_clk_stop_block() call might not provide a
sufficiently stable RX clock in time for the immediate DMA reset that
follows.

Since stmmac already sets mac_requires_rxc = true, I modified
phylink_bringup_phy() to honor this flag. This avoids toggling the
PHY's clk_stop_enable during the initialization sequence, ensuring the
RX clock remains active and stable throughout.
With the change below, I achieved 200/200 successful reboots with the
cable connected (previously ~50% failure rate).

--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2171,7 +2171,7 @@ static int phylink_bringup_phy(struct phylink
*pl, struct phy_device *phy,
     /* Allow the MAC to stop its clock if the PHY has the capability */
     pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0;

-    if (pl->mac_supports_eee_ops) {
+    if (pl->mac_supports_eee_ops && !pl->config->mac_requires_rxc) {
         /* Explicitly configure whether the PHY is allowed to stop it's
          * receive clock.
          */

Any feedback/testing on this would be appreciated.

Best regards,
Jensen Huang

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
  2026-04-29 12:53 [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18 Jensen Huang
@ 2026-05-05  8:26 ` Thorsten Leemhuis
  2026-05-07 12:49   ` Jensen Huang
  0 siblings, 1 reply; 8+ messages in thread
From: Thorsten Leemhuis @ 2026-05-05  8:26 UTC (permalink / raw)
  To: Jensen Huang, Russell King
  Cc: Heiner Kallweit, Andrew Lunn, regressions, netdev, LKML

[Jumping in here, as there are no replies yet]

BTW, Russel, just in case you missed this: looks like this regressions
caused by a change of yours.

On 4/29/26 14:53, Jensen Huang wrote:
> 
> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
> When a network cable is connected during boot, the DMA reset
> occasionally fails with the error message: "Failed to reset the dma".
> 
> This appears to be a timing issue related to the EEE RX clock-stop
> logic. Based on my investigation with the RTL8211E PHY, I monitored
> the PHY register PS1R (MMD device 3, address 0x01) and observed a
> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
> clock may have already stopped.
> 
> While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")

Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
affected? This is something that is always a good advisable (some people
would call it required). In this case even more, as it since a while
contains a fix for the change you mentioned, that wasn't backported:
c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
enabled"). But this is not my area of expertise (and in different area
of the code), so that fix might be unrelated to your issue.

Ciao, Thorsten

> ensures the clock is running before the DMA reset, my tests suggest
> that the phylink_rx_clk_stop_block() call might not provide a
> sufficiently stable RX clock in time for the immediate DMA reset that
> follows.
> 
> Since stmmac already sets mac_requires_rxc = true, I modified
> phylink_bringup_phy() to honor this flag. This avoids toggling the
> PHY's clk_stop_enable during the initialization sequence, ensuring the
> RX clock remains active and stable throughout.
> With the change below, I achieved 200/200 successful reboots with the
> cable connected (previously ~50% failure rate).
> 
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -2171,7 +2171,7 @@ static int phylink_bringup_phy(struct phylink
> *pl, struct phy_device *phy,
>      /* Allow the MAC to stop its clock if the PHY has the capability */
>      pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0;
> 
> -    if (pl->mac_supports_eee_ops) {
> +    if (pl->mac_supports_eee_ops && !pl->config->mac_requires_rxc) {
>          /* Explicitly configure whether the PHY is allowed to stop it's
>           * receive clock.
>           */
> 
> Any feedback/testing on this would be appreciated.
> 
> Best regards,
> Jensen Huang
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
  2026-05-05  8:26 ` Thorsten Leemhuis
@ 2026-05-07 12:49   ` Jensen Huang
  2026-05-07 13:13     ` Thorsten Leemhuis
  2026-05-07 13:16     ` Maxime Chevallier
  0 siblings, 2 replies; 8+ messages in thread
From: Jensen Huang @ 2026-05-07 12:49 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: Russell King, Heiner Kallweit, Andrew Lunn, regressions, netdev,
	LKML

On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
>
> [Jumping in here, as there are no replies yet]
>
> BTW, Russel, just in case you missed this: looks like this regressions
> caused by a change of yours.
>
> On 4/29/26 14:53, Jensen Huang wrote:
> >
> > I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
> > When a network cable is connected during boot, the DMA reset
> > occasionally fails with the error message: "Failed to reset the dma".
> >
> > This appears to be a timing issue related to the EEE RX clock-stop
> > logic. Based on my investigation with the RTL8211E PHY, I monitored
> > the PHY register PS1R (MMD device 3, address 0x01) and observed a
> > value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
> > clock may have already stopped.
> >
> > While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
>
> Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
> affected? This is something that is always a good advisable (some people
> would call it required). In this case even more, as it since a while
> contains a fix for the change you mentioned, that wasn't backported:
> c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
> enabled"). But this is not my area of expertise (and in different area
> of the code), so that fix might be unrelated to your issue.

Thanks for the pointer.
As you suggested, I have tested the mainline and confirmed that the
issue is not present in v7.1-rc2, nor as early as v6.19-rc1. However,
I verified that the issue persists in the latest stable v6.18.26.
I performed a git bisect and the result pointed exactly to the commit
you mentioned: c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
when VLAN is enabled").
Additionally, I tested the case where CONFIG_VLAN_8021Q is not set,
and the DMA reset issue occurs again.


Best regards,
Jensen Huang

>
> Ciao, Thorsten
>
> > ensures the clock is running before the DMA reset, my tests suggest
> > that the phylink_rx_clk_stop_block() call might not provide a
> > sufficiently stable RX clock in time for the immediate DMA reset that
> > follows.
> >
> > Since stmmac already sets mac_requires_rxc = true, I modified
> > phylink_bringup_phy() to honor this flag. This avoids toggling the
> > PHY's clk_stop_enable during the initialization sequence, ensuring the
> > RX clock remains active and stable throughout.
> > With the change below, I achieved 200/200 successful reboots with the
> > cable connected (previously ~50% failure rate).
> >
> > --- a/drivers/net/phy/phylink.c
> > +++ b/drivers/net/phy/phylink.c
> > @@ -2171,7 +2171,7 @@ static int phylink_bringup_phy(struct phylink
> > *pl, struct phy_device *phy,
> >      /* Allow the MAC to stop its clock if the PHY has the capability */
> >      pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0;
> >
> > -    if (pl->mac_supports_eee_ops) {
> > +    if (pl->mac_supports_eee_ops && !pl->config->mac_requires_rxc) {
> >          /* Explicitly configure whether the PHY is allowed to stop it's
> >           * receive clock.
> >           */
> >
> > Any feedback/testing on this would be appreciated.
> >
> > Best regards,
> > Jensen Huang
> >
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
  2026-05-07 12:49   ` Jensen Huang
@ 2026-05-07 13:13     ` Thorsten Leemhuis
  2026-05-08  8:19       ` Jensen Huang
  2026-05-11  7:35       ` Thorsten Leemhuis
  2026-05-07 13:16     ` Maxime Chevallier
  1 sibling, 2 replies; 8+ messages in thread
From: Thorsten Leemhuis @ 2026-05-07 13:13 UTC (permalink / raw)
  To: Ovidiu Panait, Jensen Huang
  Cc: Russell King, Heiner Kallweit, Andrew Lunn, regressions, netdev,
	LKML

[+Ovidiu Panait]

On 5/7/26 14:49, Jensen Huang wrote:
> On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
> <regressions@leemhuis.info> wrote:
>> On 4/29/26 14:53, Jensen Huang wrote:
>
>>> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
>>> When a network cable is connected during boot, the DMA reset
>>> occasionally fails with the error message: "Failed to reset the dma".
>>>
>>> This appears to be a timing issue related to the EEE RX clock-stop
>>> logic. Based on my investigation with the RTL8211E PHY, I monitored
>>> the PHY register PS1R (MMD device 3, address 0x01) and observed a
>>> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
>>> clock may have already stopped.
>>>
>>> While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
>>
>> Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
>> affected? This is something that is always a good advisable (some people
>> would call it required). In this case even more, as it since a while
>> contains a fix for the change you mentioned, that wasn't backported:
>> c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
>> enabled"). But this is not my area of expertise (and in different area
>> of the code), so that fix might be unrelated to your issue.
> 
> Thanks for the pointer.
> As you suggested, I have tested the mainline and confirmed that the
> issue is not present in v7.1-rc2, nor as early as v6.19-rc1. However,
> I verified that the issue persists in the latest stable v6.18.26.
> I performed a git bisect and the result pointed exactly to the commit
> you mentioned: c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
> when VLAN is enabled").

Great! Could you please cherry-pick c171e679ee66d7 to 6.18.y and see if
that fixes things? It sounds like it should.

@Ovidiu Panait: c171e679ee66d7 is a commit of yours. If Jensen confirms
that cherry-picking fixed the problem, I'd say we ask Greg to pick it up
for 6.18.y -- unless you see any reasons why that might be a bad idea.

> Additionally, I tested the case where CONFIG_VLAN_8021Q is not set,
> and the DMA reset issue occurs again.

I'd say that is likely best discussed in a new thread you might want to
start. Also wondering if it was like that earlier. Or iow: if that is a
regression or not.

Ciao, Thorsten

>>> ensures the clock is running before the DMA reset, my tests suggest
>>> that the phylink_rx_clk_stop_block() call might not provide a
>>> sufficiently stable RX clock in time for the immediate DMA reset that
>>> follows.
>>>
>>> Since stmmac already sets mac_requires_rxc = true, I modified
>>> phylink_bringup_phy() to honor this flag. This avoids toggling the
>>> PHY's clk_stop_enable during the initialization sequence, ensuring the
>>> RX clock remains active and stable throughout.
>>> With the change below, I achieved 200/200 successful reboots with the
>>> cable connected (previously ~50% failure rate).
>>>
>>> --- a/drivers/net/phy/phylink.c
>>> +++ b/drivers/net/phy/phylink.c
>>> @@ -2171,7 +2171,7 @@ static int phylink_bringup_phy(struct phylink
>>> *pl, struct phy_device *phy,
>>>      /* Allow the MAC to stop its clock if the PHY has the capability */
>>>      pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0;
>>>
>>> -    if (pl->mac_supports_eee_ops) {
>>> +    if (pl->mac_supports_eee_ops && !pl->config->mac_requires_rxc) {
>>>          /* Explicitly configure whether the PHY is allowed to stop it's
>>>           * receive clock.
>>>           */
>>>
>>> Any feedback/testing on this would be appreciated.
>>>
>>> Best regards,
>>> Jensen Huang
>>>
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
  2026-05-07 12:49   ` Jensen Huang
  2026-05-07 13:13     ` Thorsten Leemhuis
@ 2026-05-07 13:16     ` Maxime Chevallier
  1 sibling, 0 replies; 8+ messages in thread
From: Maxime Chevallier @ 2026-05-07 13:16 UTC (permalink / raw)
  To: Jensen Huang, Thorsten Leemhuis
  Cc: Russell King, Heiner Kallweit, Andrew Lunn, regressions, netdev,
	LKML

Hi,

On 07/05/2026 14:49, Jensen Huang wrote:
> On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
> <regressions@leemhuis.info> wrote:
>>
>> [Jumping in here, as there are no replies yet]
>>
>> BTW, Russel, just in case you missed this: looks like this regressions
>> caused by a change of yours.

I think Russell is dealing with unpleasant personal stuff, let's see if we
can figure this out while he's away.

>>
>> On 4/29/26 14:53, Jensen Huang wrote:
>>>
>>> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
>>> When a network cable is connected during boot, the DMA reset
>>> occasionally fails with the error message: "Failed to reset the dma".
>>>
>>> This appears to be a timing issue related to the EEE RX clock-stop
>>> logic. Based on my investigation with the RTL8211E PHY, I monitored
>>> the PHY register PS1R (MMD device 3, address 0x01) and observed a
>>> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
>>> clock may have already stopped.

From what I get, your current hypthesis is that it takes a while for that
clock to stabilize and therefore we're accessing the DMA registers too soon ?

Can you confirm that with the addition of a small delay ?

>>>
>>> While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
>>
>> Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
>> affected? This is something that is always a good advisable (some people
>> would call it required). In this case even more, as it since a while
>> contains a fix for the change you mentioned, that wasn't backported:
>> c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
>> enabled"). But this is not my area of expertise (and in different area
>> of the code), so that fix might be unrelated to your issue.
> 
> Thanks for the pointer.
> As you suggested, I have tested the mainline and confirmed that the
> issue is not present in v7.1-rc2, nor as early as v6.19-rc1. However,
> I verified that the issue persists in the latest stable v6.18.26.
> I performed a git bisect and the result pointed exactly to the commit
> you mentioned: c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
> when VLAN is enabled").

Do you mean that c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
when VLAN is enabled") introduces the bug on 6.18.26 ?

do you have the possibility of bisecting to verify when exactly the issue
was solved between v6.18 and v6.19 ?

Maxime



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
  2026-05-07 13:13     ` Thorsten Leemhuis
@ 2026-05-08  8:19       ` Jensen Huang
  2026-05-11  7:35       ` Thorsten Leemhuis
  1 sibling, 0 replies; 8+ messages in thread
From: Jensen Huang @ 2026-05-08  8:19 UTC (permalink / raw)
  To: Thorsten Leemhuis, Maxime Chevallier
  Cc: Ovidiu Panait, Russell King, Heiner Kallweit, Andrew Lunn,
	regressions, netdev, LKML

Hi Thorsten, Maxime,

On Thu, May 7, 2026 at 9:45 PM Thorsten Leemhuis
<regressions@leemhuis.info> wrote:
>
> [+Ovidiu Panait]
>
> On 5/7/26 14:49, Jensen Huang wrote:
> > On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
> > <regressions@leemhuis.info> wrote:
> >> On 4/29/26 14:53, Jensen Huang wrote:
> >
> >>> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
> >>> When a network cable is connected during boot, the DMA reset
> >>> occasionally fails with the error message: "Failed to reset the dma".
> >>>
> >>> This appears to be a timing issue related to the EEE RX clock-stop
> >>> logic. Based on my investigation with the RTL8211E PHY, I monitored
> >>> the PHY register PS1R (MMD device 3, address 0x01) and observed a
> >>> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
> >>> clock may have already stopped.
> >>>
> >>> While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
> >>
> >> Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
> >> affected? This is something that is always a good advisable (some people
> >> would call it required). In this case even more, as it since a while
> >> contains a fix for the change you mentioned, that wasn't backported:
> >> c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
> >> enabled"). But this is not my area of expertise (and in different area
> >> of the code), so that fix might be unrelated to your issue.
> >
> > Thanks for the pointer.
> > As you suggested, I have tested the mainline and confirmed that the
> > issue is not present in v7.1-rc2, nor as early as v6.19-rc1. However,
> > I verified that the issue persists in the latest stable v6.18.26.
> > I performed a git bisect and the result pointed exactly to the commit
> > you mentioned: c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
> > when VLAN is enabled").
>
> Great! Could you please cherry-pick c171e679ee66d7 to 6.18.y and see if
> that fixes things? It sounds like it should.
>
> @Ovidiu Panait: c171e679ee66d7 is a commit of yours. If Jensen confirms
> that cherry-picking fixed the problem, I'd say we ask Greg to pick it up
> for 6.18.y -- unless you see any reasons why that might be a bad idea.
>
> > Additionally, I tested the case where CONFIG_VLAN_8021Q is not set,
> > and the DMA reset issue occurs again.
>
> I'd say that is likely best discussed in a new thread you might want to
> start. Also wondering if it was like that earlier. Or iow: if that is a
> regression or not.

I have tested v6.18.26 and here are the results:
1. running "ip link add link eth0 name eth0.5 type vlan id 5" over 10
times and did not encounter timeout issues. This might be because the
RK3399 GMAC does not support EEE.
2. cherry-picking c171e679ee66d7 to v6.18.26 avoids the DMA reset failure.

Additionally, I am considering proposing a new DT property (e.g.,
snps,no-eee-rx-clk-stop) to explicitly control eee_rx_clk_stop_enable.
This would provide a more robust solution for hardware combinations
that require a continuous RX clock for stability, regardless of VLAN
configurations. However, this would be better discussed in new thread
too.


On Thu, May 7, 2026 at 9:16 PM Maxime Chevallier
<maxime.chevallier@bootlin.com> wrote:
>
> Hi,
>
> On 07/05/2026 14:49, Jensen Huang wrote:
> > On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
> > <regressions@leemhuis.info> wrote:
> >>
> >> [Jumping in here, as there are no replies yet]
> >>
> >> BTW, Russel, just in case you missed this: looks like this regressions
> >> caused by a change of yours.
>
> I think Russell is dealing with unpleasant personal stuff, let's see if we
> can figure this out while he's away.
>
> >>
> >> On 4/29/26 14:53, Jensen Huang wrote:
> >>>
> >>> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
> >>> When a network cable is connected during boot, the DMA reset
> >>> occasionally fails with the error message: "Failed to reset the dma".
> >>>
> >>> This appears to be a timing issue related to the EEE RX clock-stop
> >>> logic. Based on my investigation with the RTL8211E PHY, I monitored
> >>> the PHY register PS1R (MMD device 3, address 0x01) and observed a
> >>> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
> >>> clock may have already stopped.
>
> From what I get, your current hypthesis is that it takes a while for that
> clock to stabilize and therefore we're accessing the DMA registers too soon ?
>
> Can you confirm that with the addition of a small delay ?

Adding msleep(100) between phylink_rx_clk_stop_block() and
stmmac_init_dma_engine(), and it did not help.

> Do you mean that c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
> when VLAN is enabled") introduces the bug on 6.18.26 ?
>
> do you have the possibility of bisecting to verify when exactly the issue
> was solved between v6.18 and v6.19 ?

Sorry for the confusion. Commit c171e679ee66d7 is actually the fix. My
git bisect pointed to this commit as the one that avoided the issue
between v6.18 and v6.19-rc1.


Best regards,
Jensen Huang

>
> Ciao, Thorsten
>
> >>> ensures the clock is running before the DMA reset, my tests suggest
> >>> that the phylink_rx_clk_stop_block() call might not provide a
> >>> sufficiently stable RX clock in time for the immediate DMA reset that
> >>> follows.
> >>>
> >>> Since stmmac already sets mac_requires_rxc = true, I modified
> >>> phylink_bringup_phy() to honor this flag. This avoids toggling the
> >>> PHY's clk_stop_enable during the initialization sequence, ensuring the
> >>> RX clock remains active and stable throughout.
> >>> With the change below, I achieved 200/200 successful reboots with the
> >>> cable connected (previously ~50% failure rate).
> >>>
> >>> --- a/drivers/net/phy/phylink.c
> >>> +++ b/drivers/net/phy/phylink.c
> >>> @@ -2171,7 +2171,7 @@ static int phylink_bringup_phy(struct phylink
> >>> *pl, struct phy_device *phy,
> >>>      /* Allow the MAC to stop its clock if the PHY has the capability */
> >>>      pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0;
> >>>
> >>> -    if (pl->mac_supports_eee_ops) {
> >>> +    if (pl->mac_supports_eee_ops && !pl->config->mac_requires_rxc) {
> >>>          /* Explicitly configure whether the PHY is allowed to stop it's
> >>>           * receive clock.
> >>>           */
> >>>
> >>> Any feedback/testing on this would be appreciated.
> >>>
> >>> Best regards,
> >>> Jensen Huang
> >>>
> >>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
  2026-05-07 13:13     ` Thorsten Leemhuis
  2026-05-08  8:19       ` Jensen Huang
@ 2026-05-11  7:35       ` Thorsten Leemhuis
  2026-05-11  8:17         ` Greg KH
  1 sibling, 1 reply; 8+ messages in thread
From: Thorsten Leemhuis @ 2026-05-11  7:35 UTC (permalink / raw)
  To: stable@vger.kernel.org
  Cc: Russell King, Heiner Kallweit, Andrew Lunn, regressions, netdev,
	LKML, Jensen Huang, Ovidiu Panait

Greg, Sasha, could you please cherry-pick c171e679ee66d7 ("net: stmmac:
Disable EEE RX clock stop when VLAN is enabled") [v6.19-rc1] to 6.18.y?
It fixes a regression for Jensen Huang (for details see below; it was
later confirmed that c171e679ee66d7 really fixes this) caused by
dd557266cf5fb0 ("net: stmmac: block PHY RXC clock-stop") [v6.15-rc1]. tia!

Ciao, Thorsten

On 5/7/26 15:13, Thorsten Leemhuis wrote:
> [+Ovidiu Panait]
> On 5/7/26 14:49, Jensen Huang wrote:
>> On Tue, May 5, 2026 at 4:26 PM Thorsten Leemhuis
>> <regressions@leemhuis.info> wrote:
>>> On 4/29/26 14:53, Jensen Huang wrote:
>>
>>>> I'm reporting a regression on RK3399 (stmmac) observed in v6.18.24.
>>>> When a network cable is connected during boot, the DMA reset
>>>> occasionally fails with the error message: "Failed to reset the dma".
>>>>
>>>> This appears to be a timing issue related to the EEE RX clock-stop
>>>> logic. Based on my investigation with the RTL8211E PHY, I monitored
>>>> the PHY register PS1R (MMD device 3, address 0x01) and observed a
>>>> value of 0x0f40. This indicates that the PHY is in LPI mode and the RX
>>>> clock may have already stopped.
>>>>
>>>> While commit dd557266cf5f ("net: stmmac: block PHY RXC clock-stop")
>>>
>>> Just wondering: have you tried if mainline (e.g. 7.1-rc1) is still
>>> affected? This is something that is always a good advisable (some people
>>> would call it required). In this case even more, as it since a while
>>> contains a fix for the change you mentioned, that wasn't backported:
>>> c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop when VLAN is
>>> enabled"). But this is not my area of expertise (and in different area
>>> of the code), so that fix might be unrelated to your issue.
>>
>> Thanks for the pointer.
>> As you suggested, I have tested the mainline and confirmed that the
>> issue is not present in v7.1-rc2, nor as early as v6.19-rc1. However,
>> I verified that the issue persists in the latest stable v6.18.26.
>> I performed a git bisect and the result pointed exactly to the commit
>> you mentioned: c171e679ee66d7 ("net: stmmac: Disable EEE RX clock stop
>> when VLAN is enabled").
> 
> Great! Could you please cherry-pick c171e679ee66d7 to 6.18.y and see if
> that fixes things? It sounds like it should.
> 
> @Ovidiu Panait: c171e679ee66d7 is a commit of yours. If Jensen confirms
> that cherry-picking fixed the problem, I'd say we ask Greg to pick it up
> for 6.18.y -- unless you see any reasons why that might be a bad idea.
> 
>> Additionally, I tested the case where CONFIG_VLAN_8021Q is not set,
>> and the DMA reset issue occurs again.
> 
> I'd say that is likely best discussed in a new thread you might want to
> start. Also wondering if it was like that earlier. Or iow: if that is a
> regression or not.
> 
> Ciao, Thorsten
> 
>>>> ensures the clock is running before the DMA reset, my tests suggest
>>>> that the phylink_rx_clk_stop_block() call might not provide a
>>>> sufficiently stable RX clock in time for the immediate DMA reset that
>>>> follows.
>>>>
>>>> Since stmmac already sets mac_requires_rxc = true, I modified
>>>> phylink_bringup_phy() to honor this flag. This avoids toggling the
>>>> PHY's clk_stop_enable during the initialization sequence, ensuring the
>>>> RX clock remains active and stable throughout.
>>>> With the change below, I achieved 200/200 successful reboots with the
>>>> cable connected (previously ~50% failure rate).
>>>>
>>>> --- a/drivers/net/phy/phylink.c
>>>> +++ b/drivers/net/phy/phylink.c
>>>> @@ -2171,7 +2171,7 @@ static int phylink_bringup_phy(struct phylink
>>>> *pl, struct phy_device *phy,
>>>>      /* Allow the MAC to stop its clock if the PHY has the capability */
>>>>      pl->mac_tx_clk_stop = phy_eee_tx_clock_stop_capable(phy) > 0;
>>>>
>>>> -    if (pl->mac_supports_eee_ops) {
>>>> +    if (pl->mac_supports_eee_ops && !pl->config->mac_requires_rxc) {
>>>>          /* Explicitly configure whether the PHY is allowed to stop it's
>>>>           * receive clock.
>>>>           */
>>>>
>>>> Any feedback/testing on this would be appreciated.
>>>>
>>>> Best regards,
>>>> Jensen Huang
>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18
  2026-05-11  7:35       ` Thorsten Leemhuis
@ 2026-05-11  8:17         ` Greg KH
  0 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2026-05-11  8:17 UTC (permalink / raw)
  To: Thorsten Leemhuis
  Cc: stable@vger.kernel.org, Russell King, Heiner Kallweit,
	Andrew Lunn, regressions, netdev, LKML, Jensen Huang,
	Ovidiu Panait

On Mon, May 11, 2026 at 09:35:37AM +0200, Thorsten Leemhuis wrote:
> Greg, Sasha, could you please cherry-pick c171e679ee66d7 ("net: stmmac:
> Disable EEE RX clock stop when VLAN is enabled") [v6.19-rc1] to 6.18.y?
> It fixes a regression for Jensen Huang (for details see below; it was
> later confirmed that c171e679ee66d7 really fixes this) caused by
> dd557266cf5fb0 ("net: stmmac: block PHY RXC clock-stop") [v6.15-rc1]. tia!

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2026-05-11  8:17 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-29 12:53 [REGRESSION] stmmac: Random DMA reset failure on RK3399 since v6.18 Jensen Huang
2026-05-05  8:26 ` Thorsten Leemhuis
2026-05-07 12:49   ` Jensen Huang
2026-05-07 13:13     ` Thorsten Leemhuis
2026-05-08  8:19       ` Jensen Huang
2026-05-11  7:35       ` Thorsten Leemhuis
2026-05-11  8:17         ` Greg KH
2026-05-07 13:16     ` Maxime Chevallier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox