netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
@ 2025-11-16  2:38 Wei Fang
  2025-11-16 16:10 ` Andrew Lunn
  2025-11-17 15:17 ` Russell King (Oracle)
  0 siblings, 2 replies; 11+ messages in thread
From: Wei Fang @ 2025-11-16  2:38 UTC (permalink / raw)
  To: linux, andrew, hkallweit1, davem, edumazet, kuba, pabeni, eric,
	maxime.chevallier
  Cc: imx, netdev, linux-kernel

Pause, Asym_Pause and Autoneg bits are not set when pl->supported is
initialized, so these link modes will not work for the fixed-link. This
leads to a TCP performance degradation issue observed on the i.MX943
platform.

The switch CPU port of i.MX943 is connected to an ENETC MAC, this link
is a fixed link and the link speed is 2.5Gbps. And one of the switch
user ports is the RGMII interface, and its link speed is 1Gbps. If the
flow-control of the fixed link is not enabled, we can easily observe
the iperf performance of TCP packets is very low. Because the inbound
rate on the CPU port is greater than the outbound rate on the user port,
the switch is prone to congestion, leading to the loss of some TCP
packets and requiring multiple retransmissions.

Solving this problem should be as simple as setting the Asym_Pause and
Pause bits. The reason why the Autoneg bit needs to be set is because
it was already set before the blame commit. Moreover, Russell provides
a very good explanation of why it needs to be set in the thread [1].

[1] https://lore.kernel.org/all/aRjqLN8eQDIQfBjS@shell.armlinux.org.uk/

Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration")
Signed-off-by: Wei Fang <wei.fang@nxp.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
---
v2 changes:
1. Improve the commit message
2. Collect the Reviewed-by tag
v1: https://lore.kernel.org/imx/20251114052808.1129942-1-wei.fang@nxp.com/
---
 drivers/net/phy/phylink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9d7799ea1c17..918244308215 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -637,6 +637,9 @@ static int phylink_validate(struct phylink *pl, unsigned long *supported,
 
 static void phylink_fill_fixedlink_supported(unsigned long *supported)
 {
+	linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT, supported);
+	linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, supported);
+	linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, supported);
 	linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, supported);
 	linkmode_set_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, supported);
 	linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, supported);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-16  2:38 [PATCH v2] net: phylink: add missing supported link modes for the fixed-link Wei Fang
@ 2025-11-16 16:10 ` Andrew Lunn
  2025-11-16 17:33   ` Russell King (Oracle)
  2025-11-17  3:23   ` Wei Fang
  2025-11-17 15:17 ` Russell King (Oracle)
  1 sibling, 2 replies; 11+ messages in thread
From: Andrew Lunn @ 2025-11-16 16:10 UTC (permalink / raw)
  To: Wei Fang
  Cc: linux, hkallweit1, davem, edumazet, kuba, pabeni, eric,
	maxime.chevallier, imx, netdev, linux-kernel

On Sun, Nov 16, 2025 at 10:38:23AM +0800, Wei Fang wrote:
> Pause, Asym_Pause and Autoneg bits are not set when pl->supported is
> initialized, so these link modes will not work for the fixed-link. This
> leads to a TCP performance degradation issue observed on the i.MX943
> platform.
> 
> The switch CPU port of i.MX943 is connected to an ENETC MAC, this link
> is a fixed link and the link speed is 2.5Gbps. And one of the switch
> user ports is the RGMII interface, and its link speed is 1Gbps. If the
> flow-control of the fixed link is not enabled, we can easily observe
> the iperf performance of TCP packets is very low. Because the inbound
> rate on the CPU port is greater than the outbound rate on the user port,
> the switch is prone to congestion, leading to the loss of some TCP
> packets and requiring multiple retransmissions.
> 
> Solving this problem should be as simple as setting the Asym_Pause and
> Pause bits. The reason why the Autoneg bit needs to be set is because
> it was already set before the blame commit. Moreover, Russell provides
> a very good explanation of why it needs to be set in the thread [1].
> 
> [1] https://lore.kernel.org/all/aRjqLN8eQDIQfBjS@shell.armlinux.org.uk/

There is no limit on commit message length, but URLs sometimes
die. Please just make use of Russells explanation. You can say: As
explained by Russell King, and just quote it, etc.

This also seems like two fixes: a regression for the AUTONEG bit, and
allowing pause to be set. So maybe this should be two patches?

I'm also surprised TCP is collapsing. This is not an unusual setup,
e.g. a home wireless network feeding a cable modem. A high speed link
feeding a lower speed link. What RTT is there when TCP gets into
trouble? TCP should be backing off as soon as it sees packet loss, so
reducing the bandwidth it tries to consume, and so emptying out the
buffers. But if you have big buffers in the ENETC causing high
latency, that might be an issue?  Does ENETC have BQL? It is worth
implementing, just to avoid bufferbloat problems.

	Andrew

---
pw-bot: cr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-16 16:10 ` Andrew Lunn
@ 2025-11-16 17:33   ` Russell King (Oracle)
  2025-11-17  3:23   ` Wei Fang
  1 sibling, 0 replies; 11+ messages in thread
From: Russell King (Oracle) @ 2025-11-16 17:33 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Wei Fang, hkallweit1, davem, edumazet, kuba, pabeni, eric,
	maxime.chevallier, imx, netdev, linux-kernel

On Sun, Nov 16, 2025 at 05:10:19PM +0100, Andrew Lunn wrote:
> There is no limit on commit message length, but URLs sometimes
> die. Please just make use of Russells explanation. You can say: As
> explained by Russell King, and just quote it, etc.

I think one of the reasons lore exists is to provide a stable and
reliable source of URLs into archives - it's maintained by the
kernel.org folk.

> This also seems like two fixes: a regression for the AUTONEG bit, and
> allowing pause to be set. So maybe this should be two patches?

The blamed commit caused both to change.

The old code:

	linkmode_fill(pl->supported);
	linkmode_copy(pl->link_config.advertising, pl->supported);
	phylink_validate(pl, pl->supported, &pl->link_config);

This results in pl->supported and pl->link_config.advertising being
the fullest capabilities that the MAC supports.

	s = phy_lookup_setting(pl->link_config.speed, pl->link_config.duplex,
                               pl->supported, true);

This gets the linkmode bit corresponding to the speed/duplex. We then
construct a mask:

	__ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
	...
	linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT, mask);
	linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, mask);
	linkmode_set_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, mask);

which ends up passing through just these three bits, provided the MAC
supports them:

	linkmode_and(pl->supported, pl->supported, mask);

We ensure that a port type is set:

	phylink_set(pl->supported, MII);

and then we explicitly set a single bit for the speed/duplex:

	if (s) {
		__set_bit(s->bit, pl->supported);
		__set_bit(s->bit, pl->link_config.lp_advertising);
	} else ...

lp_advertising doesn't make sense without Autoneg bit set.

After the blamed commit, rather than pl->supported being filled
prior to the phylink_Validate() call, it is now cleared and the
baseT modes are populated. Everything else is clear. This is
incorrect.

So, I think fixing this breakage in a single patch is justified.
It is restoring the old behaviour which we've had for a long time.
It isn't two bugs, it's one mistake.

> I'm also surprised TCP is collapsing. This is not an unusual setup,
> e.g. a home wireless network feeding a cable modem. A high speed link
> feeding a lower speed link. What RTT is there when TCP gets into
> trouble? TCP should be backing off as soon as it sees packet loss, so
> reducing the bandwidth it tries to consume, and so emptying out the
> buffers. But if you have big buffers in the ENETC causing high
> latency, that might be an issue?  Does ENETC have BQL? It is worth
> implementing, just to avoid bufferbloat problems.

I'd also suggest further evaluating the performance of other ports
when flow control is enabled when one port gets overwhelmed with
packets (and thus the switch starts sending pause frames to the
host port, slowing _all_ traffic to _all_ ports.)

Depending on the application, it may be better to let the congested
port drop packets (Ethernet was designed to drop packets after all)
while allowing other ports to operate, rather than throttling all
ports.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-16 16:10 ` Andrew Lunn
  2025-11-16 17:33   ` Russell King (Oracle)
@ 2025-11-17  3:23   ` Wei Fang
  2025-11-17 13:22     ` Andrew Lunn
  1 sibling, 1 reply; 11+ messages in thread
From: Wei Fang @ 2025-11-17  3:23 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: linux@armlinux.org.uk, hkallweit1@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	eric@nelint.com, maxime.chevallier@bootlin.com,
	imx@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

> This also seems like two fixes: a regression for the AUTONEG bit, and
> allowing pause to be set. So maybe this should be two patches?

As Russell explained in the thread, one patch is enough.

> 
> I'm also surprised TCP is collapsing. This is not an unusual setup,
> e.g. a home wireless network feeding a cable modem. A high speed link
> feeding a lower speed link. What RTT is there when TCP gets into

The below result is the RTT when doing the iperf TCtestP
root@imx943evk:~# ./tcping -I swp2 10.193.102.224 5201
TCPinging 10.193.102.224 on port 5201
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=2 time=1.004 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=3 time=0.958 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=4 time=0.989 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=5 time=1.040 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=6 time=0.760 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=7 time=0.950 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=8 time=0.726 ms

After applying the this patch, the RTT appears to be greater, I suspect
that some iperf packets preceding the ping packet are being dropped
by the hardware, resulting in a smaller RTT.

root@imx943evk:~# ./tcping -I swp2 10.193.102.224 5201
TCPinging 10.193.102.224 on port 5201
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=1 time=0.819 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=2 time=0.752 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=3 time=1.190 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=4 time=0.932 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=5 time=1.137 ms
Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=6 time=1.279 ms


> trouble? TCP should be backing off as soon as it sees packet loss, so
> reducing the bandwidth it tries to consume, and so emptying out the
> buffers. But if you have big buffers in the ENETC causing high
> latency, that might be an issue? Does ENETC have BQL? It is worth
> implementing, just to avoid bufferbloat problems.

No, currently the ENETC driver does not support BQL, maybe we
will support it in the future.

> 
> 	Andrew
> 
> ---
> pw-bot: cr

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-17  3:23   ` Wei Fang
@ 2025-11-17 13:22     ` Andrew Lunn
  2025-11-17 13:54       ` Wei Fang
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2025-11-17 13:22 UTC (permalink / raw)
  To: Wei Fang
  Cc: linux@armlinux.org.uk, hkallweit1@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	eric@nelint.com, maxime.chevallier@bootlin.com,
	imx@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

On Mon, Nov 17, 2025 at 03:23:05AM +0000, Wei Fang wrote:
> > This also seems like two fixes: a regression for the AUTONEG bit, and
> > allowing pause to be set. So maybe this should be two patches?
> 
> As Russell explained in the thread, one patch is enough.
> 
> > 
> > I'm also surprised TCP is collapsing. This is not an unusual setup,
> > e.g. a home wireless network feeding a cable modem. A high speed link
> > feeding a lower speed link. What RTT is there when TCP gets into
> 
> The below result is the RTT when doing the iperf TCtestP
> root@imx943evk:~# ./tcping -I swp2 10.193.102.224 5201
> TCPinging 10.193.102.224 on port 5201
> Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=2 time=1.004 ms
> Reply from 10.193.102.224 (10.193.102.224) on port 5201 TCP_conn=3 time=0.958 ms

With 1ms ping times, you don't have buffer bloat problems, so that is
not the issue.

I still think you need to look at this some more. Both Russells
comments about it potentially blocking traffic for other ports, and
why TCP is doing so bad, maybe gets some traffic dumps and ask the TCP
experts.

	Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-17 13:22     ` Andrew Lunn
@ 2025-11-17 13:54       ` Wei Fang
  2025-11-17 15:01         ` Andrew Lunn
  0 siblings, 1 reply; 11+ messages in thread
From: Wei Fang @ 2025-11-17 13:54 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: linux@armlinux.org.uk, hkallweit1@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	eric@nelint.com, maxime.chevallier@bootlin.com,
	imx@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

> With 1ms ping times, you don't have buffer bloat problems, so that is
> not the issue.
> 
> I still think you need to look at this some more. Both Russells
> comments about it potentially blocking traffic for other ports, and
> why TCP is doing so bad, maybe gets some traffic dumps and ask the TCP
> experts.
> 

The default TCP block size sent by iperf is 128KB. ENETC then fragments
the packet via TSO and sends the fragments to the switch. Because the
link speed between ENETC and the switch CPU port is 2.5Gbps, it takes
approximately 420 us for the TCP block to be sent to the switch. However,
the link speed of the switch's user port is only 1Gbps, and the switch takes
approximately 1050 us to send the packets out. Therefore, packets
accumulate within the switch. Without flow control enabled, this can
exhaust the switch's buffer, eventually leading to congestion.

Debugging results from the switch show that many packets are being
dropped on the CPU port, and the reason of packet loss is precisely
due to congestion.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-17 13:54       ` Wei Fang
@ 2025-11-17 15:01         ` Andrew Lunn
  2025-11-18  6:20           ` Wei Fang
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Lunn @ 2025-11-17 15:01 UTC (permalink / raw)
  To: Wei Fang
  Cc: linux@armlinux.org.uk, hkallweit1@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	eric@nelint.com, maxime.chevallier@bootlin.com,
	imx@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

> The default TCP block size sent by iperf is 128KB. ENETC then fragments
> the packet via TSO and sends the fragments to the switch. Because the
> link speed between ENETC and the switch CPU port is 2.5Gbps, it takes
> approximately 420 us for the TCP block to be sent to the switch. However,
> the link speed of the switch's user port is only 1Gbps, and the switch takes
> approximately 1050 us to send the packets out. Therefore, packets
> accumulate within the switch. Without flow control enabled, this can
> exhaust the switch's buffer, eventually leading to congestion.
> 
> Debugging results from the switch show that many packets are being
> dropped on the CPU port, and the reason of packet loss is precisely
> due to congestion.

BQL might help you. It could break the 128KB burst up into a number of
smaller bursts, helping avoid the congestion.

Are there any parameters you can change with TSO? This is one
stream. Sending it out at 2.5G line rate makes no sense when you know
it is going to hit a 1G egress. You might as well limit TSO to
1G. That will help single stream traffic. If you have multiple streams
it will not help, you will still hit congestion, assuming you can do
multiple TSOs in parallel.

	Andrew

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-16  2:38 [PATCH v2] net: phylink: add missing supported link modes for the fixed-link Wei Fang
  2025-11-16 16:10 ` Andrew Lunn
@ 2025-11-17 15:17 ` Russell King (Oracle)
  2025-11-17 15:24   ` Maxime Chevallier
  2025-11-18  2:05   ` Wei Fang
  1 sibling, 2 replies; 11+ messages in thread
From: Russell King (Oracle) @ 2025-11-17 15:17 UTC (permalink / raw)
  To: Wei Fang
  Cc: andrew, hkallweit1, davem, edumazet, kuba, pabeni, eric,
	maxime.chevallier, imx, netdev, linux-kernel

On Sun, Nov 16, 2025 at 10:38:23AM +0800, Wei Fang wrote:
> Pause, Asym_Pause and Autoneg bits are not set when pl->supported is
> initialized, so these link modes will not work for the fixed-link. This
> leads to a TCP performance degradation issue observed on the i.MX943
> platform.
> 
> The switch CPU port of i.MX943 is connected to an ENETC MAC, this link
> is a fixed link and the link speed is 2.5Gbps. And one of the switch
> user ports is the RGMII interface, and its link speed is 1Gbps. If the
> flow-control of the fixed link is not enabled, we can easily observe
> the iperf performance of TCP packets is very low. Because the inbound
> rate on the CPU port is greater than the outbound rate on the user port,
> the switch is prone to congestion, leading to the loss of some TCP
> packets and requiring multiple retransmissions.
> 
> Solving this problem should be as simple as setting the Asym_Pause and
> Pause bits. The reason why the Autoneg bit needs to be set is because
> it was already set before the blame commit. Moreover, Russell provides
> a very good explanation of why it needs to be set in the thread [1].
> 
> [1] https://lore.kernel.org/all/aRjqLN8eQDIQfBjS@shell.armlinux.org.uk/
> 
> Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration")
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>

Even though discussion is still going on, we do need to fix this
regression. So:

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-17 15:17 ` Russell King (Oracle)
@ 2025-11-17 15:24   ` Maxime Chevallier
  2025-11-18  2:05   ` Wei Fang
  1 sibling, 0 replies; 11+ messages in thread
From: Maxime Chevallier @ 2025-11-17 15:24 UTC (permalink / raw)
  To: Russell King (Oracle), Wei Fang
  Cc: andrew, hkallweit1, davem, edumazet, kuba, pabeni, eric, imx,
	netdev, linux-kernel

Hi,

On 17/11/2025 16:17, Russell King (Oracle) wrote:
> On Sun, Nov 16, 2025 at 10:38:23AM +0800, Wei Fang wrote:
>> Pause, Asym_Pause and Autoneg bits are not set when pl->supported is
>> initialized, so these link modes will not work for the fixed-link. This
>> leads to a TCP performance degradation issue observed on the i.MX943
>> platform.
>>
>> The switch CPU port of i.MX943 is connected to an ENETC MAC, this link
>> is a fixed link and the link speed is 2.5Gbps. And one of the switch
>> user ports is the RGMII interface, and its link speed is 1Gbps. If the
>> flow-control of the fixed link is not enabled, we can easily observe
>> the iperf performance of TCP packets is very low. Because the inbound
>> rate on the CPU port is greater than the outbound rate on the user port,
>> the switch is prone to congestion, leading to the loss of some TCP
>> packets and requiring multiple retransmissions.
>>
>> Solving this problem should be as simple as setting the Asym_Pause and
>> Pause bits. The reason why the Autoneg bit needs to be set is because
>> it was already set before the blame commit. Moreover, Russell provides
>> a very good explanation of why it needs to be set in the thread [1].
>>
>> [1] https://lore.kernel.org/all/aRjqLN8eQDIQfBjS@shell.armlinux.org.uk/
>>
>> Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link configuration")
>> Signed-off-by: Wei Fang <wei.fang@nxp.com>
>> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> 
> Even though discussion is still going on, we do need to fix this
> regression. So:
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

I already reviewed the patch, but I agree with Russell. Being the author
of the blamed commit, I can say that this change in behavior was not
intentional :(

Maxime


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-17 15:17 ` Russell King (Oracle)
  2025-11-17 15:24   ` Maxime Chevallier
@ 2025-11-18  2:05   ` Wei Fang
  1 sibling, 0 replies; 11+ messages in thread
From: Wei Fang @ 2025-11-18  2:05 UTC (permalink / raw)
  To: Russell King
  Cc: andrew@lunn.ch, hkallweit1@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	eric@nelint.com, maxime.chevallier@bootlin.com,
	imx@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

> On Sun, Nov 16, 2025 at 10:38:23AM +0800, Wei Fang wrote:
> > Pause, Asym_Pause and Autoneg bits are not set when pl->supported is
> > initialized, so these link modes will not work for the fixed-link.
> > This leads to a TCP performance degradation issue observed on the
> > i.MX943 platform.
> >
> > The switch CPU port of i.MX943 is connected to an ENETC MAC, this link
> > is a fixed link and the link speed is 2.5Gbps. And one of the switch
> > user ports is the RGMII interface, and its link speed is 1Gbps. If the
> > flow-control of the fixed link is not enabled, we can easily observe
> > the iperf performance of TCP packets is very low. Because the inbound
> > rate on the CPU port is greater than the outbound rate on the user
> > port, the switch is prone to congestion, leading to the loss of some
> > TCP packets and requiring multiple retransmissions.
> >
> > Solving this problem should be as simple as setting the Asym_Pause and
> > Pause bits. The reason why the Autoneg bit needs to be set is because
> > it was already set before the blame commit. Moreover, Russell provides
> > a very good explanation of why it needs to be set in the thread [1].
> >
> > [1]
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore
> > .kernel.org%2Fall%2FaRjqLN8eQDIQfBjS%40shell.armlinux.org.uk%2F&data=
> 0
> >
> 5%7C02%7Cwei.fang%40nxp.com%7C046af7619cb34d31ce1f08de25ec6ac4%7
> C686ea
> >
> 1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638989894529904535%7CUnkn
> own%7CT
> >
> WFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
> zMiI
> >
> sIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=C3b%2FDWqT8
> yliu6m
> > tGDX0JtrdDevoTik74gZwn%2F1%2BJXk%3D&reserved=0
> >
> > Fixes: de7d3f87be3c ("net: phylink: Use phy_caps_lookup for fixed-link
> > configuration")
> > Signed-off-by: Wei Fang <wei.fang@nxp.com>
> > Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> 
> Even though discussion is still going on, we do need to fix this regression. So:
> 
> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
> 
> Thanks!

Thanks, based on Andrew's comments, I have sent a v3 patch.

https://lore.kernel.org/all/20251117102943.1862680-1-wei.fang@nxp.com/


^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH v2] net: phylink: add missing supported link modes for the fixed-link
  2025-11-17 15:01         ` Andrew Lunn
@ 2025-11-18  6:20           ` Wei Fang
  0 siblings, 0 replies; 11+ messages in thread
From: Wei Fang @ 2025-11-18  6:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: linux@armlinux.org.uk, hkallweit1@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	eric@nelint.com, maxime.chevallier@bootlin.com,
	imx@lists.linux.dev, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org

> > The default TCP block size sent by iperf is 128KB. ENETC then fragments
> > the packet via TSO and sends the fragments to the switch. Because the
> > link speed between ENETC and the switch CPU port is 2.5Gbps, it takes
> > approximately 420 us for the TCP block to be sent to the switch. However,
> > the link speed of the switch's user port is only 1Gbps, and the switch takes
> > approximately 1050 us to send the packets out. Therefore, packets
> > accumulate within the switch. Without flow control enabled, this can
> > exhaust the switch's buffer, eventually leading to congestion.
> >
> > Debugging results from the switch show that many packets are being
> > dropped on the CPU port, and the reason of packet loss is precisely
> > due to congestion.
> 
> BQL might help you. It could break the 128KB burst up into a number of
> smaller bursts, helping avoid the congestion.
> 

Thanks, based on your suggestion, I added BQL to the enetc driver,
but the issue still exists. The buffer pool of the CPU port has very
limited buffers, so it is easy to be congested. I removed the maximum
threshold limit for the buffer pool, allowing it to use the entire switch's
buffer at most. I can see that the TCP performance meets expectations,
but there are still some TCP retransmissions.

> Are there any parameters you can change with TSO? This is one
> stream. Sending it out at 2.5G line rate makes no sense when you know
> it is going to hit a 1G egress. You might as well limit TSO to
> 1G. That will help single stream traffic. If you have multiple streams
> it will not help, you will still hit congestion, assuming you can do
> multiple TSOs in parallel.
> 

For ENETC, TSO is mainly performed by hardware. The driver simply puts
the entire TCP block on the TX ring and informs the hardware to perform
TSO through BD. Therefore, we cannot adjust the TSO parameters to
reduce the transmission rate from ENETC to the CPU port.

For ENETC, it's not possible to set the egress rate for each stream based
on its egress port. The hardware does not support this functionality. What
we can set is the port speed of the ENETC, which is a global configuration.
And some user ports of the NETC switch are SGMII interface, the link speed
is 2.5Gbps, so we cannot set the ENETC port speed to 1Gbps. Currently,
enabling flow control for this internal link appears to be the best solution.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-11-18  6:20 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-16  2:38 [PATCH v2] net: phylink: add missing supported link modes for the fixed-link Wei Fang
2025-11-16 16:10 ` Andrew Lunn
2025-11-16 17:33   ` Russell King (Oracle)
2025-11-17  3:23   ` Wei Fang
2025-11-17 13:22     ` Andrew Lunn
2025-11-17 13:54       ` Wei Fang
2025-11-17 15:01         ` Andrew Lunn
2025-11-18  6:20           ` Wei Fang
2025-11-17 15:17 ` Russell King (Oracle)
2025-11-17 15:24   ` Maxime Chevallier
2025-11-18  2:05   ` Wei Fang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).