public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] net: macb: Shuffle the tx ring before enabling tx
@ 2026-03-07  7:08 Kevin Hao
  2026-03-10 16:34 ` Simon Horman
  2026-03-11  1:00 ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 3+ messages in thread
From: Kevin Hao @ 2026-03-07  7:08 UTC (permalink / raw)
  To: netdev
  Cc: Kevin Hao, Quanyang Wang, stable, Nicolas Ferre, Claudiu Beznea,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Russell King

Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board,
the rootfs may take an extended time to recover after a suspend.
Upon investigation, it was determined that the issue originates from a
problem in the macb driver.

According to the Zynq UltraScale TRM [1], when transmit is disabled,
the transmit buffer queue pointer resets to point to the address
specified by the transmit buffer queue base address register.

In the current implementation, the code merely resets `queue->tx_head`
and `queue->tx_tail` to '0'. This approach presents several issues:

- Packets already queued in the tx ring are silently lost,
  leading to memory leaks since the associated skbs cannot be released.

- Concurrent write access to `queue->tx_head` and `queue->tx_tail` may
  occur from `macb_tx_poll()` or `macb_start_xmit()` when these values
  are reset to '0'.

- The transmission may become stuck on a packet that has already been sent
  out, with its 'TX_USED' bit set, but has not yet been processed. However,
  due to the manipulation of 'queue->tx_head' and 'queue->tx_tail',
  `macb_tx_poll()` incorrectly assumes there are no packets to handle
  because `queue->tx_head == queue->tx_tail`. This issue is only resolved
  when a new packet is placed at this position. This is the root cause of
  the prolonged recovery time observed for the NFS root filesystem.

To resolve this issue, shuffle the tx ring and tx skb array so that
the first unsent packet is positioned at the start of the tx ring.
Additionally, ensure that updates to `queue->tx_head` and
`queue->tx_tail` are properly protected with the appropriate lock.

[1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm

Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up")
Reported-by: Quanyang Wang <quanyang.wang@windriver.com>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Cc: stable@vger.kernel.org
---
Changes in v2:
- Resolves the issue of incomplete copying of the tx descriptor, as identified by the AI [1].

- Resolves warnings for lines exceeding 80 columns.

- Link to v1: https://lore.kernel.org/r/20260305-zynqmp-v1-1-5de72254d56b@gmail.com

[1] https://netdev-ai.bots.linux.dev/ai-review.html?id=44318d8b-d8c3-42c8-8884-238421f708c5
---
Cc: Nicolas Ferre <nicolas.ferre@microchip.com>
Cc: Claudiu Beznea <claudiu.beznea@tuxon.dev>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
---
 drivers/net/ethernet/cadence/macb_main.c | 98 +++++++++++++++++++++++++++++++-
 1 file changed, 95 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 3dcae4d5f74c9709a86fae837d25501da4484bf7..952aaf84757c20e94f3bbc98162a18330aa4cf73 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -37,6 +37,7 @@
 #include <linux/tcp.h>
 #include <linux/types.h>
 #include <linux/udp.h>
+#include <linux/gcd.h>
 #include <net/pkt_sched.h>
 #include "macb.h"
 
@@ -786,6 +787,97 @@ static void macb_mac_link_down(struct phylink_config *config, unsigned int mode,
 	netif_tx_stop_all_queues(ndev);
 }
 
+/* Use juggling algorithm to left rotate tx ring and tx skb array */
+static void gem_shuffle_tx_one_ring(struct macb_queue *queue)
+{
+	unsigned int head, tail, count, ring_size, desc_size;
+	struct macb_tx_skb tx_skb, *skb_curr, *skb_next;
+	struct macb_dma_desc *desc_curr, *desc_next;
+	unsigned int i, cycles, shift, curr, next;
+	struct macb *bp = queue->bp;
+	unsigned char desc[24];
+	unsigned long flags;
+
+	desc_size = macb_dma_desc_get_size(bp);
+
+	if (WARN_ON_ONCE(desc_size > ARRAY_SIZE(desc)))
+		return;
+
+	spin_lock_irqsave(&queue->tx_ptr_lock, flags);
+	head = queue->tx_head;
+	tail = queue->tx_tail;
+	ring_size = bp->tx_ring_size;
+	count = CIRC_CNT(head, tail, ring_size);
+
+	if (!(tail % ring_size))
+		goto unlock;
+
+	if (!count) {
+		queue->tx_head = 0;
+		queue->tx_tail = 0;
+		goto unlock;
+	}
+
+	shift = tail % ring_size;
+	cycles = gcd(ring_size, shift);
+
+	for (i = 0; i < cycles; i++) {
+		memcpy(&desc, macb_tx_desc(queue, i), desc_size);
+		memcpy(&tx_skb, macb_tx_skb(queue, i),
+		       sizeof(struct macb_tx_skb));
+
+		curr = i;
+		next = (curr + shift) % ring_size;
+
+		while (next != i) {
+			desc_curr = macb_tx_desc(queue, curr);
+			desc_next = macb_tx_desc(queue, next);
+
+			memcpy(desc_curr, desc_next, desc_size);
+
+			if (next == ring_size - 1)
+				desc_curr->ctrl &= ~MACB_BIT(TX_WRAP);
+			if (curr == ring_size - 1)
+				desc_curr->ctrl |= MACB_BIT(TX_WRAP);
+
+			skb_curr = macb_tx_skb(queue, curr);
+			skb_next = macb_tx_skb(queue, next);
+			memcpy(skb_curr, skb_next, sizeof(struct macb_tx_skb));
+
+			curr = next;
+			next = (curr + shift) % ring_size;
+		}
+
+		desc_curr = macb_tx_desc(queue, curr);
+		memcpy(desc_curr, &desc, desc_size);
+		if (i == ring_size - 1)
+			desc_curr->ctrl &= ~MACB_BIT(TX_WRAP);
+		if (curr == ring_size - 1)
+			desc_curr->ctrl |= MACB_BIT(TX_WRAP);
+		memcpy(macb_tx_skb(queue, curr), &tx_skb,
+		       sizeof(struct macb_tx_skb));
+	}
+
+	queue->tx_head = count;
+	queue->tx_tail = 0;
+
+	/* Make descriptor updates visible to hardware */
+	wmb();
+
+unlock:
+	spin_unlock_irqrestore(&queue->tx_ptr_lock, flags);
+}
+
+/* Rotate the queue so that the tail is at index 0 */
+static void gem_shuffle_tx_rings(struct macb *bp)
+{
+	struct macb_queue *queue;
+	int q;
+
+	for (q = 0, queue = bp->queues; q < bp->num_queues; q++, queue++)
+		gem_shuffle_tx_one_ring(queue);
+}
+
 static void macb_mac_link_up(struct phylink_config *config,
 			     struct phy_device *phy,
 			     unsigned int mode, phy_interface_t interface,
@@ -824,8 +916,6 @@ static void macb_mac_link_up(struct phylink_config *config,
 			ctrl |= MACB_BIT(PAE);
 
 		for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
-			queue->tx_head = 0;
-			queue->tx_tail = 0;
 			queue_writel(queue, IER,
 				     bp->rx_intr_mask | MACB_TX_INT_FLAGS | MACB_BIT(HRESP));
 		}
@@ -839,8 +929,10 @@ static void macb_mac_link_up(struct phylink_config *config,
 
 	spin_unlock_irqrestore(&bp->lock, flags);
 
-	if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC))
+	if (!(bp->caps & MACB_CAPS_MACB_IS_EMAC)) {
 		macb_set_tx_clk(bp, speed);
+		gem_shuffle_tx_rings(bp);
+	}
 
 	/* Enable Rx and Tx; Enable PTP unicast */
 	ctrl = macb_readl(bp, NCR);

---
base-commit: a0ae2a256046c0c5d3778d1a194ff2e171f16e5f
change-id: 20260214-zynqmp-74d35f3862f8

Best regards,
-- 
Kevin Hao <haokexin@gmail.com>


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: macb: Shuffle the tx ring before enabling tx
  2026-03-07  7:08 [PATCH net v2] net: macb: Shuffle the tx ring before enabling tx Kevin Hao
@ 2026-03-10 16:34 ` Simon Horman
  2026-03-11  1:00 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2026-03-10 16:34 UTC (permalink / raw)
  To: Kevin Hao
  Cc: netdev, Quanyang Wang, stable, Nicolas Ferre, Claudiu Beznea,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Russell King

On Sat, Mar 07, 2026 at 03:08:54PM +0800, Kevin Hao wrote:
> Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board,
> the rootfs may take an extended time to recover after a suspend.
> Upon investigation, it was determined that the issue originates from a
> problem in the macb driver.
> 
> According to the Zynq UltraScale TRM [1], when transmit is disabled,
> the transmit buffer queue pointer resets to point to the address
> specified by the transmit buffer queue base address register.
> 
> In the current implementation, the code merely resets `queue->tx_head`
> and `queue->tx_tail` to '0'. This approach presents several issues:
> 
> - Packets already queued in the tx ring are silently lost,
>   leading to memory leaks since the associated skbs cannot be released.
> 
> - Concurrent write access to `queue->tx_head` and `queue->tx_tail` may
>   occur from `macb_tx_poll()` or `macb_start_xmit()` when these values
>   are reset to '0'.
> 
> - The transmission may become stuck on a packet that has already been sent
>   out, with its 'TX_USED' bit set, but has not yet been processed. However,
>   due to the manipulation of 'queue->tx_head' and 'queue->tx_tail',
>   `macb_tx_poll()` incorrectly assumes there are no packets to handle
>   because `queue->tx_head == queue->tx_tail`. This issue is only resolved
>   when a new packet is placed at this position. This is the root cause of
>   the prolonged recovery time observed for the NFS root filesystem.
> 
> To resolve this issue, shuffle the tx ring and tx skb array so that
> the first unsent packet is positioned at the start of the tx ring.
> Additionally, ensure that updates to `queue->tx_head` and
> `queue->tx_tail` are properly protected with the appropriate lock.
> 
> [1] https://docs.amd.com/v/u/en-US/ug1085-zynq-ultrascale-trm
> 
> Fixes: bf9cf80cab81 ("net: macb: Fix tx/rx malfunction after phy link down and up")
> Reported-by: Quanyang Wang <quanyang.wang@windriver.com>
> Signed-off-by: Kevin Hao <haokexin@gmail.com>
> Cc: stable@vger.kernel.org
> ---
> Changes in v2:
> - Resolves the issue of incomplete copying of the tx descriptor, as identified by the AI [1].
> 
> - Resolves warnings for lines exceeding 80 columns.
> 
> - Link to v1: https://lore.kernel.org/r/20260305-zynqmp-v1-1-5de72254d56b@gmail.com
> 
> [1] https://netdev-ai.bots.linux.dev/ai-review.html?id=44318d8b-d8c3-42c8-8884-238421f708c5

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net v2] net: macb: Shuffle the tx ring before enabling tx
  2026-03-07  7:08 [PATCH net v2] net: macb: Shuffle the tx ring before enabling tx Kevin Hao
  2026-03-10 16:34 ` Simon Horman
@ 2026-03-11  1:00 ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-03-11  1:00 UTC (permalink / raw)
  To: Kevin Hao
  Cc: netdev, quanyang.wang, stable, nicolas.ferre, claudiu.beznea,
	andrew+netdev, davem, edumazet, kuba, pabeni, linux

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 07 Mar 2026 15:08:54 +0800 you wrote:
> Quanyang observed that when using an NFS rootfs on an AMD ZynqMp board,
> the rootfs may take an extended time to recover after a suspend.
> Upon investigation, it was determined that the issue originates from a
> problem in the macb driver.
> 
> According to the Zynq UltraScale TRM [1], when transmit is disabled,
> the transmit buffer queue pointer resets to point to the address
> specified by the transmit buffer queue base address register.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: macb: Shuffle the tx ring before enabling tx
    https://git.kernel.org/netdev/net/c/881a0263d502

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-03-11  1:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-07  7:08 [PATCH net v2] net: macb: Shuffle the tx ring before enabling tx Kevin Hao
2026-03-10 16:34 ` Simon Horman
2026-03-11  1:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox