public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
@ 2026-03-25 17:36 justin.chen
  2026-03-25 22:00 ` Florian Fainelli
  2026-03-26  8:05 ` Nicolai Buchwitz
  0 siblings, 2 replies; 4+ messages in thread
From: justin.chen @ 2026-03-25 17:36 UTC (permalink / raw)
  To: netdev
  Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
	bcm-kernel-feedback-list, florian.fainelli, Justin Chen

From: Justin Chen <justin.chen@broadcom.com>

The bcmgenet_timeout handler tries to take down all tx queues when
a single queue times out. This is over zealous and causes many race
conditions with queues that are still chugging along. Instead lets
only restart the timed out queue.

While reclaiming the tx queue we fast forward the write pointer to
drop any data in flight. These dropped frames are not added back
to the pool of free bds. We also need to tell the netdev that we
are dropping said data.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
---
 .../net/ethernet/broadcom/genet/bcmgenet.c    | 24 +++++++++----------
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 482a31e7b72b..f69cf490445c 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1985,6 +1985,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
 		drop = (ring->prod_index - ring->c_index) & DMA_C_INDEX_MASK;
 		released += drop;
 		ring->prod_index = ring->c_index & DMA_C_INDEX_MASK;
+		ring->free_bds += drop;
 		while (drop--) {
 			cb_ptr = bcmgenet_put_txcb(priv, ring);
 			skb = cb_ptr->skb;
@@ -1996,6 +1997,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
 		}
 		if (skb)
 			dev_consume_skb_any(skb);
+		netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
 		bcmgenet_tdma_ring_writel(priv, ring->index,
 					  ring->prod_index, TDMA_PROD_INDEX);
 		wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
@@ -3475,27 +3477,23 @@ static void bcmgenet_dump_tx_queue(struct bcmgenet_tx_ring *ring)
 static void bcmgenet_timeout(struct net_device *dev, unsigned int txqueue)
 {
 	struct bcmgenet_priv *priv = netdev_priv(dev);
-	u32 int1_enable = 0;
-	unsigned int q;
+	struct bcmgenet_tx_ring *ring = &priv->tx_rings[txqueue];
+	struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
 
 	netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
 
-	for (q = 0; q <= priv->hw_params->tx_queues; q++)
-		bcmgenet_dump_tx_queue(&priv->tx_rings[q]);
-
-	bcmgenet_tx_reclaim_all(dev);
+	bcmgenet_dump_tx_queue(ring);
 
-	for (q = 0; q <= priv->hw_params->tx_queues; q++)
-		int1_enable |= (1 << q);
+	bcmgenet_tx_reclaim(dev, ring, true);
 
-	/* Re-enable TX interrupts if disabled */
-	bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
+	/* Re-enable the TX interrupt for this ring */
+	bcmgenet_intrl2_1_writel(priv, 1 << txqueue, INTRL2_CPU_MASK_CLEAR);
 
-	netif_trans_update(dev);
+	txq_trans_cond_update(txq);
 
-	BCMGENET_STATS64_INC((&priv->tx_rings[txqueue].stats64), errors);
+	BCMGENET_STATS64_INC((&ring->stats64), errors);
 
-	netif_tx_wake_all_queues(dev);
+	netif_tx_wake_queue(txq);
 }
 
 #define MAX_MDF_FILTER	17
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
  2026-03-25 17:36 [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler justin.chen
@ 2026-03-25 22:00 ` Florian Fainelli
  2026-03-25 22:11   ` Justin Chen
  2026-03-26  8:05 ` Nicolai Buchwitz
  1 sibling, 1 reply; 4+ messages in thread
From: Florian Fainelli @ 2026-03-25 22:00 UTC (permalink / raw)
  To: justin.chen, netdev
  Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
	bcm-kernel-feedback-list

On 3/25/26 10:36, justin.chen@broadcom.com wrote:
> From: Justin Chen <justin.chen@broadcom.com>
> 
> The bcmgenet_timeout handler tries to take down all tx queues when
> a single queue times out. This is over zealous and causes many race
> conditions with queues that are still chugging along. Instead lets
> only restart the timed out queue.
> 
> While reclaiming the tx queue we fast forward the write pointer to
> drop any data in flight. These dropped frames are not added back
> to the pool of free bds. We also need to tell the netdev that we
> are dropping said data.
> 
> Signed-off-by: Justin Chen <justin.chen@broadcom.com>

The first bits in bcmgenet_tx_reclaim would be fixing:

Fixes: f1bacae8b655 ("net: bcmgenet: support reclaiming unsent Tx packets")

and the other part in bcmgenet_tx_timeout() would have:

Fixes: 13ea657806cf ("net: bcmgenet: improve TX timeout")

Both fixes look good to me:

Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
  2026-03-25 22:00 ` Florian Fainelli
@ 2026-03-25 22:11   ` Justin Chen
  0 siblings, 0 replies; 4+ messages in thread
From: Justin Chen @ 2026-03-25 22:11 UTC (permalink / raw)
  To: Florian Fainelli, netdev
  Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
	bcm-kernel-feedback-list



On 3/25/26 3:00 PM, Florian Fainelli wrote:
> On 3/25/26 10:36, justin.chen@broadcom.com wrote:
>> From: Justin Chen <justin.chen@broadcom.com>
>>
>> The bcmgenet_timeout handler tries to take down all tx queues when
>> a single queue times out. This is over zealous and causes many race
>> conditions with queues that are still chugging along. Instead lets
>> only restart the timed out queue.
>>
>> While reclaiming the tx queue we fast forward the write pointer to
>> drop any data in flight. These dropped frames are not added back
>> to the pool of free bds. We also need to tell the netdev that we
>> are dropping said data.
>>
>> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
> 
> The first bits in bcmgenet_tx_reclaim would be fixing:
> 
> Fixes: f1bacae8b655 ("net: bcmgenet: support reclaiming unsent Tx packets")
> 
> and the other part in bcmgenet_tx_timeout() would have:
> 
> Fixes: 13ea657806cf ("net: bcmgenet: improve TX timeout")
> 
> Both fixes look good to me:
> 
> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>

Thanks. I will resend v2 as two patches once we get verification the 
patch is working as intended.

Justin

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
  2026-03-25 17:36 [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler justin.chen
  2026-03-25 22:00 ` Florian Fainelli
@ 2026-03-26  8:05 ` Nicolai Buchwitz
  1 sibling, 0 replies; 4+ messages in thread
From: Nicolai Buchwitz @ 2026-03-26  8:05 UTC (permalink / raw)
  To: justin.chen
  Cc: netdev, pabeni, kuba, edumazet, davem, andrew+netdev,
	bcm-kernel-feedback-list, florian.fainelli

On 25.3.2026 18:36, justin.chen@broadcom.com wrote:
> From: Justin Chen <justin.chen@broadcom.com>
> 
> The bcmgenet_timeout handler tries to take down all tx queues when
> a single queue times out. This is over zealous and causes many race
> conditions with queues that are still chugging along. Instead lets
> only restart the timed out queue.
> 
> While reclaiming the tx queue we fast forward the write pointer to
> drop any data in flight. These dropped frames are not added back
> to the pool of free bds. We also need to tell the netdev that we
> are dropping said data.
> 
> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
> ---
>  .../net/ethernet/broadcom/genet/bcmgenet.c    | 24 +++++++++----------
>  1 file changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 482a31e7b72b..f69cf490445c 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -1985,6 +1985,7 @@ static unsigned int bcmgenet_tx_reclaim(struct 
> net_device *dev,
>  		drop = (ring->prod_index - ring->c_index) & DMA_C_INDEX_MASK;
>  		released += drop;
>  		ring->prod_index = ring->c_index & DMA_C_INDEX_MASK;
> +		ring->free_bds += drop;
>  		while (drop--) {
>  			cb_ptr = bcmgenet_put_txcb(priv, ring);
>  			skb = cb_ptr->skb;
> @@ -1996,6 +1997,7 @@ static unsigned int bcmgenet_tx_reclaim(struct 
> net_device *dev,
>  		}
>  		if (skb)
>  			dev_consume_skb_any(skb);
> +		netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
>  		bcmgenet_tdma_ring_writel(priv, ring->index,
>  					  ring->prod_index, TDMA_PROD_INDEX);
>  		wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
> @@ -3475,27 +3477,23 @@ static void bcmgenet_dump_tx_queue(struct 
> bcmgenet_tx_ring *ring)
>  static void bcmgenet_timeout(struct net_device *dev, unsigned int 
> txqueue)
>  {
>  	struct bcmgenet_priv *priv = netdev_priv(dev);
> -	u32 int1_enable = 0;
> -	unsigned int q;
> +	struct bcmgenet_tx_ring *ring = &priv->tx_rings[txqueue];
> +	struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
> 
>  	netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
> 
> -	for (q = 0; q <= priv->hw_params->tx_queues; q++)
> -		bcmgenet_dump_tx_queue(&priv->tx_rings[q]);
> -
> -	bcmgenet_tx_reclaim_all(dev);
> +	bcmgenet_dump_tx_queue(ring);
> 
> -	for (q = 0; q <= priv->hw_params->tx_queues; q++)
> -		int1_enable |= (1 << q);
> +	bcmgenet_tx_reclaim(dev, ring, true);
> 
> -	/* Re-enable TX interrupts if disabled */
> -	bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
> +	/* Re-enable the TX interrupt for this ring */
> +	bcmgenet_intrl2_1_writel(priv, 1 << txqueue, INTRL2_CPU_MASK_CLEAR);
> 
> -	netif_trans_update(dev);
> +	txq_trans_cond_update(txq);
> 
> -	BCMGENET_STATS64_INC((&priv->tx_rings[txqueue].stats64), errors);
> +	BCMGENET_STATS64_INC((&ring->stats64), errors);
> 
> -	netif_tx_wake_all_queues(dev);
> +	netif_tx_wake_queue(txq);
>  }
> 
>  #define MAX_MDF_FILTER	17

Can confirm that the lockup is gone with this patch. The watchdog still 
reports a timeout (as already mentioned by you):

[Thu Mar 26 08:00:30 2026] bcmgenet fd580000.ethernet eth0: NETDEV 
WATCHDOG: CPU: 0: transmit queue 0 timed out 2377 ms
[Thu Mar 26 08:00:32 2026] bcmgenet fd580000.ethernet eth0: NETDEV 
WATCHDOG: CPU: 0: transmit queue 4 timed out 4361 ms
[Thu Mar 26 08:00:34 2026] bcmgenet fd580000.ethernet eth0: NETDEV 
WATCHDOG: CPU: 0: transmit queue 4 timed out 2001 ms

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de
Tested-by: Nicolai Buchwitz <nb@tipi-net.de

Thanks
Nicolai

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-26  8:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 17:36 [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler justin.chen
2026-03-25 22:00 ` Florian Fainelli
2026-03-25 22:11   ` Justin Chen
2026-03-26  8:05 ` Nicolai Buchwitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox