* [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
@ 2026-03-25 17:36 justin.chen
2026-03-25 22:00 ` Florian Fainelli
2026-03-26 8:05 ` Nicolai Buchwitz
0 siblings, 2 replies; 4+ messages in thread
From: justin.chen @ 2026-03-25 17:36 UTC (permalink / raw)
To: netdev
Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list, florian.fainelli, Justin Chen
From: Justin Chen <justin.chen@broadcom.com>
The bcmgenet_timeout handler tries to take down all tx queues when
a single queue times out. This is over zealous and causes many race
conditions with queues that are still chugging along. Instead lets
only restart the timed out queue.
While reclaiming the tx queue we fast forward the write pointer to
drop any data in flight. These dropped frames are not added back
to the pool of free bds. We also need to tell the netdev that we
are dropping said data.
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
---
.../net/ethernet/broadcom/genet/bcmgenet.c | 24 +++++++++----------
1 file changed, 11 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 482a31e7b72b..f69cf490445c 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1985,6 +1985,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
drop = (ring->prod_index - ring->c_index) & DMA_C_INDEX_MASK;
released += drop;
ring->prod_index = ring->c_index & DMA_C_INDEX_MASK;
+ ring->free_bds += drop;
while (drop--) {
cb_ptr = bcmgenet_put_txcb(priv, ring);
skb = cb_ptr->skb;
@@ -1996,6 +1997,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
}
if (skb)
dev_consume_skb_any(skb);
+ netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
bcmgenet_tdma_ring_writel(priv, ring->index,
ring->prod_index, TDMA_PROD_INDEX);
wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
@@ -3475,27 +3477,23 @@ static void bcmgenet_dump_tx_queue(struct bcmgenet_tx_ring *ring)
static void bcmgenet_timeout(struct net_device *dev, unsigned int txqueue)
{
struct bcmgenet_priv *priv = netdev_priv(dev);
- u32 int1_enable = 0;
- unsigned int q;
+ struct bcmgenet_tx_ring *ring = &priv->tx_rings[txqueue];
+ struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
- for (q = 0; q <= priv->hw_params->tx_queues; q++)
- bcmgenet_dump_tx_queue(&priv->tx_rings[q]);
-
- bcmgenet_tx_reclaim_all(dev);
+ bcmgenet_dump_tx_queue(ring);
- for (q = 0; q <= priv->hw_params->tx_queues; q++)
- int1_enable |= (1 << q);
+ bcmgenet_tx_reclaim(dev, ring, true);
- /* Re-enable TX interrupts if disabled */
- bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
+ /* Re-enable the TX interrupt for this ring */
+ bcmgenet_intrl2_1_writel(priv, 1 << txqueue, INTRL2_CPU_MASK_CLEAR);
- netif_trans_update(dev);
+ txq_trans_cond_update(txq);
- BCMGENET_STATS64_INC((&priv->tx_rings[txqueue].stats64), errors);
+ BCMGENET_STATS64_INC((&ring->stats64), errors);
- netif_tx_wake_all_queues(dev);
+ netif_tx_wake_queue(txq);
}
#define MAX_MDF_FILTER 17
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
2026-03-25 17:36 [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler justin.chen
@ 2026-03-25 22:00 ` Florian Fainelli
2026-03-25 22:11 ` Justin Chen
2026-03-26 8:05 ` Nicolai Buchwitz
1 sibling, 1 reply; 4+ messages in thread
From: Florian Fainelli @ 2026-03-25 22:00 UTC (permalink / raw)
To: justin.chen, netdev
Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list
On 3/25/26 10:36, justin.chen@broadcom.com wrote:
> From: Justin Chen <justin.chen@broadcom.com>
>
> The bcmgenet_timeout handler tries to take down all tx queues when
> a single queue times out. This is over zealous and causes many race
> conditions with queues that are still chugging along. Instead lets
> only restart the timed out queue.
>
> While reclaiming the tx queue we fast forward the write pointer to
> drop any data in flight. These dropped frames are not added back
> to the pool of free bds. We also need to tell the netdev that we
> are dropping said data.
>
> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
The first bits in bcmgenet_tx_reclaim would be fixing:
Fixes: f1bacae8b655 ("net: bcmgenet: support reclaiming unsent Tx packets")
and the other part in bcmgenet_tx_timeout() would have:
Fixes: 13ea657806cf ("net: bcmgenet: improve TX timeout")
Both fixes look good to me:
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
--
Florian
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
2026-03-25 22:00 ` Florian Fainelli
@ 2026-03-25 22:11 ` Justin Chen
0 siblings, 0 replies; 4+ messages in thread
From: Justin Chen @ 2026-03-25 22:11 UTC (permalink / raw)
To: Florian Fainelli, netdev
Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list
On 3/25/26 3:00 PM, Florian Fainelli wrote:
> On 3/25/26 10:36, justin.chen@broadcom.com wrote:
>> From: Justin Chen <justin.chen@broadcom.com>
>>
>> The bcmgenet_timeout handler tries to take down all tx queues when
>> a single queue times out. This is over zealous and causes many race
>> conditions with queues that are still chugging along. Instead lets
>> only restart the timed out queue.
>>
>> While reclaiming the tx queue we fast forward the write pointer to
>> drop any data in flight. These dropped frames are not added back
>> to the pool of free bds. We also need to tell the netdev that we
>> are dropping said data.
>>
>> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
>
> The first bits in bcmgenet_tx_reclaim would be fixing:
>
> Fixes: f1bacae8b655 ("net: bcmgenet: support reclaiming unsent Tx packets")
>
> and the other part in bcmgenet_tx_timeout() would have:
>
> Fixes: 13ea657806cf ("net: bcmgenet: improve TX timeout")
>
> Both fixes look good to me:
>
> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Thanks. I will resend v2 as two patches once we get verification the
patch is working as intended.
Justin
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
2026-03-25 17:36 [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler justin.chen
2026-03-25 22:00 ` Florian Fainelli
@ 2026-03-26 8:05 ` Nicolai Buchwitz
1 sibling, 0 replies; 4+ messages in thread
From: Nicolai Buchwitz @ 2026-03-26 8:05 UTC (permalink / raw)
To: justin.chen
Cc: netdev, pabeni, kuba, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list, florian.fainelli
On 25.3.2026 18:36, justin.chen@broadcom.com wrote:
> From: Justin Chen <justin.chen@broadcom.com>
>
> The bcmgenet_timeout handler tries to take down all tx queues when
> a single queue times out. This is over zealous and causes many race
> conditions with queues that are still chugging along. Instead lets
> only restart the timed out queue.
>
> While reclaiming the tx queue we fast forward the write pointer to
> drop any data in flight. These dropped frames are not added back
> to the pool of free bds. We also need to tell the netdev that we
> are dropping said data.
>
> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
> ---
> .../net/ethernet/broadcom/genet/bcmgenet.c | 24 +++++++++----------
> 1 file changed, 11 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 482a31e7b72b..f69cf490445c 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -1985,6 +1985,7 @@ static unsigned int bcmgenet_tx_reclaim(struct
> net_device *dev,
> drop = (ring->prod_index - ring->c_index) & DMA_C_INDEX_MASK;
> released += drop;
> ring->prod_index = ring->c_index & DMA_C_INDEX_MASK;
> + ring->free_bds += drop;
> while (drop--) {
> cb_ptr = bcmgenet_put_txcb(priv, ring);
> skb = cb_ptr->skb;
> @@ -1996,6 +1997,7 @@ static unsigned int bcmgenet_tx_reclaim(struct
> net_device *dev,
> }
> if (skb)
> dev_consume_skb_any(skb);
> + netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
> bcmgenet_tdma_ring_writel(priv, ring->index,
> ring->prod_index, TDMA_PROD_INDEX);
> wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
> @@ -3475,27 +3477,23 @@ static void bcmgenet_dump_tx_queue(struct
> bcmgenet_tx_ring *ring)
> static void bcmgenet_timeout(struct net_device *dev, unsigned int
> txqueue)
> {
> struct bcmgenet_priv *priv = netdev_priv(dev);
> - u32 int1_enable = 0;
> - unsigned int q;
> + struct bcmgenet_tx_ring *ring = &priv->tx_rings[txqueue];
> + struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
>
> netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
>
> - for (q = 0; q <= priv->hw_params->tx_queues; q++)
> - bcmgenet_dump_tx_queue(&priv->tx_rings[q]);
> -
> - bcmgenet_tx_reclaim_all(dev);
> + bcmgenet_dump_tx_queue(ring);
>
> - for (q = 0; q <= priv->hw_params->tx_queues; q++)
> - int1_enable |= (1 << q);
> + bcmgenet_tx_reclaim(dev, ring, true);
>
> - /* Re-enable TX interrupts if disabled */
> - bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
> + /* Re-enable the TX interrupt for this ring */
> + bcmgenet_intrl2_1_writel(priv, 1 << txqueue, INTRL2_CPU_MASK_CLEAR);
>
> - netif_trans_update(dev);
> + txq_trans_cond_update(txq);
>
> - BCMGENET_STATS64_INC((&priv->tx_rings[txqueue].stats64), errors);
> + BCMGENET_STATS64_INC((&ring->stats64), errors);
>
> - netif_tx_wake_all_queues(dev);
> + netif_tx_wake_queue(txq);
> }
>
> #define MAX_MDF_FILTER 17
Can confirm that the lockup is gone with this patch. The watchdog still
reports a timeout (as already mentioned by you):
[Thu Mar 26 08:00:30 2026] bcmgenet fd580000.ethernet eth0: NETDEV
WATCHDOG: CPU: 0: transmit queue 0 timed out 2377 ms
[Thu Mar 26 08:00:32 2026] bcmgenet fd580000.ethernet eth0: NETDEV
WATCHDOG: CPU: 0: transmit queue 4 timed out 4361 ms
[Thu Mar 26 08:00:34 2026] bcmgenet fd580000.ethernet eth0: NETDEV
WATCHDOG: CPU: 0: transmit queue 4 timed out 2001 ms
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de
Tested-by: Nicolai Buchwitz <nb@tipi-net.de
Thanks
Nicolai
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-03-26 8:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-25 17:36 [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler justin.chen
2026-03-25 22:00 ` Florian Fainelli
2026-03-25 22:11 ` Justin Chen
2026-03-26 8:05 ` Nicolai Buchwitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox