* [PATCH 0/2 net] net: bcmgenet: fix lock up when queues time out
@ 2026-03-26 18:45 justin.chen
2026-03-26 18:45 ` [PATCH net 1/2] net: bcmgenet: fix leaking free_bds justin.chen
2026-03-26 18:45 ` [PATCH net 2/2] net: bcmgenet: fix racing timeout handler justin.chen
0 siblings, 2 replies; 5+ messages in thread
From: justin.chen @ 2026-03-26 18:45 UTC (permalink / raw)
To: netdev
Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list, florian.fainelli, opendmb, nb,
Justin Chen
From: Justin Chen <justin.chen@broadcom.com>
We tend to timeout on our smaller queues. This will be solved later, but
occassionally when we hit the timeout handler the queues lock up entirely.
This was due a leaking and racing timeout handler that is now fixed.
Justin Chen (2):
net: bcmgenet: fix leaking free_bds
net: bcmgenet: fix racing timeout handler
.../net/ethernet/broadcom/genet/bcmgenet.c | 24 +++++++++----------
1 file changed, 11 insertions(+), 13 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net 1/2] net: bcmgenet: fix leaking free_bds
2026-03-26 18:45 [PATCH 0/2 net] net: bcmgenet: fix lock up when queues time out justin.chen
@ 2026-03-26 18:45 ` justin.chen
2026-03-28 4:11 ` Jakub Kicinski
2026-03-26 18:45 ` [PATCH net 2/2] net: bcmgenet: fix racing timeout handler justin.chen
1 sibling, 1 reply; 5+ messages in thread
From: justin.chen @ 2026-03-26 18:45 UTC (permalink / raw)
To: netdev
Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list, florian.fainelli, opendmb, nb,
Justin Chen
From: Justin Chen <justin.chen@broadcom.com>
While reclaiming the tx queue we fast forward the write pointer to
drop any data in flight. These dropped frames are not added back
to the pool of free bds. We also need to tell the netdev that we
are dropping said data.
Fixes: f1bacae8b655 ("net: bcmgenet: support reclaiming unsent Tx packets")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Tested-by: Nicolai Buchwitz <nb@tipi-net.de>
---
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 482a31e7b72b..3e1fc3bb8297 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -1985,6 +1985,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
drop = (ring->prod_index - ring->c_index) & DMA_C_INDEX_MASK;
released += drop;
ring->prod_index = ring->c_index & DMA_C_INDEX_MASK;
+ ring->free_bds += drop;
while (drop--) {
cb_ptr = bcmgenet_put_txcb(priv, ring);
skb = cb_ptr->skb;
@@ -1996,6 +1997,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
}
if (skb)
dev_consume_skb_any(skb);
+ netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
bcmgenet_tdma_ring_writel(priv, ring->index,
ring->prod_index, TDMA_PROD_INDEX);
wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net 2/2] net: bcmgenet: fix racing timeout handler
2026-03-26 18:45 [PATCH 0/2 net] net: bcmgenet: fix lock up when queues time out justin.chen
2026-03-26 18:45 ` [PATCH net 1/2] net: bcmgenet: fix leaking free_bds justin.chen
@ 2026-03-26 18:45 ` justin.chen
2026-03-28 4:13 ` Jakub Kicinski
1 sibling, 1 reply; 5+ messages in thread
From: justin.chen @ 2026-03-26 18:45 UTC (permalink / raw)
To: netdev
Cc: pabeni, kuba, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list, florian.fainelli, opendmb, nb,
Justin Chen
From: Justin Chen <justin.chen@broadcom.com>
The bcmgenet_timeout handler tries to take down all tx queues when
a single queue times out. This is over zealous and causes many race
conditions with queues that are still chugging along. Instead lets
only restart the timed out queue.
Fixes: 13ea657806cf ("net: bcmgenet: improve TX timeout")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de>
Tested-by: Nicolai Buchwitz <nb@tipi-net.de>
---
.../net/ethernet/broadcom/genet/bcmgenet.c | 22 ++++++++-----------
1 file changed, 9 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index 3e1fc3bb8297..d33c4fa071a3 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -3477,27 +3477,23 @@ static void bcmgenet_dump_tx_queue(struct bcmgenet_tx_ring *ring)
static void bcmgenet_timeout(struct net_device *dev, unsigned int txqueue)
{
struct bcmgenet_priv *priv = netdev_priv(dev);
- u32 int1_enable = 0;
- unsigned int q;
+ struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
+ struct bcmgenet_tx_ring *ring = &priv->tx_rings[txqueue];
netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
- for (q = 0; q <= priv->hw_params->tx_queues; q++)
- bcmgenet_dump_tx_queue(&priv->tx_rings[q]);
-
- bcmgenet_tx_reclaim_all(dev);
+ bcmgenet_dump_tx_queue(ring);
- for (q = 0; q <= priv->hw_params->tx_queues; q++)
- int1_enable |= (1 << q);
+ bcmgenet_tx_reclaim(dev, ring, true);
- /* Re-enable TX interrupts if disabled */
- bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
+ /* Re-enable the TX interrupt for this ring */
+ bcmgenet_intrl2_1_writel(priv, 1 << txqueue, INTRL2_CPU_MASK_CLEAR);
- netif_trans_update(dev);
+ txq_trans_cond_update(txq);
- BCMGENET_STATS64_INC((&priv->tx_rings[txqueue].stats64), errors);
+ BCMGENET_STATS64_INC((&ring->stats64), errors);
- netif_tx_wake_all_queues(dev);
+ netif_tx_wake_queue(txq);
}
#define MAX_MDF_FILTER 17
--
2.34.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net 1/2] net: bcmgenet: fix leaking free_bds
2026-03-26 18:45 ` [PATCH net 1/2] net: bcmgenet: fix leaking free_bds justin.chen
@ 2026-03-28 4:11 ` Jakub Kicinski
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2026-03-28 4:11 UTC (permalink / raw)
To: justin.chen
Cc: netdev, pabeni, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list, florian.fainelli, opendmb, nb
On Thu, 26 Mar 2026 11:45:28 -0700 justin.chen@broadcom.com wrote:
> From: Justin Chen <justin.chen@broadcom.com>
>
> While reclaiming the tx queue we fast forward the write pointer to
> drop any data in flight. These dropped frames are not added back
> to the pool of free bds. We also need to tell the netdev that we
> are dropping said data.
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 482a31e7b72b..3e1fc3bb8297 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -1985,6 +1985,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
> drop = (ring->prod_index - ring->c_index) & DMA_C_INDEX_MASK;
> released += drop;
> ring->prod_index = ring->c_index & DMA_C_INDEX_MASK;
> + ring->free_bds += drop;
> while (drop--) {
> cb_ptr = bcmgenet_put_txcb(priv, ring);
> skb = cb_ptr->skb;
> @@ -1996,6 +1997,7 @@ static unsigned int bcmgenet_tx_reclaim(struct net_device *dev,
> }
> if (skb)
> dev_consume_skb_any(skb);
> + netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
> bcmgenet_tdma_ring_writel(priv, ring->index,
> ring->prod_index, TDMA_PROD_INDEX);
> wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
AI says you may be off by one here?
Does this loop miss the oldest dropped descriptor and leak its SKB and
DMA mapping?
Since bcmgenet_get_txcb() increments write_ptr after a transmission,
write_ptr always points to the next available empty slot.
When this loops backwards drop times, the first iteration retrieves that
empty descriptor (where cb->skb is NULL), and the loop will finish before
reaching the oldest uncompleted descriptor at c_index.
When write_ptr is rolled back to c_index, won't subsequent transmissions
overwrite the descriptor at c_index and permanently leak the unmapped DMA
buffer and SKB memory?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net 2/2] net: bcmgenet: fix racing timeout handler
2026-03-26 18:45 ` [PATCH net 2/2] net: bcmgenet: fix racing timeout handler justin.chen
@ 2026-03-28 4:13 ` Jakub Kicinski
0 siblings, 0 replies; 5+ messages in thread
From: Jakub Kicinski @ 2026-03-28 4:13 UTC (permalink / raw)
To: justin.chen
Cc: netdev, pabeni, edumazet, davem, andrew+netdev,
bcm-kernel-feedback-list, florian.fainelli, opendmb, nb
On Thu, 26 Mar 2026 11:45:29 -0700 justin.chen@broadcom.com wrote:
> The bcmgenet_timeout handler tries to take down all tx queues when
> a single queue times out. This is over zealous and causes many race
> conditions with queues that are still chugging along. Instead lets
> only restart the timed out queue.
FWIW AI seems to suggest we should also stop NAPI and the DMA in this
case, just to make sure that the queue in question doesn't suddenly
wake up either. Which seems fair but probably as a follow up and only
if not too hard in itself..
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-28 4:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-26 18:45 [PATCH 0/2 net] net: bcmgenet: fix lock up when queues time out justin.chen
2026-03-26 18:45 ` [PATCH net 1/2] net: bcmgenet: fix leaking free_bds justin.chen
2026-03-28 4:11 ` Jakub Kicinski
2026-03-26 18:45 ` [PATCH net 2/2] net: bcmgenet: fix racing timeout handler justin.chen
2026-03-28 4:13 ` Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox