public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Nicolai Buchwitz <nb@tipi-net.de>
To: justin.chen@broadcom.com
Cc: netdev@vger.kernel.org, pabeni@redhat.com, kuba@kernel.org,
	edumazet@google.com, davem@davemloft.net, andrew+netdev@lunn.ch,
	bcm-kernel-feedback-list@broadcom.com,
	florian.fainelli@broadcom.com
Subject: Re: [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler
Date: Thu, 26 Mar 2026 09:05:04 +0100	[thread overview]
Message-ID: <df5781320ba8e27f55743c0613fea23a@tipi-net.de> (raw)
In-Reply-To: <20260325173602.3676778-1-justin.chen@broadcom.com>

On 25.3.2026 18:36, justin.chen@broadcom.com wrote:
> From: Justin Chen <justin.chen@broadcom.com>
> 
> The bcmgenet_timeout handler tries to take down all tx queues when
> a single queue times out. This is over zealous and causes many race
> conditions with queues that are still chugging along. Instead lets
> only restart the timed out queue.
> 
> While reclaiming the tx queue we fast forward the write pointer to
> drop any data in flight. These dropped frames are not added back
> to the pool of free bds. We also need to tell the netdev that we
> are dropping said data.
> 
> Signed-off-by: Justin Chen <justin.chen@broadcom.com>
> ---
>  .../net/ethernet/broadcom/genet/bcmgenet.c    | 24 +++++++++----------
>  1 file changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c 
> b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> index 482a31e7b72b..f69cf490445c 100644
> --- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> +++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
> @@ -1985,6 +1985,7 @@ static unsigned int bcmgenet_tx_reclaim(struct 
> net_device *dev,
>  		drop = (ring->prod_index - ring->c_index) & DMA_C_INDEX_MASK;
>  		released += drop;
>  		ring->prod_index = ring->c_index & DMA_C_INDEX_MASK;
> +		ring->free_bds += drop;
>  		while (drop--) {
>  			cb_ptr = bcmgenet_put_txcb(priv, ring);
>  			skb = cb_ptr->skb;
> @@ -1996,6 +1997,7 @@ static unsigned int bcmgenet_tx_reclaim(struct 
> net_device *dev,
>  		}
>  		if (skb)
>  			dev_consume_skb_any(skb);
> +		netdev_tx_reset_queue(netdev_get_tx_queue(dev, ring->index));
>  		bcmgenet_tdma_ring_writel(priv, ring->index,
>  					  ring->prod_index, TDMA_PROD_INDEX);
>  		wr_ptr = ring->write_ptr * WORDS_PER_BD(priv);
> @@ -3475,27 +3477,23 @@ static void bcmgenet_dump_tx_queue(struct 
> bcmgenet_tx_ring *ring)
>  static void bcmgenet_timeout(struct net_device *dev, unsigned int 
> txqueue)
>  {
>  	struct bcmgenet_priv *priv = netdev_priv(dev);
> -	u32 int1_enable = 0;
> -	unsigned int q;
> +	struct bcmgenet_tx_ring *ring = &priv->tx_rings[txqueue];
> +	struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
> 
>  	netif_dbg(priv, tx_err, dev, "bcmgenet_timeout\n");
> 
> -	for (q = 0; q <= priv->hw_params->tx_queues; q++)
> -		bcmgenet_dump_tx_queue(&priv->tx_rings[q]);
> -
> -	bcmgenet_tx_reclaim_all(dev);
> +	bcmgenet_dump_tx_queue(ring);
> 
> -	for (q = 0; q <= priv->hw_params->tx_queues; q++)
> -		int1_enable |= (1 << q);
> +	bcmgenet_tx_reclaim(dev, ring, true);
> 
> -	/* Re-enable TX interrupts if disabled */
> -	bcmgenet_intrl2_1_writel(priv, int1_enable, INTRL2_CPU_MASK_CLEAR);
> +	/* Re-enable the TX interrupt for this ring */
> +	bcmgenet_intrl2_1_writel(priv, 1 << txqueue, INTRL2_CPU_MASK_CLEAR);
> 
> -	netif_trans_update(dev);
> +	txq_trans_cond_update(txq);
> 
> -	BCMGENET_STATS64_INC((&priv->tx_rings[txqueue].stats64), errors);
> +	BCMGENET_STATS64_INC((&ring->stats64), errors);
> 
> -	netif_tx_wake_all_queues(dev);
> +	netif_tx_wake_queue(txq);
>  }
> 
>  #define MAX_MDF_FILTER	17

Can confirm that the lockup is gone with this patch. The watchdog still 
reports a timeout (as already mentioned by you):

[Thu Mar 26 08:00:30 2026] bcmgenet fd580000.ethernet eth0: NETDEV 
WATCHDOG: CPU: 0: transmit queue 0 timed out 2377 ms
[Thu Mar 26 08:00:32 2026] bcmgenet fd580000.ethernet eth0: NETDEV 
WATCHDOG: CPU: 0: transmit queue 4 timed out 4361 ms
[Thu Mar 26 08:00:34 2026] bcmgenet fd580000.ethernet eth0: NETDEV 
WATCHDOG: CPU: 0: transmit queue 4 timed out 2001 ms

Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de
Tested-by: Nicolai Buchwitz <nb@tipi-net.de

Thanks
Nicolai

      parent reply	other threads:[~2026-03-26  8:05 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-25 17:36 [PATCH RFC net] net: bcmgenet: fix leaking and racing timeout handler justin.chen
2026-03-25 22:00 ` Florian Fainelli
2026-03-25 22:11   ` Justin Chen
2026-03-26  8:05 ` Nicolai Buchwitz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df5781320ba8e27f55743c0613fea23a@tipi-net.de \
    --to=nb@tipi-net.de \
    --cc=andrew+netdev@lunn.ch \
    --cc=bcm-kernel-feedback-list@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=florian.fainelli@broadcom.com \
    --cc=justin.chen@broadcom.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox