Re: [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: netdev@vger.kernel.org, "Toke Høiland-Jørgensen" <toke@toke.dk>,
	"Eric Dumazet" <eric.dumazet@gmail.com>,
	"David S. Miller" <davem@davemloft.net>,
	"Paolo Abeni" <pabeni@redhat.com>,
	ihor.solodrai@linux.dev, "Michael S. Tsirkin" <mst@redhat.com>,
	makita.toshiaki@lab.ntt.co.jp, toshiaki.makita1@gmail.com,
	bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org, kernel-team@cloudflare.com
Subject: Re: [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs
Date: Thu, 6 Nov 2025 17:29:19 -0800	[thread overview]
Message-ID: <20251106172919.24540443@kernel.org> (raw)
In-Reply-To: <176236369293.30034.1875162194564877560.stgit@firesoul>

On Wed, 05 Nov 2025 18:28:12 +0100 Jesper Dangaard Brouer wrote:
> The changes introduced in commit dc82a33297fc ("veth: apply qdisc
> backpressure on full ptr_ring to reduce TX drops") have been found to cause
> a race condition in production environments.
> 
> Under specific circumstances, observed exclusively on ARM64 (aarch64)
> systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
> permanently stalled. This happens when the race condition leads to the TXQ
> entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
> preventing the attached qdisc from dequeueing packets and causing the
> network link to halt.
> 
> As a first step towards resolving this issue, this patch introduces a
> failsafe mechanism. It enables the net device watchdog by setting a timeout
> value and implements the .ndo_tx_timeout callback.
> 
> If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
> which logs a warning and calls netif_tx_wake_queue() to unstall the queue
> and allow traffic to resume.
> 
> The log message will look like this:
> 
>  veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
>  veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
> 
> This provides a necessary recovery mechanism while the underlying race
> condition is investigated further. Subsequent patches will address the root
> cause and add more robust state handling.
> 
> Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>

I think this belongs in net-next.. Fail safe is not really a bug fix.
I'm slightly worried we're missing a corner case and will cause
timeouts to get printed for someone's config.

> +static void veth_tx_timeout(struct net_device *dev, unsigned int txqueue)
> +{
> +	struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
> +
> +	netdev_err(dev, "veth backpressure stalled(n:%ld) TXQ(%u) re-enable\n",
> +		   atomic_long_read(&txq->trans_timeout), txqueue);

If you think the trans_timeout is useful, let's add it to the message
core prints? And then we can make this msg just veth specific, I don't
think we should be repeating what core already printed.

> +	netif_tx_wake_queue(txq);
> +}
> +
>  static int veth_open(struct net_device *dev)
>  {
>  	struct veth_priv *priv = netdev_priv(dev);
> @@ -1711,6 +1723,7 @@ static const struct net_device_ops veth_netdev_ops = {
>  	.ndo_bpf		= veth_xdp,
>  	.ndo_xdp_xmit		= veth_ndo_xdp_xmit,
>  	.ndo_get_peer_dev	= veth_peer_dev,
> +	.ndo_tx_timeout		= veth_tx_timeout,
>  };
>  
>  static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
> @@ -1749,6 +1762,7 @@ static void veth_setup(struct net_device *dev)
>  	dev->priv_destructor = veth_dev_free;
>  	dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
>  	dev->max_mtu = ETH_MAX_MTU;
> +	dev->watchdog_timeo = msecs_to_jiffies(5000);
>  
>  	dev->hw_features = VETH_FEATURES;
>  	dev->hw_enc_features = VETH_FEATURES;
> 
>

next prev parent reply	other threads:[~2025-11-07  1:29 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-05 17:28 [PATCH net V3 0/2] veth: Fix TXQ stall race condition and add recovery Jesper Dangaard Brouer
2025-11-05 17:28 ` [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs Jesper Dangaard Brouer
2025-11-07  1:29   ` Jakub Kicinski [this message]
2025-11-07 13:42     ` Jesper Dangaard Brouer
2025-11-08  1:54       ` Jakub Kicinski
2025-11-12 21:58         ` Jesper Dangaard Brouer
2025-11-05 17:28 ` [PATCH net V3 2/2] veth: more robust handing of race to avoid txq getting stuck Jesper Dangaard Brouer
2025-11-06 14:14   ` Toshiaki Makita

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251106172919.24540443@kernel.org \
    --to=kuba@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hawk@kernel.org \
    --cc=ihor.solodrai@linux.dev \
    --cc=kernel-team@cloudflare.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=makita.toshiaki@lab.ntt.co.jp \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@toke.dk \
    --cc=toshiaki.makita1@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.