From: Jakub Kicinski <kuba@kernel.org>
To: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: netdev@vger.kernel.org, "Toke Høiland-Jørgensen" <toke@toke.dk>,
"Eric Dumazet" <eric.dumazet@gmail.com>,
"David S. Miller" <davem@davemloft.net>,
"Paolo Abeni" <pabeni@redhat.com>,
ihor.solodrai@linux.dev, "Michael S. Tsirkin" <mst@redhat.com>,
makita.toshiaki@lab.ntt.co.jp, toshiaki.makita1@gmail.com,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org, kernel-team@cloudflare.com
Subject: Re: [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs
Date: Thu, 6 Nov 2025 17:29:19 -0800 [thread overview]
Message-ID: <20251106172919.24540443@kernel.org> (raw)
In-Reply-To: <176236369293.30034.1875162194564877560.stgit@firesoul>
On Wed, 05 Nov 2025 18:28:12 +0100 Jesper Dangaard Brouer wrote:
> The changes introduced in commit dc82a33297fc ("veth: apply qdisc
> backpressure on full ptr_ring to reduce TX drops") have been found to cause
> a race condition in production environments.
>
> Under specific circumstances, observed exclusively on ARM64 (aarch64)
> systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
> permanently stalled. This happens when the race condition leads to the TXQ
> entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
> preventing the attached qdisc from dequeueing packets and causing the
> network link to halt.
>
> As a first step towards resolving this issue, this patch introduces a
> failsafe mechanism. It enables the net device watchdog by setting a timeout
> value and implements the .ndo_tx_timeout callback.
>
> If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
> which logs a warning and calls netif_tx_wake_queue() to unstall the queue
> and allow traffic to resume.
>
> The log message will look like this:
>
> veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
> veth42: veth backpressure stalled(n:1) TXQ(0) re-enable
>
> This provides a necessary recovery mechanism while the underlying race
> condition is investigated further. Subsequent patches will address the root
> cause and add more robust state handling.
>
> Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
I think this belongs in net-next.. Fail safe is not really a bug fix.
I'm slightly worried we're missing a corner case and will cause
timeouts to get printed for someone's config.
> +static void veth_tx_timeout(struct net_device *dev, unsigned int txqueue)
> +{
> + struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
> +
> + netdev_err(dev, "veth backpressure stalled(n:%ld) TXQ(%u) re-enable\n",
> + atomic_long_read(&txq->trans_timeout), txqueue);
If you think the trans_timeout is useful, let's add it to the message
core prints? And then we can make this msg just veth specific, I don't
think we should be repeating what core already printed.
> + netif_tx_wake_queue(txq);
> +}
> +
> static int veth_open(struct net_device *dev)
> {
> struct veth_priv *priv = netdev_priv(dev);
> @@ -1711,6 +1723,7 @@ static const struct net_device_ops veth_netdev_ops = {
> .ndo_bpf = veth_xdp,
> .ndo_xdp_xmit = veth_ndo_xdp_xmit,
> .ndo_get_peer_dev = veth_peer_dev,
> + .ndo_tx_timeout = veth_tx_timeout,
> };
>
> static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
> @@ -1749,6 +1762,7 @@ static void veth_setup(struct net_device *dev)
> dev->priv_destructor = veth_dev_free;
> dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
> dev->max_mtu = ETH_MAX_MTU;
> + dev->watchdog_timeo = msecs_to_jiffies(5000);
>
> dev->hw_features = VETH_FEATURES;
> dev->hw_enc_features = VETH_FEATURES;
>
>
next prev parent reply other threads:[~2025-11-07 1:29 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 17:28 [PATCH net V3 0/2] veth: Fix TXQ stall race condition and add recovery Jesper Dangaard Brouer
2025-11-05 17:28 ` [PATCH net V3 1/2] veth: enable dev_watchdog for detecting stalled TXQs Jesper Dangaard Brouer
2025-11-07 1:29 ` Jakub Kicinski [this message]
2025-11-07 13:42 ` Jesper Dangaard Brouer
2025-11-08 1:54 ` Jakub Kicinski
2025-11-05 17:28 ` [PATCH net V3 2/2] veth: more robust handing of race to avoid txq getting stuck Jesper Dangaard Brouer
2025-11-06 14:14 ` Toshiaki Makita
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251106172919.24540443@kernel.org \
--to=kuba@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=hawk@kernel.org \
--cc=ihor.solodrai@linux.dev \
--cc=kernel-team@cloudflare.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=makita.toshiaki@lab.ntt.co.jp \
--cc=mst@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=toke@toke.dk \
--cc=toshiaki.makita1@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).