From: hawk@kernel.org
To: netdev@vger.kernel.org
Cc: hawk@kernel.org, andrew+netdev@lunn.ch, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
horms@kernel.org, jhs@mojatatu.com, jiri@resnulli.us,
j.koeppeler@tu-berlin.de, kernel-team@cloudflare.com
Subject: [PATCH net-next 3/5] veth: add tx_timeout watchdog as BQL safety net
Date: Tue, 24 Mar 2026 18:47:02 +0100 [thread overview]
Message-ID: <20260324174719.1224337-5-hawk@kernel.org> (raw)
In-Reply-To: <20260324174719.1224337-1-hawk@kernel.org>
From: Jesper Dangaard Brouer <hawk@kernel.org>
With the introduction of BQL (Byte Queue Limits) for veth, there are
now two independent mechanisms that can stop a transmit queue:
- DRV_XOFF: set by netif_tx_stop_queue() when the ptr_ring is full
- STACK_XOFF: set by BQL when the byte-in-flight limit is reached
If either mechanism stalls without a corresponding wake/completion,
the queue stops permanently. Enable the net device watchdog timer and
implement ndo_tx_timeout as a failsafe recovery.
The timeout handler resets BQL state (clearing STACK_XOFF) and wakes
the queue (clearing DRV_XOFF), covering both stop mechanisms. The
watchdog fires after 16 seconds, which accommodates worst-case NAPI
processing (budget=64 packets x 250ms per-packet consumer delay)
without false positives under normal backpressure.
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de>
---
drivers/net/veth.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index b9a79d066703..beb4c31d8fd7 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1426,6 +1426,22 @@ static int veth_set_channels(struct net_device *dev,
goto out;
}
+static void veth_tx_timeout(struct net_device *dev, unsigned int txqueue)
+{
+ struct netdev_queue *txq = netdev_get_tx_queue(dev, txqueue);
+
+ netdev_err(dev,
+ "veth backpressure(0x%lX) stalled(n:%ld) TXQ(%u) re-enable\n",
+ txq->state, atomic_long_read(&txq->trans_timeout), txqueue);
+
+ /* Clear both stop mechanisms:
+ * - DRV_XOFF: set by netif_tx_stop_queue (ptr_ring backpressure)
+ * - STACK_XOFF: set by BQL when byte limit is reached
+ */
+ netdev_tx_reset_queue(txq);
+ netif_tx_wake_queue(txq);
+}
+
static int veth_open(struct net_device *dev)
{
struct veth_priv *priv = netdev_priv(dev);
@@ -1764,6 +1780,7 @@ static const struct net_device_ops veth_netdev_ops = {
.ndo_bpf = veth_xdp,
.ndo_xdp_xmit = veth_ndo_xdp_xmit,
.ndo_get_peer_dev = veth_peer_dev,
+ .ndo_tx_timeout = veth_tx_timeout,
};
static const struct xdp_metadata_ops veth_xdp_metadata_ops = {
@@ -1803,6 +1820,7 @@ static void veth_setup(struct net_device *dev)
dev->priv_destructor = veth_dev_free;
dev->pcpu_stat_type = NETDEV_PCPU_STAT_TSTATS;
dev->max_mtu = ETH_MAX_MTU;
+ dev->watchdog_timeo = msecs_to_jiffies(16000);
dev->hw_features = VETH_FEATURES;
dev->hw_enc_features = VETH_FEATURES;
--
2.43.0
next prev parent reply other threads:[~2026-03-24 17:48 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 17:46 [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support hawk
2026-03-24 17:46 ` [PATCH " hawk
2026-03-24 17:56 ` Jesper Dangaard Brouer
2026-03-24 17:47 ` [PATCH net-next 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices hawk
2026-03-24 17:47 ` [PATCH net-next 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction hawk
2026-03-24 17:47 ` hawk [this message]
2026-03-24 17:47 ` [PATCH net-next 4/5] net: sched: add timeout count to NETDEV WATCHDOG message hawk
2026-03-24 17:47 ` [PATCH net-next 5/5] selftests: net: add veth BQL stress test hawk
2026-03-26 12:19 ` Jesper Dangaard Brouer
2026-03-26 19:55 ` Jakub Kicinski
2026-03-28 15:19 ` Simon Schippers
[not found] ` <1c435d90-8d08-4ac1-8b84-cc72c0b4e30f@tu-berlin.de>
2026-04-30 9:45 ` Simon Schippers
2026-04-30 12:31 ` Jesper Dangaard Brouer
[not found] ` <a841e7ed-eee0-4069-bd0d-ab043a1509c5@tu-berlin.de>
2026-05-01 20:35 ` Simon Schippers
2026-03-27 9:50 ` [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support Toke Høiland-Jørgensen
2026-03-27 12:49 ` Jesper Dangaard Brouer
2026-03-27 15:37 ` Jonas Köppeler
2026-03-28 20:06 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324174719.1224337-5-hawk@kernel.org \
--to=hawk@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=j.koeppeler@tu-berlin.de \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.