From: sashiko-bot@kernel.org
To: hawk@kernel.org
Cc: bpf@vger.kernel.org
Subject: Re: [PATCH net-next v7 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction
Date: Sat, 13 Jun 2026 08:36:05 +0000 [thread overview]
Message-ID: <20260613083605.A69951F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260612083530.1650245-3-hawk@kernel.org>
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
Pre-existing issues:
- [High] The attempt to clear DRV_XOFF during teardown incorrectly checks netif_running(dev), leaving the peer's TX queue stalled for 16 seconds on interface restart.
--
commit 9a6517a905c3dcf9d565963659494eab75fbf1f5
Author: Jesper Dangaard Brouer <hawk@kernel.org>
Subject: veth: implement Byte Queue Limits (BQL) for latency reduction
This commit implements Byte Queue Limits (BQL) in the veth driver to give
qdiscs visibility into the ptr_ring fill level and reduce base latency. It
charges a fixed unit per packet rather than byte lengths, and introduces
logic to drain leftover ring entries and balance DQL accounting at teardown.
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index 0cfb19b760dd5..a3505627f49e4 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
[ ... ]
> @@ -1085,11 +1155,49 @@ static void veth_napi_del_range(struct net_device *dev, int start, int end)
> }
> synchronize_net();
>
> + /* This rq's frames were BQL-charged on the peer's txq[i]. */
> + peer = rtnl_dereference(priv->peer);
> +
> for (i = start; i < end; i++) {
> struct veth_rq *rq = &priv->rq[i];
> + struct netdev_queue *txq;
> + unsigned int n_bql;
>
> rq->rx_notify_masked = false;
> +
> + /* Drain leftover ring frames, counting BQL-charged SKBs that
> + * were charged via netdev_tx_sent_queue() but never consumed.
> + */
> + n_bql = veth_ptr_ring_drain(&rq->xdp_ring);
> ptr_ring_cleanup(&rq->xdp_ring, veth_ptr_free);
> +
> + if (!peer || i >= peer->num_tx_queues)
> + continue;
> +
> + txq = netdev_get_tx_queue(peer, i);
> +
> + /* Balance the peer txq's DQL accounting by completing the
> + * outstanding charges instead of netdev_tx_reset_queue():
> + * dql_reset() races with a concurrent producer, while
> + * netdev_tx_completed_queue() is the normal single-completer
> + * path and is safe here -- NAPI is gone (synchronize_net()
> + * above) and the producer stopped charging BQL once it
> + * observed rq->napi == NULL. Completing every charge drives
> + * DQL inflight to 0 and clears STACK_XOFF.
> + */
> + if (n_bql)
> + netdev_tx_completed_queue(txq, n_bql,
> + n_bql * VETH_BQL_UNIT);
> +
> + /* DRV_XOFF is independent of BQL/STACK_XOFF: a concurrent
> + * veth_xmit() may have set it between rcu_assign_pointer(napi,
> + * NULL) and synchronize_net(); with NAPI gone nothing else
> + * clears it. The completion above only clears STACK_XOFF, so
> + * still wake the txq to clear DRV_XOFF -- but only when the
> + * device is still up.
> + */
> + if (netif_running(dev))
> + netif_tx_wake_queue(txq);
[Severity: High]
This isn't a bug introduced by this patch, but the attempt to clear DRV_XOFF
during teardown relies on netif_running(dev) which will always be false
during dev_close().
Will this leave the peer's TX queue stalled?
When the local RX ring fills up, veth_xmit() stops the peer's TX queue
by setting the __QUEUE_STATE_DRV_XOFF bit. During device close, the
driver core in __dev_close_many() clears the __LINK_STATE_START bit before
the ndo_stop callback is reached. Because of this, netif_running(dev) will
always evaluate to false when veth_napi_del_range() is called.
Consequently, the netif_tx_wake_queue() call is bypassed and the peer's
transmit queue remains frozen. When the local device is brought back up,
nothing clears the DRV_XOFF bit on the peer, leaving the transmit path
completely stalled until the TX watchdog timer expires 16 seconds later.
Could this check be adjusted or removed so the peer's queue can be
properly woken and cleared of DRV_XOFF during teardown?
> }
>
> for (i = start; i < end; i++) {
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260612083530.1650245-1-hawk@kernel.org?part=2
next prev parent reply other threads:[~2026-06-13 8:36 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-12 8:35 [PATCH net-next v7 0/5] veth: add Byte Queue Limits (BQL) support hawk
2026-06-12 8:35 ` [PATCH net-next v7 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices hawk
2026-06-12 8:35 ` [PATCH net-next v7 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction hawk
2026-06-13 8:36 ` sashiko-bot [this message]
2026-06-12 8:35 ` [PATCH net-next v7 3/5] veth: add tx_timeout watchdog as BQL safety net hawk
2026-06-12 8:35 ` [PATCH net-next v7 4/5] net: sched: add timeout count to NETDEV WATCHDOG message hawk
2026-06-12 8:35 ` [PATCH net-next v7 5/5] veth: time-based BQL completion coalescing via ethtool tx-usecs hawk
2026-06-13 8:36 ` sashiko-bot
2026-06-13 14:14 ` Simon Schippers
2026-06-12 14:10 ` [PATCH net-next v7 0/5] veth: add Byte Queue Limits (BQL) support Simon Schippers
2026-06-12 17:21 ` Jonas Köppeler
2026-06-13 13:57 ` Simon Schippers
2026-06-16 1:53 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260613083605.A69951F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=hawk@kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.