From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FD072E738D for ; Sat, 30 May 2026 00:50:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780102212; cv=none; b=XZj2OdxeON2tk7mb7Eny7bbBhDYUNfoQz3wrOguC7bx+4wbn6XJN36bs++535vvdzzqZurLNnscrjd8BIbs/El6Z5aO9GGK3NSCjbwUx9v6g3pWJJD2cZLYmP4vur4lF4N0iX06MpbxS1UiAkH3LFTHXbdUFuoINR8JwZV4Yoxc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780102212; c=relaxed/simple; bh=zjkEWyctfF9lWvUtfPnaoMGL98fWF0naFmj8URKbkT0=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=lkQIF7qg22geww/J6kMSIbsUwAfYWt/RqGDY6d/yQw4OqGA6Z+FNxaZ/eRi273R1rRdB/imb8kYb1RXGvdnIyoOfBQ0m4I+AHbIrrU+NvZDEMXFDyuOo4rA82/xjle1lH1ox6siTxPGh/43ooTKuPEao/CjzxZZoWm1jsTodyPU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OwNK2uZ7; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OwNK2uZ7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 3F4E51F00899; Sat, 30 May 2026 00:50:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780102211; bh=ogf+grTNuwk9LfDROG+RJdG53S4GhWvXYOaLgzGDOXg=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=OwNK2uZ7sVZVcE5lddSJmW0MPvvAh9ZWd2buovOrRV43ORFYs3FQpqRKnGRBMmb25 QqTJmOUxzifDOKM+GxODHgHEUDHeE/tXEkcU8aJDDXJKr4SN3oBJUk+nR9jZianQih MITe+ktsoGA+0Fjiq/NseCSamNFStVVCeiq2xLEdawPP08MrUYxGi6T0z3HB1DnbNy UbOaqCwx/yRHX/uZtgqDwl27cKVnLtb0X5emjRRvIZ1wB0ss+djOZ4wii9B0L577uT 1EYR4zxgcVektG7oV3Ms+F8KNc69KbIsJPdRLL9TCpehdY13VQmV6yi7hZw7HRTurj sDAxSbXWwtR4w== From: sashiko-bot@kernel.org Subject: Re: [PATCH net-next v6 5/5] veth: time-based BQL completion coalescing via ethtool tx-usecs Reply-To: sashiko-reviews@lists.linux.dev To: hawk@kernel.org Cc: bpf@vger.kernel.org In-Reply-To: <20260527135418.1166665-6-hawk@kernel.org> References: <20260527135418.1166665-6-hawk@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sat, 30 May 2026 00:50:10 +0000 Message-Id: <20260530005011.3F4E51F00899@smtp.kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 3 potential issue(= s) to consider: - [High] Coalesced BQL completions are left pending indefinitely when the r= eceive ring is empty and NAPI yields, corrupting DQL timing. - [Medium] Disabling BQL coalescing at runtime (tx-usecs =3D 0) orphans pre= viously accumulated completions, permanently skewing DQL state. - [Low] Data race on `priv->tx_coal_usecs` between ethtool configuration an= d the NAPI poll data path. -- commit d710fd75a7c5d61400890032084a8df84bc9498d Author: Simon Schippers veth: time-based BQL completion coalescing via ethtool tx-usecs This patch implements time-based batching of BQL completions for the veth driver to reduce NAPI scheduling overhead. It accumulates BQL completions and flushes them when a configurable time threshold is exceeded, letting DQL discover a limit that bounds actual queuing delay to the configured interval. > diff --git a/drivers/net/veth.c b/drivers/net/veth.c > index d5675d9d52361..743d17b37223e 100644 > --- a/drivers/net/veth.c > +++ b/drivers/net/veth.c [ ... ] > @@ -265,7 +275,30 @@ static void veth_get_channels(struct net_device *dev, > static int veth_set_channels(struct net_device *dev, > struct ethtool_channels *ch); > =20 > +static int veth_get_coalesce(struct net_device *dev, > + struct ethtool_coalesce *ec, > + struct kernel_ethtool_coalesce *kernel_coal, > + struct netlink_ext_ack *extack) > +{ > + struct veth_priv *priv =3D netdev_priv(dev); > + > + ec->tx_coalesce_usecs =3D priv->tx_coal_usecs; > + return 0; > +} > + > +static int veth_set_coalesce(struct net_device *dev, > + struct ethtool_coalesce *ec, > + struct kernel_ethtool_coalesce *kernel_coal, > + struct netlink_ext_ack *extack) > +{ > + struct veth_priv *priv =3D netdev_priv(dev); > + > + priv->tx_coal_usecs =3D ec->tx_coalesce_usecs; [Severity: Low] Are we missing a WRITE_ONCE() here? priv->tx_coal_usecs is modified under the RTNL lock but read by veth_xdp_rcv() in softirq context during NAPI polling without any synchronization. Using WRITE_ONCE() and READ_ONCE() would prevent compiler reordering and load tearing. > + return 0; > +} [ ... ] > static int veth_xdp_rcv(struct veth_rq *rq, int budget, > struct veth_xdp_tx_bq *bq, > struct veth_stats *stats, > struct netdev_queue *peer_txq) > { > + struct veth_bql_state *state =3D &rq->bql_state; > int i, done =3D 0, n_xdpf =3D 0; > void *xdpf[VETH_XDP_BATCH]; > + struct veth_priv *priv; > + u64 bql_flush_ns; > + > + priv =3D netdev_priv(rq->dev); > + bql_flush_ns =3D (u64)priv->tx_coal_usecs * 1000; > + > + /* Clamp stored timestamp in case we migrated to a CPU with a behind > + * sched_clock(); prevents the deadline from never firing. > + */ > + state->time =3D min(state->time, sched_clock()); > + > + /* Flush completions that timed out since the previous NAPI poll. */ > + if (peer_txq && bql_flush_ns) > + veth_bql_maybe_complete(state, peer_txq, bql_flush_ns); [Severity: Medium] What happens if the user disables coalescing by setting tx-usecs to 0 while there are accumulated completions? Because bql_flush_ns becomes 0, this pre-loop time-based flush check will evaluate to false, skipping the flush of previously batched completions. Inside the loop, new packets are completed individually, ignoring the old state->n_bql count. If those pending completions remain stranded indefinitely, won't the in-flight byte count tracked by DQL become permanently inflated, artificially reducing the available queue capacity? > for (i =3D 0; i < budget; i++) { > void *ptr =3D __ptr_ring_consume(&rq->xdp_ring); [ ... ] > @@ -989,6 +1064,18 @@ static int veth_xdp_rcv(struct veth_rq *rq, int bud= get, > if (n_xdpf) > veth_xdp_rcv_bulk_skb(rq, xdpf, n_xdpf, bq, stats); > =20 > + /* If the ring is now empty and the peer TX queue is stalled by DQL > + * backpressure, release completions immediately to unblock it. > + */ > + if (peer_txq && state->n_bql && __ptr_ring_empty(&rq->xdp_ring)) { > + /* Pairs with smp_wmb() in __ptr_ring_produce(); ensure ring > + * emptiness is observed before reading peer_txq->state. > + */ > + smp_rmb(); > + if (test_bit(__QUEUE_STATE_STACK_XOFF, &peer_txq->state)) > + veth_bql_complete(state, peer_txq); > + } [Severity: High] Does this leave coalesced BQL completions pending indefinitely if the receive ring is empty but the peer TX queue isn't stalled by DQL? When the NAPI budget loop finishes and the RX ring is empty, the device may go idle and NAPI polling stops. The batched completions (state->n_bql) are only flushed if the TX queue is currently stalled by DQL. If the queue is not stalled, these completions are left pending indefinitely without a timer to flush them. Since DQL measures the time between enqueue and completion, wouldn't this arbitrary delay (until the next packet arrives) be perceived by DQL as a massive transmission delay? This could cause it to incorrectly shrink the dynamic limit to its minimum and severely throttle transmission performance. > + > u64_stats_update_begin(&rq->stats.syncp); > rq->stats.vs.xdp_redirect +=3D stats->xdp_redirect; --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260527135418.1166= 665-1-hawk@kernel.org?part=3D5