From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA14F3FB064 for ; Wed, 18 Mar 2026 16:24:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773851098; cv=none; b=L9OY2yhWukjTuWvl3DDAZB7P1dZfmQIiSizaQDtASJ7OtHjw0r2962jqyTiSOvMusnUEx3Oge7NSV51ZPr5KUuInMRsdBymgcs8whAqZaNuXAVbUsCwvHHLrOrIo7JF2SePbzfO0Dy5rx783j1GFRDIS+MrkP2S/zeZGoQW3RX0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773851098; c=relaxed/simple; bh=MCP3h2XuCPCMPKLxfuYQ6FoftJMMchFpuND023+4K68=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=qRUBKum84RmXDxAbqQwIhP3GdUFaMcS5KXpjrmRBi05TfMLD2rCaKgTX3zJJ/KFTEdeB+y11NgnYVNMD9fLNaieqCfPl1E75Ah9jEAK0Sx4AZo7DKTbOBzwkoREKJOZ/pfkD5Kl3N0uoHHOIs/HaDgE7+RuVqUvcGCFqkcbmbi4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=EE4y+WNw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="EE4y+WNw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56469C19421; Wed, 18 Mar 2026 16:24:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773851098; bh=MCP3h2XuCPCMPKLxfuYQ6FoftJMMchFpuND023+4K68=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=EE4y+WNwZVkeTWYHmQ+1jB32IVowSRFDTghbu9eKgWYZDoWM5W+Ca15Z7IgtMSy8G AiEo3KUeSKyop81qzWtVt6mHCbeuvcE9VPK8TvX1ObQwAsodjKwDIgGzxHafdNUL8V fneEOkrhLrdLySUuczoVPrt7eVdqyAvDhNXzOQO57E5Yky+r0yhXEc/V1OQleAD3dc ndvpMrOBODiE3cVkFyQ6Z577DfipzHKJ/G5EPBEZOsOHVOwd4JH0MEBAzBS3KWBwsH B0HoY/3qlwVYbf18XwqfWp51OiP+5KJQ9ncptoRybuCLH8Ny7sQ6MXM1OwSGsleWGp tiXfJu/If514A== Message-ID: Date: Wed, 18 Mar 2026 17:24:53 +0100 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH net-next 2/6] veth: implement Byte Queue Limits (BQL) for latency reduction To: =?UTF-8?Q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , netdev@vger.kernel.org Cc: edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, davem@davemloft.net, andrew+netdev@lunn.ch, horms@kernel.org, jhs@mojatatu.com, jiri@resnulli.us, sdf@fomichev.me, j.koeppeler@tu-berlin.de, mfreemon@cloudflare.com, carges@cloudflare.com, kernel-team References: <20260318134826.1281205-1-hawk@kernel.org> <20260318134826.1281205-3-hawk@kernel.org> <87ms05nrdw.fsf@toke.dk> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: <87ms05nrdw.fsf@toke.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 18/03/2026 15.28, Toke Høiland-Jørgensen wrote: > hawk@kernel.org writes: > >> From: Jesper Dangaard Brouer >> >> Add BQL support to the veth driver to dynamically limit the number of >> bytes queued in the ptr_ring, giving the qdisc earlier feedback to shape >> traffic and reduce latency. >> >> The BQL charge (netdev_tx_sent_queue) is placed in veth_xmit() BEFORE >> veth_forward_skb() produces the SKB into the ptr_ring. This ordering is >> critical: with threaded NAPI the consumer runs on a separate CPU and can >> complete the SKB (calling dql_completed) before veth_xmit() returns. If >> the charge happened after the produce, the completion could race ahead >> of the charge, violating dql_completed()'s invariant that completed >> bytes never exceed queued bytes (BUG_ON). >> >> Whether an SKB was BQL-charged is tracked per-SKB using a VETH_BQL_FLAG >> bit in the ptr_ring pointer (BIT(1), alongside the existing VETH_XDP_FLAG >> BIT(0)). The do_bql flag from veth_xmit() propagates through >> veth_forward_skb() and veth_xdp_rx() into the ptr_ring entry. On the >> completion side in veth_xdp_rcv(), veth_ptr_is_bql() reads the flag to >> decide whether to call netdev_tx_completed_queue(). Per-SKB tracking is >> necessary because the qdisc can be replaced live (e.g. noqueue->sfq or >> vice versa via 'tc qdisc replace') while SKBs are already in-flight in >> the ptr_ring. SKBs charged under the old qdisc must complete correctly >> regardless of what qdisc is attached when the consumer runs, so each >> SKB carries its own BQL-charged state rather than re-checking the peer's >> qdisc at completion time. > > It's not completely obvious to me why BQL can't be active regardless of > whether there's a qdisc installed or not? If there's no qdisc, shouldn't > BQL auto-tune to a higher value because the queue runs empty more? > When net_device don't have qdisc we hit this code path: - [0] https://elixir.bootlin.com/linux/v7.0-rc4/source/net/core/dev.c#L4806-L4852 - Notice the check if(!netif_xmit_stopped(txq)) - resulting in "Virtual device %s asks to queue packet!" We cannot unconditionally track BQL as calling netdev_tx_sent_queue() can result in setting STACK_XOFF. Resulting in above code dropping packets and complaining. (It have no qdisc to requeue store back- pressured packet). --Jesper