From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A778242EEA3; Mon, 11 May 2026 18:08:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778522892; cv=none; b=sVWEvTaqZZcm6sZkbi9WamQ6Cq8yy/g68faGrv4DPDv/V+sx1hDBqsOpL5UmG8C/MfTMQsXNjkF2sz2Sqi0itlaepLQ/mYdhpK7Sdp7sWaosVZEFhZqmPtOKmC/DeecfaHt6dBPCXC28Yf83ly0Fj2PfrgpPoOaufsi+cdEasOk= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778522892; c=relaxed/simple; bh=zGxdkZySaTWxihWT7HicM0TgM1njYZbnMUe3rO0FjhE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=ULrzH0f+wu2reCuYEhvrE8ppF6Ij8AmZQ2ST0BjkySNFs+AxOgKbdCSYOMFgO6HVKsQeiwJgeipOWqESLtdWTPKEHdJfqP1onuLeSqTysdbSxE0kdPWBkF0L2+ip7/cYoEe0n7TVLNJVmEQsl/z9ohPBpn4YEdk96w8k7BEWiY4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SQx7Bk9F; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SQx7Bk9F" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E79CBC2BCB0; Mon, 11 May 2026 18:08:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778522892; bh=zGxdkZySaTWxihWT7HicM0TgM1njYZbnMUe3rO0FjhE=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=SQx7Bk9Fi1NvlUS/w8qhcF6fyU6fHcZeQX60zznPGX2h6sYsESE3hDTd8mYTZShEd hIBkcZpFUIPaF9Wqt88UjaDXmtJ8Kv6rc3IpynlLJWp6sh9zwyv02bwa9GImfHdqiO OTqYkaFkkd1YUlGKfqb1x7PhyPCjLpWvUBu1uXxWn21aArWeHvg9ar4/IXc4EGxgsq 4076+fiqrRed3JEQFf6UE88Xw2woRifJ9pG8VQ4MqxbuWoECJJKfmbDd4GC8d7YGvE OlPSM2wo98z0gOJ8BMSFS3JLCQG5EHiydpL8StMoFdvk89ajN2QGHIf1ysgG2ovQF9 Os2CsCTzN2ZhQ== Message-ID: Date: Mon, 11 May 2026 20:08:07 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction To: Simon Schippers , Jakub Kicinski Cc: Paolo Abeni , netdev@vger.kernel.org, kernel-team@cloudflare.com, Andrew Lunn , "David S. Miller" , Eric Dumazet , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Stanislav Fomichev , linux-kernel@vger.kernel.org, bpf@vger.kernel.org References: <20260505132159.241305-1-hawk@kernel.org> <20260505132159.241305-4-hawk@kernel.org> <8f2f7f2e-6aa2-4e5b-b52d-0025b2525579@redhat.com> <6a597dbd-70bf-4b14-b495-2f7248fd3220@kernel.org> <20260508190626.4285fac0@kernel.org> <20260510085602.57c7a081@kernel.org> <41023c34-87a3-4e4f-b3ab-3ed53d171910@schippers-hamm.de> Content-Language: en-US From: Jesper Dangaard Brouer In-Reply-To: <41023c34-87a3-4e4f-b3ab-3ed53d171910@schippers-hamm.de> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 11/05/2026 11.55, Simon Schippers wrote: > On 5/11/26 10:11, Jesper Dangaard Brouer wrote: >> >> >> On 10/05/2026 17.56, Jakub Kicinski wrote: >>> On Sat, 9 May 2026 11:09:51 +0200 Jesper Dangaard Brouer wrote: >>>> On 09/05/2026 04.06, Jakub Kicinski wrote: >>>>> On Thu, 7 May 2026 21:09:09 +0200 Jesper Dangaard Brouer wrote: >>>>>> Not against being able to modify VETH_RING_SIZE, but I don't think it is >>>>>> the solution here. >>>>> >>>>> Was it evaluated, tho? >>>>> >>>>> It's obviously super easy these days have AI spew no end of complex >>>>> code. So it'd be great to have some solid, ideally production-like >>>>> data to back this all up. >>>>> >>>>> VETH_RING_SIZE seems trivial, ethtool set ringparam >>>> >>>> No, unfortunately we cannot just decrease the VETH_RING_SIZE. >>> >>> To be clear - I said may it configurable with ethtool -G >>> not change the default. >>> >> >> Sure, I understand the desire to make VETH_RING_SIZE configurable. >> If doing so we are making Linux network stack harder to tune and setup >> correctly. E.g. adding a qdisc to veth would also require changing the >> ring size, but if system also uses XDP then tuning below 64 (likely 128) >> will lead to hard-to-find packet drops. > > I mean 64 still could be a 4x improvement at least. > No not really, setting it to 64 will give same (bad) latency from "BQL off" which that patchset is trying to address. >> >> I prefer adding something (like BQL) that auto-tune how much of the ring >> queue we are using. Good queues function as shock absorbers when >> concurrent processes in the OS have scheduling noise. >> >> I acknowledge that Simon Schippers found that the BQL implementation was >> actually not auto-tuning. We need to work on this, my prototype >> implementation [1] [2] works surprisingly well. >> >> >> - [1] https://lore.kernel.org/all/3e43117f-356d-4086-a176-abd7fe2e6f0a@kernel.org/2-09-veth-time-based-bql-coalescing.patch >> - [2] https://lore.kernel.org/all/3e43117f-356d-4086-a176-abd7fe2e6f0a@kernel.org/ >> >> >>>> The reason is that XDP-redirect into veth don't have any >>>> back-pressure and would simply drop packets if queue size becomes >>>> less than the NAPI budget (64). (Yes, we use both normal path and >>>> XDP-redirect in production). >>> >>> Doesn't this mean you have a queue which is not under BQL control? >>> >> >> It is a matter of perspective. BQL needs between 17-55 elements in the >> 256 queue. At the same time we handle if the ring runs full, e.g. due >> to a sudden burst of XDP redirected packets, which pushes packets into >> the qdisc layer. > > You are checking inflight/limit in /sys directory to get the 17-55 > number, right? > Nope, I'm using a bpftrace program to keep track of the inflight/limit in a BPF hashmap. Reading from /sys will not be accurate. I moved the selftests into a github repo [1] to allow us to collaborate and evaluate the changes more easily. I explicitly kept the new BPF based BQL tracking as a commit[2] for your benefit. [1] https://github.com/netoptimizer/veth-backpressure-performance-testing/tree/main/selftests [2] https://github.com/netoptimizer/veth-backpressure-performance-testing/commit/f25c5dc92977 Sorry for cutting the remaining of the message, but I ran out of time, as things are a bit challenging/hectic here at Cloudflare at the moment. --Jesper