From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.126.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6026035E950; Thu, 7 May 2026 20:12:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=212.227.126.131 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778184735; cv=none; b=HcNhKoLtaW8jANX02YdZaFAc9voQHRihqBFu/UbAxiQGRkpJDsfSgc9n9whTO4Qa5eGDeSAHhd6TudDlJvungedHNVqz3Wpv1MCvH60fPIVdoc0WkKKZrJTzIWfX1mK6ve6HcQmzpde/e600qdGWZt3Jn9MEH0WR3I6rtZHL5RI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778184735; c=relaxed/simple; bh=7s/4adQpQg3g9cdxL1pqHwK0tsim71iGO8AnG9gX20M=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=SQ8u3bwaCEqwFaxuIxBNjP06nEGLM2F/sfFGsThNp2rbTqyCE4K6oxnUSt2v78OGTBDnYwGOyqMbgtwFHDicQri8ZO7JwdYCaFB5sgNySli/fY7jZheTfrdyoQSqozvCW+ne9cOzjhbSl/sgtIG7eqr1iu9gHcdAmDK4dzcj6Kg= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=schippers-hamm.de; spf=pass smtp.mailfrom=schippers-hamm.de; dkim=pass (2048-bit key) header.d=schippers-hamm.de header.i=simon@schippers-hamm.de header.b=hGLgMPXl; arc=none smtp.client-ip=212.227.126.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=schippers-hamm.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=schippers-hamm.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=schippers-hamm.de header.i=simon@schippers-hamm.de header.b="hGLgMPXl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=schippers-hamm.de; s=s1-ionos; t=1778184727; x=1778789527; i=simon@schippers-hamm.de; bh=5OMntRvUYVrF4y3sfizXFvoNLUAkQ7WvIogA+XQ/6VM=; h=X-UI-Sender-Class:Message-ID:Date:MIME-Version:Subject:To:Cc: References:From:In-Reply-To:Content-Type: Content-Transfer-Encoding:cc:content-transfer-encoding: content-type:date:from:message-id:mime-version:reply-to:subject: to; b=hGLgMPXlX5F33ZmNKxoTCrlF899OeSLxlCdk7xOMNnq5TJlM6Q/MRDJ1CKxi4Ng/ FHIvuDf6uLnbHfiZOJrGg+WjCIEAisSD+GC5cYK+KYQh7gMFOtrFCaPPEUneoGS35 2HzfLj26WVCbif+980q+mWxTo+diPm6xAOIRWN/TqG/BxuUsjIerH9xiKUw3xwX2w wWkoal4o5FMR2IDcZ9T/DGuwwwdwbGao2e6OJ2CE5J+yIMOQrgK9EHSEgYUHu74v4 DzaavFNArCp79i3l2jg+MAgHJXY/uGgvrUvXJSxD5x7XuFtWIa9k5CaSkEJNmbnx1 Dy+k18Oqvt3UoGBAFg== X-UI-Sender-Class: 55c96926-9e95-11ee-ae09-1f7a4046a0f6 Received: from client.hidden.invalid by mrelayeu.kundenserver.de (mreue009 [212.227.17.165]) with ESMTPSA (Nemesis) id 1M8QNs-1wGfDy3kxc-00AIPQ; Thu, 07 May 2026 22:12:07 +0200 Message-ID: <68223314-1a44-4aee-8207-57437ef9f3ab@schippers-hamm.de> Date: Thu, 7 May 2026 22:12:00 +0200 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction To: Jesper Dangaard Brouer , Paolo Abeni , netdev@vger.kernel.org Cc: kernel-team@cloudflare.com, Andrew Lunn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , John Fastabend , Stanislav Fomichev , linux-kernel@vger.kernel.org, bpf@vger.kernel.org References: <20260505132159.241305-1-hawk@kernel.org> <20260505132159.241305-4-hawk@kernel.org> <8f2f7f2e-6aa2-4e5b-b52d-0025b2525579@redhat.com> <6a597dbd-70bf-4b14-b495-2f7248fd3220@kernel.org> Content-Language: en-US From: Simon Schippers In-Reply-To: <6a597dbd-70bf-4b14-b495-2f7248fd3220@kernel.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:QCywI0hdiv3Q6UFnj9I4c6dnI+uaxvvcdbGaOz+JnRkI+r0aqtu RRImD4uH45u+VbhN5dcvEwQ213WwF++9+RNJsIefj9TrGaIl5rufckc8naupz/FMJWBhU3c rqsbHuYXjKS50lbt9cqY4Cjjq/1R3w7hLC2+KWHOQMn6q3L3AfU7nyfb40dB2E//uOkFuiM r7XLsQMWNxIeTdV1G/9Cw== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:M70IqaqnKkE=;6tWWpHrg8Ntyt7nE38p7yzIvbKZ 1zxu8bYgexffwE46hhcse8srdA6ZX4QdVrpy0fHELKfFRkk92Ahy1RsS1cu0e21qIwRKz372c WoQRQwTpxl6KKyhgtqAjAUMGLPoZ4FdCiGT3HRt50xaFHieq2NMBRwHSXiv9h/zoL1P9QoIj1 mIZOGq2eKn26/eD3/Xr/U+2P1y4qlwAMzeT1qZxkGOD3b/eZRT6L1hbjfd1k25w3jmplphCiY MSM7ZIWVBXLxJ4VjTPI6hAo+ozDVOW+WxXccC5DKspolvyHcHjZsSsbpQtfzFKntHbpdYO8dH pcs580uziEPnRaNldm+GpU5LkgNopjeNe4P20NoxpgMJIKqsrjoa8rB2f2FdGUzTCExjn1ZWx /tki3CErOUwB+bNdhYqKqmBGRsxWJCj6dm/qqAVH6ojPKh7z0KPM2V+naY4Xv8fVwm9T4fwyh fCz8XMYZYUuTmh2GZtBV1JSb4Y8r7s7RFayPql5+0C5CzS/yynJvyC4vLZk5QvkYM5ZKh8fyU j/M6+9lVjFuiy3o/kH4S2LbeirSPN1Ur2nNlqQ4diycs/g58jtctNvRWPuMeGnMrF/Q9XgrET 8HyODAGEQkrU49U2MpJonmHzy1zZtrmxXWLlvjjF+vizyT6s8HyWRczG2xg8C/1Aw5zIZcMCF yoZx0CGnYKfPhkCti+Oq5M0uYrU5YiFlfV6d0KO0j9sncWHiX4if5/zlI/yoejp8yC4hrcKHv pWlGE0rU1wWA+44aYRCbj2ptJ3x61BS8XIV+F276h1Tsghm/o3J06LUVP32xf22tAsBt0SxP8 SYGIIS3LKfDceqo6OLo6a1gM+h4JfFQL0ehG/SRrdTIY/cActSr+396vG00blz9VG0nwgYc+e FAyoeGv/TfTp3ryFVqUK+Kerac/m2juFHV8QCcObH+weNFOl/7WWoYdf//B6Mw5pOzhxvpg6L PF0mLxt7U4Crpo15fHj5Pfol+cO4zAPSy+rG8+bt3XHquBdM6uSdwsZ13AaPG2EP3oIj1tyeW 8HyOpQhWR/bchx7/FHrYq5eS8b1Sj7KcGI9G4WuvuwCIs0ee2GDxFsVI+sBEXrp/RzHToQiYb rMMGEU9+KG6QdJsuu/hx4XrGFxhoWZU0qnv+0PDEUoRrUEXWdYVJ0iAe8acCcZYdnvOzgmc76 R4yKWoogVj8Z+R4TacxwgydQlf5/8DZ7fFcmTKRbL7wkNFugFUX/UNET7rPeyJFHigdFChtv6 Al+Q4nPdic7XFYsCJeTLNfQ8Ufvm3hOI5GXXOXTlGa73Jzbxxdv5bKb1bL6gf3hf007l0E0ys p9zCD11MNlifDo58vi0t2vvaV8lrAklYDIxoTLfe+wEcUE7EJD6XwdS8lkt31JgvltXhk7rlr eKUDNiuoceltCCPuZAqmZmu0B6cxTnxVH/RY6IDsUvd07KIw9C7/axYIrtUDBWPmRusWd4pti /18fNVJ/dcJkS3KYQrcr/uAbo3BGV8czJ4bWeTB8bVzAdbF7UIbvUkS6k+w3BGqQ0vyMcUWby g/g9p74gnLyOb6540tMG3d9pjDFpv1iJNnEho8+UmezdE7kDsQJg2AxVSEH+rZP2ON6xRBfq3 KrTqkPiG1nPA4arlAgwANCmC33kCCp4tjpnhrVvI92pcOjVYtKriVq5KUFL4zYGYF2VGwRwcE dV/T2D3zTjE0OWQBt6Hv8dj/jnKmyMss37vH/nqoGtFvgaWUPJyow+t1gQskvusApaz9EQFp7 vJPINCHy+8+mA1MwhPFTdE46KSYmvQ4OY4AdaeVlts36sUOEdcW2TsxwQHUpRmJ8LaEgWg0n9 tXDdqG7Fl6GYirIyfqVL7rKfB/hLG9jD8W2+JOZexecfIUgEFKtnnG3B9iQ7pptx8tohgbRjW myqMEVoep0LJ3rAw1wvVOGJyxUNFX2B/h8MzmlLnsw4at2QURo7orpRuWhX2i7EOo20uka9hT kXAPaLhWAOW0Wn83GjWP27+hwETe2rdoVrnPTjECKGSh7mDMjXDcqlxInZyN4OzSlmBGgMiIb 3sLrvWn+A5Y7CiCfNUBG5SjQ4SPPx1VuECAHdtuoiqT2qa2Bx3tOYSQ7rjjg2xw== On 5/7/26 21:09, Jesper Dangaard Brouer wrote: >=20 >=20 > On 07/05/2026 16.46, Simon Schippers wrote: >> >> >> On 5/7/26 16:34, Paolo Abeni wrote: >>> On 5/7/26 8:54 AM, Simon Schippers wrote: >>>> On 5/5/26 15:21, hawk@kernel.org wrote: >>>>> @@ -928,9 +968,13 @@ static int veth_xdp_rcv(struct veth_rq *rq, int= budget, >>>>> } >>>>> } else { >>>>> /* ndo_start_xmit */ >>>>> - struct sk_buff *skb =3D ptr; >>>>> + bool bql_charged =3D veth_ptr_is_bql(ptr); >>>>> + struct sk_buff *skb =3D veth_ptr_to_skb(ptr); >>>>> stats->xdp_bytes +=3D skb->len; >>>>> + if (peer_txq && bql_charged) >>>>> + netdev_tx_completed_queue(peer_txq, 1, VETH_BQL_UNI= T); >>>> >>>> In the discussion with Jonas [1], I left a comment explaining why I t= hink >>>> this doesn=E2=80=99t work. >>>> >=20 > I've experimented with doing the "completion" at NAPI-end in > veth_poll(), but that resulted in BQL limit being 128 packets, which > leads to bad latency results (not acceptable). > (See detailed report later) >=20 >=20 >>>> I still think first that adding an option to modify the hard-coded >>>> VETH_RING_SIZE is the way to go. >>>> >=20 > Not against being able to modify VETH_RING_SIZE, but I don't think it is > the solution here. >=20 > The simply solution is the configure BQL limit_min: > `/sys/class/net//queues/tx-N/byte_queue_limits/limit_min` >=20 > My experiments (below) find that limit_min=3D8 is gives good performance= . > We can simply set default to 8 as this still allows userspace to change > this later if lower latency is preferred. >=20 >>>> Thanks! >>>> >>>> [1] Link: https://lore.kernel.org/netdev/e8cdba04-aa9a-45c6-9807-8274= b62920df@tu-dortmund.de/ >>> >>> In the above discussion a 20% regression is reported, which IMHO can't >>> be ignored. Still the tput figures in the data are extremely low, >>> something is possibly off?!? I would expect a few Mpps with pktgen on >>> top of veth, while the reported data is ~20-30Kpps. >>> >>> /P >>> >> >> The ~20-30Kpps occur when thousands of iptables rules are applied and >> an UDP userspace application is sending. >> >> And there is a 20% pktgen regression (no iptables rules applied). >> >=20 > The pktgen test is a little dubious/weird and Jonas had to modify pktgen > to test this. John Fastabend added a config to pktgen that allows us > to benchmarking egress qdisc path, this might be better to use this. > The samples/pktgen/pktgen_bench_xmit_mode_queue_xmit.sh is a demo usage. >=20 > If redoing the tests, can you adjust limit_min to see the effect? > /sys/class/net//queues/tx-N/byte_queue_limits/limit_min >=20 > 20% throughput performance regression is of-cause too much, but I will > remind us, that adding a qdisc will "cost" some overhead, that is a > configuration choice. Our purpose here is to reduce bufferbloat and > latency, not optimize for throughput. >=20 >=20 >> I am pretty sure the reason is because the BQL limit is stuck at 2 >> packets (because the completed queue is always called with 1 packet >> and not in a interrupt/timer with multiple packets...). >> >=20 > I've run a lot of experiments, which I made AI write a report over, see = attachment. The TL;DR is that best performance vs latency tradeoff is def= aulting BQL/DQL limit_min to be 8 packets. >=20 > I fear this patchset will stall forever, if we keep searching for a perf= ect solution without any overhead. The qdisc layer will be a baseline ove= rhead. The limit=3D2 packets is actually the optimal darkbuffer queue siz= e, but I acknowledge that this causes too many qdisc requeue events (leadi= ng to overhead). I suggest that I add another patch in V6, that defaults = limit_min to 8 (separate patch to make it easier to revert/adjust later). >=20 > I've talked with Jonas, and we want to experiment with different solutio= ns to make BQL/DQL work better with virtual devices. >=20 > This patchset helps our (production) use-case reduce mice-flow latency > from approx 22ms to 1.3ms for latency under-load. Due to the consumer > namespace being the bottleneck the requeue overhead is negligible in > comparison. >=20 > -Jesper First of all thanks for you work and I really see the advantages of avoiding bufferbloat :) But the key of the BQL algorithm, which is the *dynamic* adaption of the limit, is not working. Always calling netdev_completed_queue() with 1 packet results in a static limit of 2 packets (as seen by Jonas measurements), which you force up to 8 packets. So in the end this patchset has the same effect as just setting VETH_RING_SIZE to 8 (and giving an option to change this value).