Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction

Netdev List
 help / color / mirror / Atom feed

From: Simon Schippers <simon@schippers-hamm.de>
To: Jesper Dangaard Brouer <hawk@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>,
	netdev@vger.kernel.org, kernel-team@cloudflare.com,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Stanislav Fomichev <sdf@fomichev.me>,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org
Subject: Re: [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction
Date: Tue, 12 May 2026 23:55:10 +0200	[thread overview]
Message-ID: <14348957-d061-4124-9bac-45df9cf6686c@schippers-hamm.de> (raw)
In-Reply-To: <18855e57-f050-411f-9958-d4babcc81ba3@kernel.org>

On 5/12/26 15:54, Jesper Dangaard Brouer wrote:
>>> Nope, I'm using a bpftrace program to keep track of the inflight/limit
>>> in a BPF hashmap.  Reading from /sys will not be accurate.
>>
>> Ah nice.
> 
> Add the option --hist to have both NAPI and BQL histograms printed when
> script ends.  This will give you an accurate pattern of how inflight and
> limit evolves.
> 
>>>
>>> I moved the selftests into a github repo [1] to allow us to collaborate
>>> and evaluate the changes more easily.  I explicitly kept the new BPF
>>> based BQL tracking as a commit[2] for your benefit.
>>>
>>>   [1] https://github.com/netoptimizer/veth-backpressure-performance-testing/tree/main/selftests
>>>
>>>   [2] https://github.com/netoptimizer/veth-backpressure-performance-testing/commit/f25c5dc92977
>>
>> Thanks for sharing. After minor issues I was able to set it up
>> (currently I am just using plain v5, will look at the coalescing patch
>> when I find the time):
>>
>> Can confirm the latency reduction with the default settings, in my case
>> 4.888ms to 0.241ms.
>>
>> With the same script I was also able to see a performance slow down:
>> veth_bql_test_virtme.sh --qdisc fq_codel --nrules 0
>> --> ~510 Kpps
>> Same with --bql-disable
>> --> ~570 Kpps
>> --> 12% faster
>>
> 
> Thanks for running these benchmarks.
> 
> Notice that --nrules 0 can easily result in no-queuing (on average),
> because the veth NAPI consumer is faster than the producer.  You will
> likely see BQL inflight=1 and sink reported avg latency very low
> (remember it okay that sink get high latency penalty as long at ping
> latency remains low, as that show AQM is working).

I ran the benchmarks with --hist and I see what you mean.
I have very similar results.

Is Jonas way [1] of modifiying pktgen maybe the best option to ensure
that the producer is faster than the consumer?

[1] Link: https://lore.kernel.org/netdev/e8cdba04-aa9a-45c6-9807-8274b62920df@tu-dortmund.de/

> Hi, so what I found is that pktgen does not respect
> __QUEUE_STATE_STACK_OFF. So the test data above is invalid, since it
> just sent packets even if the BQL "stopped" the queue. So I patched
> pktgen with the following:
> 
> -       if (unlikely(netif_xmit_frozen_or_drv_stopped(txq))) {
> +       if (unlikely(netif_xmit_frozen_or_stopped(txq))) {


After thinking more about the implementation I see possible issues:

1. netdev_tx_completed_queue() never reports more than burst=64 packets:

BQL only increments the limit if the queue was starved. That means:
"The queue was over-limit in the last interval (the last time completion
processing ran), and there is no more data in the queue (i.e. it’s
empty)" [2]
But as only 64 packets are reported at max, the queue can only grow when
it is <= 64 packets. And then it can only stay at a limit >64 until the
next decrease of the limit. 


2. netdev_tx_completed_queue() is called in irregular intervals:

If the consumer is slow it is called approx each tx_coal_usecs.
But if the consumer is fast it is called way more frequent, probably
in irregular intervals depending on the scheduling.
However, "BQL depends on periodic completion interrupts" [2].

--> How about adding something like an interrupt that triggers every
    10us and calls netdev_tx_completed_queue() with n_bql collected from
    (multiple) veth_xdp_rcv runs? That could solve 1. and 2. 

[2] Link: https://medium.com/@tom_84912/byte-queue-limits-the-unauthorized-biography-61adc5730b83

> 
> There is an important gotcha. We actually have micro-burst of queuing
> (likely due to scheduling noise). Reading BQL stats from /sys will show
> BQL inflight=1, but when using the option --hist is it visible that
> @inflight have a long tail (see below signature).  The "qdisc" output
> line also shows this happening via requeues increasing (approx 17/sec in
> a test with 567Kpps). (this was with the time-based BQL impl).

I understand..

> 
> 
>>>
>>> Sorry for cutting the remaining of the message, but I ran out of time,
>>> as things are a bit challenging/hectic here at Cloudflare at the moment.
>>>
>>> --Jesper
>>
>> All good, just ignore it. I think I misunderstood something anyway.
> 
> Okay, I'll ignore it as I couldn't make sense of it ;-)
> --Jesper
> 
> 
> 
> --- BQL inflight histogram (VETH_BQL_UNIT=1, values = packets) ---
> @inflight:
> [0, 1)            306565 |@      |
> [1, 2)           9250454 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [2, 3)           5561919 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@      |
> [3, 4)            354341 |@      |
> [4, 5)             50137 |      |
> [5, 6)             16771 |      |
> [6, 7)              6001 |      |
> [7, 8)              3076 |      |
> [8, 9)              1949 |      |
> [9, 10)             1965 |      |
> [10, 11)            1954 |      |
> [11, 12)            1914 |      |
> [12, 13)            1732 |      |
> [13, 14)            1559 |      |
> [14, 15)            1405 |      |
> [15, 16)            1269 |      |
> [16, 17)            1194 |      |
> [17, 18)            1190 |      |
> [18, 19)            1148 |      |
> [19, 20)            1079 |      |
> [20, 21)            1008 |      |
> [21, 22)             951 |      |
> [22, 23)             870 |      |
> [23, 24)             826 |      |
> [24, 25)             775 |      |
> [25, 26)             764 |      |
> [26, 27)             740 |      |
> [27, 28)             714 |      |
> [28, 29)             665 |      |
> [29, 30)             626 |      |
> [30, 31)             607 |      |
> [31, 32)             601 |      |
> [32, 33)             583 |      |
> [33, 34)             593 |      |
> [34, 35)             574 |      |
> [35, 36)             562 |      |
> [36, 37)             554 |      |
> [37, 38)             538 |      |
> [38, 39)             528 |      |
> [39, 40)             525 |      |
> [40, 41)             512 |      |
> [41, 42)             542 |      |
> [42, 43)             529 |      |
> [43, 44)             526 |      |
> [44, 45)             513 |      |
> [45, 46)             503 |      |
> [46, 47)             485 |      |
> [47, 48)             480 |      |
> [48, 49)             473 |      |
> [49, 50)             474 |      |
> [50, 51)             476 |      |
> [51, 52)             476 |      |
> [52, 53)             465 |      |
> [53, 54)             454 |      |
> [54, 55)             446 |      |
> [55, 56)             430 |      |
> [56, 57)             425 |      |
> [57, 58)             425 |      |
> [58, 59)             422 |      |
> [59, 60)             407 |      |
> [60, 61)             390 |      |
> [61, 62)             370 |      |
> [62, 63)             354 |      |
> [63, 64)             343 |      |
> [64, 65)             325 |      |
> [65, 66)             303 |      |
> [66, 67)             158 |      |
> [67, 68)             136 |      |
> [68, 69)             124 |      |
> [69, 70)             110 |      |
> [70, 71)              99 |      |
> [71, 72)              94 |      |
> [72, 73)              82 |      |
> [73, 74)              74 |      |
> [74, 75)              58 |      |
> [75, 76)              52 |      |
> [76, 77)              45 |      |
> [77, 78)              40 |      |
> [78, 79)              39 |      |
> [79, 80)              38 |      |
> [80, 81)              21 |      |
> [81, 82)               4 |      |
> [82, 83)               4 |      |
> [83, 84)               4 |      |
> [84, 85)               2 |      |
> [85, 86)               2 |      |
> [86, 87)               2 |      |
> [87, 88)               2 |      |
> [88, 89)               1 |      |
> 
> 
> --- BQL limit histogram (auto-tuned, values = packets) ---
> @limit_val:
> [61, 62)          221346 |@      |
> [62, 63)               0 |      |
> [63, 64)          772169 |@@@      |
> [64, 65)        10053949 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> [65, 66)               0 |      |
> [66, 67)               0 |      |
> [67, 68)               0 |      |
> [68, 69)               0 |      |
> [69, 70)               0 |      |
> [70, 71)          457838 |@@      |
> [71, 72)               0 |      |
> [72, 73)          610198 |@@@      |
> [73, 74)               0 |      |
> [74, 75)               0 |      |
> [75, 76)               0 |      |
> [76, 77)               0 |      |
> [77, 78)               0 |      |
> [78, 79)         2328284 |@@@@@@@@@@@@      |
> [79, 80)         1150181 |@@@@@      |
> 
> @inflight_stats: count 15593965, average 1, total 23078061
> 
> @limit_stats: count 15593965, average 67, total 1054054856
> 
>

next prev parent reply	other threads:[~2026-05-12 21:55 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-05 13:21 [PATCH net-next v5 0/5] veth: add Byte Queue Limits (BQL) support hawk
2026-05-05 13:21 ` [PATCH net-next v5 1/5] veth: fix OOB txq access in veth_poll() with asymmetric queue counts hawk
2026-05-07 14:25   ` Paolo Abeni
2026-05-05 13:21 ` [PATCH net-next v5 2/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices hawk
2026-05-05 13:21 ` [PATCH net-next v5 3/5] veth: implement Byte Queue Limits (BQL) for latency reduction hawk
2026-05-07  6:54   ` Simon Schippers
2026-05-07 13:21     ` Paolo Abeni
2026-05-07 14:34     ` Paolo Abeni
2026-05-07 14:46       ` Simon Schippers
2026-05-07 19:09         ` Jesper Dangaard Brouer
2026-05-07 20:12           ` Simon Schippers
2026-05-07 20:45             ` Jesper Dangaard Brouer
2026-05-08  8:01               ` Simon Schippers
2026-05-08  9:20                 ` Simon Schippers
2026-05-09  2:06           ` Jakub Kicinski
2026-05-09  9:09             ` Jesper Dangaard Brouer
2026-05-10 15:56               ` Jakub Kicinski
2026-05-11  8:11                 ` Jesper Dangaard Brouer
2026-05-11  9:55                   ` Simon Schippers
2026-05-11 18:08                     ` Jesper Dangaard Brouer
2026-05-11 20:37                       ` Simon Schippers
2026-05-12 13:54                         ` Jesper Dangaard Brouer
2026-05-12 21:55                           ` Simon Schippers [this message]
2026-05-05 13:21 ` [PATCH net-next v5 4/5] veth: add tx_timeout watchdog as BQL safety net hawk
2026-05-05 13:21 ` [PATCH net-next v5 5/5] net: sched: add timeout count to NETDEV WATCHDOG message hawk
2026-05-07 14:30 ` [PATCH net-next v5 0/5] veth: add Byte Queue Limits (BQL) support patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14348957-d061-4124-9bac-45df9cf6686c@schippers-hamm.de \
    --to=simon@schippers-hamm.de \
    --cc=andrew+netdev@lunn.ch \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sdf@fomichev.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox