From: Simon Schippers <simon.schippers@tu-dortmund.de>
To: "Jonas Köppeler" <j.koeppeler@tu-berlin.de>,
"Jesper Dangaard Brouer" <hawk@kernel.org>
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
horms@kernel.org, jhs@mojatatu.com, jiri@resnulli.us,
kernel-team@cloudflare.com, kuba@kernel.org,
netdev@vger.kernel.org, pabeni@redhat.com
Subject: Re: [PATCH net-next 5/5] selftests: net: add veth BQL stress test
Date: Fri, 1 May 2026 22:35:54 +0200 [thread overview]
Message-ID: <e8cdba04-aa9a-45c6-9807-8274b62920df@tu-dortmund.de> (raw)
In-Reply-To: <a841e7ed-eee0-4069-bd0d-ab043a1509c5@tu-berlin.de>
On 5/1/26 10:43, Jonas Köppeler wrote:
> On 4/30/26 2:31 PM, Jesper Dangaard Brouer wrote:
>>
>>
>> On 30/04/2026 11.45, Simon Schippers wrote:
>>> On 4/30/26 11:17, Jonas Köppeler wrote:
>>>> On 3/28/26 4:19 PM, Simon Schippers wrote:
>>>>> Hi, thanks for your work! I am really interested in this patchset.
>>>>>
>>>>> I am planning to submit a similar patch set (see [1]) for the tun/tap
>>>>> driver, where I am currently implementing qdisc backpressure similar
>>>>> to that used in veth.
>>>>>
>>>>> Can you run pktgen [2] to see if there is a regression?
>>>>> I think that there might be a slowdown due to BQL not choosing a big
>>>>> enough queue size.
>>>> I ran some tests using pktgen by replacing the trafficgen from the
>>>> selftest with samples/pktgen/pktgen_sample01_simple.sh (Patch v3)
>>>> and used --nrules 0. In general the throughput is quite similar:
>>>>
>>>> BQL disabled (using --bql-disable):
>>>> 2378694pps 1141Mb/sec (1141773120bps) errors: 0
>>>> 2400898pps 1152Mb/sec (1152431040bps) errors: 0
>>>> 2358125pps 1131Mb/sec (1131900000bps) errors: 0
>>>> 2402034pps 1152Mb/sec (1152976320bps) errors: 0
>>>> 2362061pps 1133Mb/sec (1133789280bps) errors: 0
>>>> 2416301pps 1159Mb/sec (1159824480bps) errors: 0
>>>> 2398496pps 1151Mb/sec (1151278080bps) errors: 0
>>>> 2415200pps 1159Mb/sec (1159296000bps) errors: 0
>>>> 2375921pps 1140Mb/sec (1140442080bps) errors: 0
>>>> 2427419pps 1165Mb/sec (1165161120bps) errors: 0
>>>> 2382461pps 1143Mb/sec (1143581280bps) errors: 0
>>>>
>>>> mean: 2392510pps
>>>>
>>>> BQL enabled:
>>>> 2159545pps 1036Mb/sec (1036581600bps) errors: 0
>>>> 2321899pps 1114Mb/sec (1114511520bps) errors: 0
>>>> 2477853pps 1189Mb/sec (1189369440bps) errors: 0
>>>> 2447857pps 1174Mb/sec (1174971360bps) errors: 0
>>>> 2400284pps 1152Mb/sec (1152136320bps) errors: 0
>>>> 2442841pps 1172Mb/sec (1172563680bps) errors: 0
>>>> 2442540pps 1172Mb/sec (1172419200bps) errors: 0
>>>> 2410585pps 1157Mb/sec (1157080800bps) errors: 0
>>>> 2395902pps 1150Mb/sec (1150032960bps) errors: 0
>>>> 2393260pps 1148Mb/sec (1148764800bps) errors: 0
>>>> 2401959pps 1152Mb/sec (1152940320bps) errors: 0
>>>>
>>>> mean: 2390411pps
>>>>
>>>> BQL enabled is ~2099pps (~0.09%) lower than BQL disabled.
>>>
>>> Sounds great!
>>>
>>> One more thing:
>>> Could you check what BQL limit settles during the test run using
>>> something like:
>>>
>>> watch -n 0.1 'cat /sys/class/net/XXXXX/queues/tx-0/byte_queue_limits/limit'
>>
>> FYI: The selftest already tracks BQL "limit" and "inflight".
>> - Jonas can just report those BQL inflight logs
>>
>> +print_periodic_stats() {
>> + local elapsed="$1"
>> +
>> + # BQL stats and watchdog counter
>> + WD_CNT=$(cat /sys/class/net/${VETH_A}/queues/tx-0/tx_timeout \
>> + 2>/dev/null) || WD_CNT="?"
>> + if [ -n "$BQL_DIR" ] && [ -d "$BQL_DIR" ]; then
>> + INFLIGHT=$(cat "$BQL_DIR/inflight" 2>/dev/null || echo "?")
>> + LIMIT=$(cat "$BQL_DIR/limit" 2>/dev/null || echo "?")
>> + echo " [${elapsed}s] BQL inflight=${INFLIGHT} limit=${LIMIT}" \
>> + "watchdog=${WD_CNT}"
>> + else
>> + echo " [${elapsed}s] watchdog=${WD_CNT} (no BQL sysfs)"
>> + fi
>>
> Hi, so what I found is that pktgen does not respect
> __QUEUE_STATE_STACK_OFF. So the test data above is invalid, since it
> just sent packets even if the BQL "stopped" the queue. So I patched
> pktgen with the following:
>
> - if (unlikely(netif_xmit_frozen_or_drv_stopped(txq))) {
> + if (unlikely(netif_xmit_frozen_or_stopped(txq))) {
>
> Test run with --nrules 0
>
> BQL disabled (using --bql-disable):
> inflight packets is always around 200 packets and throughput
> 2264138pps 1086Mb/sec (1086786240bps)
>
> BQL enabled:
> inflight packet is always 3 packets (with some exception that
> sometimes its even 0) and throughput is degraded:
> 1813455pps 870Mb/sec (870458400bps)
> limit is 2.
>
> BQL enabled is roughly 20% worse in throughput.
Good findings.
>
> Test run with --nrules 3500
>
> BQL disabled: Inflight ~200, throughput: 27161pps 13Mb/sec
> BQL enabled: Inflight 3 (limit 2), throughput: 26085pps 12Mb/sec
> BQL ~4% worse.
>
> Test run with --nrules 5000
>
> BQL disabled: Inflight ~200, throughput: 19395pps 9Mb/sec
> BQL enabled: Inflight 3 (limit 2), throughput: 20423pps 9Mb/sec
> BQL ~5.3% better.
>
> So it seems that BQL will always steer to a limit of 2. Could this be
> a result of that we call netdev_tx_completed_queue for every packet?
>
> Looking at the comment above netdev_tx_completed_queue in
> include/linux/netdevice.h:
>
> "Must be called at most once per TX completion round (and not per
> individual packet), so that BQL can adjust its limits appropriately."
>
> This is consistent with what Tom Herbert stated in the original BQL
> cover letter [1]:
>
> "BQL accounting is in the transmit path for every packet, and the
> function to recompute the byte limit is run once per transmit
> completion."
Yes, exactly that will be the problem here.
There must be a periodic transmit completion but there is not.
Adding something like a tasklet for calling netdev_tx_completed_queue()
periodically feels wrong. And veth_tx_timeout() is also not suited
for that. And calling it on ptr_ring_empty() is probably also wrong.
I guess there must be a new "BQL" algorithm just for
software interfaces which considers:
1. Context Switching (bigger ring size better)
2. Cache locality (smaller ring size better)
3. Bufferbloat (time limit in ring?)
I think it is a hard problem.
How about first adding a option to modify the VETH_RING_SIZE?
>
> [1] https://lwn.net/Articles/469652/
>
> Jonas
>
>>> I guess it will just choose the ptr_ring size as limit in this case,
>>> but it would be nice if you could briefly verify this :)
>>>
>>> Thanks!
>>>
>>>>
>>>>> Thanks!
>>>>>
>>>>> [1] Link:https://lore.kernel.org/all/20260312130639.138988-1-simon.schippers@tu-dortmund.de/
>>>>> [2] Link:https://www.kernel.org/doc/html/latest/networking/pktgen.html
>>
next prev parent reply other threads:[~2026-05-01 20:36 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 17:46 [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support hawk
2026-03-24 17:46 ` [PATCH " hawk
2026-03-24 17:56 ` Jesper Dangaard Brouer
2026-03-24 17:47 ` [PATCH net-next 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices hawk
2026-03-24 17:47 ` [PATCH net-next 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction hawk
2026-03-24 17:47 ` [PATCH net-next 3/5] veth: add tx_timeout watchdog as BQL safety net hawk
2026-03-24 17:47 ` [PATCH net-next 4/5] net: sched: add timeout count to NETDEV WATCHDOG message hawk
2026-03-24 17:47 ` [PATCH net-next 5/5] selftests: net: add veth BQL stress test hawk
2026-03-26 12:19 ` Jesper Dangaard Brouer
2026-03-26 19:55 ` Jakub Kicinski
2026-03-28 15:19 ` Simon Schippers
[not found] ` <1c435d90-8d08-4ac1-8b84-cc72c0b4e30f@tu-berlin.de>
2026-04-30 9:45 ` Simon Schippers
2026-04-30 12:31 ` Jesper Dangaard Brouer
[not found] ` <a841e7ed-eee0-4069-bd0d-ab043a1509c5@tu-berlin.de>
2026-05-01 20:35 ` Simon Schippers [this message]
2026-03-27 9:50 ` [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support Toke Høiland-Jørgensen
2026-03-27 12:49 ` Jesper Dangaard Brouer
2026-03-27 15:37 ` Jonas Köppeler
2026-03-28 20:06 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e8cdba04-aa9a-45c6-9807-8274b62920df@tu-dortmund.de \
--to=simon.schippers@tu-dortmund.de \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=hawk@kernel.org \
--cc=horms@kernel.org \
--cc=j.koeppeler@tu-berlin.de \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox