public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Simon Schippers <simon.schippers@tu-dortmund.de>
To: "Jonas Köppeler" <j.koeppeler@tu-berlin.de>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	horms@kernel.org, jhs@mojatatu.com, jiri@resnulli.us,
	kernel-team@cloudflare.com, kuba@kernel.org,
	netdev@vger.kernel.org, pabeni@redhat.com
Subject: Re: [PATCH net-next 5/5] selftests: net: add veth BQL stress test
Date: Fri, 1 May 2026 22:35:54 +0200	[thread overview]
Message-ID: <e8cdba04-aa9a-45c6-9807-8274b62920df@tu-dortmund.de> (raw)
In-Reply-To: <a841e7ed-eee0-4069-bd0d-ab043a1509c5@tu-berlin.de>

On 5/1/26 10:43, Jonas Köppeler wrote:
> On 4/30/26 2:31 PM, Jesper Dangaard Brouer wrote:
>>
>>
>> On 30/04/2026 11.45, Simon Schippers wrote:
>>> On 4/30/26 11:17, Jonas Köppeler wrote:
>>>> On 3/28/26 4:19 PM, Simon Schippers wrote:
>>>>> Hi, thanks for your work! I am really interested in this patchset.
>>>>>
>>>>> I am planning to submit a similar patch set (see [1]) for the tun/tap
>>>>> driver, where I am currently implementing qdisc backpressure similar
>>>>> to that used in veth.
>>>>>
>>>>> Can you run pktgen [2] to see if there is a regression?
>>>>> I think that there might be a slowdown due to BQL not choosing a big
>>>>> enough queue size.
>>>> I ran some tests using pktgen by replacing the trafficgen from the
>>>> selftest with samples/pktgen/pktgen_sample01_simple.sh (Patch v3)
>>>> and used --nrules 0. In general the throughput is quite similar:
>>>>
>>>> BQL disabled (using --bql-disable):
>>>>    2378694pps 1141Mb/sec (1141773120bps) errors: 0
>>>>    2400898pps 1152Mb/sec (1152431040bps) errors: 0
>>>>    2358125pps 1131Mb/sec (1131900000bps) errors: 0
>>>>    2402034pps 1152Mb/sec (1152976320bps) errors: 0
>>>>    2362061pps 1133Mb/sec (1133789280bps) errors: 0
>>>>    2416301pps 1159Mb/sec (1159824480bps) errors: 0
>>>>    2398496pps 1151Mb/sec (1151278080bps) errors: 0
>>>>    2415200pps 1159Mb/sec (1159296000bps) errors: 0
>>>>    2375921pps 1140Mb/sec (1140442080bps) errors: 0
>>>>    2427419pps 1165Mb/sec (1165161120bps) errors: 0
>>>>    2382461pps 1143Mb/sec (1143581280bps) errors: 0
>>>>
>>>>    mean: 2392510pps
>>>>
>>>> BQL enabled:
>>>>    2159545pps 1036Mb/sec (1036581600bps) errors: 0
>>>>    2321899pps 1114Mb/sec (1114511520bps) errors: 0
>>>>    2477853pps 1189Mb/sec (1189369440bps) errors: 0
>>>>    2447857pps 1174Mb/sec (1174971360bps) errors: 0
>>>>    2400284pps 1152Mb/sec (1152136320bps) errors: 0
>>>>    2442841pps 1172Mb/sec (1172563680bps) errors: 0
>>>>    2442540pps 1172Mb/sec (1172419200bps) errors: 0
>>>>    2410585pps 1157Mb/sec (1157080800bps) errors: 0
>>>>    2395902pps 1150Mb/sec (1150032960bps) errors: 0
>>>>    2393260pps 1148Mb/sec (1148764800bps) errors: 0
>>>>    2401959pps 1152Mb/sec (1152940320bps) errors: 0
>>>>
>>>>    mean: 2390411pps
>>>>
>>>> BQL enabled is ~2099pps (~0.09%) lower than BQL disabled.
>>>
>>> Sounds great!
>>>
>>> One more thing:
>>> Could you check what BQL limit settles during the test run using
>>> something like:
>>>
>>> watch -n 0.1 'cat /sys/class/net/XXXXX/queues/tx-0/byte_queue_limits/limit'
>>
>> FYI: The selftest already tracks BQL "limit" and "inflight".
>> - Jonas can just report those BQL inflight logs
>>
>> +print_periodic_stats() {
>> +    local elapsed="$1"
>> +
>> +    # BQL stats and watchdog counter
>> +    WD_CNT=$(cat /sys/class/net/${VETH_A}/queues/tx-0/tx_timeout \
>> +        2>/dev/null) || WD_CNT="?"
>> +    if [ -n "$BQL_DIR" ] && [ -d "$BQL_DIR" ]; then
>> +        INFLIGHT=$(cat "$BQL_DIR/inflight" 2>/dev/null || echo "?")
>> +        LIMIT=$(cat "$BQL_DIR/limit" 2>/dev/null || echo "?")
>> +        echo "  [${elapsed}s] BQL inflight=${INFLIGHT} limit=${LIMIT}" \
>> +            "watchdog=${WD_CNT}"
>> +    else
>> +        echo "  [${elapsed}s] watchdog=${WD_CNT} (no BQL sysfs)"
>> +    fi
>>
> Hi, so what I found is that pktgen does not respect
> __QUEUE_STATE_STACK_OFF. So the test data above is invalid, since it
> just sent packets even if the BQL "stopped" the queue. So I patched
> pktgen with the following:
> 
> -       if (unlikely(netif_xmit_frozen_or_drv_stopped(txq))) {
> +       if (unlikely(netif_xmit_frozen_or_stopped(txq))) {
> 
> Test run with --nrules 0
> 
> BQL disabled (using --bql-disable):
> inflight packets is always around 200 packets and throughput
> 2264138pps 1086Mb/sec (1086786240bps)
> 
> BQL enabled:
> inflight packet is always 3 packets (with some exception that
> sometimes its even 0) and throughput is degraded:
> 1813455pps 870Mb/sec (870458400bps)
> limit is 2.
> 
> BQL enabled is roughly 20% worse in throughput.

Good findings.

> 
> Test run with --nrules 3500
> 
> BQL disabled: Inflight ~200, throughput: 27161pps 13Mb/sec
> BQL enabled:  Inflight 3 (limit 2),    throughput: 26085pps 12Mb/sec
> BQL ~4% worse.
> 
> Test run with --nrules 5000
> 
> BQL disabled: Inflight ~200, throughput: 19395pps 9Mb/sec
> BQL enabled:  Inflight 3 (limit 2),    throughput: 20423pps 9Mb/sec
> BQL ~5.3% better.
> 
> So it seems that BQL will always steer to a limit of 2. Could this be
> a result of that we call netdev_tx_completed_queue for every packet?
> 
> Looking at the comment above netdev_tx_completed_queue in
> include/linux/netdevice.h:
> 
>   "Must be called at most once per TX completion round (and not per
>    individual packet), so that BQL can adjust its limits appropriately."
> 
> This is consistent with what Tom Herbert stated in the original BQL
> cover letter [1]:
> 
>   "BQL accounting is in the transmit path for every packet, and the
>    function to recompute the byte limit is run once per transmit
>    completion."

Yes, exactly that will be the problem here.
There must be a periodic transmit completion but there is not.

Adding something like a tasklet for calling netdev_tx_completed_queue()
periodically feels wrong. And veth_tx_timeout() is also not suited
for that. And calling it on ptr_ring_empty() is probably also wrong.

I guess there must be a new "BQL" algorithm just for
software interfaces which considers:
1. Context Switching (bigger ring size better)
2. Cache locality (smaller ring size better)
3. Bufferbloat (time limit in ring?)

I think it is a hard problem.

How about first adding a option to modify the VETH_RING_SIZE?

> 
> [1] https://lwn.net/Articles/469652/
> 
> Jonas
> 
>>> I guess it will just choose the ptr_ring size as limit in this case,
>>> but it would be nice if you could briefly verify this :)
>>>
>>> Thanks!
>>>
>>>>
>>>>> Thanks!
>>>>>
>>>>> [1] Link:https://lore.kernel.org/all/20260312130639.138988-1-simon.schippers@tu-dortmund.de/
>>>>> [2] Link:https://www.kernel.org/doc/html/latest/networking/pktgen.html
>>

  parent reply	other threads:[~2026-05-01 20:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 17:46 [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support hawk
2026-03-24 17:46 ` [PATCH " hawk
2026-03-24 17:56   ` Jesper Dangaard Brouer
2026-03-24 17:47 ` [PATCH net-next 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices hawk
2026-03-24 17:47 ` [PATCH net-next 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction hawk
2026-03-24 17:47 ` [PATCH net-next 3/5] veth: add tx_timeout watchdog as BQL safety net hawk
2026-03-24 17:47 ` [PATCH net-next 4/5] net: sched: add timeout count to NETDEV WATCHDOG message hawk
2026-03-24 17:47 ` [PATCH net-next 5/5] selftests: net: add veth BQL stress test hawk
2026-03-26 12:19   ` Jesper Dangaard Brouer
2026-03-26 19:55     ` Jakub Kicinski
2026-03-28 15:19   ` Simon Schippers
     [not found]     ` <1c435d90-8d08-4ac1-8b84-cc72c0b4e30f@tu-berlin.de>
2026-04-30  9:45       ` Simon Schippers
2026-04-30 12:31         ` Jesper Dangaard Brouer
     [not found]           ` <a841e7ed-eee0-4069-bd0d-ab043a1509c5@tu-berlin.de>
2026-05-01 20:35             ` Simon Schippers [this message]
2026-03-27  9:50 ` [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support Toke Høiland-Jørgensen
2026-03-27 12:49   ` Jesper Dangaard Brouer
2026-03-27 15:37     ` Jonas Köppeler
2026-03-28 20:06       ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8cdba04-aa9a-45c6-9807-8274b62920df@tu-dortmund.de \
    --to=simon.schippers@tu-dortmund.de \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=j.koeppeler@tu-berlin.de \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox