All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: "Jonas Köppeler" <j.koeppeler@tu-berlin.de>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	netdev@vger.kernel.org
Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
	jhs@mojatatu.com, jiri@resnulli.us, kernel-team@cloudflare.com,
	Chris Arges <chris.arges@gmail.com>,
	Mike Freemon <mike.freemon@cloudflare.com>
Subject: Re: [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support
Date: Sat, 28 Mar 2026 21:06:48 +0100	[thread overview]
Message-ID: <87bjg7d8h3.fsf@toke.dk> (raw)
In-Reply-To: <7d404bd3-4444-464e-8831-c8304ecf5b40@tu-berlin.de>

Jonas Köppeler <j.koeppeler@tu-berlin.de> writes:

> On 3/27/26 13:49, Jesper Dangaard Brouer wrote:
>>
>>
>> On 27/03/2026 10.50, Toke Høiland-Jørgensen wrote:
>>> hawk@kernel.org writes:
>>>
>>>> From: Jesper Dangaard Brouer <hawk@kernel.org>
>>>>
>>>> This series adds BQL (Byte Queue Limits) to the veth driver, reducing
>>>> latency by dynamically limiting in-flight bytes in the ptr_ring and
>>>> moving buffering into the qdisc where AQM algorithms can act on it.
>>>>
>>>> Problem:
>>>>    veth's 256-entry ptr_ring acts as a "dark buffer" -- packets queued
>>>>    there are invisible to the qdisc's AQM.  Under load, the ring fills
>>>>    completely (DRV_XOFF backpressure), adding up to 256 packets of
>>>>    unmanaged latency before the qdisc even sees congestion.
>>>>
>>>> Solution:
>>>>    BQL (STACK_XOFF) dynamically limits in-flight bytes, stopping the
>>>>    queue before the ring fills.  This keeps the ring shallow and pushes
>>>>    excess packets into the qdisc, where sojourn-based AQM can measure
>>>>    and drop them.
>>>
>>> So one question here: Is *Byte* queue limits really the right thing for
>>> veth? As you mention above, the ptr_ring is sized in a number of
>>> packets. On a physical NIC, accounting bytes makes sense because there's
>>> a fixed line rate, so bytes turn directly into latency.
>>>
>>> But on a veth device, the stack processing is per packet, and most
>>> processing takes the same amount of time regardless of the size of the
>>> packet (e.g., netfilter rules that operate on the skb only).
>>>
>>> So my worry would be that when you're accounting in bytes, if there's a
>>> mix of big and small packets, you'd end up with the BQL algorithm
>>> scaling to a "too large" value, which would allow a lot of small packets
>>> to be queued up, adding extra latency (or even overflowing the ring
>>> buffer if the ratio is large enough).
>>>
>>> Have you run any such experiments? 
>>
>> Thank for bring this up.
>> Yes, we have considered this (and agree).
>>
>> Jonas is conduction some experiments.
>> I will let Jonas answer?
> Hi,
>
> I used the provided selftest, modified so that the payload size alternates
> between 1400 bytes and sizeof(struct pkt_hdr) = 24 bytes every 5000 packets.
>
> The receiver was slowed down using 10K iptables rules. I could confirm that
> the receive queue filled up to ~66 packets, whereas the BQL limit is around
> 2884 bytes, corresponding to approximately 2 x 1400-byte packets.
>
> I compared two accounting strategies: using skb->len vs. a fixed size of 1.
>
> Ping results over 5 runs using skb->len accounting:
>
>    rtt min/avg/max/mdev = 0.636/2.784/ 9.543/1.735 ms
>    rtt min/avg/max/mdev = 0.629/2.947/10.587/1.927 ms
>    rtt min/avg/max/mdev = 0.587/2.966/11.625/1.963 ms
>    rtt min/avg/max/mdev = 0.589/3.006/10.694/1.979 ms
>
> Ping results over 5 runs using fixed size (1) accounting:
>
>    rtt min/avg/max/mdev = 0.587/2.446/6.261/1.065 ms
>    rtt min/avg/max/mdev = 0.641/2.339/6.008/0.950 ms
>    rtt min/avg/max/mdev = 0.688/2.527/5.506/1.086 ms
>    rtt min/avg/max/mdev = 0.596/2.411/5.228/1.041 ms
>
> The avg and max RTT are consistently lower with the fixed-size accounting.
> This suggests that the excess buffered packets contribute to some
> latency.

Right, so this sounds like fixed-size accounting is the way to go, then.
Cool :)

-Toke


      reply	other threads:[~2026-03-28 20:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-24 17:46 [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support hawk
2026-03-24 17:46 ` [PATCH " hawk
2026-03-24 17:56   ` Jesper Dangaard Brouer
2026-03-24 17:47 ` [PATCH net-next 1/5] net: add dev->bql flag to allow BQL sysfs for IFF_NO_QUEUE devices hawk
2026-03-24 17:47 ` [PATCH net-next 2/5] veth: implement Byte Queue Limits (BQL) for latency reduction hawk
2026-03-24 17:47 ` [PATCH net-next 3/5] veth: add tx_timeout watchdog as BQL safety net hawk
2026-03-24 17:47 ` [PATCH net-next 4/5] net: sched: add timeout count to NETDEV WATCHDOG message hawk
2026-03-24 17:47 ` [PATCH net-next 5/5] selftests: net: add veth BQL stress test hawk
2026-03-26 12:19   ` Jesper Dangaard Brouer
2026-03-26 19:55     ` Jakub Kicinski
2026-03-28 15:19   ` Simon Schippers
     [not found]     ` <1c435d90-8d08-4ac1-8b84-cc72c0b4e30f@tu-berlin.de>
2026-04-30  9:45       ` Simon Schippers
2026-04-30 12:31         ` Jesper Dangaard Brouer
     [not found]           ` <a841e7ed-eee0-4069-bd0d-ab043a1509c5@tu-berlin.de>
2026-05-01 20:35             ` Simon Schippers
2026-03-27  9:50 ` [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support Toke Høiland-Jørgensen
2026-03-27 12:49   ` Jesper Dangaard Brouer
2026-03-27 15:37     ` Jonas Köppeler
2026-03-28 20:06       ` Toke Høiland-Jørgensen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bjg7d8h3.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=chris.arges@gmail.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hawk@kernel.org \
    --cc=horms@kernel.org \
    --cc=j.koeppeler@tu-berlin.de \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=mike.freemon@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.