All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Amery Hung <ameryhung@gmail.com>, netdev@vger.kernel.org
Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	alexei.starovoitov@gmail.com, martin.lau@kernel.org,
	sinquersw@gmail.com, jhs@mojatatu.com, jiri@resnulli.us,
	stfomichev@gmail.com, ekarani.silvestre@ccc.ufcg.edu.br,
	yangpeihao@sjtu.edu.cn, xiyou.wangcong@gmail.com,
	yepeilin.cs@gmail.com, ameryhung@gmail.com,
	amery.hung@bytedance.com
Subject: Re: [PATCH bpf-next v2 00/14] bpf qdisc
Date: Thu, 02 Jan 2025 18:29:57 +0100	[thread overview]
Message-ID: <874j2h86p6.fsf@toke.dk> (raw)
In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com>

Amery Hung <ameryhung@gmail.com> writes:

> Hi all,
>
> This patchset aims to support implementing qdisc using bpf struct_ops.
> This version takes a step back and only implements the minimum support
> for bpf qdisc. 1) support of adding skb to bpf_list and bpf_rbtree
> directly and 2) classful qdisc are deferred to future patchsets.
>
> * Overview *
>
> This series supports implementing qdisc using bpf struct_ops. bpf qdisc
> aims to be a flexible and easy-to-use infrastructure that allows users to
> quickly experiment with different scheduling algorithms/policies. It only
> requires users to implement core qdisc logic using bpf and implements the
> mundane part for them. In addition, the ability to easily communicate
> between qdisc and other components will also bring new opportunities for
> new applications and optimizations.

This is very cool, and I'm thrilled to see this work getting closer to
being merged! :)

> * struct_ops changes *
>
> To make struct_ops works better with bpf qdisc, two new changes are
> introduced to bpf specifically for struct_ops programs. Frist, we
> introduce "__ref" postfix for arguments in stub functions in patch 1-2.
> It allows Qdisc_ops->enqueue to acquire an unique referenced kptr to the
> skb argument. Through the reference object tracking mechanism in
> the verifier, we can make sure that the acquired skb will be either
> enqueued or dropped. Besides, no duplicate references can be acquired.
> Then, we allow a referenced kptr to be returned from struct_ops programs
> so that we can return an skb naturally. This is done and tested in patch 3
> and 4.
>
> * Performance of bpf qdisc *
>
> We tested several bpf qdiscs included in the selftests and their in-tree
> counterparts to give you a sense of the performance of qdisc implemented
> in bpf.
>
> The implementation of bpf_fq is fairly complex and slightly different from
> fq so later we only compare the two fifo qdiscs. bpf_fq implements the
> same fair queueing algorithm in fq, but without flow hash collision
> avoidance and garbage collection of inactive flows. bpf_fifo uses a single
> bpf_list as a queue instead of three queues for different priorities in
> pfifo_fast. The time complexity of fifo however should be similar since the
> queue selection time is negligible.
>
> Test setup:
>
>     client -> qdisc ------------->  server
>     ~~~~~~~~~~~~~~~                 ~~~~~~
>     nested VM1 @ DC1               VM2 @ DC2
>
> Throghput: iperf3 -t 600, 5 times
>
>       Qdisc        Average (GBits/sec)
>     ----------     -------------------
>     pfifo_fast       12.52 ± 0.26
>     bpf_fifo         11.72 ± 0.32 
>     fq               10.24 ± 0.13
>     bpf_fq           11.92 ± 0.64 
>
> Latency: sockperf pp --tcp -t 600, 5 times
>
>       Qdisc        Average (usec)
>     ----------     --------------
>     pfifo_fast      244.58 ± 7.93
>     bpf_fifo        244.92 ± 15.22
>     fq              234.30 ± 19.25
>     bpf_fq          221.34 ± 10.76
>
> Looking at the two fifo qdiscs, the 6.4% drop in throughput in the bpf
> implementatioin is consistent with previous observation (v8 throughput
> test on a loopback device). This should be able to be mitigated by
> supporting adding skb to bpf_list or bpf_rbtree directly in the future.

This looks pretty decent!

> * Clean up skb in bpf qdisc during reset *
>
> The current implementation relies on bpf qdisc implementors to correctly
> release skbs in queues (bpf graphs or maps) in .reset, which might not be
> a safe thing to do. The solution as Martin has suggested would be
> supporting private data in struct_ops. This can also help simplifying
> implementation of qdisc that works with mq. For examples, qdiscs in the
> selftest mostly use global data. Therefore, even if user add multiple
> qdisc instances under mq, they would still share the same queue.

So is the plan to fix this before merging this series? Seems like a bit
of a footgun, otherwise?

-Toke


  parent reply	other threads:[~2025-01-02 17:30 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-20 19:55 [PATCH bpf-next v2 00/14] bpf qdisc Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 01/14] bpf: Support getting referenced kptr from struct_ops argument Amery Hung
2025-01-23  9:57   ` Eduard Zingerman
2025-01-23 19:41     ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 02/14] selftests/bpf: Test referenced kptr arguments of struct_ops programs Amery Hung
2025-01-23  9:57   ` Eduard Zingerman
2025-01-24  0:04     ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 03/14] bpf: Allow struct_ops prog to return referenced kptr Amery Hung
2025-01-15 15:25   ` Ming Lei
2025-01-23  9:57   ` Eduard Zingerman
2025-01-23 18:19     ` Eduard Zingerman
2024-12-20 19:55 ` [PATCH bpf-next v2 04/14] selftests/bpf: Test returning referenced kptr from struct_ops programs Amery Hung
2025-01-23  9:58   ` Eduard Zingerman
2024-12-20 19:55 ` [PATCH bpf-next v2 05/14] bpf: net_sched: Support implementation of Qdisc_ops in bpf Amery Hung
2025-01-09 15:00   ` Amery Hung
2025-01-10  0:28   ` Martin KaFai Lau
2025-01-10  1:20   ` Jakub Kicinski
2024-12-20 19:55 ` [PATCH bpf-next v2 06/14] bpf: net_sched: Add basic bpf qdisc kfuncs Amery Hung
2025-01-10  0:24   ` Martin KaFai Lau
2025-01-10 18:00     ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 07/14] bpf: Search and add kfuncs in struct_ops prologue and epilogue Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 08/14] bpf: net_sched: Add a qdisc watchdog timer Amery Hung
2025-01-09  0:20   ` Martin KaFai Lau
2025-01-09 15:00     ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 09/14] bpf: net_sched: Support updating bstats Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 10/14] bpf: net_sched: Support updating qstats Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 11/14] bpf: net_sched: Allow writing to more Qdisc members Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 12/14] libbpf: Support creating and destroying qdisc Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 13/14] selftests: Add a basic fifo qdisc test Amery Hung
2025-01-10  0:05   ` Martin KaFai Lau
2024-12-20 19:55 ` [PATCH bpf-next v2 14/14] selftests: Add a bpf fq qdisc to selftest Amery Hung
2025-01-09 23:36   ` Martin KaFai Lau
2025-01-02 17:29 ` Toke Høiland-Jørgensen [this message]
2025-01-10  1:43 ` [PATCH bpf-next v2 00/14] bpf qdisc Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874j2h86p6.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=amery.hung@bytedance.com \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=ekarani.silvestre@ccc.ufcg.edu.br \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=martin.lau@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sinquersw@gmail.com \
    --cc=stfomichev@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yangpeihao@sjtu.edu.cn \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.