From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, netdev@vger.kernel.org,
daniel@iogearbox.net, andrii@kernel.org,
alexei.starovoitov@gmail.com, martin.lau@kernel.org,
sinquersw@gmail.com, toke@redhat.com, jhs@mojatatu.com,
jiri@resnulli.us, stfomichev@gmail.com,
ekarani.silvestre@ccc.ufcg.edu.br, yangpeihao@sjtu.edu.cn,
xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com
Subject: Re: [PATCH bpf-next v2 00/14] bpf qdisc
Date: Thu, 9 Jan 2025 17:43:46 -0800 [thread overview]
Message-ID: <1292dc51-4ca1-45c0-8a7c-78d325530531@linux.dev> (raw)
In-Reply-To: <20241220195619.2022866-1-amery.hung@gmail.com>
On 12/20/24 11:55 AM, Amery Hung wrote:
> The implementation of bpf_fq is fairly complex and slightly different from
> fq so later we only compare the two fifo qdiscs. bpf_fq implements the
> same fair queueing algorithm in fq, but without flow hash collision
> avoidance and garbage collection of inactive flows. bpf_fifo uses a single
For hash collision, I think you meant >1 tcp_socks having the same hash in patch
14? This probably could be detected by adding the sk pointer value to the
bpf-map key? not asking to change patch 14 though.
For garbage collection, I think patch 14 has it but yes it is iterating the bpf
map, so not as quick as doing gc while searching for the sk in the rbtree. I
think the only missing piece is being able to iterate the bpf_rb_root, i.e. able
to directly search left and right of a bpf_rb_node.
> bpf_list as a queue instead of three queues for different priorities in
> pfifo_fast. The time complexity of fifo however should be similar since the
> queue selection time is negligible.
>
> Test setup:
>
> client -> qdisc -------------> server
> ~~~~~~~~~~~~~~~ ~~~~~~
> nested VM1 @ DC1 VM2 @ DC2
>
> Throghput: iperf3 -t 600, 5 times
>
> Qdisc Average (GBits/sec)
> ---------- -------------------
> pfifo_fast 12.52 ± 0.26
> bpf_fifo 11.72 ± 0.32
> fq 10.24 ± 0.13
> bpf_fq 11.92 ± 0.64
>
> Latency: sockperf pp --tcp -t 600, 5 times
>
> Qdisc Average (usec)
> ---------- --------------
> pfifo_fast 244.58 ± 7.93
> bpf_fifo 244.92 ± 15.22
> fq 234.30 ± 19.25
> bpf_fq 221.34 ± 10.76
>
> Looking at the two fifo qdiscs, the 6.4% drop in throughput in the bpf
> implementatioin is consistent with previous observation (v8 throughput
> test on a loopback device). This should be able to be mitigated by
> supporting adding skb to bpf_list or bpf_rbtree directly in the future.
>
> * Clean up skb in bpf qdisc during reset *
>
> The current implementation relies on bpf qdisc implementors to correctly
> release skbs in queues (bpf graphs or maps) in .reset, which might not be
> a safe thing to do. The solution as Martin has suggested would be
> supporting private data in struct_ops. This can also help simplifying
> implementation of qdisc that works with mq. For examples, qdiscs in the
> selftest mostly use global data. Therefore, even if user add multiple
> qdisc instances under mq, they would still share the same queue.
Although not as nice as priv_data, I think mq setup with a dedicated queue can
be done with bpf map-in-map.
For the cleanup part, it is similar to how the bpf kptr is cleaned up, either
the bpf program frees it or the bpf infra will eventually clean it up during the
bpf map destruction.
For priv_data, I think it could be a useful addition to the bpf_struct_ops.
Meaning it should also work for struct_ops other than Qdisc_ops. Then all
destruction and free could be done more automatically and seamlessly.
imo, the above improvements can be iterated later on top of the core pieces of
this set.
prev parent reply other threads:[~2025-01-10 1:43 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-20 19:55 [PATCH bpf-next v2 00/14] bpf qdisc Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 01/14] bpf: Support getting referenced kptr from struct_ops argument Amery Hung
2025-01-23 9:57 ` Eduard Zingerman
2025-01-23 19:41 ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 02/14] selftests/bpf: Test referenced kptr arguments of struct_ops programs Amery Hung
2025-01-23 9:57 ` Eduard Zingerman
2025-01-24 0:04 ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 03/14] bpf: Allow struct_ops prog to return referenced kptr Amery Hung
2025-01-15 15:25 ` Ming Lei
2025-01-23 9:57 ` Eduard Zingerman
2025-01-23 18:19 ` Eduard Zingerman
2024-12-20 19:55 ` [PATCH bpf-next v2 04/14] selftests/bpf: Test returning referenced kptr from struct_ops programs Amery Hung
2025-01-23 9:58 ` Eduard Zingerman
2024-12-20 19:55 ` [PATCH bpf-next v2 05/14] bpf: net_sched: Support implementation of Qdisc_ops in bpf Amery Hung
2025-01-09 15:00 ` Amery Hung
2025-01-10 0:28 ` Martin KaFai Lau
2025-01-10 1:20 ` Jakub Kicinski
2024-12-20 19:55 ` [PATCH bpf-next v2 06/14] bpf: net_sched: Add basic bpf qdisc kfuncs Amery Hung
2025-01-10 0:24 ` Martin KaFai Lau
2025-01-10 18:00 ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 07/14] bpf: Search and add kfuncs in struct_ops prologue and epilogue Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 08/14] bpf: net_sched: Add a qdisc watchdog timer Amery Hung
2025-01-09 0:20 ` Martin KaFai Lau
2025-01-09 15:00 ` Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 09/14] bpf: net_sched: Support updating bstats Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 10/14] bpf: net_sched: Support updating qstats Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 11/14] bpf: net_sched: Allow writing to more Qdisc members Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 12/14] libbpf: Support creating and destroying qdisc Amery Hung
2024-12-20 19:55 ` [PATCH bpf-next v2 13/14] selftests: Add a basic fifo qdisc test Amery Hung
2025-01-10 0:05 ` Martin KaFai Lau
2024-12-20 19:55 ` [PATCH bpf-next v2 14/14] selftests: Add a bpf fq qdisc to selftest Amery Hung
2025-01-09 23:36 ` Martin KaFai Lau
2025-01-02 17:29 ` [PATCH bpf-next v2 00/14] bpf qdisc Toke Høiland-Jørgensen
2025-01-10 1:43 ` Martin KaFai Lau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1292dc51-4ca1-45c0-8a7c-78d325530531@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=ekarani.silvestre@ccc.ufcg.edu.br \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=martin.lau@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sinquersw@gmail.com \
--cc=stfomichev@gmail.com \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yangpeihao@sjtu.edu.cn \
--cc=yepeilin.cs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).