From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com,
netdev@vger.kernel.org
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
Date: Thu, 25 Jan 2024 18:22:31 -0800 [thread overview]
Message-ID: <01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev> (raw)
In-Reply-To: <CAMB2axM1TVw05jZsFe7TsKKRN8jw=YOwu-+rA9bOAkOiCPyFqQ@mail.gmail.com>
On 1/23/24 9:22 PM, Amery Hung wrote:
>> I looked at the high level of the patchset. The major ops that it wants to be
>> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
>> patch 4 and patch 5).
>>
>> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
>> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
>> uapi. It is no long an acceptable way to add new bpf extension.
>>
>> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
>> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
>> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.
>>
> Partially. If using struct_ops, I think we'll need another structure
> like the following in bpf qdisc to be implemented with struct_ops bpf:
>
> struct bpf_qdisc_ops {
> int (*enqueue) (struct sk_buff *skb)
> void (*dequeue) (void)
> void (*init) (void)
> void (*reset) (void)
> };
>
> Then, Qdisc_ops will wrap around them to handle things that cannot be
> implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).
We can see how those limitations (calling sch_tree_lock() and returning a ptr)
can be addressed in bpf. This will also help other similar use cases.
Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you
see that blocks directly implementing the enqueue/dequeue/init/reset in the
struct Qdisc_ops?
Have you thought above ".priv_size"? It is now fixed to sizeof(struct
bpf_sched_data). It should be useful to allow the bpf prog to store its own data
there?
>
>> If other ops (like ".dump", ".dump_stats"...) do not have good use case to be
>> programmable in bpf, it can stay with the kernel implementation for now and only
>> allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset
>> implemented.
>>
>> You mentioned in the cover letter that:
>> "Current struct_ops attachment model does not seem to support replacing only
>> functions of a specific instance of a module, but I might be wrong."
>>
>> I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes,
>> it can be done through the struct_ops's ".init_member". Take a look at
>> bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for
>> ".dump" (for example) when loading the bpf Qdisc_ops.
>>
> I have no problem with partially replacing a struct, which like you
> mentioned has been demonstrated by congestion control or sched_ext.
> What I am not sure about is the ability to create multiple bpf qdiscs,
> where each has different ".enqueue", ".dequeue", and so on. I like the
> struct_ops approach and would love to try it if struct_ops support
> this.
The need for allowing different ".enqueue/.dequeue/..." bpf
(BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because
there is only one ".id == bpf" Qdisc_ops native kernel implementation which is
then because of the limitation you mentioned above?
Am I understanding your reason correctly on why it requires to load different
bpf prog for different qdisc instances?
If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly
implemented in bpf prog itself, it can just load another bpf struct_ops which
has a different ".enqueue/.dequeue/..." implementation:
#> bpftool struct_ops register bpf_simple_fq_v1.bpf.o
#> bpftool struct_ops register bpf_simple_fq_v2.bpf.o
#> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o
From reading the test bpf prog, I think the set is on a good footing. Instead
of working around the limitation by wrapping the bpf prog in a predefined
"struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in
bpf and see how we could address them.
next prev parent reply other threads:[~2024-01-26 2:22 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 21:56 [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 1/8] " Amery Hung
2024-01-23 23:51 ` Martin KaFai Lau
2024-01-24 5:22 ` Amery Hung
2024-01-26 2:22 ` Martin KaFai Lau [this message]
2024-01-27 1:17 ` Amery Hung
2024-01-30 6:39 ` Martin KaFai Lau
2024-01-30 17:49 ` Kui-Feng Lee
2024-01-31 1:01 ` Martin KaFai Lau
2024-01-31 16:49 ` Kui-Feng Lee
2024-01-31 16:59 ` Amery Hung
2024-01-31 16:23 ` Amery Hung
2024-02-02 1:47 ` Martin KaFai Lau
2024-02-09 20:14 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 2/8] net_sched: Add kfuncs for working with skb Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 3/8] net_sched: Introduce kfunc bpf_skb_tc_classify() Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 4/8] net_sched: Add reset program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 5/8] net_sched: Add init program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 6/8] tools/libbpf: Add support for BPF_PROG_TYPE_QDISC Amery Hung
2024-01-23 0:17 ` Andrii Nakryiko
2024-01-23 19:40 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 7/8] samples/bpf: Add an example of bpf fq qdisc Amery Hung
2024-01-24 10:29 ` Daniel Borkmann
2024-01-26 19:49 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 8/8] samples/bpf: Add an example of bpf netem qdisc Amery Hung
2024-01-23 21:13 ` [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Stanislav Fomichev
2024-01-24 10:10 ` Daniel Borkmann
2024-01-24 12:09 ` Jamal Hadi Salim
2024-01-24 13:07 ` Daniel Borkmann
2024-01-24 14:11 ` Jamal Hadi Salim
2024-01-24 15:26 ` Daniel Borkmann
2024-01-24 21:26 ` Amery Hung
2024-01-25 11:57 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev \
--to=martin.lau@linux.dev \
--cc=ameryhung@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yangpeihao@sjtu.edu.cn \
--cc=yepeilin.cs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).