From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com,
netdev@vger.kernel.org
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
Date: Thu, 25 Jan 2024 18:22:31 -0800 [thread overview]
Message-ID: <01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev> (raw)
In-Reply-To: <CAMB2axM1TVw05jZsFe7TsKKRN8jw=YOwu-+rA9bOAkOiCPyFqQ@mail.gmail.com>
On 1/23/24 9:22 PM, Amery Hung wrote:
>> I looked at the high level of the patchset. The major ops that it wants to be
>> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
>> patch 4 and patch 5).
>>
>> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
>> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
>> uapi. It is no long an acceptable way to add new bpf extension.
>>
>> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
>> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
>> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.
>>
> Partially. If using struct_ops, I think we'll need another structure
> like the following in bpf qdisc to be implemented with struct_ops bpf:
>
> struct bpf_qdisc_ops {
> int (*enqueue) (struct sk_buff *skb)
> void (*dequeue) (void)
> void (*init) (void)
> void (*reset) (void)
> };
>
> Then, Qdisc_ops will wrap around them to handle things that cannot be
> implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).
We can see how those limitations (calling sch_tree_lock() and returning a ptr)
can be addressed in bpf. This will also help other similar use cases.
Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you
see that blocks directly implementing the enqueue/dequeue/init/reset in the
struct Qdisc_ops?
Have you thought above ".priv_size"? It is now fixed to sizeof(struct
bpf_sched_data). It should be useful to allow the bpf prog to store its own data
there?
>
>> If other ops (like ".dump", ".dump_stats"...) do not have good use case to be
>> programmable in bpf, it can stay with the kernel implementation for now and only
>> allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset
>> implemented.
>>
>> You mentioned in the cover letter that:
>> "Current struct_ops attachment model does not seem to support replacing only
>> functions of a specific instance of a module, but I might be wrong."
>>
>> I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes,
>> it can be done through the struct_ops's ".init_member". Take a look at
>> bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for
>> ".dump" (for example) when loading the bpf Qdisc_ops.
>>
> I have no problem with partially replacing a struct, which like you
> mentioned has been demonstrated by congestion control or sched_ext.
> What I am not sure about is the ability to create multiple bpf qdiscs,
> where each has different ".enqueue", ".dequeue", and so on. I like the
> struct_ops approach and would love to try it if struct_ops support
> this.
The need for allowing different ".enqueue/.dequeue/..." bpf
(BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because
there is only one ".id == bpf" Qdisc_ops native kernel implementation which is
then because of the limitation you mentioned above?
Am I understanding your reason correctly on why it requires to load different
bpf prog for different qdisc instances?
If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly
implemented in bpf prog itself, it can just load another bpf struct_ops which
has a different ".enqueue/.dequeue/..." implementation:
#> bpftool struct_ops register bpf_simple_fq_v1.bpf.o
#> bpftool struct_ops register bpf_simple_fq_v2.bpf.o
#> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o
From reading the test bpf prog, I think the set is on a good footing. Instead
of working around the limitation by wrapping the bpf prog in a predefined
"struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in
bpf and see how we could address them.
next prev parent reply other threads:[~2024-01-26 2:22 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-17 21:56 [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 1/8] " Amery Hung
2024-01-20 11:41 ` kernel test robot
2024-01-20 12:13 ` kernel test robot
2024-01-20 13:47 ` kernel test robot
2024-01-23 23:51 ` Martin KaFai Lau
2024-01-24 5:22 ` Amery Hung
2024-01-26 2:22 ` Martin KaFai Lau [this message]
2024-01-27 1:17 ` Amery Hung
2024-01-30 6:39 ` Martin KaFai Lau
2024-01-30 17:49 ` Kui-Feng Lee
2024-01-31 1:01 ` Martin KaFai Lau
2024-01-31 16:49 ` Kui-Feng Lee
2024-01-31 16:59 ` Amery Hung
2024-01-31 16:23 ` Amery Hung
2024-02-02 1:47 ` Martin KaFai Lau
2024-02-09 20:14 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 2/8] net_sched: Add kfuncs for working with skb Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 3/8] net_sched: Introduce kfunc bpf_skb_tc_classify() Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 4/8] net_sched: Add reset program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 5/8] net_sched: Add init program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 6/8] tools/libbpf: Add support for BPF_PROG_TYPE_QDISC Amery Hung
2024-01-23 0:17 ` Andrii Nakryiko
2024-01-23 19:40 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 7/8] samples/bpf: Add an example of bpf fq qdisc Amery Hung
2024-01-24 10:29 ` Daniel Borkmann
2024-01-26 19:49 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 8/8] samples/bpf: Add an example of bpf netem qdisc Amery Hung
2024-01-23 21:13 ` [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Stanislav Fomichev
2024-01-24 10:10 ` Daniel Borkmann
2024-01-24 12:09 ` Jamal Hadi Salim
2024-01-24 13:07 ` Daniel Borkmann
2024-01-24 14:11 ` Jamal Hadi Salim
2024-01-24 15:26 ` Daniel Borkmann
2024-01-24 21:26 ` Amery Hung
2024-01-25 11:57 ` Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev \
--to=martin.lau@linux.dev \
--cc=ameryhung@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=sdf@google.com \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
--cc=yangpeihao@sjtu.edu.cn \
--cc=yepeilin.cs@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.