netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
	jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
	xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com,
	netdev@vger.kernel.org
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
Date: Thu, 25 Jan 2024 18:22:31 -0800	[thread overview]
Message-ID: <01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev> (raw)
In-Reply-To: <CAMB2axM1TVw05jZsFe7TsKKRN8jw=YOwu-+rA9bOAkOiCPyFqQ@mail.gmail.com>

On 1/23/24 9:22 PM, Amery Hung wrote:
>> I looked at the high level of the patchset. The major ops that it wants to be
>> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
>> patch 4 and patch 5).
>>
>> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
>> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
>> uapi. It is no long an acceptable way to add new bpf extension.
>>
>> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
>> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
>> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.
>>
> Partially. If using struct_ops, I think we'll need another structure
> like the following in bpf qdisc to be implemented with struct_ops bpf:
> 
> struct bpf_qdisc_ops {
>      int (*enqueue) (struct sk_buff *skb)
>      void (*dequeue) (void)
>      void (*init) (void)
>      void (*reset) (void)
> };
> 
> Then, Qdisc_ops will wrap around them to handle things that cannot be
> implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).

We can see how those limitations (calling sch_tree_lock() and returning a ptr) 
can be addressed in bpf. This will also help other similar use cases.

Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you 
see that blocks directly implementing the enqueue/dequeue/init/reset in the 
struct Qdisc_ops?

Have you thought above ".priv_size"? It is now fixed to sizeof(struct 
bpf_sched_data). It should be useful to allow the bpf prog to store its own data 
there?

> 
>> If other ops (like ".dump", ".dump_stats"...) do not have good use case to be
>> programmable in bpf, it can stay with the kernel implementation for now and only
>> allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset
>> implemented.
>>
>> You mentioned in the cover letter that:
>> "Current struct_ops attachment model does not seem to support replacing only
>> functions of a specific instance of a module, but I might be wrong."
>>
>> I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes,
>> it can be done through the struct_ops's ".init_member". Take a look at
>> bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for
>> ".dump" (for example) when loading the bpf Qdisc_ops.
>>
> I have no problem with partially replacing a struct, which like you
> mentioned has been demonstrated by congestion control or sched_ext.
> What I am not sure about is the ability to create multiple bpf qdiscs,
> where each has different ".enqueue", ".dequeue", and so on. I like the
> struct_ops approach and would love to try it if struct_ops support
> this.

The need for allowing different ".enqueue/.dequeue/..." bpf 
(BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because 
there is only one ".id == bpf" Qdisc_ops native kernel implementation which is 
then because of the limitation you mentioned above?

Am I understanding your reason correctly on why it requires to load different 
bpf prog for different qdisc instances?

If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly 
implemented in bpf prog itself, it can just load another bpf struct_ops which 
has a different ".enqueue/.dequeue/..." implementation:

#> bpftool struct_ops register bpf_simple_fq_v1.bpf.o
#> bpftool struct_ops register bpf_simple_fq_v2.bpf.o
#> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o

 From reading the test bpf prog, I think the set is on a good footing. Instead 
of working around the limitation by wrapping the bpf prog in a predefined 
"struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in 
bpf and see how we could address them.



  reply	other threads:[~2024-01-26  2:22 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 21:56 [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 1/8] " Amery Hung
2024-01-23 23:51   ` Martin KaFai Lau
2024-01-24  5:22     ` Amery Hung
2024-01-26  2:22       ` Martin KaFai Lau [this message]
2024-01-27  1:17         ` Amery Hung
2024-01-30  6:39           ` Martin KaFai Lau
2024-01-30 17:49             ` Kui-Feng Lee
2024-01-31  1:01               ` Martin KaFai Lau
2024-01-31 16:49                 ` Kui-Feng Lee
2024-01-31 16:59                   ` Amery Hung
2024-01-31 16:23             ` Amery Hung
2024-02-02  1:47               ` Martin KaFai Lau
2024-02-09 20:14                 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 2/8] net_sched: Add kfuncs for working with skb Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 3/8] net_sched: Introduce kfunc bpf_skb_tc_classify() Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 4/8] net_sched: Add reset program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 5/8] net_sched: Add init program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 6/8] tools/libbpf: Add support for BPF_PROG_TYPE_QDISC Amery Hung
2024-01-23  0:17   ` Andrii Nakryiko
2024-01-23 19:40     ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 7/8] samples/bpf: Add an example of bpf fq qdisc Amery Hung
2024-01-24 10:29   ` Daniel Borkmann
2024-01-26 19:49     ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 8/8] samples/bpf: Add an example of bpf netem qdisc Amery Hung
2024-01-23 21:13 ` [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Stanislav Fomichev
2024-01-24 10:10   ` Daniel Borkmann
2024-01-24 12:09   ` Jamal Hadi Salim
2024-01-24 13:07     ` Daniel Borkmann
2024-01-24 14:11       ` Jamal Hadi Salim
2024-01-24 15:26         ` Daniel Borkmann
2024-01-24 21:26           ` Amery Hung
2024-01-25 11:57             ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=ameryhung@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=toke@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yangpeihao@sjtu.edu.cn \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).