All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: bpf@vger.kernel.org, yangpeihao@sjtu.edu.cn, toke@redhat.com,
	jhs@mojatatu.com, jiri@resnulli.us, sdf@google.com,
	xiyou.wangcong@gmail.com, yepeilin.cs@gmail.com,
	netdev@vger.kernel.org
Subject: Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc
Date: Thu, 25 Jan 2024 18:22:31 -0800	[thread overview]
Message-ID: <01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev> (raw)
In-Reply-To: <CAMB2axM1TVw05jZsFe7TsKKRN8jw=YOwu-+rA9bOAkOiCPyFqQ@mail.gmail.com>

On 1/23/24 9:22 PM, Amery Hung wrote:
>> I looked at the high level of the patchset. The major ops that it wants to be
>> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
>> patch 4 and patch 5).
>>
>> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
>> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
>> uapi. It is no long an acceptable way to add new bpf extension.
>>
>> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
>> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
>> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.
>>
> Partially. If using struct_ops, I think we'll need another structure
> like the following in bpf qdisc to be implemented with struct_ops bpf:
> 
> struct bpf_qdisc_ops {
>      int (*enqueue) (struct sk_buff *skb)
>      void (*dequeue) (void)
>      void (*init) (void)
>      void (*reset) (void)
> };
> 
> Then, Qdisc_ops will wrap around them to handle things that cannot be
> implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).

We can see how those limitations (calling sch_tree_lock() and returning a ptr) 
can be addressed in bpf. This will also help other similar use cases.

Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you 
see that blocks directly implementing the enqueue/dequeue/init/reset in the 
struct Qdisc_ops?

Have you thought above ".priv_size"? It is now fixed to sizeof(struct 
bpf_sched_data). It should be useful to allow the bpf prog to store its own data 
there?

> 
>> If other ops (like ".dump", ".dump_stats"...) do not have good use case to be
>> programmable in bpf, it can stay with the kernel implementation for now and only
>> allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset
>> implemented.
>>
>> You mentioned in the cover letter that:
>> "Current struct_ops attachment model does not seem to support replacing only
>> functions of a specific instance of a module, but I might be wrong."
>>
>> I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes,
>> it can be done through the struct_ops's ".init_member". Take a look at
>> bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for
>> ".dump" (for example) when loading the bpf Qdisc_ops.
>>
> I have no problem with partially replacing a struct, which like you
> mentioned has been demonstrated by congestion control or sched_ext.
> What I am not sure about is the ability to create multiple bpf qdiscs,
> where each has different ".enqueue", ".dequeue", and so on. I like the
> struct_ops approach and would love to try it if struct_ops support
> this.

The need for allowing different ".enqueue/.dequeue/..." bpf 
(BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because 
there is only one ".id == bpf" Qdisc_ops native kernel implementation which is 
then because of the limitation you mentioned above?

Am I understanding your reason correctly on why it requires to load different 
bpf prog for different qdisc instances?

If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly 
implemented in bpf prog itself, it can just load another bpf struct_ops which 
has a different ".enqueue/.dequeue/..." implementation:

#> bpftool struct_ops register bpf_simple_fq_v1.bpf.o
#> bpftool struct_ops register bpf_simple_fq_v2.bpf.o
#> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o

 From reading the test bpf prog, I think the set is on a good footing. Instead 
of working around the limitation by wrapping the bpf prog in a predefined 
"struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in 
bpf and see how we could address them.



  reply	other threads:[~2024-01-26  2:22 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-17 21:56 [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 1/8] " Amery Hung
2024-01-20 11:41   ` kernel test robot
2024-01-20 12:13   ` kernel test robot
2024-01-20 13:47   ` kernel test robot
2024-01-23 23:51   ` Martin KaFai Lau
2024-01-24  5:22     ` Amery Hung
2024-01-26  2:22       ` Martin KaFai Lau [this message]
2024-01-27  1:17         ` Amery Hung
2024-01-30  6:39           ` Martin KaFai Lau
2024-01-30 17:49             ` Kui-Feng Lee
2024-01-31  1:01               ` Martin KaFai Lau
2024-01-31 16:49                 ` Kui-Feng Lee
2024-01-31 16:59                   ` Amery Hung
2024-01-31 16:23             ` Amery Hung
2024-02-02  1:47               ` Martin KaFai Lau
2024-02-09 20:14                 ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 2/8] net_sched: Add kfuncs for working with skb Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 3/8] net_sched: Introduce kfunc bpf_skb_tc_classify() Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 4/8] net_sched: Add reset program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 5/8] net_sched: Add init program Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 6/8] tools/libbpf: Add support for BPF_PROG_TYPE_QDISC Amery Hung
2024-01-23  0:17   ` Andrii Nakryiko
2024-01-23 19:40     ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 7/8] samples/bpf: Add an example of bpf fq qdisc Amery Hung
2024-01-24 10:29   ` Daniel Borkmann
2024-01-26 19:49     ` Amery Hung
2024-01-17 21:56 ` [RFC PATCH v7 8/8] samples/bpf: Add an example of bpf netem qdisc Amery Hung
2024-01-23 21:13 ` [RFC PATCH v7 0/8] net_sched: Introduce eBPF based Qdisc Stanislav Fomichev
2024-01-24 10:10   ` Daniel Borkmann
2024-01-24 12:09   ` Jamal Hadi Salim
2024-01-24 13:07     ` Daniel Borkmann
2024-01-24 14:11       ` Jamal Hadi Salim
2024-01-24 15:26         ` Daniel Borkmann
2024-01-24 21:26           ` Amery Hung
2024-01-25 11:57             ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01fdb720-c0dc-495d-a42d-756aa2bf4455@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=ameryhung@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@google.com \
    --cc=toke@redhat.com \
    --cc=xiyou.wangcong@gmail.com \
    --cc=yangpeihao@sjtu.edu.cn \
    --cc=yepeilin.cs@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.