From: sdf@google.com
To: Cong Wang <xiyou.wangcong@gmail.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
"Cong Wang" <cong.wang@bytedance.com>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
"Jamal Hadi Salim" <jhs@mojatatu.com>,
"Jiri Pirko" <jiri@resnulli.us>
Subject: Re: [RFC Patch v5 0/5] net_sched: introduce eBPF based Qdisc
Date: Fri, 24 Jun 2022 13:51:40 -0700 [thread overview]
Message-ID: <YrYj3LPaHV7thgJW@google.com> (raw)
In-Reply-To: <20220602041028.95124-1-xiyou.wangcong@gmail.com>
On 06/01, Cong Wang wrote:
> From: Cong Wang <cong.wang@bytedance.com>
> This *incomplete* patchset introduces a programmable Qdisc with eBPF.
> There are a few use cases:
> 1. Allow customizing Qdisc's in an easier way. So that people don't
> have to write a complete Qdisc kernel module just to experiment
> some new queuing theory.
> 2. Solve EDT's problem. EDT calcuates the "tokens" in clsact which
> is before enqueue, it is impossible to adjust those "tokens" after
> packets get dropped in enqueue. With eBPF Qdisc, it is easy to
> be solved with a shared map between clsact and sch_bpf.
> 3. Replace qevents, as now the user gains much more control over the
> skb and queues.
> 4. Provide a new way to reuse TC filters. Currently TC relies on filter
> chain and block to reuse the TC filters, but they are too complicated
> to understand. With eBPF helper bpf_skb_tc_classify(), we can invoke
> TC filters on _any_ Qdisc (even on a different netdev) to do the
> classification.
> 5. Potentially pave a way for ingress to queue packets, although
> current implementation is still only for egress.
> 6. Possibly pave a way for handling TCP protocol in TC, as rbtree itself
> is already used by TCP to handle TCP retransmission.
> The goal here is to make this Qdisc as programmable as possible,
> that is, to replace as many existing Qdisc's as we can, no matter
> in tree or out of tree. This is why I give up on PIFO which has
> serious limitations on the programmablity.
> Here is a summary of design decisions I made:
> 1. Avoid eBPF struct_ops, as it would be really hard to program
> a Qdisc with this approach, literally all the struct Qdisc_ops
> and struct Qdisc_class_ops are needed to implement. This is almost
> as hard as programming a Qdisc kernel module.
> 2. Introduce skb map, which will allow other eBPF programs to store skb's
> too.
> a) As eBPF maps are not directly visible to the kernel, we have to
> dump the stats via eBPF map API's instead of netlink.
> b) The user-space is not allowed to read the entire packets, only
> __sk_buff
> itself is readable, because we don't have such a use case yet and it
> would
> require a different API to read the data, as map values have fixed
> length.
> c) Two eBPF helpers are introduced for skb map operations:
> bpf_skb_map_push() and bpf_skb_map_pop(). Normal map update is
> not allowed.
> d) Multi-queue support is implemented via map-in-map, in a similar
> push/pop fasion.
> e) Use the netdevice notifier to reset the packets inside skb map upon
> NETDEV_DOWN event.
> 3. Integrate with existing TC infra. For example, if the user doesn't want
> to implement her own filters (e.g. a flow dissector), she should be
> able
> to re-use the existing TC filters. Another helper
> bpf_skb_tc_classify() is
> introduced for this purpose.
> Any high-level feedback is welcome. Please kindly do not review any coding
> details until RFC tag is removed.
> TODO:
> 1. actually test it
Can you try to implement some existing qdisc using your new mechanism?
For BPF-CC, Martin showcased how dctcp/cubic can be reimplemented;
I feel like this patch series (even rfc), should also have a good example
to show that bpf qdisc is on par and can be used to at least implement
existing policies. fq/fq_codel/cake are good candidates.
next prev parent reply other threads:[~2022-06-24 20:51 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-02 4:10 [RFC Patch v5 0/5] net_sched: introduce eBPF based Qdisc Cong Wang
2022-06-02 4:10 ` [RFC Patch v5 1/5] net: introduce skb_rbtree_walk_safe() Cong Wang
2022-06-02 4:10 ` [RFC Patch v5 2/5] bpf: move map in map declarations to bpf.h Cong Wang
2022-06-02 4:10 ` [RFC Patch v5 3/5] bpf: introduce skb map and flow map Cong Wang
2022-06-02 4:10 ` [RFC Patch v5 4/5] net_sched: introduce eBPF based Qdisc Cong Wang
2022-06-03 23:14 ` Cong Wang
2022-06-02 4:10 ` [RFC Patch v5 5/5] net_sched: introduce helper bpf_skb_tc_classify() Cong Wang
2022-06-24 16:52 ` [RFC Patch v5 0/5] net_sched: introduce eBPF based Qdisc Toke Høiland-Jørgensen
2022-06-24 17:35 ` Dave Taht
2022-06-24 20:51 ` sdf [this message]
-- strict thread matches above, loose matches on Subject: below --
2022-09-22 7:46 杨培灏
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YrYj3LPaHV7thgJW@google.com \
--to=sdf@google.com \
--cc=bpf@vger.kernel.org \
--cc=cong.wang@bytedance.com \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=toke@redhat.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).