Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API
Date: Wed, 28 Apr 2021 00:36:25 +0200	[thread overview]
Message-ID: <87sg3b8idy.fsf@toke.dk> (raw)
In-Reply-To: <7a75062e-b439-68b3-afa3-44ea519624c7@iogearbox.net>

Daniel Borkmann <daniel@iogearbox.net> writes:

> On 4/27/21 11:55 PM, Daniel Borkmann wrote:
>> On 4/27/21 8:02 PM, Kumar Kartikeya Dwivedi wrote:
>>> On Tue, Apr 27, 2021 at 08:34:30PM IST, Daniel Borkmann wrote:
>>>> On 4/23/21 5:05 PM, Kumar Kartikeya Dwivedi wrote:
>> [...]
>>>>> +/*
>>>>> + * @ctx: Can be NULL, if not, must point to a valid object.
>>>>> + *     If the qdisc was attached during ctx_init, it will be deleted if no
>>>>> + *     filters are attached to it.
>>>>> + *     When ctx == NULL, this is a no-op.
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_ctx_destroy(struct bpf_tc_ctx *ctx);
>>>>> +/*
>>>>> + * @ctx: Cannot be NULL.
>>>>> + * @fd: Must be >= 0.
>>>>> + * @opts: Cannot be NULL, prog_id must be unset, all other fields can be
>>>>> + *      optionally set. All fields except replace  will be set as per created
>>>>> + *        filter's attributes. parent must only be set when attach_point of ctx is
>>>>> + *        BPF_TC_CUSTOM_PARENT, otherwise parent must be unset.
>>>>> + *
>>>>> + * Fills the following fields in opts:
>>>>> + *    handle
>>>>> + *    parent
>>>>> + *    priority
>>>>> + *    prog_id
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_attach(struct bpf_tc_ctx *ctx, int fd,
>>>>> +                 struct bpf_tc_opts *opts);
>>>>> +/*
>>>>> + * @ctx: Cannot be NULL.
>>>>> + * @opts: Cannot be NULL, replace and prog_id must be unset, all other fields
>>>>> + *      must be set.
>>>>> + */
>>>>> +LIBBPF_API int bpf_tc_detach(struct bpf_tc_ctx *ctx,
>>>>> +                 const struct bpf_tc_opts *opts);
>>>>
>>>> One thing that I find a bit odd from this API is that BPF_TC_INGRESS / BPF_TC_EGRESS
>>>> needs to be set each time via bpf_tc_ctx_init(). So whenever a specific program would
>>>> be attached to both we need to 're-init' in between just to change from hook a to b,
>>>> whereas when you have BPF_TC_CUSTOM_PARENT, you could just use a different opts->parent
>>>> without going this detour (unless the clsact wasn't loaded there in the first place).
>>>
>>> Currently I check that opts->parent is unset when BPF_TC_INGRESS or BPF_TC_EGRESS
>>> is set as attach point. But since both map to clsact, we could allow the user to
>>> specify opts->parent as BPF_TC_INGRESS or BPF_TC_EGRESS (no need to use
>>> TC_H_MAKE, we can detect it from ctx->parent that it won't be a parent id). This
>>> would mean that by default attach point is what you set for ctx, but for
>>> bpf_tc_attach you can temporarily override to be some other attach point (for
>>> the same qdisc). You still won't be able to set anything other than the two
>>> though.
>> 
>> I think the assumption on auto-detecting the parent id in that case might not hold given
>> major number could very well be 0. Wrt BPF_TC_UNSPEC ... maybe it's not even needed, back
>> to drawing board ...
>> 
>> Here's how the whole API could look like, usage examples below:
>> 
>>    enum bpf_tc_attach_point {
>>      BPF_TC_INGRESS = 1 << 0,
>>      BPF_TC_EGRESS  = 1 << 1,
>>      BPF_TC_CUSTOM  = 1 << 2,
>>    };
>> 
>>    enum bpf_tc_attach_flags {
>>      BPF_TC_F_REPLACE = 1 << 0,
>>    };
>> 
>>    struct bpf_tc_hook {
>>      size_t sz;
>>      int    ifindex;
>>      enum bpf_tc_attach_point which;
>>      __u32  parent;
>>      size_t :0;
>>    };
>> 
>>    struct bpf_tc_opts {
>>      size_t sz;
>>      __u32  handle;
>>      __u16  priority;
>>      union {
>>          int   prog_fd;
>>          __u32 prog_id;
>>      };
>>      size_t :0;
>>    };
>> 
>>    LIBBPF_API int bpf_tc_hook_create(struct bpf_tc_hook *hook);
>>    LIBBPF_API int bpf_tc_hook_destroy(struct bpf_tc_hook *hook);
>> 
>>    LIBBPF_API int bpf_tc_attach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts, int flags);
>>    LIBBPF_API int bpf_tc_detach(const struct bpf_tc_hook *hook, const struct bpf_tc_opts *opts);
>>    LIBBPF_API int bpf_tc_query(const struct bpf_tc_hook *hook, struct bpf_tc_opts *opts);
>> 
>> So a user could do just:
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1, .prog_fd = fd);
>> 
>>    err = bpf_tc_attach(&hook, &opts, BPF_TC_F_REPLACE);
>>    [...]
>> 
>> If it's not known whether the hook exists, then a preceding call to:
>> 
>>    err = bpf_tc_hook_create(&hook);
>>    [...]
>> 
>> The bpf_tc_query() would look like:
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_EGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
>> 
>>    err = bpf_tc_query(&hook, &opts);
>>    if (!err) {
>>           [...]  // gives access to: opts.prog_id
>>    }
>> 
>> The bpf_tc_detach():
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS);
>>    DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .handle = 1, .priority = 1);
>> 
>>    err = bpf_tc_detach(&hook, &opts);
>>    [...]
>> 
>> The nice thing would be that hook and opts are kept semantically separate, meaning with
>> hook you can iterate though a bunch of devs and ingress/egress locations without changing
>> opts, whereas with opts you could iterate on the cls_bpf instance itself w/o changing
>> hook. Both are kept extensible via DECLARE_LIBBPF_OPTS().
>> 
>> Now the bpf_tc_hook_destroy() one:
>> 
>>    DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>> 
>>    err = bpf_tc_hook_destroy(&hook);
>>    [...]
>> 
>> For triggering a remove of the clsact qdisc on the device, both directions are passed in.
>> Combining both is only ever allowed for bpf_tc_hook_destroy().
>
> Small addendum:
>
>      DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex = 42, .which = BPF_TC_INGRESS|BPF_TC_EGRESS);
>
>      err = bpf_tc_hook_create(&hook);
>      [...]
>
> ... is also possible, of course, and then both bpf_tc_hook_{create,destroy}() are symmetric.

It should be allowed, but it wouldn't actually make any difference which
combination of TC_INGRESS and TC_EGRESS you specify, as long as one of
them is set, right? I.e., we just attach the clsact qdisc in both
cases...

-Toke

next prev parent reply	other threads:[~2021-04-27 22:36 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-23 15:05 [PATCH bpf-next v4 0/3] Add TC-BPF API Kumar Kartikeya Dwivedi
2021-04-23 15:05 ` [PATCH bpf-next v4 1/3] libbpf: add helpers for preparing netlink attributes Kumar Kartikeya Dwivedi
2021-04-23 15:05 ` [PATCH bpf-next v4 2/3] libbpf: add low level TC-BPF API Kumar Kartikeya Dwivedi
2021-04-27 15:04   ` Daniel Borkmann
2021-04-27 18:00     ` Toke Høiland-Jørgensen
2021-04-27 18:02     ` Kumar Kartikeya Dwivedi
2021-04-27 18:15       ` Toke Høiland-Jørgensen
2021-04-27 21:55       ` Daniel Borkmann
2021-04-27 22:05         ` Daniel Borkmann
2021-04-27 22:32           ` Daniel Borkmann
2021-04-27 22:36           ` Toke Høiland-Jørgensen [this message]
2021-04-27 22:40             ` Daniel Borkmann
2021-04-27 22:51               ` Toke Høiland-Jørgensen
2021-04-27 23:14                 ` Daniel Borkmann
2021-04-27 23:19                   ` Kumar Kartikeya Dwivedi
2021-04-27 20:36     ` Andrii Nakryiko
2021-04-23 15:06 ` [PATCH bpf-next v4 3/3] libbpf: add selftests for " Kumar Kartikeya Dwivedi
2021-04-27 21:50   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87sg3b8idy.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.