All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: sdf@google.com, Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org, razor@blackwall.org, ast@kernel.org,
	andrii@kernel.org, martin.lau@linux.dev,
	john.fastabend@gmail.com, joannelkoong@gmail.com,
	memxor@gmail.com, joe@cilium.io, netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs
Date: Wed, 05 Oct 2022 12:50:13 +0200	[thread overview]
Message-ID: <878rluily2.fsf@toke.dk> (raw)
In-Reply-To: <YzzWDqAmN5DRTupQ@google.com>

sdf@google.com writes:

>>   	struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
>> -		__u32		target_fd;	/* container object to attach to */
>> +		union {
>> +			__u32	target_fd;	/* container object to attach to */
>> +			__u32	target_ifindex; /* target ifindex */
>> +		};
>>   		__u32		attach_bpf_fd;	/* eBPF program to attach */
>>   		__u32		attach_type;
>>   		__u32		attach_flags;
>> -		__u32		replace_bpf_fd;	/* previously attached eBPF
>
> [..]
>
>> +		union {
>> +			__u32	attach_priority;
>> +			__u32	replace_bpf_fd;	/* previously attached eBPF
>>   						 * program to replace if
>>   						 * BPF_F_REPLACE is used
>>   						 */
>> +		};
>
> The series looks exciting, haven't had a chance to look deeply, will try
> to find some time this week.
>
> We've chatted briefly about priority during the talk, let's maybe discuss
> it here more?
>
> I, as a user, still really have no clue about what priority to use.
> We have this problem at tc, and we'll seemingly have the same problem
> here? I guess it's even more relevant in k8s because internally at G we
> can control the users.
>
> Is it worth at least trying to provide some default bands / guidance?
>
> For example, having SEC('tc/ingress') receive attach_priority=124 by
> default? Maybe we can even have something like 'tc/ingress_first' get
> attach_priority=1 and 'tc/ingress_last' with attach_priority=254?
> (the names are arbitrary, we can do something better)
>
> ingress_first/ingress_last can be used by some monitoring jobs. The rest
> can use default 124. If somebody really needs a custom priority, then they
> can manually use something around 124/2 if they need to trigger before the
> 'default' priority or 124+124/2 if they want to trigger after?
>
> Thoughts? Is it worth it? Do we care?

I think we should care :)

Having "better" defaults are probably a good idea (so not everything
just ends up at priority 1 by default). However, I think ultimately the
only robust solution is to make the priority override-able. Users are
going to want to combine BPF programs in ways that their authors didn't
anticipate, so the actual priority the programs run at should not be the
sole choice of the program author.

To use the example that Daniel presented at LPC: Running datadog and
cilium at the same time broke cilium because datadog took over the
prio-1 hook point. With the bpf_link API what would change is that (a)
it would be obvious that something breaks (that is good), and (b) it
would be datadog that breaks instead of cilium (because it can no longer
just take over the hook, it'll get an error instead). However, (b) means
that the user still hasn't gotten what they wanted: the ability to run
datadog and cilium at the same time. To do this, they will need to be
able to change the priorities of one or both applications.

I know cilium at least has a configuration option to change this
somewhere, but I don't think relying on every BPF-using application to
expose this (each in their own way) is a good solution. I think of
priorities more like daemon startup at boot: this is system policy,
decided by the equivalent of the init system (and in this analogy we are
currently at the 'rc.d' stage of init system design, with the hook
priorities).

One way to resolve this is to have a central daemon that implements the
policy and does all the program loading on behalf of the users. I think
multiple such daemons exist already in more or less public and/or
complete states. However, getting everyone to agree on one is also hard,
so maybe the kernel needs to expose a mechanism for doing the actual
overriding, and then whatever daemon people run can hook into that?

Not sure what that mechanism would be? A(nother) BPF hook for overriding
priority on load? An LSM hook that rewrites the system call? (can it
already do that?) Something else?

Oh, and also, in the case of TC there's also the additional issue that
execution only chains to the next program if the current one returns
TC_ACT_UNSPEC; this should probably also be overridable somehow, for the
same reasons...

-Toke


  reply	other threads:[~2022-10-05 10:50 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04 23:11 [PATCH bpf-next 00/10] BPF link support for tc BPF programs Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach " Daniel Borkmann
2022-10-05  0:55   ` sdf
2022-10-05 10:50     ` Toke Høiland-Jørgensen [this message]
2022-10-05 14:48       ` Daniel Borkmann
2022-10-05 12:35     ` Daniel Borkmann
2022-10-05 17:56       ` sdf
2022-10-05 18:21         ` Daniel Borkmann
2022-10-05 10:33   ` Toke Høiland-Jørgensen
2022-10-05 12:47     ` Daniel Borkmann
2022-10-05 14:32       ` Toke Høiland-Jørgensen
2022-10-05 14:53         ` Daniel Borkmann
2022-10-05 19:04   ` Jamal Hadi Salim
2022-10-06 20:49     ` Daniel Borkmann
2022-10-07 15:36       ` Jamal Hadi Salim
2022-10-06  0:22   ` Andrii Nakryiko
2022-10-06  5:00   ` Alexei Starovoitov
2022-10-06 14:40     ` Jamal Hadi Salim
2022-10-06 23:29       ` Alexei Starovoitov
2022-10-07 15:43         ` Jamal Hadi Salim
2022-10-06 21:29     ` Daniel Borkmann
2022-10-06 23:28       ` Alexei Starovoitov
2022-10-07 13:26         ` Daniel Borkmann
2022-10-07 14:32           ` Toke Høiland-Jørgensen
2022-10-07 16:55             ` sdf
2022-10-07 17:20               ` Toke Høiland-Jørgensen
2022-10-07 18:11                 ` sdf
2022-10-07 19:06                   ` Daniel Borkmann
2022-10-07 18:59                 ` Alexei Starovoitov
2022-10-07 19:37                   ` Daniel Borkmann
2022-10-07 22:45                     ` sdf
2022-10-07 23:41                       ` Alexei Starovoitov
2022-10-07 23:34                     ` Alexei Starovoitov
2022-10-08 11:38                       ` Toke Høiland-Jørgensen
2022-10-08 20:38                         ` Alexei Starovoitov
2022-10-13 18:30                           ` Andrii Nakryiko
2022-10-14 15:38                             ` Alexei Starovoitov
2022-10-27  9:01                               ` Daniel Xu
2022-10-06 20:15   ` Martin KaFai Lau
2022-10-06 20:54   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 02/10] bpf: Implement BPF link handling for " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 20:54     ` Daniel Borkmann
2022-10-06 17:56   ` Martin KaFai Lau
2022-10-06 20:10   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 03/10] bpf: Implement link update for tc BPF link programs Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 04/10] bpf: Implement link introspection " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:14   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 05/10] bpf: Implement link detach " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:24   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 06/10] libbpf: Change signature of bpf_prog_query Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 07/10] libbpf: Add extended attach/detach opts Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 08/10] libbpf: Add support for BPF tc link Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 09/10] bpftool: Add support for tc fd-based attach types Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 10/10] bpf, selftests: Add various BPF tc link selftests Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878rluily2.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=joannelkoong@gmail.com \
    --cc=joe@cilium.io \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=sdf@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.