netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: sdf@google.com
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	bpf <bpf@vger.kernel.org>,
	"Nikolay Aleksandrov" <razor@blackwall.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Joanne Koong" <joannelkoong@gmail.com>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"Joe Stringer" <joe@cilium.io>,
	"Network Development" <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs
Date: Fri, 7 Oct 2022 15:45:21 -0700	[thread overview]
Message-ID: <Y0CsATkd2qK1Mu2Z@google.com> (raw)
In-Reply-To: <8cc9811e-6efe-3aa5-b201-abbd4b10ceb4@iogearbox.net>

On 10/07, Daniel Borkmann wrote:
> On 10/7/22 8:59 PM, Alexei Starovoitov wrote:
> > On Fri, Oct 7, 2022 at 10:20 AM Toke H�iland-J�rgensen  
> <toke@redhat.com> wrote:
> [...]
> > > > > I was thinking a little about how this might work; i.e., how can  
> the
> > > > > kernel expose the required knobs to allow a system policy to be
> > > > > implemented without program loading having to talk to anything  
> other
> > > > > than the syscall API?
> > > >
> > > > > How about we only expose prepend/append in the prog attach UAPI,  
> and
> > > > > then have a kernel function that does the sorting like:
> > > >
> > > > > int bpf_add_new_tcx_prog(struct bpf_prog *progs, size_t  
> num_progs, struct
> > > > > bpf_prog *new_prog, bool append)
> > > >
> > > > > where the default implementation just appends/prepends to the  
> array in
> > > > > progs depending on the value of 'appen'.
> > > >
> > > > > And then use the __weak linking trick (or maybe struct_ops with a  
> member
> > > > > for TXC, another for XDP, etc?) to allow BPF to override the  
> function
> > > > > wholesale and implement whatever ordering it wants? I.e., allow  
> it can
> > > > > to just shift around the order of progs in the 'progs' array  
> whenever a
> > > > > program is loaded/unloaded?
> > > >
> > > > > This way, a userspace daemon can implement any policy it wants by  
> just
> > > > > attaching to that hook, and keeping things like how to express
> > > > > dependencies as a userspace concern?
> > > >
> > > > What if we do the above, but instead of simple global 'attach  
> first/last',
> > > > the default api would be:
> > > >
> > > > - attach before <target_fd>
> > > > - attach after <target_fd>
> > > > - attach before target_fd=-1 == first
> > > > - attach after target_fd=-1 == last
> > > >
> > > > ?
> > >
> > > Hmm, the problem with that is that applications don't generally have  
> an
> > > fd to another application's BPF programs; and obtaining them from an  
> ID
> > > is a privileged operation (CAP_SYS_ADMIN). We could have it be "attach
> > > before target *ID*" instead, which could work I guess? But then the
> > > problem becomes that it's racy: the ID you're targeting could get
> > > detached before you attach, so you'll need to be prepared to check  
> that
> > > and retry; and I'm almost certain that applications won't test for  
> this,
> > > so it'll just lead to hard-to-debug heisenbugs. Or am I being too
> > > pessimistic here?
> >
> > I like Stan's proposal and don't see any issue with FD.
> > It's good to gate specific sequencing with cap_sys_admin.
> > Also for consistency the FD is better than ID.
> >
> > I also like systemd analogy with Before=, After=.
> > systemd has a ton more ways to specify deps between Units,
> > but none of them have absolute numbers (which is what priority is).
> > The only bit I'd tweak in Stan's proposal is:
> > - attach before <target_fd>
> > - attach after <target_fd>
> > - attach before target_fd=0 == first
> > - attach after target_fd=0 == last

> I think the before(), after() could work, but the target_fd I have my  
> doubts
> that it will be practical. Maybe lets walk through a concrete real  
> example. app_a
> and app_b shipped via container_a resp container_b. Both want to install  
> tc BPF
> and we (operator/user) want to say that prog from app_b should only be  
> inserted
> after the one from app_a, never run before; if no prog_a is installed, we  
> ofc just
> run prog_b, but if prog_a is inserted, it must be before prog_b given the  
> latter
> can only run after the former. How would we get to one anothers target  
> fd? One
> could use the 0, but not if more programs sit before/after.

This fd/id has to be definitely abstracted by the loader. With the
program, we would ship some metadata like 'run_after:prog_a' for
prog_b (where prog_a might be literal function name maybe?).
However, this also depends on 'run_before:prog_b' in prog_a (in
case it happens to be started after prog_b) :-/

So yeah, some central place might still be needed; in this case, Toke's
suggestion on overriding this via bpf seems like the most flexible one.

Or maybe libbpf can consult some /etc/bpf.init.d/ directory for those?
Not sure if it's too much for libbpf or it's better done by the higher
levels? I guess we can rely on the program names and then all we really
need is some place to say 'prog X happens before Y' and for the loaders
to interpret that.

> To me it sounds reasonable to have the append mode as default mode/API,
> and an advanced option to say 'I want to run as 2nd prog, but if something
> is already attached as 2nd prog, shift all the others +1 in the array'  
> which
> would relate to your above point, Stan, of being able to stick into any
> place in the chain.

Replying to your other email here:

I'd still prefer, from the user side, to be able to stick my prog into
any place for debugging. But you suggestion to shift others for +1 works  
for me.
(although, not sure, for example, what happens if I want to shift right the
program that's at position 65k; aka already last?)

IMO, having explicit before/after+target is slightly better usability-wise
than juggling priorities, but I'm fine with either way.

  reply	other threads:[~2022-10-07 22:45 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04 23:11 [PATCH bpf-next 00/10] BPF link support for tc BPF programs Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach " Daniel Borkmann
2022-10-05  0:55   ` sdf
2022-10-05 10:50     ` Toke Høiland-Jørgensen
2022-10-05 14:48       ` Daniel Borkmann
2022-10-05 12:35     ` Daniel Borkmann
2022-10-05 17:56       ` sdf
2022-10-05 18:21         ` Daniel Borkmann
2022-10-05 10:33   ` Toke Høiland-Jørgensen
2022-10-05 12:47     ` Daniel Borkmann
2022-10-05 14:32       ` Toke Høiland-Jørgensen
2022-10-05 14:53         ` Daniel Borkmann
2022-10-05 19:04   ` Jamal Hadi Salim
2022-10-06 20:49     ` Daniel Borkmann
2022-10-07 15:36       ` Jamal Hadi Salim
2022-10-06  0:22   ` Andrii Nakryiko
2022-10-06  5:00   ` Alexei Starovoitov
2022-10-06 14:40     ` Jamal Hadi Salim
2022-10-06 23:29       ` Alexei Starovoitov
2022-10-07 15:43         ` Jamal Hadi Salim
2022-10-06 21:29     ` Daniel Borkmann
2022-10-06 23:28       ` Alexei Starovoitov
2022-10-07 13:26         ` Daniel Borkmann
2022-10-07 14:32           ` Toke Høiland-Jørgensen
2022-10-07 16:55             ` sdf
2022-10-07 17:20               ` Toke Høiland-Jørgensen
2022-10-07 18:11                 ` sdf
2022-10-07 19:06                   ` Daniel Borkmann
2022-10-07 18:59                 ` Alexei Starovoitov
2022-10-07 19:37                   ` Daniel Borkmann
2022-10-07 22:45                     ` sdf [this message]
2022-10-07 23:41                       ` Alexei Starovoitov
2022-10-07 23:34                     ` Alexei Starovoitov
2022-10-08 11:38                       ` Toke Høiland-Jørgensen
2022-10-08 20:38                         ` Alexei Starovoitov
2022-10-13 18:30                           ` Andrii Nakryiko
2022-10-14 15:38                             ` Alexei Starovoitov
2022-10-27  9:01                               ` Daniel Xu
2022-10-06 20:15   ` Martin KaFai Lau
2022-10-06 20:54   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 02/10] bpf: Implement BPF link handling for " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 20:54     ` Daniel Borkmann
2022-10-06 17:56   ` Martin KaFai Lau
2022-10-06 20:10   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 03/10] bpf: Implement link update for tc BPF link programs Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 04/10] bpf: Implement link introspection " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:14   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 05/10] bpf: Implement link detach " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:24   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 06/10] libbpf: Change signature of bpf_prog_query Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 07/10] libbpf: Add extended attach/detach opts Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 08/10] libbpf: Add support for BPF tc link Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 09/10] bpftool: Add support for tc fd-based attach types Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 10/10] bpf, selftests: Add various BPF tc link selftests Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0CsATkd2qK1Mu2Z@google.com \
    --to=sdf@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=joannelkoong@gmail.com \
    --cc=joe@cilium.io \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).