All of lore.kernel.org
 help / color / mirror / Atom feed
From: sdf@google.com
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	bpf <bpf@vger.kernel.org>,
	"Nikolay Aleksandrov" <razor@blackwall.org>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Joanne Koong" <joannelkoong@gmail.com>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"Joe Stringer" <joe@cilium.io>,
	"Network Development" <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs
Date: Fri, 7 Oct 2022 15:45:21 -0700	[thread overview]
Message-ID: <Y0CsATkd2qK1Mu2Z@google.com> (raw)
In-Reply-To: <8cc9811e-6efe-3aa5-b201-abbd4b10ceb4@iogearbox.net>

On 10/07, Daniel Borkmann wrote:
> On 10/7/22 8:59 PM, Alexei Starovoitov wrote:
> > On Fri, Oct 7, 2022 at 10:20 AM Toke H�iland-J�rgensen  
> <toke@redhat.com> wrote:
> [...]
> > > > > I was thinking a little about how this might work; i.e., how can  
> the
> > > > > kernel expose the required knobs to allow a system policy to be
> > > > > implemented without program loading having to talk to anything  
> other
> > > > > than the syscall API?
> > > >
> > > > > How about we only expose prepend/append in the prog attach UAPI,  
> and
> > > > > then have a kernel function that does the sorting like:
> > > >
> > > > > int bpf_add_new_tcx_prog(struct bpf_prog *progs, size_t  
> num_progs, struct
> > > > > bpf_prog *new_prog, bool append)
> > > >
> > > > > where the default implementation just appends/prepends to the  
> array in
> > > > > progs depending on the value of 'appen'.
> > > >
> > > > > And then use the __weak linking trick (or maybe struct_ops with a  
> member
> > > > > for TXC, another for XDP, etc?) to allow BPF to override the  
> function
> > > > > wholesale and implement whatever ordering it wants? I.e., allow  
> it can
> > > > > to just shift around the order of progs in the 'progs' array  
> whenever a
> > > > > program is loaded/unloaded?
> > > >
> > > > > This way, a userspace daemon can implement any policy it wants by  
> just
> > > > > attaching to that hook, and keeping things like how to express
> > > > > dependencies as a userspace concern?
> > > >
> > > > What if we do the above, but instead of simple global 'attach  
> first/last',
> > > > the default api would be:
> > > >
> > > > - attach before <target_fd>
> > > > - attach after <target_fd>
> > > > - attach before target_fd=-1 == first
> > > > - attach after target_fd=-1 == last
> > > >
> > > > ?
> > >
> > > Hmm, the problem with that is that applications don't generally have  
> an
> > > fd to another application's BPF programs; and obtaining them from an  
> ID
> > > is a privileged operation (CAP_SYS_ADMIN). We could have it be "attach
> > > before target *ID*" instead, which could work I guess? But then the
> > > problem becomes that it's racy: the ID you're targeting could get
> > > detached before you attach, so you'll need to be prepared to check  
> that
> > > and retry; and I'm almost certain that applications won't test for  
> this,
> > > so it'll just lead to hard-to-debug heisenbugs. Or am I being too
> > > pessimistic here?
> >
> > I like Stan's proposal and don't see any issue with FD.
> > It's good to gate specific sequencing with cap_sys_admin.
> > Also for consistency the FD is better than ID.
> >
> > I also like systemd analogy with Before=, After=.
> > systemd has a ton more ways to specify deps between Units,
> > but none of them have absolute numbers (which is what priority is).
> > The only bit I'd tweak in Stan's proposal is:
> > - attach before <target_fd>
> > - attach after <target_fd>
> > - attach before target_fd=0 == first
> > - attach after target_fd=0 == last

> I think the before(), after() could work, but the target_fd I have my  
> doubts
> that it will be practical. Maybe lets walk through a concrete real  
> example. app_a
> and app_b shipped via container_a resp container_b. Both want to install  
> tc BPF
> and we (operator/user) want to say that prog from app_b should only be  
> inserted
> after the one from app_a, never run before; if no prog_a is installed, we  
> ofc just
> run prog_b, but if prog_a is inserted, it must be before prog_b given the  
> latter
> can only run after the former. How would we get to one anothers target  
> fd? One
> could use the 0, but not if more programs sit before/after.

This fd/id has to be definitely abstracted by the loader. With the
program, we would ship some metadata like 'run_after:prog_a' for
prog_b (where prog_a might be literal function name maybe?).
However, this also depends on 'run_before:prog_b' in prog_a (in
case it happens to be started after prog_b) :-/

So yeah, some central place might still be needed; in this case, Toke's
suggestion on overriding this via bpf seems like the most flexible one.

Or maybe libbpf can consult some /etc/bpf.init.d/ directory for those?
Not sure if it's too much for libbpf or it's better done by the higher
levels? I guess we can rely on the program names and then all we really
need is some place to say 'prog X happens before Y' and for the loaders
to interpret that.

> To me it sounds reasonable to have the append mode as default mode/API,
> and an advanced option to say 'I want to run as 2nd prog, but if something
> is already attached as 2nd prog, shift all the others +1 in the array'  
> which
> would relate to your above point, Stan, of being able to stick into any
> place in the chain.

Replying to your other email here:

I'd still prefer, from the user side, to be able to stick my prog into
any place for debugging. But you suggestion to shift others for +1 works  
for me.
(although, not sure, for example, what happens if I want to shift right the
program that's at position 65k; aka already last?)

IMO, having explicit before/after+target is slightly better usability-wise
than juggling priorities, but I'm fine with either way.

  reply	other threads:[~2022-10-07 22:45 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04 23:11 [PATCH bpf-next 00/10] BPF link support for tc BPF programs Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach " Daniel Borkmann
2022-10-05  0:55   ` sdf
2022-10-05 10:50     ` Toke Høiland-Jørgensen
2022-10-05 14:48       ` Daniel Borkmann
2022-10-05 12:35     ` Daniel Borkmann
2022-10-05 17:56       ` sdf
2022-10-05 18:21         ` Daniel Borkmann
2022-10-05 10:33   ` Toke Høiland-Jørgensen
2022-10-05 12:47     ` Daniel Borkmann
2022-10-05 14:32       ` Toke Høiland-Jørgensen
2022-10-05 14:53         ` Daniel Borkmann
2022-10-05 19:04   ` Jamal Hadi Salim
2022-10-06 20:49     ` Daniel Borkmann
2022-10-07 15:36       ` Jamal Hadi Salim
2022-10-06  0:22   ` Andrii Nakryiko
2022-10-06  5:00   ` Alexei Starovoitov
2022-10-06 14:40     ` Jamal Hadi Salim
2022-10-06 23:29       ` Alexei Starovoitov
2022-10-07 15:43         ` Jamal Hadi Salim
2022-10-06 21:29     ` Daniel Borkmann
2022-10-06 23:28       ` Alexei Starovoitov
2022-10-07 13:26         ` Daniel Borkmann
2022-10-07 14:32           ` Toke Høiland-Jørgensen
2022-10-07 16:55             ` sdf
2022-10-07 17:20               ` Toke Høiland-Jørgensen
2022-10-07 18:11                 ` sdf
2022-10-07 19:06                   ` Daniel Borkmann
2022-10-07 18:59                 ` Alexei Starovoitov
2022-10-07 19:37                   ` Daniel Borkmann
2022-10-07 22:45                     ` sdf [this message]
2022-10-07 23:41                       ` Alexei Starovoitov
2022-10-07 23:34                     ` Alexei Starovoitov
2022-10-08 11:38                       ` Toke Høiland-Jørgensen
2022-10-08 20:38                         ` Alexei Starovoitov
2022-10-13 18:30                           ` Andrii Nakryiko
2022-10-14 15:38                             ` Alexei Starovoitov
2022-10-27  9:01                               ` Daniel Xu
2022-10-06 20:15   ` Martin KaFai Lau
2022-10-06 20:54   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 02/10] bpf: Implement BPF link handling for " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 20:54     ` Daniel Borkmann
2022-10-06 17:56   ` Martin KaFai Lau
2022-10-06 20:10   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 03/10] bpf: Implement link update for tc BPF link programs Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 04/10] bpf: Implement link introspection " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:14   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 05/10] bpf: Implement link detach " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:24   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 06/10] libbpf: Change signature of bpf_prog_query Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 07/10] libbpf: Add extended attach/detach opts Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 08/10] libbpf: Add support for BPF tc link Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 09/10] bpftool: Add support for tc fd-based attach types Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 10/10] bpf, selftests: Add various BPF tc link selftests Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0CsATkd2qK1Mu2Z@google.com \
    --to=sdf@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=joannelkoong@gmail.com \
    --cc=joe@cilium.io \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.