From: sdf@google.com
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: "Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
"Toke Høiland-Jørgensen" <toke@redhat.com>,
bpf <bpf@vger.kernel.org>,
"Nikolay Aleksandrov" <razor@blackwall.org>,
"Alexei Starovoitov" <ast@kernel.org>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Martin KaFai Lau" <martin.lau@linux.dev>,
"John Fastabend" <john.fastabend@gmail.com>,
"Joanne Koong" <joannelkoong@gmail.com>,
"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
"Joe Stringer" <joe@cilium.io>,
"Network Development" <netdev@vger.kernel.org>
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs
Date: Fri, 7 Oct 2022 15:45:21 -0700 [thread overview]
Message-ID: <Y0CsATkd2qK1Mu2Z@google.com> (raw)
In-Reply-To: <8cc9811e-6efe-3aa5-b201-abbd4b10ceb4@iogearbox.net>
On 10/07, Daniel Borkmann wrote:
> On 10/7/22 8:59 PM, Alexei Starovoitov wrote:
> > On Fri, Oct 7, 2022 at 10:20 AM Toke H�iland-J�rgensen
> <toke@redhat.com> wrote:
> [...]
> > > > > I was thinking a little about how this might work; i.e., how can
> the
> > > > > kernel expose the required knobs to allow a system policy to be
> > > > > implemented without program loading having to talk to anything
> other
> > > > > than the syscall API?
> > > >
> > > > > How about we only expose prepend/append in the prog attach UAPI,
> and
> > > > > then have a kernel function that does the sorting like:
> > > >
> > > > > int bpf_add_new_tcx_prog(struct bpf_prog *progs, size_t
> num_progs, struct
> > > > > bpf_prog *new_prog, bool append)
> > > >
> > > > > where the default implementation just appends/prepends to the
> array in
> > > > > progs depending on the value of 'appen'.
> > > >
> > > > > And then use the __weak linking trick (or maybe struct_ops with a
> member
> > > > > for TXC, another for XDP, etc?) to allow BPF to override the
> function
> > > > > wholesale and implement whatever ordering it wants? I.e., allow
> it can
> > > > > to just shift around the order of progs in the 'progs' array
> whenever a
> > > > > program is loaded/unloaded?
> > > >
> > > > > This way, a userspace daemon can implement any policy it wants by
> just
> > > > > attaching to that hook, and keeping things like how to express
> > > > > dependencies as a userspace concern?
> > > >
> > > > What if we do the above, but instead of simple global 'attach
> first/last',
> > > > the default api would be:
> > > >
> > > > - attach before <target_fd>
> > > > - attach after <target_fd>
> > > > - attach before target_fd=-1 == first
> > > > - attach after target_fd=-1 == last
> > > >
> > > > ?
> > >
> > > Hmm, the problem with that is that applications don't generally have
> an
> > > fd to another application's BPF programs; and obtaining them from an
> ID
> > > is a privileged operation (CAP_SYS_ADMIN). We could have it be "attach
> > > before target *ID*" instead, which could work I guess? But then the
> > > problem becomes that it's racy: the ID you're targeting could get
> > > detached before you attach, so you'll need to be prepared to check
> that
> > > and retry; and I'm almost certain that applications won't test for
> this,
> > > so it'll just lead to hard-to-debug heisenbugs. Or am I being too
> > > pessimistic here?
> >
> > I like Stan's proposal and don't see any issue with FD.
> > It's good to gate specific sequencing with cap_sys_admin.
> > Also for consistency the FD is better than ID.
> >
> > I also like systemd analogy with Before=, After=.
> > systemd has a ton more ways to specify deps between Units,
> > but none of them have absolute numbers (which is what priority is).
> > The only bit I'd tweak in Stan's proposal is:
> > - attach before <target_fd>
> > - attach after <target_fd>
> > - attach before target_fd=0 == first
> > - attach after target_fd=0 == last
> I think the before(), after() could work, but the target_fd I have my
> doubts
> that it will be practical. Maybe lets walk through a concrete real
> example. app_a
> and app_b shipped via container_a resp container_b. Both want to install
> tc BPF
> and we (operator/user) want to say that prog from app_b should only be
> inserted
> after the one from app_a, never run before; if no prog_a is installed, we
> ofc just
> run prog_b, but if prog_a is inserted, it must be before prog_b given the
> latter
> can only run after the former. How would we get to one anothers target
> fd? One
> could use the 0, but not if more programs sit before/after.
This fd/id has to be definitely abstracted by the loader. With the
program, we would ship some metadata like 'run_after:prog_a' for
prog_b (where prog_a might be literal function name maybe?).
However, this also depends on 'run_before:prog_b' in prog_a (in
case it happens to be started after prog_b) :-/
So yeah, some central place might still be needed; in this case, Toke's
suggestion on overriding this via bpf seems like the most flexible one.
Or maybe libbpf can consult some /etc/bpf.init.d/ directory for those?
Not sure if it's too much for libbpf or it's better done by the higher
levels? I guess we can rely on the program names and then all we really
need is some place to say 'prog X happens before Y' and for the loaders
to interpret that.
> To me it sounds reasonable to have the append mode as default mode/API,
> and an advanced option to say 'I want to run as 2nd prog, but if something
> is already attached as 2nd prog, shift all the others +1 in the array'
> which
> would relate to your above point, Stan, of being able to stick into any
> place in the chain.
Replying to your other email here:
I'd still prefer, from the user side, to be able to stick my prog into
any place for debugging. But you suggestion to shift others for +1 works
for me.
(although, not sure, for example, what happens if I want to shift right the
program that's at position 65k; aka already last?)
IMO, having explicit before/after+target is slightly better usability-wise
than juggling priorities, but I'm fine with either way.
next prev parent reply other threads:[~2022-10-07 22:45 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-04 23:11 [PATCH bpf-next 00/10] BPF link support for tc BPF programs Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach " Daniel Borkmann
2022-10-05 0:55 ` sdf
2022-10-05 10:50 ` Toke Høiland-Jørgensen
2022-10-05 14:48 ` Daniel Borkmann
2022-10-05 12:35 ` Daniel Borkmann
2022-10-05 17:56 ` sdf
2022-10-05 18:21 ` Daniel Borkmann
2022-10-05 10:33 ` Toke Høiland-Jørgensen
2022-10-05 12:47 ` Daniel Borkmann
2022-10-05 14:32 ` Toke Høiland-Jørgensen
2022-10-05 14:53 ` Daniel Borkmann
2022-10-05 19:04 ` Jamal Hadi Salim
2022-10-06 20:49 ` Daniel Borkmann
2022-10-07 15:36 ` Jamal Hadi Salim
2022-10-06 0:22 ` Andrii Nakryiko
2022-10-06 5:00 ` Alexei Starovoitov
2022-10-06 14:40 ` Jamal Hadi Salim
2022-10-06 23:29 ` Alexei Starovoitov
2022-10-07 15:43 ` Jamal Hadi Salim
2022-10-06 21:29 ` Daniel Borkmann
2022-10-06 23:28 ` Alexei Starovoitov
2022-10-07 13:26 ` Daniel Borkmann
2022-10-07 14:32 ` Toke Høiland-Jørgensen
2022-10-07 16:55 ` sdf
2022-10-07 17:20 ` Toke Høiland-Jørgensen
2022-10-07 18:11 ` sdf
2022-10-07 19:06 ` Daniel Borkmann
2022-10-07 18:59 ` Alexei Starovoitov
2022-10-07 19:37 ` Daniel Borkmann
2022-10-07 22:45 ` sdf [this message]
2022-10-07 23:41 ` Alexei Starovoitov
2022-10-07 23:34 ` Alexei Starovoitov
2022-10-08 11:38 ` Toke Høiland-Jørgensen
2022-10-08 20:38 ` Alexei Starovoitov
2022-10-13 18:30 ` Andrii Nakryiko
2022-10-14 15:38 ` Alexei Starovoitov
2022-10-27 9:01 ` Daniel Xu
2022-10-06 20:15 ` Martin KaFai Lau
2022-10-06 20:54 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 02/10] bpf: Implement BPF link handling for " Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-06 20:54 ` Daniel Borkmann
2022-10-06 17:56 ` Martin KaFai Lau
2022-10-06 20:10 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 03/10] bpf: Implement link update for tc BPF link programs Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 04/10] bpf: Implement link introspection " Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-06 23:14 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 05/10] bpf: Implement link detach " Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-06 23:24 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 06/10] libbpf: Change signature of bpf_prog_query Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 07/10] libbpf: Add extended attach/detach opts Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 08/10] libbpf: Add support for BPF tc link Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 09/10] bpftool: Add support for tc fd-based attach types Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 10/10] bpf, selftests: Add various BPF tc link selftests Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0CsATkd2qK1Mu2Z@google.com \
--to=sdf@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=joannelkoong@gmail.com \
--cc=joe@cilium.io \
--cc=john.fastabend@gmail.com \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=razor@blackwall.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).