From: sdf@google.com
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org, razor@blackwall.org, ast@kernel.org,
andrii@kernel.org, martin.lau@linux.dev,
john.fastabend@gmail.com, joannelkoong@gmail.com,
memxor@gmail.com, toke@redhat.com, joe@cilium.io,
netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach tc BPF programs
Date: Wed, 5 Oct 2022 10:56:43 -0700 [thread overview]
Message-ID: <Yz3FW/H06XS5toBo@google.com> (raw)
In-Reply-To: <dd7b7afd-755b-e980-02b1-cfde0dad1236@iogearbox.net>
On 10/05, Daniel Borkmann wrote:
> On 10/5/22 2:55 AM, sdf@google.com wrote:
> > On 10/05, Daniel Borkmann wrote:
> [...]
> >
> > The series looks exciting, haven't had a chance to look deeply, will try
> > to find some time this week.
> Great, thanks!
> > We've chatted briefly about priority during the talk, let's maybe
> discuss
> > it here more?
> >
> > I, as a user, still really have no clue about what priority to use.
> > We have this problem at tc, and we'll seemingly have the same problem
> > here? I guess it's even more relevant in k8s because internally at G we
> > can control the users.
> >
> > Is it worth at least trying to provide some default bands / guidance?
> >
> > For example, having SEC('tc/ingress') receive attach_priority=124 by
> > default? Maybe we can even have something like 'tc/ingress_first' get
> > attach_priority=1 and 'tc/ingress_last' with attach_priority=254?
> > (the names are arbitrary, we can do something better)
> >
> > ingress_first/ingress_last can be used by some monitoring jobs. The rest
> > can use default 124. If somebody really needs a custom priority, then
> they
> > can manually use something around 124/2 if they need to trigger before
> the
> > 'default' priority or 124+124/2 if they want to trigger after?
> >
> > Thoughts? Is it worth it? Do we care?
> I think guidance is needed, yes, I can add a few paragraphs to the libbpf
> header file where we also have the tc BPF link API. I had a brief
> discussion
> around this also with developers from datadog as they also use the infra
> via tc BPF. Overall, its a hard problem, and I don't think there's a good
> generic solution. The '*_last' is implied by prio=0, so that kernel auto-
> allocates it, and for libbpf we could add an API for it where the user
> does not need to specify prio specifically. The 'appending' is reasonable
> to me given if an application explicitly requests to be added as first
> (and e.g. enforces policy through tc BPF), but some other 3rd party
> application
> prepends itself as first, it can bypass the former, which would be too
> easy
> to shoot yourself in the foot. Overall the issue in tc land is that
> ordering
> matters, skb packet data could be mangled (e.g. IPs NATed), skb fields can
> be mangled, and we can have redirect actions (dev A vs. B); the only way
> I'd
> see were this is possible if somewhat verifier can annotate the prog when
> it didn't observe any writes to skb, and no redirect was in play. Then
> you've
> kind of replicated the constraints similar to tracing where the attachment
> can say that ordering doesn't matter if all the progs are in same style.
> Otherwise, explicit corporation is needed as is today with rest of tc (or
> as Toke did in libxdp) with multi-attach. In the specific case I mentioned
> at LPC, it can be made to work given one of the two is only observing
> traffic
> at the layer, e.g. it could get prepended if there is guarantee that all
> return codes are tc_act_unspec so that there is no bypass and then you'll
> see all traffic or appended to see only traffic which made it past the
> policy. So it all depends on the applications installing programs, but to
> solve it generically is not possible given ordering and conflicting
> actions.
> So, imho, an _append() API for libbpf can be added along with guidance for
> developers when to use _append() vs explicit prio.
Agreed, it's a hard problem to solve, especially from the kernel side.
Ideally, as Toke mentions on the side thread, there should be some kind
of system daemon or some other place where the ordering is described.
But let's start with at least some guidance on the current prio.
Might be also a good idea to narrow down the prio range to 0-65k for
now? Maybe in the future we'll have some special PRIO_MONITORING_BEFORE_ALL
and PRIO_MONITORING_AFTER_ALL that trigger regardless of TC_ACT_UNSPEC?
I agree with Toke that it's another problem with the current action based
chains that's worth solving somehow (compared to, say, cgroup programs).
> Thanks,
> Daniel
> > > ����� };
> >
> > > ����� struct { /* anonymous struct used by BPF_PROG_TEST_RUN command
> */
> > > @@ -1452,7 +1460,10 @@ union bpf_attr {
> > > ����� } info;
> >
> > > ����� struct { /* anonymous struct used by BPF_PROG_QUERY command */
> > > -������� __u32������� target_fd;��� /* container object to query */
> > > +������� union {
> > > +����������� __u32��� target_fd;��� /* container object to query */
> > > +����������� __u32��� target_ifindex; /* target ifindex */
next prev parent reply other threads:[~2022-10-05 17:56 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-04 23:11 [PATCH bpf-next 00/10] BPF link support for tc BPF programs Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach " Daniel Borkmann
2022-10-05 0:55 ` sdf
2022-10-05 10:50 ` Toke Høiland-Jørgensen
2022-10-05 14:48 ` Daniel Borkmann
2022-10-05 12:35 ` Daniel Borkmann
2022-10-05 17:56 ` sdf [this message]
2022-10-05 18:21 ` Daniel Borkmann
2022-10-05 10:33 ` Toke Høiland-Jørgensen
2022-10-05 12:47 ` Daniel Borkmann
2022-10-05 14:32 ` Toke Høiland-Jørgensen
2022-10-05 14:53 ` Daniel Borkmann
2022-10-05 19:04 ` Jamal Hadi Salim
2022-10-06 20:49 ` Daniel Borkmann
2022-10-07 15:36 ` Jamal Hadi Salim
2022-10-06 0:22 ` Andrii Nakryiko
2022-10-06 5:00 ` Alexei Starovoitov
2022-10-06 14:40 ` Jamal Hadi Salim
2022-10-06 23:29 ` Alexei Starovoitov
2022-10-07 15:43 ` Jamal Hadi Salim
2022-10-06 21:29 ` Daniel Borkmann
2022-10-06 23:28 ` Alexei Starovoitov
2022-10-07 13:26 ` Daniel Borkmann
2022-10-07 14:32 ` Toke Høiland-Jørgensen
2022-10-07 16:55 ` sdf
2022-10-07 17:20 ` Toke Høiland-Jørgensen
2022-10-07 18:11 ` sdf
2022-10-07 19:06 ` Daniel Borkmann
2022-10-07 18:59 ` Alexei Starovoitov
2022-10-07 19:37 ` Daniel Borkmann
2022-10-07 22:45 ` sdf
2022-10-07 23:41 ` Alexei Starovoitov
2022-10-07 23:34 ` Alexei Starovoitov
2022-10-08 11:38 ` Toke Høiland-Jørgensen
2022-10-08 20:38 ` Alexei Starovoitov
2022-10-13 18:30 ` Andrii Nakryiko
2022-10-14 15:38 ` Alexei Starovoitov
2022-10-27 9:01 ` Daniel Xu
2022-10-06 20:15 ` Martin KaFai Lau
2022-10-06 20:54 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 02/10] bpf: Implement BPF link handling for " Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-06 20:54 ` Daniel Borkmann
2022-10-06 17:56 ` Martin KaFai Lau
2022-10-06 20:10 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 03/10] bpf: Implement link update for tc BPF link programs Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 04/10] bpf: Implement link introspection " Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-06 23:14 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 05/10] bpf: Implement link detach " Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-06 23:24 ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 06/10] libbpf: Change signature of bpf_prog_query Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 07/10] libbpf: Add extended attach/detach opts Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 08/10] libbpf: Add support for BPF tc link Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 09/10] bpftool: Add support for tc fd-based attach types Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 10/10] bpf, selftests: Add various BPF tc link selftests Daniel Borkmann
2022-10-06 3:19 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Yz3FW/H06XS5toBo@google.com \
--to=sdf@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=joannelkoong@gmail.com \
--cc=joe@cilium.io \
--cc=john.fastabend@gmail.com \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=razor@blackwall.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).