netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>
Cc: bpf@vger.kernel.org, razor@blackwall.org, ast@kernel.org,
	andrii@kernel.org, martin.lau@linux.dev,
	john.fastabend@gmail.com, joannelkoong@gmail.com,
	memxor@gmail.com, toke@redhat.com, joe@cilium.io,
	netdev@vger.kernel.org
Subject: Re: [PATCH bpf-next 02/10] bpf: Implement BPF link handling for tc BPF programs
Date: Wed, 5 Oct 2022 20:19:20 -0700	[thread overview]
Message-ID: <CAEf4Bzak_v01v5Y6dNT_1KAcax_hvVqZM4o+d_w5OJSWeLJz2g@mail.gmail.com> (raw)
In-Reply-To: <20221004231143.19190-3-daniel@iogearbox.net>

On Tue, Oct 4, 2022 at 4:12 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This work adds BPF links for tc. As a recap, a BPF link represents the attachment
> of a BPF program to a BPF hook point. The BPF link holds a single reference to
> keep BPF program alive. Moreover, hook points do not reference a BPF link, only
> the application's fd or pinning does. A BPF link holds meta-data specific to
> attachment and implements operations for link creation, (atomic) BPF program
> update, detachment and introspection.
>
> The motivation for BPF links for tc BPF programs is multi-fold, for example:
>
> - "It's especially important for applications that are deployed fleet-wide
>    and that don't "control" hosts they are deployed to. If such application
>    crashes and no one notices and does anything about that, BPF program will
>    keep running draining resources or even just, say, dropping packets. We
>    at FB had outages due to such permanent BPF attachment semantics. With
>    fd-based BPF link we are getting a framework, which allows safe, auto-
>    detachable behavior by default, unless application explicitly opts in by
>    pinning the BPF link." [0]
>
> -  From Cilium-side the tc BPF programs we attach to host-facing veth devices
>    and phys devices build the core datapath for Kubernetes Pods, and they
>    implement forwarding, load-balancing, policy, EDT-management, etc, within
>    BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
>    experienced hard-to-debug issues in a user's staging environment where
>    another Kubernetes application using tc BPF attached to the same prio/handle
>    of cls_bpf, wiping all Cilium-based BPF programs from underneath it. The
>    goal is to establish a clear/safe ownership model via links which cannot
>    accidentally be overridden. [1]
>
> BPF links for tc can co-exist with non-link attachments, and the semantics are
> in line also with XDP links: BPF links cannot replace other BPF links, BPF
> links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
> lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
> would solve mentioned issue of safe ownership model as 3rd party applications
> would not be able to accidentally wipe Cilium programs, even if they are not
> BPF link aware.
>
> Earlier attempts [2] have tried to integrate BPF links into core tc machinery
> to solve cls_bpf, which has been intrusive to the generic tc kernel API with
> extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
> be wiped from the qdisc also. Locking a tc BPF program in place this way, is
> getting into layering hacks given the two object models are vastly different.
> We chose to implement a prerequisite of the fd-based tc BPF attach API, so
> that the BPF link implementation fits in naturally similar to other link types
> which are fd-based and without the need for changing core tc internal APIs.
>
> BPF programs for tc can then be successively migrated from cls_bpf to the new
> tc BPF link without needing to change the program's source code, just the BPF
> loader mechanics for attaching.
>
>   [0] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com/
>   [1] https://lpc.events/event/16/contributions/1353/
>   [2] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com/
>
> Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---

have you considered supporting BPF cookie from the outset? It should
be trivial if you remove union from bpf_prog_array_item. If not, then
we should reject LINK_CREATE if bpf_cookie is non-zero.

>  include/linux/bpf.h            |   5 +-
>  include/net/xtc.h              |  14 ++++
>  include/uapi/linux/bpf.h       |   5 ++
>  kernel/bpf/net.c               | 116 ++++++++++++++++++++++++++++++---
>  kernel/bpf/syscall.c           |   3 +
>  tools/include/uapi/linux/bpf.h |   5 ++
>  6 files changed, 139 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 71e5f43db378..226a74f65704 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -1473,7 +1473,10 @@ struct bpf_prog_array_item {
>         union {
>                 struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
>                 u64 bpf_cookie;
> -               u32 bpf_priority;
> +               struct {
> +                       u32 bpf_priority;
> +                       u32 bpf_id;

this is link_id, is that right? should we name it as such?

> +               };
>         };
>  };
>

[...]

> diff --git a/kernel/bpf/net.c b/kernel/bpf/net.c
> index ab9a9dee615b..22b7a9b05483 100644
> --- a/kernel/bpf/net.c
> +++ b/kernel/bpf/net.c
> @@ -8,7 +8,7 @@
>  #include <net/xtc.h>
>
>  static int __xtc_prog_attach(struct net_device *dev, bool ingress, u32 limit,
> -                            struct bpf_prog *nprog, u32 prio, u32 flags)
> +                            u32 id, struct bpf_prog *nprog, u32 prio, u32 flags)

similarly here, id -> link_id or something like that, it's quite
confusing what kind of ID it is otherwise

>  {
>         struct bpf_prog_array_item *item, *tmp;
>         struct xtc_entry *entry, *peer;
> @@ -27,10 +27,13 @@ static int __xtc_prog_attach(struct net_device *dev, bool ingress, u32 limit,
>                 if (!oprog)
>                         break;
>                 if (item->bpf_priority == prio) {
> -                       if (flags & BPF_F_REPLACE) {
> +                       if (item->bpf_id == id &&
> +                           (flags & BPF_F_REPLACE)) {

[...]

  reply	other threads:[~2022-10-06  3:20 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-04 23:11 [PATCH bpf-next 00/10] BPF link support for tc BPF programs Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 01/10] bpf: Add initial fd-based API to attach " Daniel Borkmann
2022-10-05  0:55   ` sdf
2022-10-05 10:50     ` Toke Høiland-Jørgensen
2022-10-05 14:48       ` Daniel Borkmann
2022-10-05 12:35     ` Daniel Borkmann
2022-10-05 17:56       ` sdf
2022-10-05 18:21         ` Daniel Borkmann
2022-10-05 10:33   ` Toke Høiland-Jørgensen
2022-10-05 12:47     ` Daniel Borkmann
2022-10-05 14:32       ` Toke Høiland-Jørgensen
2022-10-05 14:53         ` Daniel Borkmann
2022-10-05 19:04   ` Jamal Hadi Salim
2022-10-06 20:49     ` Daniel Borkmann
2022-10-07 15:36       ` Jamal Hadi Salim
2022-10-06  0:22   ` Andrii Nakryiko
2022-10-06  5:00   ` Alexei Starovoitov
2022-10-06 14:40     ` Jamal Hadi Salim
2022-10-06 23:29       ` Alexei Starovoitov
2022-10-07 15:43         ` Jamal Hadi Salim
2022-10-06 21:29     ` Daniel Borkmann
2022-10-06 23:28       ` Alexei Starovoitov
2022-10-07 13:26         ` Daniel Borkmann
2022-10-07 14:32           ` Toke Høiland-Jørgensen
2022-10-07 16:55             ` sdf
2022-10-07 17:20               ` Toke Høiland-Jørgensen
2022-10-07 18:11                 ` sdf
2022-10-07 19:06                   ` Daniel Borkmann
2022-10-07 18:59                 ` Alexei Starovoitov
2022-10-07 19:37                   ` Daniel Borkmann
2022-10-07 22:45                     ` sdf
2022-10-07 23:41                       ` Alexei Starovoitov
2022-10-07 23:34                     ` Alexei Starovoitov
2022-10-08 11:38                       ` Toke Høiland-Jørgensen
2022-10-08 20:38                         ` Alexei Starovoitov
2022-10-13 18:30                           ` Andrii Nakryiko
2022-10-14 15:38                             ` Alexei Starovoitov
2022-10-27  9:01                               ` Daniel Xu
2022-10-06 20:15   ` Martin KaFai Lau
2022-10-06 20:54   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 02/10] bpf: Implement BPF link handling for " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko [this message]
2022-10-06 20:54     ` Daniel Borkmann
2022-10-06 17:56   ` Martin KaFai Lau
2022-10-06 20:10   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 03/10] bpf: Implement link update for tc BPF link programs Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 04/10] bpf: Implement link introspection " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:14   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 05/10] bpf: Implement link detach " Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-06 23:24   ` Martin KaFai Lau
2022-10-04 23:11 ` [PATCH bpf-next 06/10] libbpf: Change signature of bpf_prog_query Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 07/10] libbpf: Add extended attach/detach opts Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 08/10] libbpf: Add support for BPF tc link Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko
2022-10-04 23:11 ` [PATCH bpf-next 09/10] bpftool: Add support for tc fd-based attach types Daniel Borkmann
2022-10-04 23:11 ` [PATCH bpf-next 10/10] bpf, selftests: Add various BPF tc link selftests Daniel Borkmann
2022-10-06  3:19   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4Bzak_v01v5Y6dNT_1KAcax_hvVqZM4o+d_w5OJSWeLJz2g@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=joannelkoong@gmail.com \
    --cc=joe@cilium.io \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=razor@blackwall.org \
    --cc=toke@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).