BPF List
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	 Yonghong Song <yonghong.song@linux.dev>
Cc: bpf@vger.kernel.org,  Alexei Starovoitov <ast@kernel.org>,
	 Andrii Nakryiko <andrii@kernel.org>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	 Jakub Sitnicki <jakub@cloudflare.com>,
	 John Fastabend <john.fastabend@gmail.com>,
	 kernel-team@fb.com,  Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
Date: Wed, 03 Apr 2024 10:47:54 -0700	[thread overview]
Message-ID: <660d964a1444b_1cf6b20885@john.notmuch> (raw)
In-Reply-To: <CAEf4BzaYh6q_0TUUARd8m0eK+5RBxWQJ8=-n84vdayPPh4Z61g@mail.gmail.com>

Andrii Nakryiko wrote:
> On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >
> >
> > On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
> > > On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > >> Add bpf_link support for sk_msg and sk_skb programs. We have an
> > >> internal request to support bpf_link for sk_msg programs so user
> > >> space can have a uniform handling with bpf_link based libbpf
> > >> APIs. Using bpf_link based libbpf API also has a benefit which
> > >> makes system robust by decoupling prog life cycle and
> > >> attachment life cycle.
> > >>

Thanks again for working on it.

> > >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> > >> ---
> > >>   include/linux/bpf.h            |   6 +
> > >>   include/linux/skmsg.h          |   4 +
> > >>   include/uapi/linux/bpf.h       |   5 +
> > >>   kernel/bpf/syscall.c           |   4 +
> > >>   net/core/sock_map.c            | 263 ++++++++++++++++++++++++++++++++-
> > >>   tools/include/uapi/linux/bpf.h |   5 +
> > >>   6 files changed, 279 insertions(+), 8 deletions(-)
> > >>
> 
> [...]
> 
> > >>          psock_set_prog(pprog, prog);
> > >> -       return 0;
> > >> +       if (link)
> > >> +               *plink = link;
> > >> +
> > >> +out:
> > >> +       mutex_unlock(&sockmap_prog_update_mutex);
> > > why this mutex is not per-sockmap?
> >
> > My thinking is the system probably won't have lots of sockmaps and
> > sockmap attach/detach/update_prog should not be that frequent. But
> > I could be wrong.
> >

For my use case at least we have a map per protocol we want to inspect.
So its rather small set <10 I would say. Also they are created once
when the agent starts and when config changes from operator (user decides
to remove/add a parser). Config changing is rather rare. I don't think
this would be paticularly painful in practice now to have a global
lock.

> 
> That seems like an even more of an argument to keep mutex per sockmap.
> It won't add a lot of memory, but it is conceptually cleaner, as each
> sockmap instance (and corresponding links) are completely independent,
> even from a locking perspective.
> 
> But I can't say I feel very strongly about this.
> 
> > >
> > >> +       return ret;
> > >>   }
> > >>
> 
> [...]
> 
> > >
> > >> +
> > >> +static void sock_map_link_release(struct bpf_link *link)
> > >> +{
> > >> +       struct sockmap_link *sockmap_link = get_sockmap_link(link);
> > >> +
> > >> +       mutex_lock(&sockmap_link_mutex);
> > > similar to the above, why is this mutex not sockmap-specific? And I'd
> > > just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
> > > case to keep it simple.
> >
> > This is to protect sockmap_link->map. They could share the same lock.
> > Let me double check...
> 
> If you keep that global sockmap_prog_update_mutex then I'd probably
> reuse that one here for simplicity (and named it a bit more
> generically, "sockmap_mutex" or something like that, just like we have
> global "cgroup_mutex").

I was leaning to a per map lock, but because a global lock simplifies this
part a bunch I would agree just use a single sockmap_mutex throughout.

If someone has a use case where they want to add/remove maps dynamically
maybe they can let us know what that is. For us, on my todo list, I want
to just remove the map notion and bind progs to socks directly. The
original map idea was for a L7 load balancer, but other than quick hacks
I've never built such a thing nor ran it in production. Maybe someday
I'll find the time.

> 
> [...]
> 
> > >> +       if (old && link->prog != old) {
> > > hm.. even if old matches link->prog, we should unset old and set new
> > > link (link overrides prog attachment, basically), it shouldn't matter
> > > if old == link->prog, unless I'm missing something?
> >
> > In xdp link (net/core/dev.c), we have
> >
> >          cur_prog = dev_xdp_prog(dev, mode);
> >          /* can't replace attached prog with link */
> >          if (link && cur_prog) {
> >                  NL_SET_ERR_MSG(extack, "Can't replace active XDP
> > program with BPF link");
> >                  return -EBUSY;
> >          }
> >          if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) {
> >                  NL_SET_ERR_MSG(extack, "Active program does not match
> > expected");
> >                  return -EEXIST;
> >          }
> >
> > if flags has XDP_FLAGS_REPLACE, link saved prog must be equal to old_prog
> > in order to do prog update.
> > for sockmap prog update, in link_update (syscall.c), the only way
> > we can get a non-NULL old_prog is with the following:
> >
> >          if (flags & BPF_F_REPLACE) {
> >                  old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
> >                  if (IS_ERR(old_prog)) {
> >                          ret = PTR_ERR(old_prog);
> >                          old_prog = NULL;
> >                          goto out_put_progs;
> >                  }
> >          } else if (attr->link_update.old_prog_fd) {
> >                  ret = -EINVAL;
> >                  goto out_put_progs;
> >          }
> > Basically, we have BPF_F_REPLACE here.
> > So similar to xdp link, I think we should check old_prog to
> > be equal to link->prog in order to do link update_prog.
> 
> ah, ok, that's BPF_F_REPLACE case. See, it's confusing that we have
> this logic split between multiple places, in dev_xdp_attach() it's a
> bit more centralized.
> 
> >
> > >
> > >> +               ret = -EINVAL;
> > >> +               goto out;
> > >> +       }
> 
> [...]
> 
> > >> +
> > >> +       ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
> > >> +       if (ret) {
> > >> +               bpf_link_cleanup(&link_primer);
> > >> +               goto out;
> > >> +       }
> > >> +
> > >> +       bpf_prog_inc(prog);
> > > if link was created successfully, it "inherits" prog's refcnt, so you
> > > shouldn't do another bpf_prog_inc()? generic link_create() logic puts
> > > prog only if this function returns error
> >
> > The reason I did this is due to
> >
> > static inline void psock_set_prog(struct bpf_prog **pprog,
> >                                    struct bpf_prog *prog)
> > {
> >          prog = xchg(pprog, prog);
> >          if (prog)
> >                  bpf_prog_put(prog);
> > }
> >
> > You can see when the prog is swapped due to link_update or prog_attach,
> > its reference count is decremented by 1. This is necessary for prog_attach,
> > but as you mentioned, indeed, it is not necessary for link-based approach.
> > Let me see whether I can refactor code to make it easy not to increase
> > reference count of prog here.
> >
> 
> ah, ok, its another sockmap-specific convention, np
> 
> >
> > >
> > >> +
> > >> +       return bpf_link_settle(&link_primer);
> > >> +
> > >> +out:
> > >> +       bpf_map_put_with_uref(map);
> > >> +       return ret;
> > >> +}
> > >> +
> > >>   static int sock_map_iter_attach_target(struct bpf_prog *prog,
> > >>                                         union bpf_iter_link_info *linfo,
> > >>                                         struct bpf_iter_aux_info *aux)
> > >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > >> index 9585f5345353..31660c3ffc01 100644
> > >> --- a/tools/include/uapi/linux/bpf.h
> > >> +++ b/tools/include/uapi/linux/bpf.h
> > >> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
> > >>          BPF_LINK_TYPE_TCX = 11,
> > >>          BPF_LINK_TYPE_UPROBE_MULTI = 12,
> > >>          BPF_LINK_TYPE_NETKIT = 13,
> > >> +       BPF_LINK_TYPE_SOCKMAP = 14,
> > >>          __MAX_BPF_LINK_TYPE,
> > >>   };
> > >>
> > >> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
> > >>                          __u32 ifindex;
> > >>                          __u32 attach_type;
> > >>                  } netkit;
> > >> +               struct {
> > >> +                       __u32 map_id;
> > >> +                       __u32 attach_type;
> > >> +               } sockmap;
> > >>          };
> > >>   } __attribute__((aligned(8)));
> > >>
> > >> --
> > >> 2.43.0
> > >>



  reply	other threads:[~2024-04-03 17:47 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-26  2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
2024-03-26  2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
2024-04-02 17:39   ` Eduard Zingerman
2024-04-03  0:06     ` Yonghong Song
2024-04-02 17:45   ` Andrii Nakryiko
2024-04-03  1:08     ` Yonghong Song
2024-04-03 16:43       ` Andrii Nakryiko
2024-04-03 17:47         ` John Fastabend [this message]
2024-04-03 22:09           ` run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs] Martin KaFai Lau
2024-04-04  1:11             ` John Fastabend
2024-04-04  3:31               ` Yonghong Song
2024-04-05  4:41                 ` John Fastabend
2024-04-06  1:10                   ` Martin KaFai Lau
2024-04-04  3:18           ` [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
2024-04-05  4:42             ` John Fastabend
2024-03-26  2:22 ` [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
2024-04-02 13:18   ` Eduard Zingerman
2024-04-02 17:46   ` Andrii Nakryiko
2024-04-03  0:07     ` Yonghong Song
2024-03-26  2:22 ` [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP Yonghong Song
2024-03-27 11:58   ` Quentin Monnet
2024-03-26  2:22 ` [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests Yonghong Song
2024-04-02 13:18   ` Eduard Zingerman
2024-03-26  2:22 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs Yonghong Song
2024-04-02 13:17   ` Eduard Zingerman
2024-04-02 18:56     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=660d964a1444b_1cf6b20885@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox