From: John Fastabend <john.fastabend@gmail.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>,
Yonghong Song <yonghong.song@linux.dev>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jakub Sitnicki <jakub@cloudflare.com>,
John Fastabend <john.fastabend@gmail.com>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
Date: Wed, 03 Apr 2024 10:47:54 -0700 [thread overview]
Message-ID: <660d964a1444b_1cf6b20885@john.notmuch> (raw)
In-Reply-To: <CAEf4BzaYh6q_0TUUARd8m0eK+5RBxWQJ8=-n84vdayPPh4Z61g@mail.gmail.com>
Andrii Nakryiko wrote:
> On Tue, Apr 2, 2024 at 6:08 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> >
> >
> > On 4/2/24 10:45 AM, Andrii Nakryiko wrote:
> > > On Mon, Mar 25, 2024 at 7:22 PM Yonghong Song <yonghong.song@linux.dev> wrote:
> > >> Add bpf_link support for sk_msg and sk_skb programs. We have an
> > >> internal request to support bpf_link for sk_msg programs so user
> > >> space can have a uniform handling with bpf_link based libbpf
> > >> APIs. Using bpf_link based libbpf API also has a benefit which
> > >> makes system robust by decoupling prog life cycle and
> > >> attachment life cycle.
> > >>
Thanks again for working on it.
> > >> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
> > >> ---
> > >> include/linux/bpf.h | 6 +
> > >> include/linux/skmsg.h | 4 +
> > >> include/uapi/linux/bpf.h | 5 +
> > >> kernel/bpf/syscall.c | 4 +
> > >> net/core/sock_map.c | 263 ++++++++++++++++++++++++++++++++-
> > >> tools/include/uapi/linux/bpf.h | 5 +
> > >> 6 files changed, 279 insertions(+), 8 deletions(-)
> > >>
>
> [...]
>
> > >> psock_set_prog(pprog, prog);
> > >> - return 0;
> > >> + if (link)
> > >> + *plink = link;
> > >> +
> > >> +out:
> > >> + mutex_unlock(&sockmap_prog_update_mutex);
> > > why this mutex is not per-sockmap?
> >
> > My thinking is the system probably won't have lots of sockmaps and
> > sockmap attach/detach/update_prog should not be that frequent. But
> > I could be wrong.
> >
For my use case at least we have a map per protocol we want to inspect.
So its rather small set <10 I would say. Also they are created once
when the agent starts and when config changes from operator (user decides
to remove/add a parser). Config changing is rather rare. I don't think
this would be paticularly painful in practice now to have a global
lock.
>
> That seems like an even more of an argument to keep mutex per sockmap.
> It won't add a lot of memory, but it is conceptually cleaner, as each
> sockmap instance (and corresponding links) are completely independent,
> even from a locking perspective.
>
> But I can't say I feel very strongly about this.
>
> > >
> > >> + return ret;
> > >> }
> > >>
>
> [...]
>
> > >
> > >> +
> > >> +static void sock_map_link_release(struct bpf_link *link)
> > >> +{
> > >> + struct sockmap_link *sockmap_link = get_sockmap_link(link);
> > >> +
> > >> + mutex_lock(&sockmap_link_mutex);
> > > similar to the above, why is this mutex not sockmap-specific? And I'd
> > > just combine sockmap_link_mutex and sockmap_prog_update_mutex in this
> > > case to keep it simple.
> >
> > This is to protect sockmap_link->map. They could share the same lock.
> > Let me double check...
>
> If you keep that global sockmap_prog_update_mutex then I'd probably
> reuse that one here for simplicity (and named it a bit more
> generically, "sockmap_mutex" or something like that, just like we have
> global "cgroup_mutex").
I was leaning to a per map lock, but because a global lock simplifies this
part a bunch I would agree just use a single sockmap_mutex throughout.
If someone has a use case where they want to add/remove maps dynamically
maybe they can let us know what that is. For us, on my todo list, I want
to just remove the map notion and bind progs to socks directly. The
original map idea was for a L7 load balancer, but other than quick hacks
I've never built such a thing nor ran it in production. Maybe someday
I'll find the time.
>
> [...]
>
> > >> + if (old && link->prog != old) {
> > > hm.. even if old matches link->prog, we should unset old and set new
> > > link (link overrides prog attachment, basically), it shouldn't matter
> > > if old == link->prog, unless I'm missing something?
> >
> > In xdp link (net/core/dev.c), we have
> >
> > cur_prog = dev_xdp_prog(dev, mode);
> > /* can't replace attached prog with link */
> > if (link && cur_prog) {
> > NL_SET_ERR_MSG(extack, "Can't replace active XDP
> > program with BPF link");
> > return -EBUSY;
> > }
> > if ((flags & XDP_FLAGS_REPLACE) && cur_prog != old_prog) {
> > NL_SET_ERR_MSG(extack, "Active program does not match
> > expected");
> > return -EEXIST;
> > }
> >
> > if flags has XDP_FLAGS_REPLACE, link saved prog must be equal to old_prog
> > in order to do prog update.
> > for sockmap prog update, in link_update (syscall.c), the only way
> > we can get a non-NULL old_prog is with the following:
> >
> > if (flags & BPF_F_REPLACE) {
> > old_prog = bpf_prog_get(attr->link_update.old_prog_fd);
> > if (IS_ERR(old_prog)) {
> > ret = PTR_ERR(old_prog);
> > old_prog = NULL;
> > goto out_put_progs;
> > }
> > } else if (attr->link_update.old_prog_fd) {
> > ret = -EINVAL;
> > goto out_put_progs;
> > }
> > Basically, we have BPF_F_REPLACE here.
> > So similar to xdp link, I think we should check old_prog to
> > be equal to link->prog in order to do link update_prog.
>
> ah, ok, that's BPF_F_REPLACE case. See, it's confusing that we have
> this logic split between multiple places, in dev_xdp_attach() it's a
> bit more centralized.
>
> >
> > >
> > >> + ret = -EINVAL;
> > >> + goto out;
> > >> + }
>
> [...]
>
> > >> +
> > >> + ret = sock_map_prog_update(map, prog, NULL, &sockmap_link->link, attach_type);
> > >> + if (ret) {
> > >> + bpf_link_cleanup(&link_primer);
> > >> + goto out;
> > >> + }
> > >> +
> > >> + bpf_prog_inc(prog);
> > > if link was created successfully, it "inherits" prog's refcnt, so you
> > > shouldn't do another bpf_prog_inc()? generic link_create() logic puts
> > > prog only if this function returns error
> >
> > The reason I did this is due to
> >
> > static inline void psock_set_prog(struct bpf_prog **pprog,
> > struct bpf_prog *prog)
> > {
> > prog = xchg(pprog, prog);
> > if (prog)
> > bpf_prog_put(prog);
> > }
> >
> > You can see when the prog is swapped due to link_update or prog_attach,
> > its reference count is decremented by 1. This is necessary for prog_attach,
> > but as you mentioned, indeed, it is not necessary for link-based approach.
> > Let me see whether I can refactor code to make it easy not to increase
> > reference count of prog here.
> >
>
> ah, ok, its another sockmap-specific convention, np
>
> >
> > >
> > >> +
> > >> + return bpf_link_settle(&link_primer);
> > >> +
> > >> +out:
> > >> + bpf_map_put_with_uref(map);
> > >> + return ret;
> > >> +}
> > >> +
> > >> static int sock_map_iter_attach_target(struct bpf_prog *prog,
> > >> union bpf_iter_link_info *linfo,
> > >> struct bpf_iter_aux_info *aux)
> > >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > >> index 9585f5345353..31660c3ffc01 100644
> > >> --- a/tools/include/uapi/linux/bpf.h
> > >> +++ b/tools/include/uapi/linux/bpf.h
> > >> @@ -1135,6 +1135,7 @@ enum bpf_link_type {
> > >> BPF_LINK_TYPE_TCX = 11,
> > >> BPF_LINK_TYPE_UPROBE_MULTI = 12,
> > >> BPF_LINK_TYPE_NETKIT = 13,
> > >> + BPF_LINK_TYPE_SOCKMAP = 14,
> > >> __MAX_BPF_LINK_TYPE,
> > >> };
> > >>
> > >> @@ -6720,6 +6721,10 @@ struct bpf_link_info {
> > >> __u32 ifindex;
> > >> __u32 attach_type;
> > >> } netkit;
> > >> + struct {
> > >> + __u32 map_id;
> > >> + __u32 attach_type;
> > >> + } sockmap;
> > >> };
> > >> } __attribute__((aligned(8)));
> > >>
> > >> --
> > >> 2.43.0
> > >>
next prev parent reply other threads:[~2024-04-03 17:47 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-26 2:21 [PATCH bpf-next v3 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
2024-03-26 2:21 ` [PATCH bpf-next v3 1/5] " Yonghong Song
2024-04-02 17:39 ` Eduard Zingerman
2024-04-03 0:06 ` Yonghong Song
2024-04-02 17:45 ` Andrii Nakryiko
2024-04-03 1:08 ` Yonghong Song
2024-04-03 16:43 ` Andrii Nakryiko
2024-04-03 17:47 ` John Fastabend [this message]
2024-04-03 22:09 ` run bpf prog w/o sockmap [was: bpf: Add bpf_link support for sk_msg and sk_skb progs] Martin KaFai Lau
2024-04-04 1:11 ` John Fastabend
2024-04-04 3:31 ` Yonghong Song
2024-04-05 4:41 ` John Fastabend
2024-04-06 1:10 ` Martin KaFai Lau
2024-04-04 3:18 ` [PATCH bpf-next v3 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
2024-04-05 4:42 ` John Fastabend
2024-03-26 2:22 ` [PATCH bpf-next v3 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
2024-04-02 13:18 ` Eduard Zingerman
2024-04-02 17:46 ` Andrii Nakryiko
2024-04-03 0:07 ` Yonghong Song
2024-03-26 2:22 ` [PATCH bpf-next v3 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP Yonghong Song
2024-03-27 11:58 ` Quentin Monnet
2024-03-26 2:22 ` [PATCH bpf-next v3 4/5] selftests/bpf: Refactor out helper functions for a few tests Yonghong Song
2024-04-02 13:18 ` Eduard Zingerman
2024-03-26 2:22 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs Yonghong Song
2024-04-02 13:17 ` Eduard Zingerman
2024-04-02 18:56 ` Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=660d964a1444b_1cf6b20885@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jakub@cloudflare.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.