From: Yonghong Song <yonghong.song@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Jakub Sitnicki <jakub@cloudflare.com>,
John Fastabend <john.fastabend@gmail.com>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v4 1/5] bpf: Add bpf_link support for sk_msg and sk_skb progs
Date: Fri, 5 Apr 2024 22:21:19 -0700 [thread overview]
Message-ID: <bbfb67d2-2674-41ed-a748-eb364fd22146@linux.dev> (raw)
In-Reply-To: <CAEf4Bza+V-DerE=Y2cuNNYgyv0jv0b4RH2r3gXt9=uzZzA2JTQ@mail.gmail.com>
On 4/5/24 1:12 PM, Andrii Nakryiko wrote:
> On Wed, Apr 3, 2024 at 7:53 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Add bpf_link support for sk_msg and sk_skb programs. We have an
>> internal request to support bpf_link for sk_msg programs so user
>> space can have a uniform handling with bpf_link based libbpf
>> APIs. Using bpf_link based libbpf API also has a benefit which
>> makes system robust by decoupling prog life cycle and
>> attachment life cycle.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>> include/linux/bpf.h | 6 +
>> include/linux/skmsg.h | 4 +
>> include/uapi/linux/bpf.h | 5 +
>> kernel/bpf/syscall.c | 4 +
>> net/core/sock_map.c | 268 ++++++++++++++++++++++++++++++++-
>> tools/include/uapi/linux/bpf.h | 5 +
>> 6 files changed, 284 insertions(+), 8 deletions(-)
>>
> [...]
>
>> @@ -103,7 +111,7 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
>> goto put_prog;
>> }
>>
>> - ret = sock_map_prog_update(map, NULL, prog, attr->attach_type);
>> + ret = sock_map_prog_update(map, NULL, prog, NULL, attr->attach_type);
>> put_prog:
>> bpf_prog_put(prog);
>> put_map:
>> @@ -1488,21 +1496,79 @@ static int sock_map_prog_lookup(struct bpf_map *map, struct bpf_prog ***pprog,
>> return 0;
>> }
>>
>> +static int sock_map_link_lookup(struct bpf_map *map, struct bpf_link ***plink,
>> + struct bpf_link *link, bool skip_check, u32 which)
> why not combine prog + link into a single lookup? also it seems like
> sock_map_prog_lookup has some additional EBUSY conditions, do we need
> to replicate them here?
I can combine them together.
>
>> +{
>> + struct sk_psock_progs *progs = sock_map_progs(map);
>> + struct bpf_link **cur_plink;
>> +
>> + switch (which) {
>> + case BPF_SK_MSG_VERDICT:
>> + cur_plink = &progs->msg_parser_link;
>> + break;
>> +#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
>> + case BPF_SK_SKB_STREAM_PARSER:
>> + cur_plink = &progs->stream_parser_link;
>> + break;
>> +#endif
>> + case BPF_SK_SKB_STREAM_VERDICT:
>> + cur_plink = &progs->stream_verdict_link;
>> + break;
>> + case BPF_SK_SKB_VERDICT:
>> + cur_plink = &progs->skb_verdict_link;
>> + break;
>> + default:
>> + return -EOPNOTSUPP;
>> + }
>> +
>> + if (!skip_check && ((!link && *cur_plink) || (link && link != *cur_plink)))
>> + return -EBUSY;
>> +
>> + *plink = cur_plink;
>> + return 0;
>> +}
>> +
>> +/* Handle the following four cases:
>> + * prog_attach: prog != NULL, old == NULL, link == NULL
>> + * prog_detach: prog == NULL, old != NULL, link == NULL
>> + * link_attach: prog != NULL, old == NULL, link != NULL
>> + * link_detach: prog == NULL, old != NULL, link != NULL
>> + */
>> static int sock_map_prog_update(struct bpf_map *map, struct bpf_prog *prog,
>> - struct bpf_prog *old, u32 which)
>> + struct bpf_prog *old, struct bpf_link *link,
>> + u32 which)
>> {
>> struct bpf_prog **pprog;
>> + struct bpf_link **plink;
>> int ret;
>>
>> + mutex_lock(&sockmap_mutex);
>> +
>> ret = sock_map_prog_lookup(map, &pprog, which);
>> if (ret)
>> - return ret;
>> + goto out;
>>
>> - if (old)
>> - return psock_replace_prog(pprog, prog, old);
>> + if (!link || prog)
>> + ret = sock_map_link_lookup(map, &plink, NULL, false, which);
>> + else
>> + ret = sock_map_link_lookup(map, &plink, NULL, true, which);
>> + if (ret)
>> + goto out;
>> +
>> + if (old) {
>> + ret = psock_replace_prog(pprog, prog, old);
>> + if (!ret)
>> + *plink = NULL;
>> + goto out;
>> + }
>>
>> psock_set_prog(pprog, prog);
>> - return 0;
>> + if (link)
>> + *plink = link;
> nit: feels more natural to do
>
> if (old) {
> psock_replace_prog(...)
> } else {
> psock_set_prog(...)
> }
>
> it's two alternatives, not one unlikely vs one main use case (but it's minor)
Ack indeed better.
>
>> +
>> +out:
>> + mutex_unlock(&sockmap_mutex);
>> + return ret;
>> }
>>
>> int sock_map_bpf_prog_query(const union bpf_attr *attr,
>> @@ -1657,6 +1723,192 @@ void sock_map_close(struct sock *sk, long timeout)
>> }
>> EXPORT_SYMBOL_GPL(sock_map_close);
>>
>> +struct sockmap_link {
>> + struct bpf_link link;
>> + struct bpf_map *map;
>> + enum bpf_attach_type attach_type;
>> +};
>> +
>> +static void sock_map_link_release(struct bpf_link *link)
>> +{
>> + struct sockmap_link *sockmap_link = container_of(link, struct sockmap_link, link);
>> +
>> + if (sockmap_link->map) {
> nit: if (!sockmap_link->map) return;
>
> and reduce nesting of everything else
>
>> + WARN_ON_ONCE(sock_map_prog_update(sockmap_link->map, NULL, link->prog, link,
>> + sockmap_link->attach_type));
> I think sockmap_link->map access in general has to be always protected
> my sockmap_mutex (I'd do that even for the if above), because it can
> race with force-detach logic at least
Ack. will fix this.
>
>> +
>> + mutex_lock(&sockmap_mutex);
>> + bpf_map_put_with_uref(sockmap_link->map);
>> + sockmap_link->map = NULL;
>> + mutex_unlock(&sockmap_mutex);
>> + }
>> +}
>> +
> [...]
>
>> + if (old) {
>> + ret = psock_replace_prog(pprog, prog, old);
>> + goto out;
>> + }
>> +
>> + psock_set_prog(pprog, prog);
>> +
>> +out:
> same nit, feels like
>
> if (old) /* replace */ else /* set */ is more natural, and then you
> can move xchg logic before out: knowing that it's the only success
> case
Ack. will do.
>
>> + if (!ret) {
>> + bpf_prog_inc(prog);
>> + old = xchg(&link->prog, prog);
>> + bpf_prog_put(old);
>> + }
>> + mutex_unlock(&sockmap_mutex);
>> + return ret;
>> +}
>> +
> [...]
next prev parent reply other threads:[~2024-04-06 5:21 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-04 2:53 [PATCH bpf-next v4 0/5] bpf: Add bpf_link support for sk_msg and sk_skb progs Yonghong Song
2024-04-04 2:53 ` [PATCH bpf-next v4 1/5] " Yonghong Song
2024-04-05 15:19 ` John Fastabend
2024-04-05 15:53 ` Yonghong Song
2024-04-05 16:23 ` John Fastabend
2024-04-05 16:51 ` Yonghong Song
2024-04-05 19:43 ` John Fastabend
2024-04-05 20:05 ` Yonghong Song
2024-04-05 20:12 ` Andrii Nakryiko
2024-04-06 5:21 ` Yonghong Song [this message]
2024-04-04 2:53 ` [PATCH bpf-next v4 2/5] libbpf: Add bpf_link support for BPF_PROG_TYPE_SOCKMAP Yonghong Song
2024-04-05 15:20 ` John Fastabend
2024-04-05 20:14 ` Andrii Nakryiko
2024-04-06 5:19 ` Yonghong Song
2024-04-04 2:53 ` [PATCH bpf-next v4 3/5] bpftool: Add link dump support for BPF_LINK_TYPE_SOCKMAP Yonghong Song
2024-04-05 15:20 ` John Fastabend
2024-04-04 2:53 ` [PATCH bpf-next v4 4/5] selftests/bpf: Refactor out helper functions for a few tests Yonghong Song
2024-04-04 2:53 ` [PATCH bpf-next v4 5/5] selftests/bpf: Add some tests with new bpf_program__attach_sockmap() APIs Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bbfb67d2-2674-41ed-a748-eb364fd22146@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=jakub@cloudflare.com \
--cc=john.fastabend@gmail.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).