Re: [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs

BPF List
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs
Date: Wed, 7 May 2025 21:42:40 -0700	[thread overview]
Message-ID: <01f9b38e-d14d-4e35-93aa-d9722a565474@linux.dev> (raw)
In-Reply-To: <CAEf4BzbY9nZk32hoA7UNjfPVeMWzLh8FyVhG2rChoje0wyxxKw@mail.gmail.com>



On 4/23/25 7:19 AM, Andrii Nakryiko wrote:
> On Thu, Apr 10, 2025 at 6:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Current cgroup prog ordering is appending at attachment time. This is not
>> ideal. In some cases, users want specific ordering at a particular cgroup
>> level. To address this, the existing mprog API seems an ideal solution with
>> supporting BPF_F_BEFORE and BPF_F_AFTER flags.
>>
>> But there are a few obstacles to directly use kernel mprog interface.
>> Currently cgroup bpf progs already support prog attach/detach/replace
>> and link-based attach/detach/replace. For example, in struct
>> bpf_prog_array_item, the cgroup_storage field needs to be together
>> with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
>> as the member, which makes it difficult to use kernel mprog interface.
>>
>> In another case, the current cgroup prog detach tries to use the
>> same flag as in attach. This is different from mprog kernel interface
>> which uses flags passed from user space.
>>
>> So to avoid modifying existing behavior, I made the following changes to
>> support mprog API for cgroup progs:
>>   - The support is for prog list at cgroup level. Cross-level prog list
>>     (a.k.a. effective prog list) is not supported.
>>   - Previously, BPF_F_PREORDER is supported only for prog attach, now
>>     BPF_F_PREORDER is also supported by link-based attach.
>>   - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID is supported similar to
>>     kernel mprog but with different implementation.
>>   - For detach and replace, use the existing implementation.
>>   - For attach, detach and replace, the revision for a particular prog
>>     list, associated with a particular attach type, will be updated
>>     by increasing count by 1.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/uapi/linux/bpf.h       |   7 ++
>>   kernel/bpf/cgroup.c            | 151 ++++++++++++++++++++++++++++-----
>>   kernel/bpf/syscall.c           |  58 ++++++++-----
>>   tools/include/uapi/linux/bpf.h |   7 ++
>>   4 files changed, 181 insertions(+), 42 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 71d5ac83cf5d..a5c7992e8f7c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1794,6 +1794,13 @@ union bpf_attr {
>>                                  };
>>                                  __u64           expected_revision;
>>                          } netkit;
>> +                       struct {
>> +                               union {
>> +                                       __u32   relative_fd;
>> +                                       __u32   relative_id;
>> +                               };
>> +                               __u64           expected_revision;
>> +                       } cgroup;
>>                  };
>>          } link_create;
>>
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 84f58f3d028a..ffd455051131 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -624,6 +624,90 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>          return NULL;
>>   }
>>
>> +static struct bpf_prog *get_cmp_prog(struct hlist_head *progs, struct bpf_prog *prog,
>> +                                    u32 flags, u32 id_or_fd, struct bpf_prog_list **ppltmp)
>> +{
>> +       struct bpf_prog *cmp_prog = NULL, *pltmp_prog;
>> +       bool preorder = !!(flags & BPF_F_PREORDER);
> nit: !!() pattern is not necessary when assigning to bool and is just
> a visual and cognitive noise

Okay, will remove !! operators.

>
>> +       struct bpf_prog_list *pltmp;
>> +       bool id = flags & BPF_F_ID;
>> +       bool found;
>> +
>> +       if (id || id_or_fd) {
> let's invert the condition and exit early? also I find "id" as a bool
> very confusing, I think it's fine to just open-coded it in two places
> you actually check for BPF_F_ID flag.

Open-codedflags & BPF_F_ID is okay.

> But also, isn't this `if (id)` part redundant? Would just `if
> (id_or_fd)` be enough?

Yes, id_or_fd should be enough.

>
>
>> +               /* flags must have BPF_F_BEFORE or BPF_F_AFTER */
>> +               if (!(flags & (BPF_F_BEFORE | BPF_F_AFTER)))
>> +                       return ERR_PTR(-EINVAL);
>> +
>> +               if (id)
>> +                       cmp_prog = bpf_prog_by_id(id_or_fd);
>> +               else
>> +                       cmp_prog = bpf_prog_get(id_or_fd);
> how about we use "anchor" terminology here? So this would be "anchor
> program" or anchor_prog?

anchor_prog sounds good.

>
>> +               if (IS_ERR(cmp_prog))
>> +                       return cmp_prog;
>> +               if (cmp_prog->type != prog->type)
> bpf_prog_put?

Yes. Missed bpf_prog_put. Will fix.

>
> pw-bot: cr
>
>> +                       return ERR_PTR(-EINVAL);
>> +
>> +               found = false;
>> +               hlist_for_each_entry(pltmp, progs, node) {
>> +                       pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog;
>> +                       if (pltmp_prog == cmp_prog) {
> try keeping nesting minimal:
>
> if (pltmp_prog != cmp_prog)
>      continue;

Good point. Will do.

>
>> +                               if (!!(pltmp->flags & BPF_F_PREORDER) != preorder)
>> +                                       return ERR_PTR(-EINVAL);
>> +                               found = true;
>> +                               *ppltmp = pltmp;
> we don't need found flag if we set ppltmp to NULL before loop, and to
> non-NULL if we find the match

Ack.

>
>> +                               break;
>> +                       }
>> +               }
>> +               if (!found)
> bpf_prog_put(cmp_prog)

Again, will do bpf_prog_put before turn error.

>
>> +                       return ERR_PTR(-ENOENT);
>> +       }
>> +
>> +       return cmp_prog;
>> +}
>> +
>> +static int insert_pl_to_hlist(struct bpf_prog_list *pl, struct hlist_head *progs,
>> +                             struct bpf_prog *prog, u32 flags, u32 id_or_fd)
>> +{
>> +       struct hlist_node *last, *last_node = NULL;
>> +       struct bpf_prog_list *pltmp = NULL;
>> +       struct bpf_prog *cmp_prog;
>> +
>> +       /* flags cannot have both BPF_F_BEFORE and BPF_F_AFTER */
>> +       if ((flags & BPF_F_BEFORE) && (flags & BPF_F_AFTER))
>> +               return -EINVAL;
>> +
>> +       cmp_prog = get_cmp_prog(progs, prog, flags, id_or_fd, &pltmp);
> why get_cmp_prog can't return this last_node if we have BPF_F_AFTER
> with no id/fd specified? Then you wouldn't have to special-case
> appending (same for prepending, actually)?

We could do this. Let me try this.

>
>> +       if (IS_ERR(cmp_prog))
>> +               return PTR_ERR(cmp_prog);
>> +
>> +       if (hlist_empty(progs)) {
>> +               hlist_add_head(&pl->node, progs);
>> +       } else {
>> +               hlist_for_each(last, progs) {
>> +                       if (last->next)
>> +                               continue;
>> +                       last_node = last;
>> +                       break;
>> +               }
>> +
>> +               if (!cmp_prog) {
>> +                       if (flags & BPF_F_BEFORE)
>> +                               hlist_add_head(&pl->node, progs);
>> +                       else
>> +                               hlist_add_behind(&pl->node, last_node);
>> +               } else {
>> +                       if (flags & BPF_F_BEFORE)
>> +                               hlist_add_before(&pl->node, &pltmp->node);
>> +                       else if (flags & BPF_F_AFTER)
>> +                               hlist_add_behind(&pl->node, &pltmp->node);
>> +                       else
>> +                               hlist_add_behind(&pl->node, last_node);
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>   /**
>>    * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
>>    *                         propagate the change to descendants
>> @@ -633,6 +717,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>    * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
>>    * @type: Type of attach operation
>>    * @flags: Option flags
>> + * @id_or_fd: Relative prog id or fd
>> + * @revision: bpf_prog_list revision
>>    *
>>    * Exactly one of @prog or @link can be non-null.
>>    * Must be called with cgroup_mutex held.
>> @@ -640,7 +726,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>   static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>                                 struct bpf_prog *prog, struct bpf_prog *replace_prog,
>>                                 struct bpf_cgroup_link *link,
>> -                              enum bpf_attach_type type, u32 flags)
>> +                              enum bpf_attach_type type, u32 flags, u32 id_or_fd,
>> +                              u64 revision)
>>   {
>>          u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
>>          struct bpf_prog *old_prog = NULL;
>> @@ -656,6 +743,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>              ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
>>                  /* invalid combination */
>>                  return -EINVAL;
>> +       if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER)))
>> +               /* only either replace or insertion with before/after */
>> +               return -EINVAL;
>>          if (link && (prog || replace_prog))
>>                  /* only either link or prog/replace_prog can be specified */
>>                  return -EINVAL;
>> @@ -663,9 +753,12 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>                  /* replace_prog implies BPF_F_REPLACE, and vice versa */
>>                  return -EINVAL;
>>
>> +
>>          atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
>>          if (atype < 0)
>>                  return -EINVAL;
>> +       if (revision && revision != atomic64_read(&cgrp->bpf.revisions[atype]))
> this is happening under lock, no need for atomic operations

Ack

>
>> +               return -ESTALE;
>>
>>          progs = &cgrp->bpf.progs[atype];
>>
> [...]
>
>> @@ -1063,6 +1161,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>>          struct bpf_prog_array *effective;
>>          int cnt, ret = 0, i;
>>          int total_cnt = 0;
>> +       u64 revision = 0;
>>          u32 flags;
>>
>>          if (effective_query && prog_attach_flags)
>> @@ -1100,6 +1199,10 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>>                  return -EFAULT;
>>          if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
>>                  return -EFAULT;
>> +       if (!effective_query && from_atype == to_atype)
>> +               revision = atomic64_read(&cgrp->bpf.revisions[from_atype]);
> even here we hold cgroup_mutex

Ack

>
>> +       if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
>> +               return -EFAULT;
>>          if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
>>                  /* return early if user requested only program count + flags */
>>                  return 0;
> [...]
>
>> @@ -1336,7 +1441,9 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>          }
>>
>>          err = cgroup_bpf_attach(cgrp, NULL, NULL, link,
>> -                               link->type, BPF_F_ALLOW_MULTI);
>> +                               link->type, BPF_F_ALLOW_MULTI | attr->link_create.flags,
>> +                               attr->link_create.cgroup.relative_fd,
>> +                               attr->link_create.cgroup.expected_revision);
>>          if (err) {
>>                  bpf_link_cleanup(&link_primer);
>>                  goto out_put_cgroup;
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 9794446bc8c6..48cf855f949f 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -4183,6 +4183,23 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
>>          }
>>   }
>>
>> +static bool bpf_cgroup_prog_attached(enum bpf_prog_type ptype)
> I find this "attached" naming misleading. I think the name should call
> out that we are just checking if program type is cgroup-attaching, so
> maybe something like "is_cgroup_prog_type", or something along those
> lines?
>
>
> But for LSM we need to look at expected_attach_type to be
> BPF_LSM_CGROUP, so maybe pass both prog type and expected attach type?

Yes, let us pass both prog type and expected attach type with
func name is_cgroup_prog_type().

>
>> +{
>> +       switch (ptype) {
>> +       case BPF_PROG_TYPE_CGROUP_DEVICE:
>> +       case BPF_PROG_TYPE_CGROUP_SKB:
>> +       case BPF_PROG_TYPE_CGROUP_SOCK:
>> +       case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
>> +       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
>> +       case BPF_PROG_TYPE_CGROUP_SYSCTL:
>> +       case BPF_PROG_TYPE_SOCK_OPS:
>> +       case BPF_PROG_TYPE_LSM:
>> +               return true;
>> +       default:
>> +               return false;
>> +       }
>> +}
>> +
> [...]

next prev parent reply	other threads:[~2025-05-08  4:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
2025-04-23 23:19   ` Andrii Nakryiko
2025-05-08  4:19     ` Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-04-23 23:19   ` Andrii Nakryiko
2025-05-08  4:42     ` Yonghong Song [this message]
2025-04-11  1:15 ` [RFC PATCH bpf-next 3/4] libbpf: Support link-based cgroup attach with options Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add two selftests for mprog API based cgroup progs Yonghong Song
2025-04-23 23:20 ` [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing " Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01f9b38e-d14d-4e35-93aa-d9722a565474@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox