Re: [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yonghong.song@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs
Date: Wed, 7 May 2025 21:42:40 -0700	[thread overview]
Message-ID: <01f9b38e-d14d-4e35-93aa-d9722a565474@linux.dev> (raw)
In-Reply-To: <CAEf4BzbY9nZk32hoA7UNjfPVeMWzLh8FyVhG2rChoje0wyxxKw@mail.gmail.com>



On 4/23/25 7:19 AM, Andrii Nakryiko wrote:
> On Thu, Apr 10, 2025 at 6:15 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>> Current cgroup prog ordering is appending at attachment time. This is not
>> ideal. In some cases, users want specific ordering at a particular cgroup
>> level. To address this, the existing mprog API seems an ideal solution with
>> supporting BPF_F_BEFORE and BPF_F_AFTER flags.
>>
>> But there are a few obstacles to directly use kernel mprog interface.
>> Currently cgroup bpf progs already support prog attach/detach/replace
>> and link-based attach/detach/replace. For example, in struct
>> bpf_prog_array_item, the cgroup_storage field needs to be together
>> with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
>> as the member, which makes it difficult to use kernel mprog interface.
>>
>> In another case, the current cgroup prog detach tries to use the
>> same flag as in attach. This is different from mprog kernel interface
>> which uses flags passed from user space.
>>
>> So to avoid modifying existing behavior, I made the following changes to
>> support mprog API for cgroup progs:
>>   - The support is for prog list at cgroup level. Cross-level prog list
>>     (a.k.a. effective prog list) is not supported.
>>   - Previously, BPF_F_PREORDER is supported only for prog attach, now
>>     BPF_F_PREORDER is also supported by link-based attach.
>>   - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID is supported similar to
>>     kernel mprog but with different implementation.
>>   - For detach and replace, use the existing implementation.
>>   - For attach, detach and replace, the revision for a particular prog
>>     list, associated with a particular attach type, will be updated
>>     by increasing count by 1.
>>
>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>> ---
>>   include/uapi/linux/bpf.h       |   7 ++
>>   kernel/bpf/cgroup.c            | 151 ++++++++++++++++++++++++++++-----
>>   kernel/bpf/syscall.c           |  58 ++++++++-----
>>   tools/include/uapi/linux/bpf.h |   7 ++
>>   4 files changed, 181 insertions(+), 42 deletions(-)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 71d5ac83cf5d..a5c7992e8f7c 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1794,6 +1794,13 @@ union bpf_attr {
>>                                  };
>>                                  __u64           expected_revision;
>>                          } netkit;
>> +                       struct {
>> +                               union {
>> +                                       __u32   relative_fd;
>> +                                       __u32   relative_id;
>> +                               };
>> +                               __u64           expected_revision;
>> +                       } cgroup;
>>                  };
>>          } link_create;
>>
>> diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
>> index 84f58f3d028a..ffd455051131 100644
>> --- a/kernel/bpf/cgroup.c
>> +++ b/kernel/bpf/cgroup.c
>> @@ -624,6 +624,90 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>          return NULL;
>>   }
>>
>> +static struct bpf_prog *get_cmp_prog(struct hlist_head *progs, struct bpf_prog *prog,
>> +                                    u32 flags, u32 id_or_fd, struct bpf_prog_list **ppltmp)
>> +{
>> +       struct bpf_prog *cmp_prog = NULL, *pltmp_prog;
>> +       bool preorder = !!(flags & BPF_F_PREORDER);
> nit: !!() pattern is not necessary when assigning to bool and is just
> a visual and cognitive noise

Okay, will remove !! operators.

>
>> +       struct bpf_prog_list *pltmp;
>> +       bool id = flags & BPF_F_ID;
>> +       bool found;
>> +
>> +       if (id || id_or_fd) {
> let's invert the condition and exit early? also I find "id" as a bool
> very confusing, I think it's fine to just open-coded it in two places
> you actually check for BPF_F_ID flag.

Open-codedflags & BPF_F_ID is okay.

> But also, isn't this `if (id)` part redundant? Would just `if
> (id_or_fd)` be enough?

Yes, id_or_fd should be enough.

>
>
>> +               /* flags must have BPF_F_BEFORE or BPF_F_AFTER */
>> +               if (!(flags & (BPF_F_BEFORE | BPF_F_AFTER)))
>> +                       return ERR_PTR(-EINVAL);
>> +
>> +               if (id)
>> +                       cmp_prog = bpf_prog_by_id(id_or_fd);
>> +               else
>> +                       cmp_prog = bpf_prog_get(id_or_fd);
> how about we use "anchor" terminology here? So this would be "anchor
> program" or anchor_prog?

anchor_prog sounds good.

>
>> +               if (IS_ERR(cmp_prog))
>> +                       return cmp_prog;
>> +               if (cmp_prog->type != prog->type)
> bpf_prog_put?

Yes. Missed bpf_prog_put. Will fix.

>
> pw-bot: cr
>
>> +                       return ERR_PTR(-EINVAL);
>> +
>> +               found = false;
>> +               hlist_for_each_entry(pltmp, progs, node) {
>> +                       pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog;
>> +                       if (pltmp_prog == cmp_prog) {
> try keeping nesting minimal:
>
> if (pltmp_prog != cmp_prog)
>      continue;

Good point. Will do.

>
>> +                               if (!!(pltmp->flags & BPF_F_PREORDER) != preorder)
>> +                                       return ERR_PTR(-EINVAL);
>> +                               found = true;
>> +                               *ppltmp = pltmp;
> we don't need found flag if we set ppltmp to NULL before loop, and to
> non-NULL if we find the match

Ack.

>
>> +                               break;
>> +                       }
>> +               }
>> +               if (!found)
> bpf_prog_put(cmp_prog)

Again, will do bpf_prog_put before turn error.

>
>> +                       return ERR_PTR(-ENOENT);
>> +       }
>> +
>> +       return cmp_prog;
>> +}
>> +
>> +static int insert_pl_to_hlist(struct bpf_prog_list *pl, struct hlist_head *progs,
>> +                             struct bpf_prog *prog, u32 flags, u32 id_or_fd)
>> +{
>> +       struct hlist_node *last, *last_node = NULL;
>> +       struct bpf_prog_list *pltmp = NULL;
>> +       struct bpf_prog *cmp_prog;
>> +
>> +       /* flags cannot have both BPF_F_BEFORE and BPF_F_AFTER */
>> +       if ((flags & BPF_F_BEFORE) && (flags & BPF_F_AFTER))
>> +               return -EINVAL;
>> +
>> +       cmp_prog = get_cmp_prog(progs, prog, flags, id_or_fd, &pltmp);
> why get_cmp_prog can't return this last_node if we have BPF_F_AFTER
> with no id/fd specified? Then you wouldn't have to special-case
> appending (same for prepending, actually)?

We could do this. Let me try this.

>
>> +       if (IS_ERR(cmp_prog))
>> +               return PTR_ERR(cmp_prog);
>> +
>> +       if (hlist_empty(progs)) {
>> +               hlist_add_head(&pl->node, progs);
>> +       } else {
>> +               hlist_for_each(last, progs) {
>> +                       if (last->next)
>> +                               continue;
>> +                       last_node = last;
>> +                       break;
>> +               }
>> +
>> +               if (!cmp_prog) {
>> +                       if (flags & BPF_F_BEFORE)
>> +                               hlist_add_head(&pl->node, progs);
>> +                       else
>> +                               hlist_add_behind(&pl->node, last_node);
>> +               } else {
>> +                       if (flags & BPF_F_BEFORE)
>> +                               hlist_add_before(&pl->node, &pltmp->node);
>> +                       else if (flags & BPF_F_AFTER)
>> +                               hlist_add_behind(&pl->node, &pltmp->node);
>> +                       else
>> +                               hlist_add_behind(&pl->node, last_node);
>> +               }
>> +       }
>> +
>> +       return 0;
>> +}
>> +
>>   /**
>>    * __cgroup_bpf_attach() - Attach the program or the link to a cgroup, and
>>    *                         propagate the change to descendants
>> @@ -633,6 +717,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>    * @replace_prog: Previously attached program to replace if BPF_F_REPLACE is set
>>    * @type: Type of attach operation
>>    * @flags: Option flags
>> + * @id_or_fd: Relative prog id or fd
>> + * @revision: bpf_prog_list revision
>>    *
>>    * Exactly one of @prog or @link can be non-null.
>>    * Must be called with cgroup_mutex held.
>> @@ -640,7 +726,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>   static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>                                 struct bpf_prog *prog, struct bpf_prog *replace_prog,
>>                                 struct bpf_cgroup_link *link,
>> -                              enum bpf_attach_type type, u32 flags)
>> +                              enum bpf_attach_type type, u32 flags, u32 id_or_fd,
>> +                              u64 revision)
>>   {
>>          u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
>>          struct bpf_prog *old_prog = NULL;
>> @@ -656,6 +743,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>              ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
>>                  /* invalid combination */
>>                  return -EINVAL;
>> +       if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER)))
>> +               /* only either replace or insertion with before/after */
>> +               return -EINVAL;
>>          if (link && (prog || replace_prog))
>>                  /* only either link or prog/replace_prog can be specified */
>>                  return -EINVAL;
>> @@ -663,9 +753,12 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>                  /* replace_prog implies BPF_F_REPLACE, and vice versa */
>>                  return -EINVAL;
>>
>> +
>>          atype = bpf_cgroup_atype_find(type, new_prog->aux->attach_btf_id);
>>          if (atype < 0)
>>                  return -EINVAL;
>> +       if (revision && revision != atomic64_read(&cgrp->bpf.revisions[atype]))
> this is happening under lock, no need for atomic operations

Ack

>
>> +               return -ESTALE;
>>
>>          progs = &cgrp->bpf.progs[atype];
>>
> [...]
>
>> @@ -1063,6 +1161,7 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>>          struct bpf_prog_array *effective;
>>          int cnt, ret = 0, i;
>>          int total_cnt = 0;
>> +       u64 revision = 0;
>>          u32 flags;
>>
>>          if (effective_query && prog_attach_flags)
>> @@ -1100,6 +1199,10 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
>>                  return -EFAULT;
>>          if (copy_to_user(&uattr->query.prog_cnt, &total_cnt, sizeof(total_cnt)))
>>                  return -EFAULT;
>> +       if (!effective_query && from_atype == to_atype)
>> +               revision = atomic64_read(&cgrp->bpf.revisions[from_atype]);
> even here we hold cgroup_mutex

Ack

>
>> +       if (copy_to_user(&uattr->query.revision, &revision, sizeof(revision)))
>> +               return -EFAULT;
>>          if (attr->query.prog_cnt == 0 || !prog_ids || !total_cnt)
>>                  /* return early if user requested only program count + flags */
>>                  return 0;
> [...]
>
>> @@ -1336,7 +1441,9 @@ int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog)
>>          }
>>
>>          err = cgroup_bpf_attach(cgrp, NULL, NULL, link,
>> -                               link->type, BPF_F_ALLOW_MULTI);
>> +                               link->type, BPF_F_ALLOW_MULTI | attr->link_create.flags,
>> +                               attr->link_create.cgroup.relative_fd,
>> +                               attr->link_create.cgroup.expected_revision);
>>          if (err) {
>>                  bpf_link_cleanup(&link_primer);
>>                  goto out_put_cgroup;
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index 9794446bc8c6..48cf855f949f 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -4183,6 +4183,23 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
>>          }
>>   }
>>
>> +static bool bpf_cgroup_prog_attached(enum bpf_prog_type ptype)
> I find this "attached" naming misleading. I think the name should call
> out that we are just checking if program type is cgroup-attaching, so
> maybe something like "is_cgroup_prog_type", or something along those
> lines?
>
>
> But for LSM we need to look at expected_attach_type to be
> BPF_LSM_CGROUP, so maybe pass both prog type and expected attach type?

Yes, let us pass both prog type and expected attach type with
func name is_cgroup_prog_type().

>
>> +{
>> +       switch (ptype) {
>> +       case BPF_PROG_TYPE_CGROUP_DEVICE:
>> +       case BPF_PROG_TYPE_CGROUP_SKB:
>> +       case BPF_PROG_TYPE_CGROUP_SOCK:
>> +       case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
>> +       case BPF_PROG_TYPE_CGROUP_SOCKOPT:
>> +       case BPF_PROG_TYPE_CGROUP_SYSCTL:
>> +       case BPF_PROG_TYPE_SOCK_OPS:
>> +       case BPF_PROG_TYPE_LSM:
>> +               return true;
>> +       default:
>> +               return false;
>> +       }
>> +}
>> +
> [...]

next prev parent reply	other threads:[~2025-05-08  4:42 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-11  1:15 [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 1/4] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
2025-04-12  0:41   ` kernel test robot
2025-05-08  4:18     ` Yonghong Song
2025-04-12  1:13   ` kernel test robot
2025-04-23 23:19   ` Andrii Nakryiko
2025-05-08  4:19     ` Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 2/4] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-04-23 23:19   ` Andrii Nakryiko
2025-05-08  4:42     ` Yonghong Song [this message]
2025-04-11  1:15 ` [RFC PATCH bpf-next 3/4] libbpf: Support link-based cgroup attach with options Yonghong Song
2025-04-11  1:15 ` [RFC PATCH bpf-next 4/4] selftests/bpf: Add two selftests for mprog API based cgroup progs Yonghong Song
2025-04-23 23:20 ` [RFC PATCH bpf-next 0/4] bpf: Implement mprog API on top of existing " Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=01f9b38e-d14d-4e35-93aa-d9722a565474@linux.dev \
    --to=yonghong.song@linux.dev \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.