From: Yonghong Song <yonghong.song@linux.dev>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team@fb.com, Martin KaFai Lau <martin.lau@kernel.org>
Subject: Re: [PATCH bpf-next v3 2/5] bpf: Implement mprog API on top of existing cgroup progs
Date: Wed, 28 May 2025 10:23:45 -0700 [thread overview]
Message-ID: <5bfb5b19-40bd-4b37-a4a4-00e2a3473447@linux.dev> (raw)
In-Reply-To: <CAEf4BzZi7frCiq_vWfXb=QtNvpv91kf=9CymXNJUxRiPW7fxFQ@mail.gmail.com>
On 5/27/25 2:36 PM, Andrii Nakryiko wrote:
> On Fri, May 23, 2025 at 6:04 PM Yonghong Song <yonghong.song@linux.dev> wrote:
>>
>>
>> On 5/22/25 1:45 PM, Andrii Nakryiko wrote:
>>> On Sat, May 17, 2025 at 9:27 AM Yonghong Song <yonghong.song@linux.dev> wrote:
>>>> Current cgroup prog ordering is appending at attachment time. This is not
>>>> ideal. In some cases, users want specific ordering at a particular cgroup
>>>> level. To address this, the existing mprog API seems an ideal solution with
>>>> supporting BPF_F_BEFORE and BPF_F_AFTER flags.
>>>>
>>>> But there are a few obstacles to directly use kernel mprog interface.
>>>> Currently cgroup bpf progs already support prog attach/detach/replace
>>>> and link-based attach/detach/replace. For example, in struct
>>>> bpf_prog_array_item, the cgroup_storage field needs to be together
>>>> with bpf prog. But the mprog API struct bpf_mprog_fp only has bpf_prog
>>>> as the member, which makes it difficult to use kernel mprog interface.
>>>>
>>>> In another case, the current cgroup prog detach tries to use the
>>>> same flag as in attach. This is different from mprog kernel interface
>>>> which uses flags passed from user space.
>>>>
>>>> So to avoid modifying existing behavior, I made the following changes to
>>>> support mprog API for cgroup progs:
>>>> - The support is for prog list at cgroup level. Cross-level prog list
>>>> (a.k.a. effective prog list) is not supported.
>>>> - Previously, BPF_F_PREORDER is supported only for prog attach, now
>>>> BPF_F_PREORDER is also supported by link-based attach.
>>>> - For attach, BPF_F_BEFORE/BPF_F_AFTER/BPF_F_ID/BPF_F_LINK is supported
>>>> similar to kernel mprog but with different implementation.
>>>> - For detach and replace, use the existing implementation.
>>>> - For attach, detach and replace, the revision for a particular prog
>>>> list, associated with a particular attach type, will be updated
>>>> by increasing count by 1.
>>>>
>>>> Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
>>>> ---
>>>> include/uapi/linux/bpf.h | 7 ++
>>>> kernel/bpf/cgroup.c | 195 +++++++++++++++++++++++++++++----
>>>> kernel/bpf/syscall.c | 43 +++++---
>>>> tools/include/uapi/linux/bpf.h | 7 ++
>>>> 4 files changed, 214 insertions(+), 38 deletions(-)
>>>>
[...]
>>>> + }
>>>> +
>>>> + if (link) {
>>>> + anchor_link = bpf_get_anchor_link(flags, id_or_fd, prog->type);
>>>> + if (IS_ERR(anchor_link))
>>>> + return ERR_PTR(PTR_ERR(anchor_link));
>>>> + anchor_prog = anchor_link->prog;
>>>> + } else if (id || id_or_fd) {
>>>> + anchor_prog = bpf_get_anchor_prog(flags, id_or_fd, prog->type);
>>>> + if (IS_ERR(anchor_prog))
>>>> + return ERR_PTR(PTR_ERR(anchor_prog));
>>>> + }
>>>> +
>>>> + if (!anchor_prog) {
>>>> + /* if there is no anchor_prog, then BPF_F_PREORDER doesn't matter
>>>> + * since either prepend or append to a combined list of progs will
>>>> + * end up with correct result.
>>>> + */
>>>> + hlist_for_each_entry(pltmp, progs, node) {
>>>> + if (flags & BPF_F_BEFORE)
>>>> + return pltmp;
>>>> + if (pltmp->node.next)
>>>> + continue;
>>>> + return pltmp;
>>>> + }
>>>> + return NULL;
>>>> + }
>>>> +
>>>> + hlist_for_each_entry(pltmp, progs, node) {
>>>> + pltmp_prog = pltmp->link ? pltmp->link->link.prog : pltmp->prog;
>>>> + if (pltmp_prog != anchor_prog)
>>>> + continue;
>>>> + if (!!(pltmp->flags & BPF_F_PREORDER) != preorder)
>>>> + goto out;
>>> hm... thinking about this a bit more, is it illegal to have the same
>>> BPF program attached as PREORDER and POSTORDER? That seems legit to
>>> me, do we artificially disallow this?
>> Good question, in find_attach_entry(), we have
>>
>> hlist_for_each_entry(pl, progs, node) {
>> if (prog && pl->prog == prog && prog != replace_prog)
>> /* disallow attaching the same prog twice */
>> return ERR_PTR(-EINVAL);
>> if (link && pl->link == link)
>> /* disallow attaching the same link twice */
>> return ERR_PTR(-EINVAL);
>> }
>>
>> Basically, two same progs are not allowed. Here we didn't check PREORDER flag.
>> Should we relax this for this patch set?
> So with BPF link-based attachment we already allow multiple same BPF
> programs to be attached, regardless of PREORDER. I don't think we need
> to make any relaxations for old-style program attachment. But your
> mprog API is, effectively ignoring link stuff and checking for
> uniqueness of the program (regardless of whether it came from link or
> not), which is problematic, I think, no?
You are correct. I should check link instead of prog for link-based
attachment in mprog. Will fix.
>
> [...]
>
>>>> @@ -640,7 +765,8 @@ static struct bpf_prog_list *find_attach_entry(struct hlist_head *progs,
>>>> static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>>> struct bpf_prog *prog, struct bpf_prog *replace_prog,
>>>> struct bpf_cgroup_link *link,
>>>> - enum bpf_attach_type type, u32 flags)
>>>> + enum bpf_attach_type type, u32 flags, u32 id_or_fd,
>>>> + u64 revision)
>>>> {
>>>> u32 saved_flags = (flags & (BPF_F_ALLOW_OVERRIDE | BPF_F_ALLOW_MULTI));
>>>> struct bpf_prog *old_prog = NULL;
>>>> @@ -656,6 +782,9 @@ static int __cgroup_bpf_attach(struct cgroup *cgrp,
>>>> ((flags & BPF_F_REPLACE) && !(flags & BPF_F_ALLOW_MULTI)))
>>>> /* invalid combination */
>>>> return -EINVAL;
>>>> + if ((flags & BPF_F_REPLACE) && (flags & (BPF_F_BEFORE | BPF_F_AFTER)))
>>> but can it be that neither is set?
>> I would say it is possible. In that case, the new prog is appended to
>> the end of prog list.
> ok, makes sense
>
>>>> + /* only either replace or insertion with before/after */
>>>> + return -EINVAL;
>>>> if (link && (prog || replace_prog))
>>>> /* only either link or prog/replace_prog can be specified */
>>>> return -EINVAL;
>>> [...]
next prev parent reply other threads:[~2025-05-28 17:23 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-17 16:27 [PATCH bpf-next v3 0/5] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-05-17 16:27 ` [PATCH bpf-next v3 1/5] cgroup: Add bpf prog revisions to struct cgroup_bpf Yonghong Song
2025-05-17 16:27 ` [PATCH bpf-next v3 2/5] bpf: Implement mprog API on top of existing cgroup progs Yonghong Song
2025-05-22 20:45 ` Andrii Nakryiko
2025-05-24 1:03 ` Yonghong Song
2025-05-27 21:36 ` Andrii Nakryiko
2025-05-28 17:23 ` Yonghong Song [this message]
2025-05-17 16:27 ` [PATCH bpf-next v3 3/5] libbpf: Support link-based cgroup attach with options Yonghong Song
2025-05-17 16:27 ` [PATCH bpf-next v3 4/5] selftests/bpf: Move some tc_helpers.h functions to test_progs.h Yonghong Song
2025-05-17 16:27 ` [PATCH bpf-next v3 5/5] selftests/bpf: Add two selftests for mprog API based cgroup progs Yonghong Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5bfb5b19-40bd-4b37-a4a4-00e2a3473447@linux.dev \
--to=yonghong.song@linux.dev \
--cc=andrii.nakryiko@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.