Re: [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Roman Gushchin <roman.gushchin@linux.dev>
To: Amery Hung <ameryhung@gmail.com>
Cc: Song Liu <song@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	 Alexei Starovoitov <ast@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@kernel.org>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 Andrii Nakryiko <andrii@kernel.org>,
	 JP Kobryn <inwardvessel@gmail.com>,
	 linux-mm@kvack.org, cgroups@vger.kernel.org,
	 bpf@vger.kernel.org,  Martin KaFai Lau <martin.lau@kernel.org>,
	 Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups
Date: Thu, 30 Oct 2025 12:06:42 -0700	[thread overview]
Message-ID: <877bwcus3h.fsf@linux.dev> (raw)
In-Reply-To: <CAMB2axMkYS1j=KeECZQ9rnupP8kw7dn1LnGV4udxMp=f=qoEQA@mail.gmail.com> (Amery Hung's message of "Thu, 30 Oct 2025 11:19:52 -0700")

Amery Hung <ameryhung@gmail.com> writes:

> On Thu, Oct 30, 2025 at 11:09 AM Song Liu <song@kernel.org> wrote:
>>
>> On Thu, Oct 30, 2025 at 10:22 AM Roman Gushchin
>> <roman.gushchin@linux.dev> wrote:
>> >
>> > Song Liu <song@kernel.org> writes:
>> >
>> > > On Mon, Oct 27, 2025 at 4:17 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>> > > [...]
>> > >>  struct bpf_struct_ops_value {
>> > >>         struct bpf_struct_ops_common_value common;
>> > >> @@ -1359,6 +1360,18 @@ int bpf_struct_ops_link_create(union bpf_attr *attr)
>> > >>         }
>> > >>         bpf_link_init(&link->link, BPF_LINK_TYPE_STRUCT_OPS, &bpf_struct_ops_map_lops, NULL,
>> > >>                       attr->link_create.attach_type);
>> > >> +#ifdef CONFIG_CGROUPS
>> > >> +       if (attr->link_create.cgroup.relative_fd) {
>> > >> +               struct cgroup *cgrp;
>> > >> +
>> > >> +               cgrp = cgroup_get_from_fd(attr->link_create.cgroup.relative_fd);
>> > >
>> > > We should use "target_fd" here, not relative_fd.
>> > >
>> > > Also, 0 is a valid fd, so we cannot use target_fd == 0 to attach to
>> > > global memcg.
>> >
>> > Yep, but then we need somehow signal there is a cgroup fd passed,
>> > so that struct ops'es which are not attached to cgroups keep working
>> > as previously. And we can't use link_create.attach_type.
>> >
>> > Should I use link_create.flags? E.g. something like add new flag
>> >
>> > @@ -1224,6 +1224,7 @@ enum bpf_perf_event_type {
>> >  #define BPF_F_AFTER            (1U << 4)
>> >  #define BPF_F_ID               (1U << 5)
>> >  #define BPF_F_PREORDER         (1U << 6)
>> > +#define BPF_F_CGROUP           (1U << 7)
>> >  #define BPF_F_LINK             BPF_F_LINK /* 1 << 13 */
>> >
>> >  /* If BPF_F_STRICT_ALIGNMENT is used in BPF_PROG_LOAD command, the
>> >
>> > and then do something like this:
>> >
>> > int bpf_struct_ops_link_create(union bpf_attr *attr)
>> > {
>> >         <...>
>> >         if (attr->link_create.flags & BPF_F_CGROUP) {
>> >                 struct cgroup *cgrp;
>> >
>> >                 cgrp = cgroup_get_from_fd(attr->link_create.target_fd);
>> >                 if (IS_ERR(cgrp)) {
>> >                         err = PTR_ERR(cgrp);
>> >                         goto err_out;
>> >                 }
>> >
>> >                 link->cgroup_id = cgroup_id(cgrp);
>> >                 cgroup_put(cgrp);
>> >         }
>> >
>> > Does it sound right?
>>
>> I believe adding a flag (BPF_F_CGROUP or some other name), is the
>> right solution for this.
>>
>> OTOH, I am not sure whether we want to add cgroup fd/id to the
>> bpf link. I personally prefer the model used by TCP congestion
>> control: the link attaches the struct_ops to a global list, then each
>> user picks a struct_ops from the list. But I do agree this might be
>> an overkill for cgroup use cases.
>
> +1.
>
> In TCP congestion control and BPF qdisc's model:
>
> During link_create, both adds the struct_ops to a list, and the
> struct_ops can be indexed by name. The struct_ops are not "active" by
> this time.
> Then, each has their own interface to 'apply' the struct_ops to a
> socket or queue: setsockopt() or netlink.
>
> But maybe cgroup-related struct_ops are different.

Both tcp congestion and qdisk cases are somewhat different because
there already is a way to select between multiple implementations, bpf
just adds another one. In the oom case, it's not true. As of today,
there is only one (global) oom killer. Of course we can create
interfaces to allow a user make a choice. But the question is do we want
to create such interface for the oom case specifically (and later for
each new case separately), or there is a place for some generalization?


Ok, let me summarize the options we discussed here:

1) Make the attachment details (e.g. cgroup_id) the part of struct ops
itself. The attachment is happening at the reg() time.

  +: It's convenient for complex stateful struct ops'es, because a
      single entity represents a combination of code and data.
  -: No way to attach a single struct ops to multiple entities.

This approach is used by Tejun for per-cgroup sched_ext prototype.

2) Make the attachment details a part of bpf_link creation. The
attachment is still happening at the reg() time.

  +: A single struct ops can be attached to multiple entities.
  -: Implementing stateful struct ops'es is harder and requires passing
     an additional argument (some sort of "self") to all callbacks.

I'm using this approach in the bpf oom proposal.

3) Move the attachment out of .reg() scope entirely. reg() will register
the implementation system-wide and then some 3rd-party interface
(e.g. cgroupfs) should be used to select the implementation.

  +: ?
  -: New hard-coded interfaces might be required to enable bpf-driven
     kernel customization. The "attachment" code is not shared between
     various struct ops cases.
     Implementing stateful struct ops'es is harder and requires passing
     an additional argument (some sort of "self") to all callbacks.

This approach works well for cases when there is already a selection
of implementations (e.g. tcp congestion mechanisms), and bpf is adding
another one.

I personally lean towards 2), but I can easily implement bpf_oom with 1)
and most likely with 3) too, but it's a bit more complicated.

So I guess we all need to come to an agreement which way to go.

Thanks!

next prev parent reply	other threads:[~2025-10-30 19:06 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 23:17 [PATCH v2 00/23] mm: BPF OOM Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 01/23] bpf: move bpf_struct_ops_link into bpf.h Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 15:57     ` Roman Gushchin
2025-10-29 18:01   ` Song Liu
2025-10-29 20:26     ` Roman Gushchin
2025-10-30 17:22     ` Roman Gushchin
2025-10-30 18:03       ` Song Liu
2025-10-30 18:19         ` Amery Hung
2025-10-30 19:06           ` Roman Gushchin [this message]
2025-10-30 21:34             ` Song Liu
2025-10-30 22:42               ` Martin KaFai Lau
2025-10-30 23:14                 ` Roman Gushchin
2025-10-31  0:05                 ` Song Liu
2025-10-30 22:19             ` bpf_st_ops and cgroups. Was: " Alexei Starovoitov
2025-10-30 23:24               ` Roman Gushchin
2025-10-31  3:03                 ` Yafang Shao
2025-10-31  6:14                 ` Song Liu
2025-10-31 11:35                   ` Yafang Shao
2025-10-31 17:37                 ` Alexei Starovoitov
2025-10-29 18:14   ` Tejun Heo
2025-10-29 20:25     ` Roman Gushchin
2025-10-29 20:36       ` Tejun Heo
2025-10-29 21:18         ` Song Liu
2025-10-29 21:27           ` Tejun Heo
2025-10-29 21:37             ` Song Liu
2025-10-29 21:45               ` Tejun Heo
2025-10-30  4:32                 ` Song Liu
2025-10-30 16:13                   ` Tejun Heo
2025-10-30 17:56                     ` Song Liu
2025-10-29 21:53           ` Roman Gushchin
2025-10-29 22:43             ` Alexei Starovoitov
2025-10-29 22:53               ` Tejun Heo
2025-10-29 23:53                 ` Alexei Starovoitov
2025-10-30  0:03                   ` Tejun Heo
2025-10-30  0:16                     ` Alexei Starovoitov
2025-10-30  6:33                       ` Yafang Shao
2025-10-29 21:04   ` Song Liu
2025-10-30  0:43   ` Martin KaFai Lau
2025-10-27 23:17 ` [PATCH v2 03/23] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 04/23] mm: define mem_cgroup_get_from_ino() outside of CONFIG_SHRINKER_DEBUG Roman Gushchin
2025-10-31  8:32   ` Michal Hocko
2025-10-27 23:17 ` [PATCH v2 05/23] mm: declare memcg_page_state_output() in memcontrol.h Roman Gushchin
2025-10-31  8:34   ` Michal Hocko
2025-10-27 23:17 ` [PATCH v2 06/23] mm: introduce BPF struct ops for OOM handling Roman Gushchin
2025-10-27 23:57   ` bot+bpf-ci
2025-10-28 17:45   ` Alexei Starovoitov
2025-10-28 18:42     ` Roman Gushchin
2025-10-28 22:07       ` Alexei Starovoitov
2025-10-28 22:56         ` Roman Gushchin
2025-10-28 21:33   ` Song Liu
2025-10-28 23:24     ` Roman Gushchin
2025-10-30  0:20   ` Martin KaFai Lau
2025-10-30  5:57   ` Yafang Shao
2025-10-30 14:26     ` Roman Gushchin
2025-10-31  9:02   ` Michal Hocko
2025-11-02 21:36     ` Roman Gushchin
2025-11-03 19:00       ` Michal Hocko
2025-11-04  1:45         ` Roman Gushchin
2025-11-04  8:18           ` Michal Hocko
2025-11-04 18:14             ` Roman Gushchin
2025-11-04 19:22               ` Michal Hocko
2025-10-27 23:17 ` [PATCH v2 07/23] mm: introduce bpf_oom_kill_process() bpf kfunc Roman Gushchin
2025-10-31  9:05   ` Michal Hocko
2025-11-02 21:09     ` Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 08/23] mm: introduce BPF kfuncs to deal with memcg pointers Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 16:10     ` Roman Gushchin
2025-10-28 17:12       ` Alexei Starovoitov
2025-10-28 18:03         ` Chris Mason
2025-10-28 18:32           ` Roman Gushchin
2025-10-28 17:42   ` Tejun Heo
2025-10-28 18:12     ` Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 09/23] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 10/23] mm: introduce BPF kfuncs to access memcg statistics and events Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 16:16     ` Roman Gushchin
2025-10-31  9:08   ` Michal Hocko
2025-10-31  9:31 ` [PATCH v2 00/23] mm: BPF OOM Michal Hocko
2025-10-31 16:48   ` Lance Yang
2025-11-02 20:53   ` Roman Gushchin
2025-11-03 18:18     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877bwcus3h.fsf@linux.dev \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=ameryhung@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=mhocko@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).