All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 LKML <linux-kernel@vger.kernel.org>,
	 Alexei Starovoitov <ast@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@kernel.org>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 Andrii Nakryiko <andrii@kernel.org>,
	 JP Kobryn <inwardvessel@gmail.com>,
	 linux-mm <linux-mm@kvack.org>,
	 "open list:CONTROL GROUP (CGROUP)" <cgroups@vger.kernel.org>,
	 bpf <bpf@vger.kernel.org>,
	 Martin KaFai Lau <martin.lau@kernel.org>,
	 Song Liu <song@kernel.org>,
	 Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	 Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH v2 06/23] mm: introduce BPF struct ops for OOM handling
Date: Tue, 28 Oct 2025 15:56:31 -0700	[thread overview]
Message-ID: <87frb2octc.fsf@linux.dev> (raw)
In-Reply-To: <CAADnVQ+iEcMaJ68LNt2XxOeJtdZkCzJwDk9ueovQbASrX7WMdg@mail.gmail.com> (Alexei Starovoitov's message of "Tue, 28 Oct 2025 15:07:49 -0700")

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Tue, Oct 28, 2025 at 11:42 AM Roman Gushchin
> <roman.gushchin@linux.dev> wrote:
>>
>> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
>>
>> > On Mon, Oct 27, 2025 at 4:18 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>> >>
>> >> +bool bpf_handle_oom(struct oom_control *oc)
>> >> +{
>> >> +       struct bpf_oom_ops *bpf_oom_ops = NULL;
>> >> +       struct mem_cgroup __maybe_unused *memcg;
>> >> +       int idx, ret = 0;
>> >> +
>> >> +       /* All bpf_oom_ops structures are protected using bpf_oom_srcu */
>> >> +       idx = srcu_read_lock(&bpf_oom_srcu);
>> >> +
>> >> +#ifdef CONFIG_MEMCG
>> >> +       /* Find the nearest bpf_oom_ops traversing the cgroup tree upwards */
>> >> +       for (memcg = oc->memcg; memcg; memcg = parent_mem_cgroup(memcg)) {
>> >> +               bpf_oom_ops = READ_ONCE(memcg->bpf_oom);
>> >> +               if (!bpf_oom_ops)
>> >> +                       continue;
>> >> +
>> >> +               /* Call BPF OOM handler */
>> >> +               ret = bpf_ops_handle_oom(bpf_oom_ops, memcg, oc);
>> >> +               if (ret && oc->bpf_memory_freed)
>> >> +                       goto exit;
>> >> +       }
>> >> +#endif /* CONFIG_MEMCG */
>> >> +
>> >> +       /*
>> >> +        * System-wide OOM or per-memcg BPF OOM handler wasn't successful?
>> >> +        * Try system_bpf_oom.
>> >> +        */
>> >> +       bpf_oom_ops = READ_ONCE(system_bpf_oom);
>> >> +       if (!bpf_oom_ops)
>> >> +               goto exit;
>> >> +
>> >> +       /* Call BPF OOM handler */
>> >> +       ret = bpf_ops_handle_oom(bpf_oom_ops, NULL, oc);
>> >> +exit:
>> >> +       srcu_read_unlock(&bpf_oom_srcu, idx);
>> >> +       return ret && oc->bpf_memory_freed;
>> >> +}
>> >
>> > ...
>> >
>> >> +static int bpf_oom_ops_reg(void *kdata, struct bpf_link *link)
>> >> +{
>> >> +       struct bpf_struct_ops_link *ops_link = container_of(link, struct bpf_struct_ops_link, link);
>> >> +       struct bpf_oom_ops **bpf_oom_ops_ptr = NULL;
>> >> +       struct bpf_oom_ops *bpf_oom_ops = kdata;
>> >> +       struct mem_cgroup *memcg = NULL;
>> >> +       int err = 0;
>> >> +
>> >> +       if (IS_ENABLED(CONFIG_MEMCG) && ops_link->cgroup_id) {
>> >> +               /* Attach to a memory cgroup? */
>> >> +               memcg = mem_cgroup_get_from_ino(ops_link->cgroup_id);
>> >> +               if (IS_ERR_OR_NULL(memcg))
>> >> +                       return PTR_ERR(memcg);
>> >> +               bpf_oom_ops_ptr = bpf_oom_memcg_ops_ptr(memcg);
>> >> +       } else {
>> >> +               /* System-wide OOM handler */
>> >> +               bpf_oom_ops_ptr = &system_bpf_oom;
>> >> +       }
>> >
>> > I don't like the fallback and special case of cgroup_id == 0.
>> > imo it would be cleaner to require CONFIG_MEMCG for this feature
>> > and only allow attach to a cgroup.
>> > There is always a root cgroup that can be attached to and that
>> > handler will be acting as "system wide" oom handler.
>>
>> I thought about it, but then it can't be used on !CONFIG_MEMCG
>> configurations and also before cgroupfs is mounted, root cgroup
>> is created etc.
>
> before that bpf isn't viable either, and oom is certainly not an issue.
>
>> This is why system-wide things are often handled in a
>> special way, e.g. in by PSI (grep system_group_pcpu).
>>
>> I think supporting !CONFIG_MEMCG configurations might be useful for
>> some very stripped down VM's, for example.
>
> I thought I wouldn't need to convince the guy who converted bpf maps
> to memcg and it made it pretty much mandatory for the bpf subsystem :)
> I think the following is long overdue:
> diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
> index eb3de35734f0..af60be6d3d41 100644
> --- a/kernel/bpf/Kconfig
> +++ b/kernel/bpf/Kconfig
> @@ -34,6 +34,7 @@ config BPF_SYSCALL
>         select NET_SOCK_MSG if NET
>         select NET_XGRESS if NET
>         select PAGE_POOL if NET
> +       depends on MEMCG
>         default n
>
> With this we can cleanup a ton of code.
> Let's not add more hacks just because some weird thing
> still wants !MEMCG. If they do, they will survive without bpf.

Ok, this is bold, but why not?
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

Are you going to land it separately, I guess?

  reply	other threads:[~2025-10-28 22:56 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-27 23:17 [PATCH v2 00/23] mm: BPF OOM Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 01/23] bpf: move bpf_struct_ops_link into bpf.h Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 02/23] bpf: initial support for attaching struct ops to cgroups Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 15:57     ` Roman Gushchin
2025-10-29 18:01   ` Song Liu
2025-10-29 20:26     ` Roman Gushchin
2025-10-30 17:22     ` Roman Gushchin
2025-10-30 18:03       ` Song Liu
2025-10-30 18:19         ` Amery Hung
2025-10-30 19:06           ` Roman Gushchin
2025-10-30 21:34             ` Song Liu
2025-10-30 22:42               ` Martin KaFai Lau
2025-10-30 23:14                 ` Roman Gushchin
2025-10-31  0:05                 ` Song Liu
2025-10-30 22:19             ` bpf_st_ops and cgroups. Was: " Alexei Starovoitov
2025-10-30 23:24               ` Roman Gushchin
2025-10-31  3:03                 ` Yafang Shao
2025-10-31  6:14                 ` Song Liu
2025-10-31 11:35                   ` Yafang Shao
2025-10-31 17:37                 ` Alexei Starovoitov
2025-10-29 18:14   ` Tejun Heo
2025-10-29 20:25     ` Roman Gushchin
2025-10-29 20:36       ` Tejun Heo
2025-10-29 21:18         ` Song Liu
2025-10-29 21:27           ` Tejun Heo
2025-10-29 21:37             ` Song Liu
2025-10-29 21:45               ` Tejun Heo
2025-10-30  4:32                 ` Song Liu
2025-10-30 16:13                   ` Tejun Heo
2025-10-30 17:56                     ` Song Liu
2025-10-29 21:53           ` Roman Gushchin
2025-10-29 22:43             ` Alexei Starovoitov
2025-10-29 22:53               ` Tejun Heo
2025-10-29 23:53                 ` Alexei Starovoitov
2025-10-30  0:03                   ` Tejun Heo
2025-10-30  0:16                     ` Alexei Starovoitov
2025-10-30  6:33                       ` Yafang Shao
2025-10-29 21:04   ` Song Liu
2025-10-30  0:43   ` Martin KaFai Lau
2025-10-27 23:17 ` [PATCH v2 03/23] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 04/23] mm: define mem_cgroup_get_from_ino() outside of CONFIG_SHRINKER_DEBUG Roman Gushchin
2025-10-31  8:32   ` Michal Hocko
2025-10-27 23:17 ` [PATCH v2 05/23] mm: declare memcg_page_state_output() in memcontrol.h Roman Gushchin
2025-10-31  8:34   ` Michal Hocko
2025-10-27 23:17 ` [PATCH v2 06/23] mm: introduce BPF struct ops for OOM handling Roman Gushchin
2025-10-27 23:57   ` bot+bpf-ci
2025-10-28 17:45   ` Alexei Starovoitov
2025-10-28 18:42     ` Roman Gushchin
2025-10-28 22:07       ` Alexei Starovoitov
2025-10-28 22:56         ` Roman Gushchin [this message]
2025-10-28 21:33   ` Song Liu
2025-10-28 23:24     ` Roman Gushchin
2025-10-30  0:20   ` Martin KaFai Lau
2025-10-30  5:57   ` Yafang Shao
2025-10-30 14:26     ` Roman Gushchin
2025-10-31  9:02   ` Michal Hocko
2025-11-02 21:36     ` Roman Gushchin
2025-11-03 19:00       ` Michal Hocko
2025-11-04  1:45         ` Roman Gushchin
2025-11-04  8:18           ` Michal Hocko
2025-11-04 18:14             ` Roman Gushchin
2025-11-04 19:22               ` Michal Hocko
2026-01-12 14:54   ` Matt Bobrowski
2026-01-12 17:20     ` Roman Gushchin
2026-01-12 18:20       ` Matt Bobrowski
2025-10-27 23:17 ` [PATCH v2 07/23] mm: introduce bpf_oom_kill_process() bpf kfunc Roman Gushchin
2025-10-31  9:05   ` Michal Hocko
2025-11-02 21:09     ` Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 08/23] mm: introduce BPF kfuncs to deal with memcg pointers Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 16:10     ` Roman Gushchin
2025-10-28 17:12       ` Alexei Starovoitov
2025-10-28 18:03         ` Chris Mason
2025-10-28 18:32           ` Roman Gushchin
2025-10-28 17:42   ` Tejun Heo
2025-10-28 18:12     ` Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 09/23] mm: introduce bpf_get_root_mem_cgroup() BPF kfunc Roman Gushchin
2025-10-27 23:17 ` [PATCH v2 10/23] mm: introduce BPF kfuncs to access memcg statistics and events Roman Gushchin
2025-10-27 23:48   ` bot+bpf-ci
2025-10-28 16:16     ` Roman Gushchin
2025-10-31  9:08   ` Michal Hocko
2025-10-31  9:31 ` [PATCH v2 00/23] mm: BPF OOM Michal Hocko
2025-10-31 16:48   ` Lance Yang
2025-11-02 20:53   ` Roman Gushchin
2025-11-03 18:18     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87frb2octc.fsf@linux.dev \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=inwardvessel@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=mhocko@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.