All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: linux-mm@kvack.org,  bpf@vger.kernel.org,
	 Suren Baghdasaryan <surenb@google.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@suse.com>,
	 David Rientjes <rientjes@google.com>,
	 Matt Bobrowski <mattbobrowski@google.com>,
	 Song Liu <song@kernel.org>,  Alexei Starovoitov <ast@kernel.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 01/14] mm: introduce bpf struct ops for OOM handling
Date: Wed, 20 Aug 2025 19:22:31 -0700	[thread overview]
Message-ID: <875xehh0rc.fsf@linux.dev> (raw)
In-Reply-To: <CAP01T76xFkhsQKCtCynnHR4t6KyciQ4=VW2jhF8mcZEVBjsF1w@mail.gmail.com> (Kumar Kartikeya Dwivedi's message of "Thu, 21 Aug 2025 02:36:49 +0200")

Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:

> On Thu, 21 Aug 2025 at 02:25, Roman Gushchin <roman.gushchin@linux.dev> wrote:
>>
>> Kumar Kartikeya Dwivedi <memxor@gmail.com> writes:
>>
>> > On Mon, 18 Aug 2025 at 19:01, Roman Gushchin <roman.gushchin@linux.dev> wrote:
>> >>
>> >> Introduce a bpf struct ops for implementing custom OOM handling policies.
>> >>
>> >> The struct ops provides the bpf_handle_out_of_memory() callback,
>> >> which expected to return 1 if it was able to free some memory and 0
>> >> otherwise.
>> >>
>> >> In the latter case it's guaranteed that the in-kernel OOM killer will
>> >> be invoked. Otherwise the kernel also checks the bpf_memory_freed
>> >> field of the oom_control structure, which is expected to be set by
>> >> kfuncs suitable for releasing memory. It's a safety mechanism which
>> >> prevents a bpf program to claim forward progress without actually
>> >> releasing memory. The callback program is sleepable to enable using
>> >> iterators, e.g. cgroup iterators.
>> >>
>> >> The callback receives struct oom_control as an argument, so it can
>> >> easily filter out OOM's it doesn't want to handle, e.g. global vs
>> >> memcg OOM's.
>> >>
>> >> The callback is executed just before the kernel victim task selection
>> >> algorithm, so all heuristics and sysctls like panic on oom,
>> >> sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
>> >> are respected.
>> >>
>> >> The struct ops also has the name field, which allows to define a
>> >> custom name for the implemented policy. It's printed in the OOM report
>> >> in the oom_policy=<policy> format. "default" is printed if bpf is not
>> >> used or policy name is not specified.
>> >>
>> >> [  112.696676] test_progs invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
>> >>                oom_policy=bpf_test_policy
>> >> [  112.698160] CPU: 1 UID: 0 PID: 660 Comm: test_progs Not tainted 6.16.0-00015-gf09eb0d6badc #102 PREEMPT(full)
>> >> [  112.698165] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-5.fc42 04/01/2014
>> >> [  112.698167] Call Trace:
>> >> [  112.698177]  <TASK>
>> >> [  112.698182]  dump_stack_lvl+0x4d/0x70
>> >> [  112.698192]  dump_header+0x59/0x1c6
>> >> [  112.698199]  oom_kill_process.cold+0x8/0xef
>> >> [  112.698206]  bpf_oom_kill_process+0x59/0xb0
>> >> [  112.698216]  bpf_prog_7ecad0f36a167fd7_test_out_of_memory+0x2be/0x313
>> >> [  112.698229]  bpf__bpf_oom_ops_handle_out_of_memory+0x47/0xaf
>> >> [  112.698236]  ? srso_alias_return_thunk+0x5/0xfbef5
>> >> [  112.698240]  bpf_handle_oom+0x11a/0x1e0
>> >> [  112.698250]  out_of_memory+0xab/0x5c0
>> >> [  112.698258]  mem_cgroup_out_of_memory+0xbc/0x110
>> >> [  112.698274]  try_charge_memcg+0x4b5/0x7e0
>> >> [  112.698288]  charge_memcg+0x2f/0xc0
>> >> [  112.698293]  __mem_cgroup_charge+0x30/0xc0
>> >> [  112.698299]  do_anonymous_page+0x40f/0xa50
>> >> [  112.698311]  __handle_mm_fault+0xbba/0x1140
>> >> [  112.698317]  ? srso_alias_return_thunk+0x5/0xfbef5
>> >> [  112.698335]  handle_mm_fault+0xe6/0x370
>> >> [  112.698343]  do_user_addr_fault+0x211/0x6a0
>> >> [  112.698354]  exc_page_fault+0x75/0x1d0
>> >> [  112.698363]  asm_exc_page_fault+0x26/0x30
>> >> [  112.698366] RIP: 0033:0x7fa97236db00
>> >>
>> >> It's possible to load multiple bpf struct programs. In the case of
>> >> oom, they will be executed one by one in the same order they been
>> >> loaded until one of them returns 1 and bpf_memory_freed is set to 1
>> >> - an indication that the memory was freed. This allows to have
>> >> multiple bpf programs to focus on different types of OOM's - e.g.
>> >> one program can only handle memcg OOM's in one memory cgroup.
>> >> But the filtering is done in bpf - so it's fully flexible.
>> >
>> > I think a natural question here is ordering. Is this ability to have
>> > multiple OOM programs critical right now?
>>
>> Good question. Initially I had only supported a single bpf policy.
>> But then I realized that likely people would want to have different
>> policies handling different parts of the cgroup tree.
>> E.g. a global policy and several policies handling OOMs only
>> in some memory cgroups.
>> So having just a single policy is likely a no go.
>
> If the ordering is more to facilitate scoping, would it then be better
> to support attaching the policy to specific memcg/cgroup?

Well, it has some advantages and disadvantages. First, it will require
way more infrastructure on the memcg side. Second, the interface is not
super clear: we don't want to have a struct ops per cgroup, I guess.
And in many case a single policy for all memcgs is just fine, so asking
the user to attach it to all memcgs is just adding a toil and creating
all kinds of races.
So I see your point, but I'm not yet convinced, to be honest.

Thanks!

  reply	other threads:[~2025-08-21  2:22 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-18 17:01 [PATCH v1 00/14] mm: BPF OOM Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 01/14] mm: introduce bpf struct ops for OOM handling Roman Gushchin
2025-08-19  4:09   ` Suren Baghdasaryan
2025-08-19 20:06     ` Roman Gushchin
2025-08-20 19:34       ` Suren Baghdasaryan
2025-08-20 19:52         ` Roman Gushchin
2025-08-20 20:01           ` Suren Baghdasaryan
2025-08-26 16:23         ` Amery Hung
2025-08-20 11:28   ` Kumar Kartikeya Dwivedi
2025-08-21  0:24     ` Roman Gushchin
2025-08-21  0:36       ` Kumar Kartikeya Dwivedi
2025-08-21  2:22         ` Roman Gushchin [this message]
2025-08-21 15:54           ` Suren Baghdasaryan
2025-08-22 19:27       ` Martin KaFai Lau
2025-08-25 17:00         ` Roman Gushchin
2025-08-26 18:01           ` Martin KaFai Lau
2025-08-26 19:52             ` Alexei Starovoitov
2025-08-27 18:28               ` Roman Gushchin
2025-09-02 17:31               ` Roman Gushchin
2025-09-02 22:30                 ` Martin KaFai Lau
2025-09-02 23:36                   ` Roman Gushchin
2025-10-04  2:00                   ` Roman Gushchin
2025-10-06 23:21                     ` Andrii Nakryiko
2025-10-06 23:52                       ` Roman Gushchin
2025-10-06 23:57                         ` Andrii Nakryiko
2025-10-07  0:41                           ` Roman Gushchin
2025-10-08  1:07                             ` Song Liu
2025-10-08  2:15                               ` Roman Gushchin
2025-10-08  7:03                                 ` Song Liu
2025-10-08 17:02                                   ` Roman Gushchin
2025-10-07  2:25                     ` Martin KaFai Lau
2025-09-03  0:29                 ` Tejun Heo
2025-09-03 23:30                   ` Roman Gushchin
2025-09-04  6:39                     ` Tejun Heo
2025-09-04 14:32                       ` Roman Gushchin
2025-09-04 16:26                         ` Alexei Starovoitov
2025-09-04 16:58                           ` Tejun Heo
2025-08-26 16:56   ` Amery Hung
2025-08-18 17:01 ` [PATCH v1 02/14] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Roman Gushchin
2025-08-20  9:17   ` Kumar Kartikeya Dwivedi
2025-08-20 22:32     ` Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 03/14] mm: introduce bpf_oom_kill_process() bpf kfunc Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 04/14] mm: introduce bpf kfuncs to deal with memcg pointers Roman Gushchin
2025-08-20  9:21   ` Kumar Kartikeya Dwivedi
2025-08-20 22:43     ` Roman Gushchin
2025-08-20 23:33       ` Kumar Kartikeya Dwivedi
2025-08-18 17:01 ` [PATCH v1 05/14] mm: introduce bpf_get_root_mem_cgroup() bpf kfunc Roman Gushchin
2025-08-20  9:25   ` Kumar Kartikeya Dwivedi
2025-08-20 22:45     ` Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 06/14] mm: introduce bpf_out_of_memory() " Roman Gushchin
2025-08-19  4:09   ` Suren Baghdasaryan
2025-08-19 20:16     ` Roman Gushchin
2025-08-20  9:34   ` Kumar Kartikeya Dwivedi
2025-08-20 22:59     ` Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 07/14] mm: allow specifying custom oom constraint for bpf triggers Roman Gushchin
2025-10-02 16:37   ` ChaosEsque Team
2025-08-18 17:01 ` [PATCH v1 08/14] mm: introduce bpf_task_is_oom_victim() kfunc Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 09/14] bpf: selftests: introduce read_cgroup_file() helper Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 10/14] bpf: selftests: bpf OOM handler test Roman Gushchin
2025-08-20  9:33   ` Kumar Kartikeya Dwivedi
2025-08-20 22:49     ` Roman Gushchin
2025-08-20 20:23   ` Andrii Nakryiko
2025-08-21  0:10     ` Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 11/14] sched: psi: refactor psi_trigger_create() Roman Gushchin
2025-08-19  4:09   ` Suren Baghdasaryan
2025-08-19 20:28     ` Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 12/14] sched: psi: implement psi trigger handling using bpf Roman Gushchin
2025-08-19  4:11   ` Suren Baghdasaryan
2025-08-19 22:31     ` Roman Gushchin
2025-08-19 23:31       ` Roman Gushchin
2025-08-20 23:56         ` Suren Baghdasaryan
2025-08-26 17:03   ` Amery Hung
2025-08-18 17:01 ` [PATCH v1 13/14] sched: psi: implement bpf_psi_create_trigger() kfunc Roman Gushchin
2025-08-20 20:30   ` Andrii Nakryiko
2025-08-21  0:36     ` Roman Gushchin
2025-08-22 19:13       ` Andrii Nakryiko
2025-08-22 19:57       ` Martin KaFai Lau
2025-08-25 16:56         ` Roman Gushchin
2025-08-18 17:01 ` [PATCH v1 14/14] bpf: selftests: psi struct ops test Roman Gushchin
2025-08-19  4:08 ` [PATCH v1 00/14] mm: BPF OOM Suren Baghdasaryan
2025-08-19 19:52   ` Roman Gushchin
2025-08-20 21:06 ` Shakeel Butt
2025-08-21  0:01   ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875xehh0rc.fsf@linux.dev \
    --to=roman.gushchin@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mattbobrowski@google.com \
    --cc=memxor@gmail.com \
    --cc=mhocko@suse.com \
    --cc=rientjes@google.com \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.