From: Roman Gushchin <roman.gushchin@linux.dev>
To: Suren Baghdasaryan <surenb@google.com>
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Alexei Starovoitov <ast@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
David Rientjes <rientjes@google.com>,
Josh Don <joshdon@google.com>,
Chuyi Zhou <zhouchuyi@bytedance.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org
Subject: Re: [PATCH rfc 00/12] mm: BPF OOM
Date: Tue, 29 Apr 2025 23:01:09 +0000 [thread overview]
Message-ID: <aBFaNcaeYwzoJ13j@google.com> (raw)
In-Reply-To: <CAJuCfpHnND1UJ1ZqiyshPqwbZDfeN41HOUuc7DWQfSM1cATBmQ@mail.gmail.com>
On Tue, Apr 29, 2025 at 03:44:14PM -0700, Suren Baghdasaryan wrote:
> On Sun, Apr 27, 2025 at 8:36 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > This patchset adds an ability to customize the out of memory
> > handling using bpf.
> >
> > It focuses on two parts:
> > 1) OOM handling policy,
> > 2) PSI-based OOM invocation.
> >
> > The idea to use bpf for customizing the OOM handling is not new, but
> > unlike the previous proposal [1], which augmented the existing task
> > ranking-based policy, this one tries to be as generic as possible and
> > leverage the full power of the modern bpf.
> >
> > It provides a generic hook which is called before the existing OOM
> > killer code and allows implementing any policy, e.g. picking a victim
> > task or memory cgroup or potentially even releasing memory in other
> > ways, e.g. deleting tmpfs files (the last one might require some
> > additional but relatively simple changes).
> >
> > The past attempt to implement memory-cgroup aware policy [2] showed
> > that there are multiple opinions on what the best policy is. As it's
> > highly workload-dependent and specific to a concrete way of organizing
> > workloads, the structure of the cgroup tree etc, a customizable
> > bpf-based implementation is preferable over a in-kernel implementation
> > with a dozen on sysctls.
> >
> > The second part is related to the fundamental question on when to
> > declare the OOM event. It's a trade-off between the risk of
> > unnecessary OOM kills and associated work losses and the risk of
> > infinite trashing and effective soft lockups. In the last few years
> > several PSI-based userspace solutions were developed (e.g. OOMd [3] or
> > systemd-OOMd [4]). The common idea was to use userspace daemons to
> > implement custom OOM logic as well as rely on PSI monitoring to avoid
> > stalls. In this scenario the userspace daemon was supposed to handle
> > the majority of OOMs, while the in-kernel OOM killer worked as the
> > last resort measure to guarantee that the system would never deadlock
> > on the memory. But this approach creates additional infrastructure
> > churn: userspace OOM daemon is a separate entity which needs to be
> > deployed, updated, monitored. A completely different pipeline needs to
> > be built to monitor both types of OOM events and collect associated
> > logs. A userspace daemon is more restricted in terms on what data is
> > available to it. Implementing a daemon which can work reliably under a
> > heavy memory pressure in the system is also tricky.
>
> I didn't read the whole patchset yet but want to mention couple
> features that we should not forget:
> - memory reaping. Maybe you already call oom_reap_task_mm() after BPF
> oom-handler kills a process or maybe BPF handler is expected to
> implement it?
> - kill reporting to userspace. I think BPF handler would be expected
> to implement it?
The patchset implements the bpf_oom_kill_process() helper, which kills
the desired process the same way as the kernel oom killer: with the help
of the oom reaper, dumping corresponding stats into dmesg and bumping
corresponding memcg- and system-level stats.
For additional reporting generic bpf<->userspace interaction mechanims
can be used, e.g. ringbuffer.
Thanks!
prev parent reply other threads:[~2025-04-29 23:01 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-28 3:36 [PATCH rfc 00/12] mm: BPF OOM Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 01/12] mm: introduce a bpf hook for OOM handling Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 02/12] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 03/12] bpf: treat fmodret tracing program's arguments as trusted Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 04/12] mm: introduce bpf_oom_kill_process() bpf kfunc Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 05/12] mm: introduce bpf kfuncs to deal with memcg pointers Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 06/12] mm: introduce bpf_get_root_mem_cgroup() bpf kfunc Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 07/12] bpf: selftests: introduce read_cgroup_file() helper Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 08/12] bpf: selftests: bpf OOM handler test Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 09/12] sched: psi: bpf hook to handle psi events Roman Gushchin
2025-04-28 6:11 ` kernel test robot
2025-04-30 0:28 ` Suren Baghdasaryan
2025-04-30 0:58 ` Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 10/12] mm: introduce bpf_out_of_memory() bpf kfunc Roman Gushchin
2025-04-29 11:46 ` Michal Hocko
2025-04-29 21:31 ` Roman Gushchin
2025-04-30 7:27 ` Michal Hocko
2025-04-30 14:53 ` Roman Gushchin
2025-05-05 8:08 ` Michal Hocko
2025-04-28 3:36 ` [PATCH rfc 11/12] bpf: selftests: introduce open_cgroup_file() helper Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 12/12] bpf: selftests: psi handler test Roman Gushchin
2025-04-28 10:43 ` [PATCH rfc 00/12] mm: BPF OOM Matt Bobrowski
2025-04-28 17:24 ` Roman Gushchin
2025-04-29 1:56 ` Kumar Kartikeya Dwivedi
2025-04-29 15:42 ` Roman Gushchin
2025-05-02 17:26 ` Song Liu
2025-04-29 11:42 ` Michal Hocko
2025-04-29 14:44 ` Roman Gushchin
2025-04-29 21:56 ` Suren Baghdasaryan
2025-04-29 22:17 ` Roman Gushchin
2025-04-29 23:01 ` Suren Baghdasaryan
2025-04-29 22:44 ` Suren Baghdasaryan
2025-04-29 23:01 ` Roman Gushchin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aBFaNcaeYwzoJ13j@google.com \
--to=roman.gushchin@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=joshdon@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.