public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <roman.gushchin@linux.dev>
To: Michal Hocko <mhocko@suse.com>
Cc: Chuyi Zhou <zhouchuyi@bytedance.com>,
	hannes@cmpxchg.org, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, muchun.song@linux.dev, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, wuyun.abel@bytedance.com,
	robin.lu@bytedance.com
Subject: Re: [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task
Date: Mon, 7 Aug 2023 10:28:17 -0700	[thread overview]
Message-ID: <ZNEpsUFgKFIAAgrp@P9FQF9L96D.lan> (raw)
In-Reply-To: <ZNCXgsZL7bKsCEBM@dhcp22.suse.cz>

On Mon, Aug 07, 2023 at 09:04:34AM +0200, Michal Hocko wrote:
> On Mon 07-08-23 10:21:09, Chuyi Zhou wrote:
> > 
> > 
> > 在 2023/8/4 21:34, Michal Hocko 写道:
> > > On Fri 04-08-23 21:15:57, Chuyi Zhou wrote:
> > > [...]
> > > > > +	switch (bpf_oom_evaluate_task(task, oc, &points)) {
> > > > > +		case -EOPNOTSUPP: break; /* No BPF policy */
> > > > > +		case -EBUSY: goto abort; /* abort search process */
> > > > > +		case 0: goto next; /* ignore process */
> > > > > +		default: goto select; /* note the task */
> > > > > +	}

To be honest, I can't say I like it. IMO it's not really using the full bpf
potential and is too attached to the current oom implementation.

First, I'm a bit concerned about implicit restrictions we apply to bpf programs
which will be executed potentially thousands times under a very heavy memory
pressure. We will need to make sure that they don't allocate (much) memory, don't
take any locks which might deadlock with other memory allocations etc.
It will potentially require hard restrictions on what these programs can and can't
do and this is something that the bpf community will have to maintain long-term.

Second, if we're introducing bpf here (which I'm not yet convinced),
IMO we should use it in a more generic and expressive way.
Instead of adding hooks into the existing oom killer implementation, we can call
a bpf program before invoking the in-kernel oom killer and let it do whatever
it takes to free some memory. E.g. we can provide it with an API to kill individual
tasks as well as all tasks in a cgroup.
This approach is more generic and will allow to solve certain problems which
can't be solved by the current oom killer, e.g. deleting files from a tmpfs
instead of killing tasks.

So I think the alternative approach is to provide some sort of an interface to
pre-select oom victims in advance. E.g. on memcg level it can look like:

echo PID >> memory.oom.victim_proc

If the list is empty, the default oom killer is invoked.
If there are tasks, the first one is killed on OOM.
A similar interface can exist to choose between sibling cgroups:

echo CGROUP_NAME >> memory.oom.victim_cgroup

This is just a rough idea.

Thanks!

  reply	other threads:[~2023-08-07 17:28 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-04  9:38 [RFC PATCH 0/2] mm: Select victim using bpf_select_task Chuyi Zhou
2023-08-04  9:38 ` [RFC PATCH 1/2] mm, oom: Introduce bpf_select_task Chuyi Zhou
2023-08-04 11:29   ` Michal Hocko
2023-08-04 13:15     ` Chuyi Zhou
2023-08-04 13:34       ` Michal Hocko
2023-08-07  2:21         ` Chuyi Zhou
2023-08-07  7:04           ` Michal Hocko
2023-08-07 17:28             ` Roman Gushchin [this message]
2023-08-08  8:18               ` Michal Hocko
2023-08-08 21:41                 ` Roman Gushchin
2023-08-09  7:53                   ` Michal Hocko
2023-08-10  4:00                     ` Abel Wu
2023-08-15 19:52                       ` Roman Gushchin
2023-08-10 19:41                     ` Martin KaFai Lau
2023-08-15 19:03                       ` Roman Gushchin
2023-08-14 11:25                     ` Chuyi Zhou
2023-08-22 12:42                       ` Michal Hocko
2023-08-04 11:34   ` Alan Maguire
2023-08-04 23:55     ` Chuyi Zhou
2023-08-07  8:32       ` Michal Hocko
2023-08-04  9:38 ` [RFC PATCH 2/2] bpf: Add OOM policy test Chuyi Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNEpsUFgKFIAAgrp@P9FQF9L96D.lan \
    --to=roman.gushchin@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=robin.lu@bytedance.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=zhouchuyi@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox