linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: "程垲涛 Chengkaitao Cheng" <chengkaitao@didiglobal.com>,
	"tj@kernel.org" <tj@kernel.org>,
	"lizefan.x@bytedance.com" <lizefan.x@bytedance.com>,
	"hannes@cmpxchg.org" <hannes@cmpxchg.org>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"roman.gushchin@linux.dev" <roman.gushchin@linux.dev>,
	"shakeelb@google.com" <shakeelb@google.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"brauner@kernel.org" <brauner@kernel.org>,
	"muchun.song@linux.dev" <muchun.song@linux.dev>,
	"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
	"zhengqi.arch@bytedance.com" <zhengqi.arch@bytedance.com>,
	"ebiederm@xmission.com" <ebiederm@xmission.com>,
	"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"chengzhihao1@huawei.com" <chengzhihao1@huawei.com>,
	"pilgrimtao@gmail.com" <pilgrimtao@gmail.com>,
	"haolee.swjtu@gmail.com" <haolee.swjtu@gmail.com>,
	"yuzhao@google.com" <yuzhao@google.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"vasily.averin@linux.dev" <vasily.averin@linux.dev>,
	"vbabka@suse.cz" <vbabka@suse.cz>,
	"surenb@google.com" <surenb@google.com>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"mcgrof@kernel.org" <mcgrof@kernel.org>,
	"sujiaxun@uniontech.com" <sujiaxun@uniontech.com>,
	"feng.tang@intel.com" <feng.tang@intel.com>,
	"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
	"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"David Rientjes" <rientjes@google.com>
Subject: Re: [PATCH v3 0/2] memcontrol: support cgroup level OOM protection
Date: Tue, 13 Jun 2023 01:36:51 -0700	[thread overview]
Message-ID: <CAJD7tkawYZAWKYgttgtPjscnZTARj+QaGZLGiMiSadwC3oCELQ@mail.gmail.com> (raw)
In-Reply-To: <ZIgodGWoC/R07eak@dhcp22.suse.cz>

+David Rientjes

On Tue, Jun 13, 2023 at 1:27 AM Michal Hocko <mhocko@suse.com> wrote:
>
> On Sun 04-06-23 01:25:42, Yosry Ahmed wrote:
> [...]
> > There has been a parallel discussion in the cover letter thread of v4
> > [1]. To summarize, at Google, we have been using OOM scores to
> > describe different job priorities in a more explicit way -- regardless
> > of memory usage. It is strictly priority-based OOM killing. Ties are
> > broken based on memory usage.
> >
> > We understand that something like memory.oom.protect has an advantage
> > in the sense that you can skip killing a process if you know that it
> > won't free enough memory anyway, but for an environment where multiple
> > jobs of different priorities are running, we find it crucial to be
> > able to define strict ordering. Some jobs are simply more important
> > than others, regardless of their memory usage.
>
> I do remember that discussion. I am not a great fan of simple priority
> based interfaces TBH. It sounds as an easy interface but it hits
> complications as soon as you try to define a proper/sensible
> hierarchical semantic. I can see how they might work on leaf memcgs with
> statically assigned priorities but that sounds like a very narrow
> usecase IMHO.

Do you mind elaborating the problem with the hierarchical semantics?

The way it works with our internal implementation is (imo) sensible
and straightforward from a hierarchy POV. Starting at the OOM memcg
(which can be root), we recursively compare the OOM scores of the
children memcgs and pick the one with the lowest score, until we
arrive at a leaf memcg. Within that leaf, we also define per-process
scores, but these are less important to us.

I am not sure I understand why this is not sensible from a hierarchy
POV or a very narrow use case. Not that all this is optional, by
default all memcgs are given the same score, and ties are broken based
on per-memcg (or per-process) usage.

>
> I do not think we can effort a plethora of different OOM selection
> algorithms implemented in the kernel. Therefore we should really
> consider a control interface to be as much extensible and in line
> with the existing interfaces as much as possible. That is why I am
> really open to the oom protection concept which fits reasonably well
> to the reclaim protection scheme. After all oom killer is just a very
> aggressive method of the memory reclaim.
>
> On the other hand I can see a need to customizable OOM victim selection
> functionality. We've been through that discussion on several other
> occasions and the best thing we could come up with was to allow to plug
> BPF into the victim selection process and allow to bypass the system
> default method. No code has ever materialized from those discussions
> though. Maybe this is the time to revive that idea again?

That definitely sounds interesting, and it was brought up before. It
does sound like BPF (or a different customization framework) can be
the answer here. Interested to hear what others think as well.

>
> > It would be great if we can arrive at an interface that serves this
> > use case as well.
> >
> > Thanks!
> >
> > [1]https://lore.kernel.org/linux-mm/CAJD7tkaQdSTDX0Q7zvvYrA3Y4TcvLdWKnN3yc8VpfWRpUjcYBw@mail.gmail.com/
> --
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2023-06-13  8:37 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-06 11:49 [PATCH v3 0/2] memcontrol: support cgroup level OOM protection chengkaitao
2023-05-06 11:49 ` [PATCH v3 1/2] mm: memcontrol: protect the memory in cgroup from being oom killed chengkaitao
2023-05-06 14:27   ` kernel test robot
2023-05-06 11:49 ` [PATCH v3 2/2] memcg: add oom_kill_inherit event indicator chengkaitao
2023-05-07 10:11 ` [PATCH v3 0/2] memcontrol: support cgroup level OOM protection Michal Hocko
2023-05-08  9:08   ` 程垲涛 Chengkaitao Cheng
2023-05-08 14:18     ` Michal Hocko
2023-05-09  6:50       ` 程垲涛 Chengkaitao Cheng
2023-05-22 13:03         ` Michal Hocko
2023-05-25  7:35           ` 程垲涛 Chengkaitao Cheng
2023-05-29 14:02             ` Michal Hocko
2023-06-04  8:05               ` 程垲涛 Chengkaitao Cheng
2023-06-13  8:16                 ` Michal Hocko
2023-06-04  8:25       ` Yosry Ahmed
2023-06-13  8:27         ` Michal Hocko
2023-06-13  8:36           ` Yosry Ahmed [this message]
2023-06-13 12:06             ` Michal Hocko
2023-06-13 20:24               ` Yosry Ahmed
2023-06-15 10:39                 ` Michal Hocko
2023-06-16  1:44                   ` Yosry Ahmed
2023-06-13  8:40           ` tj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkawYZAWKYgttgtPjscnZTARj+QaGZLGiMiSadwC3oCELQ@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengkaitao@didiglobal.com \
    --cc=chengzhihao1@huawei.com \
    --cc=corbet@lwn.net \
    --cc=ebiederm@xmission.com \
    --cc=feng.tang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=haolee.swjtu@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mcgrof@kernel.org \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=pilgrimtao@gmail.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sfr@canb.auug.org.au \
    --cc=shakeelb@google.com \
    --cc=sujiaxun@uniontech.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=vasily.averin@linux.dev \
    --cc=vbabka@suse.cz \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).