From: Roman Gushchin <guro@fb.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: <linux-mm@kvack.org>, Vladimir Davydov <vdavydov.dev@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
David Rientjes <rientjes@google.com>, Tejun Heo <tj@kernel.org>,
<kernel-team@fb.com>, <cgroups@vger.kernel.org>,
<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [v4 2/4] mm, oom: cgroup-aware OOM killer
Date: Thu, 3 Aug 2017 13:47:51 +0100 [thread overview]
Message-ID: <20170803124751.GA24563@castle.dhcp.TheFacebook.com> (raw)
In-Reply-To: <20170802072900.GA2524@dhcp22.suse.cz>
On Wed, Aug 02, 2017 at 09:29:01AM +0200, Michal Hocko wrote:
> On Tue 01-08-17 19:13:52, Roman Gushchin wrote:
> > On Tue, Aug 01, 2017 at 07:03:03PM +0200, Michal Hocko wrote:
> > > On Tue 01-08-17 16:25:48, Roman Gushchin wrote:
> > > > On Tue, Aug 01, 2017 at 04:54:35PM +0200, Michal Hocko wrote:
> > > [...]
> > > > > I would reap out the oom_kill_process into a separate patch.
> > > >
> > > > It was a separate patch, I've merged it based on Vladimir's feedback.
> > > > No problems, I can divide it back.
> > >
> > > It would make the review slightly more easier
> > > >
> > > > > > -static void oom_kill_process(struct oom_control *oc, const char *message)
> > > > > > +static void __oom_kill_process(struct task_struct *victim)
> > > > >
> > > > > To the rest of the patch. I have to say I do not quite like how it is
> > > > > implemented. I was hoping for something much simpler which would hook
> > > > > into oom_evaluate_task. If a task belongs to a memcg with kill-all flag
> > > > > then we would update the cumulative memcg badness (more specifically the
> > > > > badness of the topmost parent with kill-all flag). Memcg will then
> > > > > compete with existing self contained tasks (oom_badness will have to
> > > > > tell whether points belong to a task or a memcg to allow the caller to
> > > > > deal with it). But it shouldn't be much more complex than that.
> > > >
> > > > I'm not sure, it will be any simpler. Basically I'm doing the same:
> > > > the difference is that you want to iterate over tasks and for each
> > > > task traverse the memcg tree, update per-cgroup oom score and find
> > > > the corresponding memcg(s) with the kill-all flag. I'm doing the opposite:
> > > > traverse the cgroup tree, and for each leaf cgroup iterate over processes.
> > >
> > > Yeah but this doesn't fit very well to the existing scheme so we would
> > > need two different schemes which is not ideal from maint. point of view.
> > > We also do not have to duplicate all the tricky checks we already do in
> > > oom_evaluate_task. So I would prefer if we could try to hook there and
> > > do the special handling there.
> >
> > I hope, that iterating over all tasks just to check if there are
> > in-flight OOM victims might be optimized at some point.
> > That means, we would be able to choose a victim much cheaper.
> > It's not easy, but it feels as a right direction to go.
>
> You would have to count per each oom domain and that sounds quite
> unfeasible to me.
It's hard, but traversing the whole cgroup tree from bottom to top
for each task is just not scalable.
This is exactly why I've choosen a compromise right now: let's
iterate over all tasks, but do it by iterating over the cgroup tree.
>
> > Also, adding new tricks to the oom_evaluate_task() will make the code
> > even more hairy. Some of the existing tricks are useless for memcg selection.
>
> Not sure what you mean but oom_evaluate_task has been usable for both
> global and memcg oom paths so far. I do not see any reason why this
> shouldn't hold for a different oom killing strategy.
Yes, but in both cases we've evaluated tasks, not cgroups.
>
> > > > Also, please note, that even without the kill-all flag the decision is made
> > > > on per-cgroup level (except tasks in the root cgroup).
> > >
> > > Yeah and I am not sure this is a reasonable behavior. Why should we
> > > consider memcgs which are not kill-all as a single entity?
> >
> > I think, it's reasonable to choose a cgroup/container to blow off based on
> > the cgroup oom_priority/size (including hierarchical settings), and then
> > kill one biggest or all tasks depending on cgroup settings.
>
> But that doesn't mean you have to treat even !kill-all memcgs like a
> single entity. In fact we should compare killable entities which is
> either a task or the whole memcg if configured that way.
I believe it's absolutely valid user's intention to prioritize some
cgroups over other, even if only one task should be killed in case of OOM.
Thanks!
Roman
next prev parent reply other threads:[~2017-08-03 12:49 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20170726132718.14806-1-guro@fb.com>
2017-07-26 13:27 ` [v4 1/4] mm, oom: refactor the TIF_MEMDIE usage Roman Gushchin
2017-07-26 13:56 ` Michal Hocko
2017-07-26 14:06 ` Roman Gushchin
2017-07-26 14:24 ` Michal Hocko
2017-07-26 14:44 ` Michal Hocko
2017-07-26 14:50 ` Roman Gushchin
2017-07-26 13:27 ` [v4 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-07-27 21:41 ` kbuild test robot
2017-08-01 14:54 ` Michal Hocko
2017-08-01 15:25 ` Roman Gushchin
2017-08-01 17:03 ` Michal Hocko
2017-08-01 18:13 ` Roman Gushchin
2017-08-02 7:29 ` Michal Hocko
2017-08-03 12:47 ` Roman Gushchin [this message]
2017-08-03 13:01 ` Michal Hocko
2017-08-08 23:06 ` David Rientjes
2017-08-14 12:03 ` Roman Gushchin
2017-07-26 13:27 ` [v4 3/4] mm, oom: introduce oom_priority for memory cgroups Roman Gushchin
2017-08-08 23:14 ` David Rientjes
2017-08-14 12:39 ` Roman Gushchin
2017-07-26 13:27 ` [v4 4/4] mm, oom, docs: describe the cgroup-aware OOM killer Roman Gushchin
2017-08-08 23:24 ` David Rientjes
2017-08-14 12:28 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170803124751.GA24563@castle.dhcp.TheFacebook.com \
--to=guro@fb.com \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=rientjes@google.com \
--cc=tj@kernel.org \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox