From: Tejun Heo <tj@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <guro@fb.com>,
linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Andrew Morton <akpm@linux-foundation.org>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Fri, 22 Sep 2017 14:05:19 -0700 [thread overview]
Message-ID: <20170922210519.GH828415@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1709221316290.68140@chino.kir.corp.google.com>
Hello,
On Fri, Sep 22, 2017 at 01:39:55PM -0700, David Rientjes wrote:
> Current heuristic based on processes is coupled with per-process
> /proc/pid/oom_score_adj. The proposed
> heuristic has no ability to be influenced by userspace, and it needs one.
> The proposed heuristic based on memory cgroups coupled with Roman's
> per-memcg memory.oom_priority is appropriate and needed. It is not
So, this is where we disagree. I don't think it's a good design.
> "sophisticated intelligence," it merely allows userspace to protect vital
> memory cgroups when opting into the new features (cgroups compared based
> on size and memory.oom_group) that we very much want.
which can't achieve that goal very well for wide variety of users.
> > We even change the whole scheduling behaviors and try really hard to
> > not get locked into specific implementation details which exclude
> > future improvements. Guaranteeing OOM killing selection would be
> > crazy. Why would we prevent ourselves from doing things better in the
> > future? We aren't talking about the semantics of read(2) here. This
> > is a kernel emergency mechanism to avoid deadlock at the last moment.
>
> We merely want to prefer other memory cgroups are oom killed on system oom
> conditions before important ones, regardless if the important one is using
> more memory than the others because of the new heuristic this patchset
> introduces. This is exactly the same as /proc/pid/oom_score_adj for the
> current heuristic.
You were arguing that we should lock into a specific heuristics and
guarantee the same behavior. We shouldn't.
When we introduce a user visible interface, we're making a lot of
promises. My point is that we need to be really careful when making
those promises.
> If you have this low priority maintenance job charging memory to the high
> priority hierarchy, you're already misconfigured unless you adjust
> /proc/pid/oom_score_adj because it will oom kill any larger process than
> itself in today's kernels anyway.
>
> A better configuration would be attach this hypothetical low priority
> maintenance job to its own sibling cgroup with its own memory limit to
> avoid exactly that problem: it going berserk and charging too much memory
> to the high priority container that results in one of its processes
> getting oom killed.
And how do you guarantee that across delegation boundaries? The
points you raise on why the priority should be applied level-by-level
are exactly the same points why this doesn't really work. OOM killing
priority isn't something which can be distributed across cgroup
hierarchy level-by-level. The resulting decision tree doesn't make
any sense.
I'm not against adding something which works but strict level-by-level
comparison isn't the solution.
Thanks.
--
tejun
WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <tj@kernel.org>
To: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <guro@fb.com>,
linux-mm@kvack.org, Michal Hocko <mhocko@kernel.org>,
Vladimir Davydov <vdavydov.dev@gmail.com>,
Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
Andrew Morton <akpm@linux-foundation.org>,
kernel-team@fb.com, cgroups@vger.kernel.org,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Date: Fri, 22 Sep 2017 14:05:19 -0700 [thread overview]
Message-ID: <20170922210519.GH828415@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <alpine.DEB.2.10.1709221316290.68140@chino.kir.corp.google.com>
Hello,
On Fri, Sep 22, 2017 at 01:39:55PM -0700, David Rientjes wrote:
> Current heuristic based on processes is coupled with per-process
> /proc/pid/oom_score_adj. The proposed
> heuristic has no ability to be influenced by userspace, and it needs one.
> The proposed heuristic based on memory cgroups coupled with Roman's
> per-memcg memory.oom_priority is appropriate and needed. It is not
So, this is where we disagree. I don't think it's a good design.
> "sophisticated intelligence," it merely allows userspace to protect vital
> memory cgroups when opting into the new features (cgroups compared based
> on size and memory.oom_group) that we very much want.
which can't achieve that goal very well for wide variety of users.
> > We even change the whole scheduling behaviors and try really hard to
> > not get locked into specific implementation details which exclude
> > future improvements. Guaranteeing OOM killing selection would be
> > crazy. Why would we prevent ourselves from doing things better in the
> > future? We aren't talking about the semantics of read(2) here. This
> > is a kernel emergency mechanism to avoid deadlock at the last moment.
>
> We merely want to prefer other memory cgroups are oom killed on system oom
> conditions before important ones, regardless if the important one is using
> more memory than the others because of the new heuristic this patchset
> introduces. This is exactly the same as /proc/pid/oom_score_adj for the
> current heuristic.
You were arguing that we should lock into a specific heuristics and
guarantee the same behavior. We shouldn't.
When we introduce a user visible interface, we're making a lot of
promises. My point is that we need to be really careful when making
those promises.
> If you have this low priority maintenance job charging memory to the high
> priority hierarchy, you're already misconfigured unless you adjust
> /proc/pid/oom_score_adj because it will oom kill any larger process than
> itself in today's kernels anyway.
>
> A better configuration would be attach this hypothetical low priority
> maintenance job to its own sibling cgroup with its own memory limit to
> avoid exactly that problem: it going berserk and charging too much memory
> to the high priority container that results in one of its processes
> getting oom killed.
And how do you guarantee that across delegation boundaries? The
points you raise on why the priority should be applied level-by-level
are exactly the same points why this doesn't really work. OOM killing
priority isn't something which can be distributed across cgroup
hierarchy level-by-level. The resulting decision tree doesn't make
any sense.
I'm not against adding something which works but strict level-by-level
comparison isn't the solution.
Thanks.
--
tejun
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-09-22 21:05 UTC|newest]
Thread overview: 168+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-11 13:17 [v8 0/4] cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-11 13:17 ` [v8 1/4] mm, oom: refactor the oom_kill_process() function Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-11 20:51 ` David Rientjes
2017-09-11 20:51 ` David Rientjes
2017-09-14 13:42 ` Michal Hocko
2017-09-14 13:42 ` Michal Hocko
2017-09-11 13:17 ` [v8 2/4] mm, oom: cgroup-aware OOM killer Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-13 20:46 ` David Rientjes
2017-09-13 20:46 ` David Rientjes
[not found] ` <alpine.DEB.2.10.1709131346200.146292-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2017-09-13 21:59 ` Roman Gushchin
2017-09-13 21:59 ` Roman Gushchin
2017-09-13 21:59 ` Roman Gushchin
2017-09-11 13:17 ` [v8 3/4] mm, oom: add cgroup v2 mount option for " Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-11 20:48 ` David Rientjes
2017-09-11 20:48 ` David Rientjes
2017-09-12 20:01 ` Roman Gushchin
2017-09-12 20:01 ` Roman Gushchin
2017-09-12 20:23 ` David Rientjes
2017-09-12 20:23 ` David Rientjes
2017-09-13 12:23 ` Michal Hocko
2017-09-13 12:23 ` Michal Hocko
2017-09-11 13:17 ` [v8 4/4] mm, oom, docs: describe the " Roman Gushchin
2017-09-11 13:17 ` Roman Gushchin
2017-09-11 20:44 ` [v8 0/4] " David Rientjes
2017-09-11 20:44 ` David Rientjes
2017-09-13 12:29 ` Michal Hocko
2017-09-13 12:29 ` Michal Hocko
2017-09-13 20:46 ` David Rientjes
2017-09-13 20:46 ` David Rientjes
2017-09-14 13:34 ` Michal Hocko
2017-09-14 13:34 ` Michal Hocko
2017-09-14 20:07 ` David Rientjes
2017-09-14 20:07 ` David Rientjes
2017-09-13 21:56 ` Roman Gushchin
2017-09-13 21:56 ` Roman Gushchin
2017-09-14 13:40 ` Michal Hocko
2017-09-14 13:40 ` Michal Hocko
2017-09-14 16:05 ` Roman Gushchin
2017-09-14 16:05 ` Roman Gushchin
2017-09-15 10:58 ` Michal Hocko
2017-09-15 10:58 ` Michal Hocko
2017-09-15 15:23 ` Roman Gushchin
2017-09-15 15:23 ` Roman Gushchin
2017-09-15 19:55 ` David Rientjes
2017-09-15 19:55 ` David Rientjes
2017-09-15 21:08 ` Roman Gushchin
2017-09-15 21:08 ` Roman Gushchin
2017-09-18 6:20 ` Michal Hocko
2017-09-18 6:20 ` Michal Hocko
2017-09-18 15:02 ` Roman Gushchin
2017-09-18 15:02 ` Roman Gushchin
2017-09-18 15:02 ` Roman Gushchin
2017-09-21 8:30 ` David Rientjes
2017-09-21 8:30 ` David Rientjes
2017-09-19 20:54 ` David Rientjes
2017-09-19 20:54 ` David Rientjes
2017-09-20 22:24 ` Roman Gushchin
2017-09-20 22:24 ` Roman Gushchin
2017-09-21 8:27 ` David Rientjes
2017-09-21 8:27 ` David Rientjes
2017-09-18 6:16 ` Michal Hocko
2017-09-18 6:16 ` Michal Hocko
2017-09-19 20:51 ` David Rientjes
2017-09-19 20:51 ` David Rientjes
2017-09-18 6:14 ` Michal Hocko
2017-09-18 6:14 ` Michal Hocko
[not found] ` <20170918061405.pcrf5vauvul4c2nr-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2017-09-20 21:53 ` Roman Gushchin
2017-09-20 21:53 ` Roman Gushchin
2017-09-20 21:53 ` Roman Gushchin
2017-09-25 12:24 ` Michal Hocko
2017-09-25 12:24 ` Michal Hocko
2017-09-25 17:00 ` Johannes Weiner
2017-09-25 17:00 ` Johannes Weiner
2017-09-25 18:15 ` Roman Gushchin
2017-09-25 18:15 ` Roman Gushchin
2017-09-25 20:25 ` Michal Hocko
2017-09-25 20:25 ` Michal Hocko
2017-09-25 20:25 ` Michal Hocko
2017-09-26 10:59 ` Roman Gushchin
2017-09-26 10:59 ` Roman Gushchin
2017-09-26 11:21 ` Michal Hocko
2017-09-26 11:21 ` Michal Hocko
2017-09-26 12:13 ` Roman Gushchin
2017-09-26 12:13 ` Roman Gushchin
2017-09-26 12:13 ` Roman Gushchin
2017-09-26 13:30 ` Michal Hocko
2017-09-26 13:30 ` Michal Hocko
2017-09-26 17:26 ` Johannes Weiner
2017-09-26 17:26 ` Johannes Weiner
2017-09-27 3:37 ` Tim Hockin
2017-09-27 3:37 ` Tim Hockin
2017-09-27 7:43 ` Michal Hocko
2017-09-27 7:43 ` Michal Hocko
2017-09-27 10:19 ` Roman Gushchin
2017-09-27 10:19 ` Roman Gushchin
2017-09-27 10:19 ` Roman Gushchin
2017-09-27 15:35 ` Tim Hockin
2017-09-27 15:35 ` Tim Hockin
2017-09-27 16:23 ` Roman Gushchin
2017-09-27 16:23 ` Roman Gushchin
2017-09-27 18:11 ` Tim Hockin
2017-09-27 18:11 ` Tim Hockin
2017-10-01 23:29 ` Shakeel Butt
2017-10-01 23:29 ` Shakeel Butt
2017-10-02 11:56 ` Tetsuo Handa
2017-10-02 11:56 ` Tetsuo Handa
2017-10-02 12:24 ` Michal Hocko
2017-10-02 12:24 ` Michal Hocko
2017-10-02 12:47 ` Roman Gushchin
2017-10-02 12:47 ` Roman Gushchin
[not found] ` <20171002124712.GA17638-B3w7+ongkCiLfgCeKHXN1g2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2017-10-02 14:29 ` Michal Hocko
2017-10-02 14:29 ` Michal Hocko
2017-10-02 14:29 ` Michal Hocko
2017-10-02 19:00 ` Shakeel Butt
2017-10-02 19:00 ` Shakeel Butt
2017-10-02 19:28 ` Michal Hocko
2017-10-02 19:28 ` Michal Hocko
2017-10-02 19:45 ` Shakeel Butt
2017-10-02 19:45 ` Shakeel Butt
2017-10-02 19:56 ` Michal Hocko
2017-10-02 19:56 ` Michal Hocko
2017-10-02 20:00 ` Tim Hockin
2017-10-02 20:00 ` Tim Hockin
2017-10-02 20:08 ` Michal Hocko
2017-10-02 20:08 ` Michal Hocko
2017-10-02 20:09 ` Shakeel Butt
2017-10-02 20:20 ` Shakeel Butt
2017-10-02 20:20 ` Shakeel Butt
2017-10-02 20:24 ` Shakeel Butt
2017-10-02 20:24 ` Shakeel Butt
2017-10-02 20:34 ` Johannes Weiner
2017-10-02 20:34 ` Johannes Weiner
2017-10-02 20:55 ` Michal Hocko
2017-10-02 20:55 ` Michal Hocko
2017-09-25 22:21 ` David Rientjes
2017-09-25 22:21 ` David Rientjes
2017-09-26 8:46 ` Michal Hocko
2017-09-26 8:46 ` Michal Hocko
2017-09-26 21:04 ` David Rientjes
2017-09-26 21:04 ` David Rientjes
2017-09-27 7:37 ` Michal Hocko
2017-09-27 7:37 ` Michal Hocko
2017-09-27 9:57 ` Roman Gushchin
2017-09-27 9:57 ` Roman Gushchin
2017-09-21 14:21 ` Johannes Weiner
2017-09-21 14:21 ` Johannes Weiner
[not found] ` <20170921142107.GA20109-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
2017-09-21 21:17 ` David Rientjes
2017-09-21 21:17 ` David Rientjes
2017-09-21 21:17 ` David Rientjes
2017-09-21 21:51 ` Johannes Weiner
2017-09-21 21:51 ` Johannes Weiner
2017-09-22 20:53 ` David Rientjes
2017-09-22 20:53 ` David Rientjes
[not found] ` <alpine.DEB.2.10.1709211357520.60945-X6Q0R45D7oAcqpCFd4KODRPsWskHk0ljAL8bYrjMMd8@public.gmane.org>
2017-09-22 15:44 ` Tejun Heo
2017-09-22 15:44 ` Tejun Heo
2017-09-22 15:44 ` Tejun Heo
[not found] ` <20170922154426.GF828415-4dN5La/x3IkLX0oZNxdnEQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org>
2017-09-22 20:39 ` David Rientjes
2017-09-22 20:39 ` David Rientjes
2017-09-22 20:39 ` David Rientjes
2017-09-22 21:05 ` Tejun Heo [this message]
2017-09-22 21:05 ` Tejun Heo
2017-09-23 8:16 ` David Rientjes
2017-09-23 8:16 ` David Rientjes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170922210519.GH828415@devbig577.frc2.facebook.com \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@fb.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=penguin-kernel@i-love.sakura.ne.jp \
--cc=rientjes@google.com \
--cc=vdavydov.dev@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.