From: Michal Hocko <mhocko@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Rientjes <rientjes@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org
Subject: Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves
Date: Mon, 2 Dec 2013 21:02:21 +0100 [thread overview]
Message-ID: <20131202200221.GC5524@dhcp22.suse.cz> (raw)
In-Reply-To: <20131127163435.GA3556@cmpxchg.org>
On Wed 27-11-13 11:34:36, Johannes Weiner wrote:
> On Tue, Nov 26, 2013 at 04:53:47PM -0800, David Rientjes wrote:
> > On Fri, 22 Nov 2013, Johannes Weiner wrote:
> >
> > > But userspace in all likeliness DOES need to take action.
> > >
> > > Reclaim is a really long process. If 5 times doing 12 priority cycles
> > > and scanning thousands of pages is not enough to reclaim a single
> > > page, what does that say about the health of the memcg?
> > >
> > > But more importantly, OOM handling is just inherently racy. A task
> > > might receive the kill signal a split second *after* userspace was
> > > notified. Or a task may exit voluntarily a split second after a
> > > victim was chosen and killed.
> > >
> >
> > That's not true even today without the userspace oom handling proposal
> > currently being discussed if you have a memcg oom handler attached to a
> > parent memcg with access to more memory than an oom child memcg. The oom
> > handler can disable the child memcg's oom killer with memory.oom_control
> > and implement its own policy to deal with any notification of oom.
>
> I was never implying the kernel handler. All the races exist with
> userspace handling as well.
>
> > This patch is required to ensure that in such a scenario that the oom
> > handler sitting in the parent memcg only wakes up when it's required to
> > intervene.
>
> A task could receive an unrelated kill between the OOM notification
> and going to sleep to wait for userspace OOM handling. Or another
> task could exit voluntarily between the notification and waitqueue
> entry, which would again be short-cut by the oom_recover of the exit
> uncharges.
>
> oom: other tasks:
> check signal/exiting
> could exit or get killed here
> mem_cgroup_oom_trylock()
> could exit or get killed here
> mem_cgroup_oom_notify()
> could exit or get killed here
> if (userspace_handler)
> sleep() could exit or get killed here
> else
> oom_kill()
> could exit or get killed here
>
> It does not matter where your signal/exiting check is, OOM
> notification can never be race free because OOM is just an arbitrary
> line we draw. We have no idea what all the tasks are up to and how
> close they are to releasing memory. Even if we freeze the whole group
> to handle tasks, it does not change the fact that the userspace OOM
> handler might kill one task and after the unfreeze another task
> immediately exits voluntarily or got a kill signal a split second
> after it was frozen.
>
> You can't fix this. We just have to draw the line somewhere and
> accept that in rare situations the OOM kill was unnecessary.
But we are not talking just about races here. What if the OOM is a
result of an OOM action itself. E.g. a killed task faults a memory in
while exiting and it hasn't freed its memory yet. Should we notify in
such a case? What would an userspace OOM handler do (the in-kernel
implementation has an advantage because it can check the tasks flags)?
> So again, I don't see this patch is doing anything but blur the
> current line and make notification less predictable. And, as someone
> else in this thread already said, it's a uservisible change in
> behavior and would break known tuning usecases.
I would like to understand how would such a tuning usecase work and how
it would break with this change.
Consider the above example. You would get 2 notification for the very
same OOM condition.
On the other hand if the encountered exiting task was just a race then
we have two options basically. Either there are more tasks racing (and
not all of them are exiting) or there is only one (all are exiting).
We will not loose any notification in the first case because the flags
are checked before mem_cgroup_oom_trylock and so one of tasks would lock
and notify.
The second case is more interesting. Userspace won't get notification
but we also know that no action is required as the OOM will be resolved
by itself. And now we should consider whether notification would do more
good than harm. The tuning usecase would loose one event. Would such a
rare situation skew the statistics so much? On the other hand a real OOM
killer would do something which means something will be killed. I find
the later much worse.
So all in all. I do agree with you that this path will never be race
free and without pointless OOM actions. I also agree that drawing the
line is hard. But I am more inclined to prevent from notification when
we already know that _no action_ is required because IMHO the vast
majority of oom listeners are there to _do_ an action which is mostly
deadly.
Finally if this is too controversial then I would at least like to see
the same check introduced before we go to sleep for oom_kill_disable
case because that is a real bug.
Thanks!
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-12-02 20:02 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-31 1:39 [patch] mm, memcg: add memory.oom_control notification for system oom David Rientjes
2013-10-31 5:49 ` Johannes Weiner
2013-11-13 22:19 ` David Rientjes
2013-11-13 23:34 ` Johannes Weiner
2013-11-14 0:56 ` David Rientjes
2013-11-14 3:25 ` Johannes Weiner
2013-11-14 22:57 ` David Rientjes
2013-11-14 23:26 ` [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves David Rientjes
2013-11-14 23:26 ` [patch 2/2] mm, memcg: add memory.oom_control notification for system oom David Rientjes
2013-11-18 18:52 ` Michal Hocko
2013-11-19 1:25 ` David Rientjes
2013-11-19 12:41 ` Michal Hocko
2013-11-18 12:52 ` [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves Michal Hocko
2013-11-18 12:55 ` Michal Hocko
2013-11-19 1:19 ` David Rientjes
2013-11-18 15:41 ` Johannes Weiner
2013-11-18 16:51 ` Michal Hocko
2013-11-19 1:22 ` David Rientjes
2013-11-22 16:51 ` Johannes Weiner
2013-11-27 0:53 ` David Rientjes
2013-11-27 16:34 ` Johannes Weiner
2013-11-27 21:51 ` David Rientjes
2013-11-27 23:19 ` Johannes Weiner
2013-11-28 0:22 ` David Rientjes
2013-11-28 2:28 ` Johannes Weiner
2013-11-28 2:52 ` David Rientjes
2013-11-28 3:16 ` Johannes Weiner
2013-12-02 20:02 ` Michal Hocko [this message]
2013-12-02 21:25 ` Johannes Weiner
2013-12-03 12:04 ` Michal Hocko
2013-12-03 20:17 ` Johannes Weiner
2013-12-03 21:00 ` Michal Hocko
2013-12-03 21:23 ` Johannes Weiner
2013-12-03 23:50 ` David Rientjes
2013-12-04 3:34 ` Johannes Weiner
2013-12-04 11:13 ` Michal Hocko
2013-12-05 0:23 ` David Rientjes
2013-12-09 12:48 ` Michal Hocko
2013-12-09 21:46 ` David Rientjes
2013-12-09 22:51 ` Johannes Weiner
2013-12-09 23:05 ` Johannes Weiner
2014-01-10 0:34 ` David Rientjes
2013-12-10 10:38 ` Michal Hocko
2013-12-11 1:03 ` David Rientjes
2013-12-11 9:55 ` Michal Hocko
2013-12-11 22:40 ` David Rientjes
2013-12-12 10:31 ` Michal Hocko
2013-12-12 10:50 ` Michal Hocko
2013-12-12 12:11 ` Michal Hocko
2013-12-12 12:37 ` Michal Hocko
2013-12-13 23:55 ` David Rientjes
2013-12-17 16:23 ` Michal Hocko
2013-12-17 20:50 ` David Rientjes
2013-12-18 20:04 ` Michal Hocko
2013-12-19 6:09 ` David Rientjes
2013-12-19 14:41 ` Michal Hocko
2014-01-08 0:25 ` Andrew Morton
2014-01-08 10:33 ` Michal Hocko
2014-01-09 14:30 ` [PATCH] memcg: Do not hang on OOM when killed by userspace OOM " Michal Hocko
2014-01-09 21:40 ` David Rientjes
2014-01-10 8:23 ` Michal Hocko
2014-01-10 21:33 ` David Rientjes
2014-01-15 14:26 ` Michal Hocko
2014-01-15 21:19 ` David Rientjes
2014-01-16 10:12 ` Michal Hocko
2014-01-21 6:13 ` David Rientjes
2014-01-21 13:21 ` Michal Hocko
2014-01-09 21:34 ` [patch 1/2] mm, memcg: avoid oom notification when current needs " David Rientjes
2014-01-09 22:47 ` Andrew Morton
2014-01-10 0:01 ` David Rientjes
2014-01-10 0:12 ` Andrew Morton
2014-01-10 0:23 ` David Rientjes
2014-01-10 0:35 ` David Rientjes
2014-01-10 22:14 ` Johannes Weiner
2014-01-12 22:10 ` David Rientjes
2014-01-15 14:34 ` Michal Hocko
2014-01-15 21:23 ` David Rientjes
2014-01-16 9:32 ` Michal Hocko
2014-01-21 5:58 ` David Rientjes
2014-01-21 6:04 ` Greg Kroah-Hartmann
2014-01-21 6:08 ` David Rientjes
2014-01-10 8:30 ` Michal Hocko
2014-01-10 21:38 ` David Rientjes
2014-01-10 22:34 ` Johannes Weiner
2014-01-12 22:14 ` David Rientjes
2013-11-18 15:54 ` [patch] mm, memcg: add memory.oom_control notification for system oom Johannes Weiner
2013-11-18 23:15 ` One Thousand Gnomes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131202200221.GC5524@dhcp22.suse.cz \
--to=mhocko@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).