From: Johannes Weiner <hannes@cmpxchg.org>
To: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
cgroups@vger.kernel.org
Subject: Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves
Date: Wed, 27 Nov 2013 18:19:31 -0500 [thread overview]
Message-ID: <20131127231931.GG3556@cmpxchg.org> (raw)
In-Reply-To: <alpine.DEB.2.02.1311271343250.9222@chino.kir.corp.google.com>
On Wed, Nov 27, 2013 at 01:51:20PM -0800, David Rientjes wrote:
> On Wed, 27 Nov 2013, Johannes Weiner wrote:
>
> > > > But more importantly, OOM handling is just inherently racy. A task
> > > > might receive the kill signal a split second *after* userspace was
> > > > notified. Or a task may exit voluntarily a split second after a
> > > > victim was chosen and killed.
> > > >
> > >
> > > That's not true even today without the userspace oom handling proposal
> > > currently being discussed if you have a memcg oom handler attached to a
> > > parent memcg with access to more memory than an oom child memcg. The oom
> > > handler can disable the child memcg's oom killer with memory.oom_control
> > > and implement its own policy to deal with any notification of oom.
> >
> > I was never implying the kernel handler. All the races exist with
> > userspace handling as well.
> >
>
> A process may indeed exit immediately after a different process was oom
> killed. A process may also free memory immediately after a process was
> oom killed.
>
> > > This patch is required to ensure that in such a scenario that the oom
> > > handler sitting in the parent memcg only wakes up when it's required to
> > > intervene.
> >
> > A task could receive an unrelated kill between the OOM notification
> > and going to sleep to wait for userspace OOM handling. Or another
> > task could exit voluntarily between the notification and waitqueue
> > entry, which would again be short-cut by the oom_recover of the exit
> > uncharges.
> >
> > oom: other tasks:
> > check signal/exiting
> > could exit or get killed here
> > mem_cgroup_oom_trylock()
> > could exit or get killed here
> > mem_cgroup_oom_notify()
> > could exit or get killed here
> > if (userspace_handler)
> > sleep() could exit or get killed here
> > else
> > oom_kill()
> > could exit or get killed here
> >
> > It does not matter where your signal/exiting check is, OOM
> > notification can never be race free because OOM is just an arbitrary
> > line we draw. We have no idea what all the tasks are up to and how
> > close they are to releasing memory. Even if we freeze the whole group
> > to handle tasks, it does not change the fact that the userspace OOM
> > handler might kill one task and after the unfreeze another task
> > immediately exits voluntarily or got a kill signal a split second
> > after it was frozen.
> >
> > You can't fix this. We just have to draw the line somewhere and
> > accept that in rare situations the OOM kill was unnecessary. So
> > again, I don't see this patch is doing anything but blur the current
> > line and make notification less predictable. And, as someone else in
> > this thread already said, it's a uservisible change in behavior and
> > would break known tuning usecases.
> >
>
> The patch is drawing the line at "the kernel can no longer do anything to
> free memory", and that's the line where userspace should be notified or a
> process killed by the kernel.
>
> Giving current access to memory reserves in the oom killer is an
> optimization so that all reclaim is exhausted prior to declaring
> that they are necessary, the kernel still has the ability to allow
> that process to exit and free memory.
"they" are necessary?
> This is the same as the oom notifiers within the kernel that free
> memory from s390 and powerpc archs: the kernel still has the ability
> to free memory.
They're not the same at all. One is the kernel freeing memory, the
other is a random coincidence.
It's such an unlikely condition that you are not really helping the
notification to be less racy wrt concurrent memory freeing, which I
tried to explain still exists big time. But it's enough to screw up
somebody's tuning effort by not reporting OOM, even though 60 reclaim
cycles have not produced a single page, just because the last
allocation happened to be in a dying task in that run.
> If you wish to be notified that you've simply reached the memcg
> limit, for whatever reason, you can monitor memory.failcnt or
> register a memory threshold.
Given a machine and a workload, I would like the OOM threshold to be
as predictable and reproducible as possible. We can count on reclaim,
we can't count on the final straw coming from a dying task.
next prev parent reply other threads:[~2013-11-27 23:19 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-31 1:39 [patch] mm, memcg: add memory.oom_control notification for system oom David Rientjes
2013-10-31 5:49 ` Johannes Weiner
2013-11-13 22:19 ` David Rientjes
2013-11-13 23:34 ` Johannes Weiner
2013-11-14 0:56 ` David Rientjes
2013-11-14 3:25 ` Johannes Weiner
2013-11-14 22:57 ` David Rientjes
2013-11-14 23:26 ` [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves David Rientjes
2013-11-14 23:26 ` [patch 2/2] mm, memcg: add memory.oom_control notification for system oom David Rientjes
2013-11-18 18:52 ` Michal Hocko
2013-11-19 1:25 ` David Rientjes
2013-11-19 12:41 ` Michal Hocko
2013-11-18 12:52 ` [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves Michal Hocko
2013-11-18 12:55 ` Michal Hocko
2013-11-19 1:19 ` David Rientjes
2013-11-18 15:41 ` Johannes Weiner
2013-11-18 16:51 ` Michal Hocko
2013-11-19 1:22 ` David Rientjes
2013-11-22 16:51 ` Johannes Weiner
2013-11-27 0:53 ` David Rientjes
2013-11-27 16:34 ` Johannes Weiner
2013-11-27 21:51 ` David Rientjes
2013-11-27 23:19 ` Johannes Weiner [this message]
2013-11-28 0:22 ` David Rientjes
2013-11-28 2:28 ` Johannes Weiner
2013-11-28 2:52 ` David Rientjes
2013-11-28 3:16 ` Johannes Weiner
2013-12-02 20:02 ` Michal Hocko
2013-12-02 21:25 ` Johannes Weiner
2013-12-03 12:04 ` Michal Hocko
2013-12-03 20:17 ` Johannes Weiner
2013-12-03 21:00 ` Michal Hocko
2013-12-03 21:23 ` Johannes Weiner
2013-12-03 23:50 ` David Rientjes
2013-12-04 3:34 ` Johannes Weiner
2013-12-04 11:13 ` Michal Hocko
2013-12-05 0:23 ` David Rientjes
2013-12-09 12:48 ` Michal Hocko
2013-12-09 21:46 ` David Rientjes
2013-12-09 22:51 ` Johannes Weiner
2013-12-09 23:05 ` Johannes Weiner
2014-01-10 0:34 ` David Rientjes
2013-12-10 10:38 ` Michal Hocko
2013-12-11 1:03 ` David Rientjes
2013-12-11 9:55 ` Michal Hocko
2013-12-11 22:40 ` David Rientjes
2013-12-12 10:31 ` Michal Hocko
2013-12-12 10:50 ` Michal Hocko
2013-12-12 12:11 ` Michal Hocko
2013-12-12 12:37 ` Michal Hocko
2013-12-13 23:55 ` David Rientjes
2013-12-17 16:23 ` Michal Hocko
2013-12-17 20:50 ` David Rientjes
2013-12-18 20:04 ` Michal Hocko
2013-12-19 6:09 ` David Rientjes
2013-12-19 14:41 ` Michal Hocko
2014-01-08 0:25 ` Andrew Morton
2014-01-08 10:33 ` Michal Hocko
2014-01-09 14:30 ` [PATCH] memcg: Do not hang on OOM when killed by userspace OOM " Michal Hocko
2014-01-09 21:40 ` David Rientjes
2014-01-10 8:23 ` Michal Hocko
2014-01-10 21:33 ` David Rientjes
2014-01-15 14:26 ` Michal Hocko
2014-01-15 21:19 ` David Rientjes
2014-01-16 10:12 ` Michal Hocko
2014-01-21 6:13 ` David Rientjes
2014-01-21 13:21 ` Michal Hocko
2014-01-09 21:34 ` [patch 1/2] mm, memcg: avoid oom notification when current needs " David Rientjes
2014-01-09 22:47 ` Andrew Morton
2014-01-10 0:01 ` David Rientjes
2014-01-10 0:12 ` Andrew Morton
2014-01-10 0:23 ` David Rientjes
2014-01-10 0:35 ` David Rientjes
2014-01-10 22:14 ` Johannes Weiner
2014-01-12 22:10 ` David Rientjes
2014-01-15 14:34 ` Michal Hocko
2014-01-15 21:23 ` David Rientjes
2014-01-16 9:32 ` Michal Hocko
2014-01-21 5:58 ` David Rientjes
2014-01-21 6:04 ` Greg Kroah-Hartmann
2014-01-21 6:08 ` David Rientjes
2014-01-10 8:30 ` Michal Hocko
2014-01-10 21:38 ` David Rientjes
2014-01-10 22:34 ` Johannes Weiner
2014-01-12 22:14 ` David Rientjes
2013-11-18 15:54 ` [patch] mm, memcg: add memory.oom_control notification for system oom Johannes Weiner
2013-11-18 23:15 ` One Thousand Gnomes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131127231931.GG3556@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox