From: Johannes Weiner <hannes@cmpxchg.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
stable@kernel.org, Michal Hocko <mhocko@suse.cz>,
azurit@pobox.sk, mm-commits@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [merged] mm-memcg-handle-non-error-oom-situations-more-gracefully.patch removed from -mm tree
Date: Wed, 27 Nov 2013 22:13:13 -0500 [thread overview]
Message-ID: <20131128031313.GK3556@cmpxchg.org> (raw)
In-Reply-To: <alpine.DEB.2.02.1311271826001.5120@chino.kir.corp.google.com>
On Wed, Nov 27, 2013 at 06:38:31PM -0800, David Rientjes wrote:
> On Wed, 27 Nov 2013, Johannes Weiner wrote:
>
> > > The task that is bypassing the memcg charge to the root memcg may not be
> > > the process that is chosen by the oom killer, and it's possible the amount
> > > of memory freed by killing the victim is less than the amount of memory
> > > bypassed.
> >
> > That's true, though unlikely.
> >
>
> Well, the "goto bypass" allows it and it's trivial to cause by
> manipulating /proc/pid/oom_score_adj values to prefer processes with very
> little rss. It will just continue looping and killing processes as they
> are forked and never cause the memcg to free memory below its limit. At
> least the "goto nomem" allows us to free some memory instead of leaking to
> the root memcg.
Yes, that's the better way of doing it, I'll send the patch. Thanks.
> > > Were you targeting these to 3.13 instead? If so, it would have already
> > > appeared in 3.13-rc1 anyway. Is it still a work in progress?
> >
> > I don't know how to answer this question.
> >
>
> It appears as though this work is being developed in Linus's tree rather
> than -mm, so I'm asking if we should consider backing some of it out for
> 3.14 instead.
The changes fix a deadlock problem. Are they creating problems that
are worse than deadlocks, that would justify their revert?
> > > Should we be checking mem_cgroup_margin() here to ensure
> > > task_in_memcg_oom() is still accurate and we haven't raced by freeing
> > > memory?
> >
> > We would have invoked the OOM killer long before this point prior to
> > my patches. There is a line we draw and from that point on we start
> > killing things. I tried to explain multiple times now that there is
> > no race-free OOM killing and I'm tired of it. Convince me otherwise
> > or stop repeating this non-sense.
> >
>
> In our internal kernel we call mem_cgroup_margin() with the order of the
> charge immediately prior to sending the SIGKILL to see if it's still
> needed even after selecting the victim. It makes the race smaller.
>
> It's obvious that after the SIGKILL is sent, either from the kernel or
> from userspace, that memory might subsequently be freed or another process
> might exit before the process killed could even wake up. There's nothing
> we can do about that since we don't have psychic abilities. I think we
> should try to reduce the chance for unnecessary oom killing as much as
> possible, however.
Since we can't physically draw a perfect line, we should strive for a
reasonable and intuitive line. After that it's rapidly diminishing
returns. Killing something after that much reclaim effort without
success is a completely reasonable and intuitive line to draw. It's
also the line that has been drawn a long time ago and we're not
breaking this because of a micro optmimization.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes@cmpxchg.org>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
stable@kernel.org, Michal Hocko <mhocko@suse.cz>,
azurit@pobox.sk, mm-commits@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [merged] mm-memcg-handle-non-error-oom-situations-more-gracefully.patch removed from -mm tree
Date: Wed, 27 Nov 2013 22:13:13 -0500 [thread overview]
Message-ID: <20131128031313.GK3556@cmpxchg.org> (raw)
In-Reply-To: <alpine.DEB.2.02.1311271826001.5120@chino.kir.corp.google.com>
On Wed, Nov 27, 2013 at 06:38:31PM -0800, David Rientjes wrote:
> On Wed, 27 Nov 2013, Johannes Weiner wrote:
>
> > > The task that is bypassing the memcg charge to the root memcg may not be
> > > the process that is chosen by the oom killer, and it's possible the amount
> > > of memory freed by killing the victim is less than the amount of memory
> > > bypassed.
> >
> > That's true, though unlikely.
> >
>
> Well, the "goto bypass" allows it and it's trivial to cause by
> manipulating /proc/pid/oom_score_adj values to prefer processes with very
> little rss. It will just continue looping and killing processes as they
> are forked and never cause the memcg to free memory below its limit. At
> least the "goto nomem" allows us to free some memory instead of leaking to
> the root memcg.
Yes, that's the better way of doing it, I'll send the patch. Thanks.
> > > Were you targeting these to 3.13 instead? If so, it would have already
> > > appeared in 3.13-rc1 anyway. Is it still a work in progress?
> >
> > I don't know how to answer this question.
> >
>
> It appears as though this work is being developed in Linus's tree rather
> than -mm, so I'm asking if we should consider backing some of it out for
> 3.14 instead.
The changes fix a deadlock problem. Are they creating problems that
are worse than deadlocks, that would justify their revert?
> > > Should we be checking mem_cgroup_margin() here to ensure
> > > task_in_memcg_oom() is still accurate and we haven't raced by freeing
> > > memory?
> >
> > We would have invoked the OOM killer long before this point prior to
> > my patches. There is a line we draw and from that point on we start
> > killing things. I tried to explain multiple times now that there is
> > no race-free OOM killing and I'm tired of it. Convince me otherwise
> > or stop repeating this non-sense.
> >
>
> In our internal kernel we call mem_cgroup_margin() with the order of the
> charge immediately prior to sending the SIGKILL to see if it's still
> needed even after selecting the victim. It makes the race smaller.
>
> It's obvious that after the SIGKILL is sent, either from the kernel or
> from userspace, that memory might subsequently be freed or another process
> might exit before the process killed could even wake up. There's nothing
> we can do about that since we don't have psychic abilities. I think we
> should try to reduce the chance for unnecessary oom killing as much as
> possible, however.
Since we can't physically draw a perfect line, we should strive for a
reasonable and intuitive line. After that it's rapidly diminishing
returns. Killing something after that much reclaim effort without
success is a completely reasonable and intuitive line to draw. It's
also the line that has been drawn a long time ago and we're not
breaking this because of a micro optmimization.
next prev parent reply other threads:[~2013-11-28 3:13 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-17 18:13 [merged] mm-memcg-handle-non-error-oom-situations-more-gracefully.patch removed from -mm tree akpm
2013-11-27 23:08 ` David Rientjes
2013-11-27 23:08 ` David Rientjes
2013-11-27 23:33 ` Johannes Weiner
2013-11-27 23:33 ` Johannes Weiner
2013-11-28 0:56 ` David Rientjes
2013-11-28 0:56 ` David Rientjes
2013-11-28 2:18 ` Johannes Weiner
2013-11-28 2:18 ` Johannes Weiner
2013-11-28 2:38 ` David Rientjes
2013-11-28 2:38 ` David Rientjes
2013-11-28 3:13 ` Johannes Weiner [this message]
2013-11-28 3:13 ` Johannes Weiner
2013-11-28 3:20 ` David Rientjes
2013-11-28 3:20 ` David Rientjes
2013-11-28 3:52 ` Johannes Weiner
2013-11-28 3:52 ` Johannes Weiner
2013-11-30 0:00 ` David Rientjes
2013-11-30 0:00 ` David Rientjes
2013-11-30 0:51 ` Greg KH
2013-11-30 0:51 ` Greg KH
2013-11-30 10:25 ` David Rientjes
2013-11-30 10:25 ` David Rientjes
2013-11-30 3:35 ` Johannes Weiner
2013-11-30 3:35 ` Johannes Weiner
2013-11-30 10:32 ` David Rientjes
2013-11-30 10:32 ` David Rientjes
2013-11-30 15:55 ` Johannes Weiner
2013-11-30 15:55 ` Johannes Weiner
2013-11-30 22:12 ` David Rientjes
2013-11-30 22:12 ` David Rientjes
2013-11-28 10:02 ` Michal Hocko
2013-11-28 10:02 ` Michal Hocko
2013-11-30 0:05 ` David Rientjes
2013-11-30 0:05 ` David Rientjes
2013-12-02 13:12 ` Michal Hocko
2013-12-02 13:12 ` Michal Hocko
2013-12-02 22:51 ` David Rientjes
2013-12-02 22:51 ` David Rientjes
2013-11-28 9:12 ` Michal Hocko
2013-11-28 9:12 ` Michal Hocko
2013-11-30 3:37 ` Johannes Weiner
2013-11-30 3:37 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131128031313.GK3556@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=azurit@pobox.sk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.cz \
--cc=mm-commits@vger.kernel.org \
--cc=rientjes@google.com \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.