public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Nikanth Karthikesan <knikanth@suse.de>
To: David Rientjes <rientjes@google.com>
Cc: "Evgeniy Polyakov" <zbr@ioremap.net>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Alan Cox" <alan@lxorguk.ukuu.org.uk>,
	linux-kernel@vger.kernel.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Chris Snook" <csnook@redhat.com>,
	"Arve Hjønnevåg" <arve@android.com>,
	"Paul Menage" <menage@google.com>,
	containers@lists.linux-foundation.org
Subject: Re: [RFC] [PATCH] Cgroup based OOM killer controller
Date: Fri, 23 Jan 2009 15:15:36 +0530	[thread overview]
Message-ID: <200901231515.37442.knikanth@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.00.0901220218120.2851@chino.kir.corp.google.com>

On Thursday 22 January 2009 15:57:19 David Rientjes wrote:
> On Thu, 22 Jan 2009, Evgeniy Polyakov wrote:
> > > In an exclusive cpuset, a task's memory is restricted to a set of mems
> > > that the administrator has designated.  If it is oom, the kernel must
> > > free memory on those nodes or the next allocation will again trigger an
> > > oom (leading to a needlessly killed task that was in a disjoint
> > > cpuset).
> > >
> > > Really.
> >
> > The whole point of oom-killer is to kill the most appropriate task to
> > free the memory. And while task is selected system-wide and some
> > tunables are added to tweak the behaviour local to some subsystems, this
> > cpuset feature is hardcoded into the selection algorithm.
>
> Of course, because the oom killer must be aware that tasks in disjoint
> cpusets are more likely than not to result in no memory freeing for
> current's subsequent allocations.
>

Yes, the problem is cpuset does not track the tasks which has allocated from 
this node - who has either moved or changed it set of allowable nodes. And 
because of that it does not limit oom killing to the tasks with in those tasks 
and could kill some innocent tasks at times.

As it is unable to take deterministic decision as memcg does, it plays with 
badness value and only suggests but does not restricts within those tasks that 
has to be killed.

This bug is present even without this patch.

> > And when some tunable starts doing own calculation, behaviour of this
> > hardcoded feature changes.
>
> Yes, it is possible to elevate oom_adj scores to override the cpuset
> preference.  That's actually intended since it is now possible for the
> administrator to specify that, against the belief of the kernel, that
> killing a task will free memory in these cpuset-constrained ooms.  That's
> probably because it has either been moved to a different cpuset or its set
> of allowable nodes is dynamic.
>

This patch adds one more easier way for the administrator to over-ride.

> > > Then the scope of this new cgroup is restricted to not being used with
> > > cpusets that could oom.
> >
> > These are perpendicular tasks - cpusets limit one area of the oom
> > handling, cgroup order - another. Some people needs cpusets, others want
> > cgroups. cpusets are not something exceptional so that only they have to
> > be taken into account when doing system-wide operation like OOM
> > condition handling.
>
> A cpuset is a cgroup.  If I am using cpusets, this patch fails to
> adequately allow me to describe my oom preferences for both
> cpuset-constrained ooms and global unconstrained ooms, which is a major
> drawback.
>

The current cpuset oom handling has to be fixed and the exact problem of 
killing innocent processes exists even without the oom-controller.

Thanks
Nikanth

  parent reply	other threads:[~2009-01-23  9:48 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-01-21 11:08 [RFC] [PATCH] Cgroup based OOM killer controller Nikanth Karthikesan
2009-01-21 13:17 ` Evgeniy Polyakov
2009-01-21 15:24   ` Nikanth Karthikesan
2009-01-21 20:49     ` David Rientjes
2009-01-22  2:53       ` KAMEZAWA Hiroyuki
2009-01-22  5:12         ` Nikanth Karthikesan
2009-01-22  5:12       ` Nikanth Karthikesan
2009-01-22  8:43         ` David Rientjes
2009-01-22  9:23           ` Nikanth Karthikesan
2009-01-22  9:39             ` David Rientjes
2009-01-22 10:10               ` Nikanth Karthikesan
2009-01-22 10:18                 ` David Rientjes
2009-01-22  9:50           ` Evgeniy Polyakov
2009-01-22 10:00             ` David Rientjes
2009-01-22 10:14               ` Evgeniy Polyakov
2009-01-22 10:27                 ` David Rientjes
2009-01-22 13:21                   ` Evgeniy Polyakov
2009-01-22 20:28                     ` David Rientjes
2009-01-22 21:06                       ` Evgeniy Polyakov
2009-01-22 21:35                         ` David Rientjes
2009-01-22 22:04                           ` Evgeniy Polyakov
2009-01-22 22:28                             ` David Rientjes
2009-01-22 22:53                               ` Evgeniy Polyakov
2009-01-22 23:25                                 ` Evgeniy Polyakov
2009-01-27 23:55                     ` Paul Menage
2009-01-23  9:45                   ` Nikanth Karthikesan [this message]
2009-01-23 10:33                     ` David Rientjes
2009-01-23 14:56                       ` Nikanth Karthikesan
2009-01-23 20:44                         ` David Rientjes
2009-01-27 10:20                           ` Nikanth Karthikesan
2009-01-27 10:53                             ` David Rientjes
2009-01-27 11:08                               ` Nikanth Karthikesan
2009-01-27 11:21                                 ` David Rientjes
2009-01-27 11:37                                   ` Nikanth Karthikesan
2009-01-27 20:29                                     ` David Rientjes
2009-01-28  1:00                         ` Paul Menage
2009-01-29 15:48                           ` Nikanth Karthikesan
2009-01-22  3:28 ` KAMEZAWA Hiroyuki
2009-01-22  5:13   ` Nikanth Karthikesan
2009-01-22  5:27     ` KAMEZAWA Hiroyuki
2009-01-22  6:11       ` Nikanth Karthikesan
2009-01-22  5:39     ` Arve Hjønnevåg
2009-01-22  6:12       ` Nikanth Karthikesan
2009-01-22  6:29         ` Arve Hjønnevåg
2009-01-22  6:42           ` Nikanth Karthikesan
2009-01-26 19:54 ` Balbir Singh
2009-01-26 19:56   ` Alan Cox
2009-01-27  7:02     ` KOSAKI Motohiro
2009-01-27  7:26       ` Balbir Singh
2009-01-27  7:39       ` David Rientjes
2009-01-27  7:44         ` KOSAKI Motohiro
2009-01-27  7:51           ` David Rientjes
2009-01-27  9:31             ` Evgeniy Polyakov
2009-01-27  9:37               ` David Rientjes
2009-01-27 13:40                 ` Evgeniy Polyakov
2009-01-27 20:37                   ` David Rientjes
2009-01-27 21:51                     ` Evgeniy Polyakov
2009-01-27 10:40               ` KOSAKI Motohiro
2009-01-27 13:45                 ` Evgeniy Polyakov
2009-01-27 15:40                   ` Balbir Singh
2009-01-27 21:54                     ` Evgeniy Polyakov
2009-01-27 20:41                   ` David Rientjes
2009-01-27 21:55                     ` Evgeniy Polyakov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200901231515.37442.knikanth@suse.de \
    --to=knikanth@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=arve@android.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=csnook@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=menage@google.com \
    --cc=rientjes@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox