public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Mandeep Singh Baines <msb@chromium.org>
To: David Rientjes <rientjes@google.com>
Cc: Mandeep Singh Baines <msb@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Rik van Riel <riel@redhat.com>, Ying Han <yinghan@google.com>,
	linux-kernel@vger.kernel.org, gspencer@chromium.org,
	piman@chromium.org, wad@chromium.org, olofj@chromium.org
Subject: Re: [PATCH] oom: create a resource limit for oom_adj
Date: Thu, 11 Nov 2010 10:30:50 -0800	[thread overview]
Message-ID: <20101111183050.GI7363@google.com> (raw)
In-Reply-To: <alpine.DEB.2.00.1011102328080.7571@chino.kir.corp.google.com>

David Rientjes (rientjes@google.com) wrote:
> On Wed, 10 Nov 2010, Mandeep Singh Baines wrote:
> 
> > For ChromiumOS, we'd like to be able to oom_adj a process up/down
> > as its leaves/enters the foreground. Currently, it is not possible
> > to oom_adj down without CAP_SYS_RESOURCE. This patch creates a new
> > resource limit, RLIMIT_OOMADJ, which is works in a similar fashion
> > to RLIMIT_NICE. This allows a process's oom_adj to be lowered
> > without CAP_SYS_RESOURCE as long as the new value is greater
> > than the resource limit.
> > 
> 
> First of all, oom_adj is deprecated and scheduled for removal in a couple 
> of years (see Documentation/feature-removal-schedule.txt) so any work in 
> this area should be targeting oom_score_adj instead.
> 

Ah. Thanks for the pointer.

> What is the anticipated use case for this?  We know that you want to lower 
> oom_adj without CAP_SYS_RESOURCE, but what's the expected behavior when an 
> app moves from foreground to background?  I assume it's something like 

The focus here is the web browser's tabs. In our case, each is a process. If
OOM is going to kill a process, you'd rather it kill the tab you looked at
hours ago instead of the one you're looking at now. So you'd like to have a
policy where the LRU tab gets killed first. We'd like to use oom_score_adj
as the mechanism to implement an LRU policy like this.

> having an oom_adj of 0 in the background and +15 in the foreground.  If 
> so, does /proc/sys/vm/oom_kill_allocating_task get you most of what you're 
> looking for?
> 

As explained above, oom_kill_allocating_task won't give us what we want.

> I'm wondering if we can avoid yet another resource limit for something 
> like this.
> 
> > Alternative considered:
> > 
> > * a setuid binary
> > * a daemon with CAP_SYS_RESOURCE
> > 
> > Since you don't wan't all processes to be able to reduce their
> > oom_adj, a setuid or daemon implementation would be complex. The
> > alternatives also have much higher overhead.
> > 
> 
> What do you anticipate will be writing to oom_score_adj with this patch, 
> the app itself?
> 

A process in the browser session will do the adusting. We'd rather not give
it CAP_SYS_RESOURCE. It should only be allowed to change oom_score_adj up
and down within the bounds set by the administrator. Analagous to renice()
which we also do using a similar policy.

> > Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> > ---
> >  fs/proc/base.c                 |   12 ++++++++++--
> >  include/asm-generic/resource.h |    5 ++++-
> >  2 files changed, 14 insertions(+), 3 deletions(-)
> > 
> > diff --git a/fs/proc/base.c b/fs/proc/base.c
> > index f3d02ca..4384013 100644
> > --- a/fs/proc/base.c
> > +++ b/fs/proc/base.c
> > @@ -462,6 +462,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
> >  	[RLIMIT_NICE] = {"Max nice priority", NULL},
> >  	[RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
> >  	[RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
> > +	[RLIMIT_OOMADJ] = {"Max OOM adjust", NULL},
> 
> s/Max/Min, right?
> 

This is a MAX value because of how resource limits work. On the other hand,
it is really controlling the minimum oom_adj. So its a toss up for me.
More than happy to change if you prefer Min.

> >  };
> >  
> >  /* Display limits for a process */
> > @@ -1057,8 +1058,15 @@ static ssize_t oom_adjust_write(struct file *file, const char __user *buf,
> >  	}
> >  
> >  	if (oom_adjust < task->signal->oom_adj && !capable(CAP_SYS_RESOURCE)) {
> > -		err = -EACCES;
> > -		goto err_sighand;
> > +		/* convert oom_adj [15,-17] to rlimit style value [1,33] */
> > +		long oom_rlim = OOM_ADJUST_MAX + 1 - oom_adjust;
> > +
> 
> Ouch, that's a rather unfortunate mapping.
> 

Unfortunate but unavoidable. The resource limit code checks to see if the
new limit is greater than the limit. This code was based on the can_nice()
code in sched.c.

> > +		if (oom_rlim > task->signal->rlim[RLIMIT_OOMADJ].rlim_cur) {
> > +			unlock_task_sighand(task, &flags);
> > +			put_task_struct(task);
> > +			err = -EACCES;
> > +			goto err_sighand;
> 
> err_sighand has duplicate unlock_task_sighand() and put_task_struct(); 
> since you're missing the task_unlock(task) here, just using goto 
> err_sighand would suffice.
> 

D'oh. Forward port error. I should be more careful. Thanks for catching:)

> > +		}
> >  	}
> >  
> >  	if (oom_adjust != task->signal->oom_adj) {

Thank you for reviewing this patch.

Should I send an updated oom_score_adj patch?

Regards,
Mandeep

  reply	other threads:[~2010-11-11 18:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-11  4:35 [PATCH] oom: create a resource limit for oom_adj Mandeep Singh Baines
2010-11-11  7:35 ` David Rientjes
2010-11-11 18:30   ` Mandeep Singh Baines [this message]
2010-11-11 20:57     ` David Rientjes
2010-11-11 22:25       ` Mandeep Singh Baines
2010-11-11 23:19         ` David Rientjes
2010-11-11 23:56           ` Mandeep Singh Baines
2010-11-13  0:46             ` [PATCH] oom: allow a non-CAP_SYS_RESOURCE proces to oom_score_adj down Mandeep Singh Baines
2010-11-14  1:37               ` David Rientjes
2010-11-15 22:01                 ` [PATCH v2] " Mandeep Singh Baines
2010-11-15 22:06                   ` David Rientjes
2010-11-16  0:03                     ` [PATCH v3] " Mandeep Singh Baines
2010-11-14  5:07 ` [PATCH] oom: create a resource limit for oom_adj KOSAKI Motohiro
  -- strict thread matches above, loose matches on Subject: below --
2010-11-11  5:19 Figo.zhang
     [not found] <fNx73-1cI-1@gated-at.bofh.it>
     [not found] ` <fNzVf-5UY-3@gated-at.bofh.it>
     [not found]   ` <fNKdY-6FU-11@gated-at.bofh.it>
     [not found]     ` <fNMps-1S1-21@gated-at.bofh.it>
2010-11-11 23:15       ` Bodo Eggert
2010-11-11 23:21         ` David Rientjes
2010-11-14  5:07         ` KOSAKI Motohiro
2010-11-14 21:42           ` David Rientjes
2010-11-23  7:16             ` KOSAKI Motohiro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101111183050.GI7363@google.com \
    --to=msb@chromium.org \
    --cc=akpm@linux-foundation.org \
    --cc=gspencer@chromium.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=olofj@chromium.org \
    --cc=piman@chromium.org \
    --cc=riel@redhat.com \
    --cc=rientjes@google.com \
    --cc=wad@chromium.org \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox