* [PATCH] RSS ulimit enforcement for 2.6.8
@ 2004-08-05 17:05 Rik van Riel
2004-08-05 20:34 ` Bill Davidsen
2004-08-05 22:31 ` Andrew Morton
0 siblings, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2004-08-05 17:05 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
The patch below implements RSS ulimit enforcement for 2.6.8-rc3-mm1.
It works in a very simple way: if a process has more resident memory
than its RSS limit allows, we pretend it didn't access any of its
pages, making it easy for the pageout code to evict the pages.
In addition to this, we don't allow a process that exceeds its RSS
limit to have the swapout protection token.
I have tested the patch on my system here and it appears to be working
fine.
Signed-off-by: Rik van Riel <riel@redhat.com>
--- linux-2.6.8-rc3/include/linux/init_task.h.rsslim 2004-08-05 10:58:03.375581943 -0400
+++ linux-2.6.8-rc3/include/linux/init_task.h 2004-08-05 10:58:47.075606510 -0400
@@ -3,6 +3,7 @@
#include <linux/file.h>
#include <linux/pagg.h>
+#include <asm/resource.h>
#define INIT_FILES \
{ \
@@ -43,6 +44,7 @@
.mmlist = LIST_HEAD_INIT(name.mmlist), \
.cpu_vm_mask = CPU_MASK_ALL, \
.default_kioctx = INIT_KIOCTX(name.default_kioctx, name), \
+ .rlimit_rss = RLIM_INFINITY, \
}
#define INIT_SIGNALS(sig) { \
--- linux-2.6.8-rc3/include/linux/sched.h.rsslim 2004-08-05 11:03:43.736528322 -0400
+++ linux-2.6.8-rc3/include/linux/sched.h 2004-08-05 11:04:16.418825667 -0400
@@ -235,7 +235,7 @@ struct mm_struct {
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
- unsigned long rss, total_vm, locked_vm;
+ unsigned long rlimit_rss, rss, total_vm, locked_vm;
unsigned long def_flags;
unsigned long saved_auxv[40]; /* for /proc/PID/auxv */
--- linux-2.6.8-rc3/fs/exec.c.rsslim 2004-08-05 11:07:53.137489937 -0400
+++ linux-2.6.8-rc3/fs/exec.c 2004-08-05 11:09:23.126777053 -0400
@@ -1109,6 +1109,11 @@ int do_execve(char * filename,
retval = init_new_context(current, bprm.mm);
if (retval < 0)
goto out_mm;
+ if (likely(current->mm)) {
+ bprm.mm->rlimit_rss = current->mm->rlimit_rss;
+ } else {
+ bprm.mm->rlimit_rss = init_mm.rlimit_rss;
+ }
bprm.argc = count(argv, bprm.p / sizeof(void *));
if ((retval = bprm.argc) < 0)
--- linux-2.6.8-rc3/kernel/sys.c.rsslim 2004-08-05 11:04:37.023708596 -0400
+++ linux-2.6.8-rc3/kernel/sys.c 2004-08-05 11:06:15.023615619 -0400
@@ -1527,6 +1527,14 @@ asmlinkage long sys_setrlimit(unsigned i
if (retval)
return retval;
+ /* The rlimit is specified in bytes, convert to pages for mm. */
+ if (resource == RLIMIT_RSS && current->mm) {
+ unsigned long pages = RLIM_INFINITY;
+ if (new_rlim.rlim_cur != RLIM_INFINITY)
+ pages = new_rlim.rlim_cur >> PAGE_SHIFT;
+ current->mm->rlimit_rss = pages;
+ }
+
*old_rlim = new_rlim;
return 0;
}
--- linux-2.6.8-rc3/mm/rmap.c.rsslim 2004-08-05 11:06:30.945888921 -0400
+++ linux-2.6.8-rc3/mm/rmap.c 2004-08-05 11:07:25.990548554 -0400
@@ -233,6 +233,9 @@ static int page_referenced_one(struct pa
if (mm != current->mm && has_swap_token(mm))
referenced++;
+ if (mm->rss > mm->rlimit_rss)
+ referenced = 0;
+
(*mapcount)--;
out_unmap:
--- linux-2.6.8-rc3/mm/thrash.c.rsslim 2004-08-05 11:09:34.101519320 -0400
+++ linux-2.6.8-rc3/mm/thrash.c 2004-08-05 11:10:39.515102287 -0400
@@ -24,7 +24,7 @@ struct mm_struct * swap_token_mm = &init
/*
* Take the token away if the process had no page faults
* in the last interval, or if it has held the token for
- * too long.
+ * too long, or if the process exceeds its RSS limit.
*/
#define SWAP_TOKEN_ENOUGH_RSS 1
#define SWAP_TOKEN_TIMED_OUT 2
@@ -35,6 +35,8 @@ static int should_release_swap_token(str
ret = SWAP_TOKEN_ENOUGH_RSS;
else if (time_after(jiffies, swap_token_timeout))
ret = SWAP_TOKEN_TIMED_OUT;
+ else if (mm->rss > mm->rlimit_rss)
+ ret = SWAP_TOKEN_ENOUGH_RSS;
mm->recent_pagein = 0;
return ret;
}
@@ -59,8 +61,8 @@ void grab_swap_token(void)
if (time_after(jiffies, swap_token_check)) {
/* Can't get swapout protection if we exceed our RSS limit. */
- // if (current->mm->rss > current->mm->rlimit_rss)
- // return;
+ if (current->mm->rss > current->mm->rlimit_rss)
+ return;
/* ... or if we recently held the token. */
if (time_before(jiffies, current->mm->swap_token_time))
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-05 17:05 [PATCH] RSS ulimit enforcement for 2.6.8 Rik van Riel
@ 2004-08-05 20:34 ` Bill Davidsen
2004-08-05 20:49 ` Rik van Riel
2004-08-05 22:31 ` Andrew Morton
1 sibling, 1 reply; 13+ messages in thread
From: Bill Davidsen @ 2004-08-05 20:34 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel
Rik van Riel wrote:
> The patch below implements RSS ulimit enforcement for 2.6.8-rc3-mm1.
> It works in a very simple way: if a process has more resident memory
> than its RSS limit allows, we pretend it didn't access any of its
> pages, making it easy for the pageout code to evict the pages.
>
> In addition to this, we don't allow a process that exceeds its RSS
> limit to have the swapout protection token.
>
> I have tested the patch on my system here and it appears to be working
> fine.
You have had better luck getting that to compile than I have, but I'm
still working on it. I assume that the note about sched compiling with
SMP set will get me going.
Wish there was something like RSS for cache, so that one process reading
every inode on the planet, or doing an md5 on an 11GB file wouldn't push
every damn process out if it's waiting for me to finish typing a line...
I did a brute force patch for 2.4.18 to limit the total memory used for
cache, but it would sure be nice to just limit by process. Yes I know
cache is shared, I have looked at this before :-(
--
-bill davidsen (davidsen@tmr.com)
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-05 20:34 ` Bill Davidsen
@ 2004-08-05 20:49 ` Rik van Riel
2004-08-05 21:20 ` Andrew Morton
2004-08-08 3:09 ` Bill Davidsen
0 siblings, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2004-08-05 20:49 UTC (permalink / raw)
To: Bill Davidsen; +Cc: Andrew Morton, linux-kernel
On Thu, 5 Aug 2004, Bill Davidsen wrote:
> Rik van Riel wrote:
> > The patch below implements RSS ulimit enforcement for 2.6.8-rc3-mm1.
> Wish there was something like RSS for cache, so that one process reading
> every inode on the planet, or doing an md5 on an 11GB file wouldn't push
> every damn process out if it's waiting for me to finish typing a line...
I guess that's beyond the scope of a simple patch, you may
be interested in CKRM for something like that:
http://ckrm.sf.net/
For now I'm just interested in filling out the holes in
rlimit for the mainline kernel, as well as putting some
simple resource enforcement things in place.
I'm not about to add something complex at this stage ;)
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-05 20:49 ` Rik van Riel
@ 2004-08-05 21:20 ` Andrew Morton
2004-08-10 7:28 ` Kurt Garloff
2004-08-08 3:09 ` Bill Davidsen
1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2004-08-05 21:20 UTC (permalink / raw)
To: Rik van Riel; +Cc: davidsen, linux-kernel
Rik van Riel <riel@redhat.com> wrote:
>
> On Thu, 5 Aug 2004, Bill Davidsen wrote:
> > Rik van Riel wrote:
> > > The patch below implements RSS ulimit enforcement for 2.6.8-rc3-mm1.
>
> > Wish there was something like RSS for cache, so that one process reading
> > every inode on the planet, or doing an md5 on an 11GB file wouldn't push
> > every damn process out if it's waiting for me to finish typing a line...
>
> I guess that's beyond the scope of a simple patch
It might not be. We could come up with some dopey per-process flag,
inherited across fork which means "invalidate each file's pagecache when I
close it". get/set that flag with a new syscall, or sys_prctl(). That
way, people could do:
/bin/run-cache-friendly tar cf /dev/tape /huge-filesystem
and not have their pagecache trodden all over. Extra points for nuking
dentries and inodes too.
It's not particularly pretty, but it would be effective for the most
commonly complained about scenarios.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-05 21:20 ` Andrew Morton
@ 2004-08-10 7:28 ` Kurt Garloff
0 siblings, 0 replies; 13+ messages in thread
From: Kurt Garloff @ 2004-08-10 7:28 UTC (permalink / raw)
To: Andrew Morton; +Cc: Rik van Riel, davidsen, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 736 bytes --]
On Thu, Aug 05, 2004 at 02:20:19PM -0700, Andrew Morton wrote:
> It might not be. We could come up with some dopey per-process flag,
> inherited across fork which means "invalidate each file's pagecache when I
> close it".
Currently, we don't even have a way to explicitly drop page cache
for a file or filesystem except for umounting it.
In the old buffer cache days, BLKFLSBUF would have done it, but
that's pretty much without any effect nowadays.
So maybe you want to add the ioctl as well.
It would be useful for doagnostics and benchmarking as well.
Regards,
--
Kurt Garloff <garloff@suse.de> Cologne, DE
SUSE LINUX AG / Novell, Nuernberg, DE Director SUSE Labs
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-05 20:49 ` Rik van Riel
2004-08-05 21:20 ` Andrew Morton
@ 2004-08-08 3:09 ` Bill Davidsen
1 sibling, 0 replies; 13+ messages in thread
From: Bill Davidsen @ 2004-08-08 3:09 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel
Rik van Riel wrote:
> On Thu, 5 Aug 2004, Bill Davidsen wrote:
>
>>Rik van Riel wrote:
>>
>>>The patch below implements RSS ulimit enforcement for 2.6.8-rc3-mm1.
>
>
>>Wish there was something like RSS for cache, so that one process reading
>>every inode on the planet, or doing an md5 on an 11GB file wouldn't push
>>every damn process out if it's waiting for me to finish typing a line...
>
>
> I guess that's beyond the scope of a simple patch, you may
> be interested in CKRM for something like that:
>
> http://ckrm.sf.net/
Interesting stuff.
>
> For now I'm just interested in filling out the holes in
> rlimit for the mainline kernel, as well as putting some
> simple resource enforcement things in place.
>
> I'm not about to add something complex at this stage ;)
>
I really wasn't asking that you should, just mumbling and hoping that
some VM-savvy person would say "I can do that!" and offer an elegant
solution. Given how little more cache helps for most loads on a machine
with adequate memory, it seems silly to have almost all the programs on
a 2GB machine pushed out to make room for pages read exactly once by a
program copying a 4GB file.
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-05 17:05 [PATCH] RSS ulimit enforcement for 2.6.8 Rik van Riel
2004-08-05 20:34 ` Bill Davidsen
@ 2004-08-05 22:31 ` Andrew Morton
2004-08-06 0:19 ` Rik van Riel
1 sibling, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2004-08-05 22:31 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-kernel
Rik van Riel <riel@redhat.com> wrote:
>
> The patch below implements RSS ulimit enforcement for 2.6.8-rc3-mm1.
> It works in a very simple way: if a process has more resident memory
> than its RSS limit allows, we pretend it didn't access any of its
> pages, making it easy for the pageout code to evict the pages.
>
> In addition to this, we don't allow a process that exceeds its RSS
> limit to have the swapout protection token.
Thanks.
I'd kinda expected that the patch would try to limit a process to its
RLIMIT_RSS all the time. So if a process is set to 16MB and tries to use
32MB it gets to do a lot of swapping. But you're not doing that. Instead,
the patch is preferentially penalising processes which are over their limit
when we enter page reclaim. What are the pros and cons, and what is the
thinking behind this?
Also, I wonder if it would be useful if refill_inactive_zone() were to
unconditionally move pages from over-rss-limit mm's onto the inactive list,
ignoring swappiness. Or if we should explicitly deactivate pages which are
newly added to the LRU on behalf of an over-rss-limit process.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-05 22:31 ` Andrew Morton
@ 2004-08-06 0:19 ` Rik van Riel
2004-08-06 0:36 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2004-08-06 0:19 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Thu, 5 Aug 2004, Andrew Morton wrote:
> I'd kinda expected that the patch would try to limit a process to its
> RLIMIT_RSS all the time. So if a process is set to 16MB and tries to use
> 32MB it gets to do a lot of swapping. But you're not doing that. Instead,
> the patch is preferentially penalising processes which are over their limit
> when we enter page reclaim. What are the pros and cons, and what is the
> thinking behind this?
Hard limiting a process when there is memory available
means that it's trying to saturate the IO subsystem,
slowing down other tasks in the system.
Basically when memory isn't the bottleneck, I think you
shouldn't try to create an IO bottleneck for the other
tasks in the system, just because an RLIMIT_RSS got set.
The downside is that the pages of the process need to be
swapped out when something else needs the memory, but if
the alternative is constant swapping, we'd have the IO
overhead regardless...
> Also, I wonder if it would be useful if refill_inactive_zone() were to
> unconditionally move pages from over-rss-limit mm's onto the inactive
> list, ignoring swappiness. Or if we should explicitly deactivate pages
> which are newly added to the LRU on behalf of an over-rss-limit process.
If the current patch isn't effective enough, we may want
to add more code. However, we may want to try the simplest
possible approach first.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-06 0:19 ` Rik van Riel
@ 2004-08-06 0:36 ` Andrew Morton
2004-08-06 0:51 ` Rik van Riel
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2004-08-06 0:36 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-kernel
Rik van Riel <riel@redhat.com> wrote:
>
> > Also, I wonder if it would be useful if refill_inactive_zone() were to
> > unconditionally move pages from over-rss-limit mm's onto the inactive
> > list, ignoring swappiness. Or if we should explicitly deactivate pages
> > which are newly added to the LRU on behalf of an over-rss-limit process.
>
> If the current patch isn't effective enough, we may want
> to add more code. However, we may want to try the simplest
> possible approach first.
How do we know whether it is effective enough? How do we define this?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-06 0:36 ` Andrew Morton
@ 2004-08-06 0:51 ` Rik van Riel
2004-08-06 0:53 ` Andrew Morton
0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2004-08-06 0:51 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Thu, 5 Aug 2004, Andrew Morton wrote:
> > If the current patch isn't effective enough, we may want
> > to add more code. However, we may want to try the simplest
> > possible approach first.
>
> How do we know whether it is effective enough? How do we define this?
Good question. I guess our usual answer is "throw it out there
and wait for somebody to show up with a workload that needs more".
If you want I could add a bit more code proactively, but how do
we find out whether it's really needed ?
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-06 0:51 ` Rik van Riel
@ 2004-08-06 0:53 ` Andrew Morton
2004-08-06 0:59 ` Rik van Riel
0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2004-08-06 0:53 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-kernel
Rik van Riel <riel@redhat.com> wrote:
>
> On Thu, 5 Aug 2004, Andrew Morton wrote:
>
> > > If the current patch isn't effective enough, we may want
> > > to add more code. However, we may want to try the simplest
> > > possible approach first.
> >
> > How do we know whether it is effective enough? How do we define this?
>
> Good question. I guess our usual answer is "throw it out there
> and wait for somebody to show up with a workload that needs more".
>
> If you want I could add a bit more code proactively, but how do
> we find out whether it's really needed ?
Good question. What I'm groping for here is some definition of what we
actually want the feature to _do_. Once we have that, and have suitably
argued about it, we can then go off and see if the patch actually does it.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-06 0:53 ` Andrew Morton
@ 2004-08-06 0:59 ` Rik van Riel
2004-08-06 2:22 ` Nick Piggin
0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2004-08-06 0:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
On Thu, 5 Aug 2004, Andrew Morton wrote:
> Good question. What I'm groping for here is some definition of what we
> actually want the feature to _do_. Once we have that, and have suitably
> argued about it, we can then go off and see if the patch actually does it.
What I want the feature to do is allow users to set an
RSS rlimit to prevent a process from hogging up all the
machine's memory.
I am not looking for a hard memory limit, since that
would just cause extra IO, which has bad consequences
for the rest of the system.
In addition, I would like the patch to be relatively
low impact, not giving us much maintenance overhead or
much runtime overhead.
If anybody has good reasons for needing hard per-process
RSS limits, let us know. So far I haven't seen anybody
with a workload that somehow requires a hard limit.
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] RSS ulimit enforcement for 2.6.8
2004-08-06 0:59 ` Rik van Riel
@ 2004-08-06 2:22 ` Nick Piggin
0 siblings, 0 replies; 13+ messages in thread
From: Nick Piggin @ 2004-08-06 2:22 UTC (permalink / raw)
To: Rik van Riel; +Cc: Andrew Morton, linux-kernel
Rik van Riel wrote:
>On Thu, 5 Aug 2004, Andrew Morton wrote:
>
>
>>Good question. What I'm groping for here is some definition of what we
>>actually want the feature to _do_. Once we have that, and have suitably
>>argued about it, we can then go off and see if the patch actually does it.
>>
>
>What I want the feature to do is allow users to set an
>RSS rlimit to prevent a process from hogging up all the
>machine's memory.
>
>I am not looking for a hard memory limit, since that
>would just cause extra IO, which has bad consequences
>for the rest of the system.
>
>In addition, I would like the patch to be relatively
>low impact, not giving us much maintenance overhead or
>much runtime overhead.
>
>If anybody has good reasons for needing hard per-process
>RSS limits, let us know. So far I haven't seen anybody
>with a workload that somehow requires a hard limit.
>
>
FWIW, I like Rik's approach. One tiny request might be just to do the
patch underneath the thrashing control patch so it can be sent to Linus
earlier.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2004-08-12 12:45 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-05 17:05 [PATCH] RSS ulimit enforcement for 2.6.8 Rik van Riel
2004-08-05 20:34 ` Bill Davidsen
2004-08-05 20:49 ` Rik van Riel
2004-08-05 21:20 ` Andrew Morton
2004-08-10 7:28 ` Kurt Garloff
2004-08-08 3:09 ` Bill Davidsen
2004-08-05 22:31 ` Andrew Morton
2004-08-06 0:19 ` Rik van Riel
2004-08-06 0:36 ` Andrew Morton
2004-08-06 0:51 ` Rik van Riel
2004-08-06 0:53 ` Andrew Morton
2004-08-06 0:59 ` Rik van Riel
2004-08-06 2:22 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox