* Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5
[not found] <Pine.LNX.4.21.0012291138380.1403-100000@duckman.distro.conectiva>
@ 2000-12-30 2:25 ` Dieter Nützel
0 siblings, 0 replies; 5+ messages in thread
From: Dieter Nützel @ 2000-12-30 2:25 UTC (permalink / raw)
To: Rik van Riel; +Cc: Linux Kernel List
Am Freitag, 29. Dezember 2000 14:38 schrieben Sie:
> On Fri, 29 Dec 2000, Dieter Nützel wrote:
> > your patch didn't apply clean.
> > Have you another version?
>
> It should apply just fine. What error messages did
> patch give ?
>
Applied #2 against my running 2.4.0-test13-pre5 + ReiserFS 3.6.23 tree and
a clean test13-pre5 (test12 + test13-pre5). Same for both of them:
SunWave1>patch -p0 -E -N <patches/2.4.0-test13-pre5-VM-fix
patching file `linux-2.4.0-test13-pre5/mm/filemap.c'
Hunk #1 FAILED at 1912.
Hunk #2 FAILED at 2438.
Hunk #3 FAILED at 2448.
Hunk #4 FAILED at 2493.
4 out of 4 hunks FAILED -- saving rejects to
linux-2.4.0-test13-pre5/mm/filemap.c.rej
patching file `linux-2.4.0-test13-pre5/mm/memory.c'
Hunk #1 FAILED at 1198.
1 out of 1 hunk FAILED -- saving rejects to
linux-2.4.0-test13-pre5/mm/memory.c.rej
patching file `linux-2.4.0-test13-pre5/mm/vmscan.c'
Hunk #1 FAILED at 49.
Hunk #2 FAILED at 59.
Hunk #3 FAILED at 92.
Hunk #4 FAILED at 108.
Hunk #5 FAILED at 159.
Hunk #6 FAILED at 200.
Hunk #7 FAILED at 271.
Hunk #8 FAILED at 290.
Hunk #9 FAILED at 310.
Hunk #10 succeeded at 390 with fuzz 2.
Hunk #11 FAILED at 430.
Hunk #12 FAILED at 575.
Hunk #13 FAILED at 586.
Hunk #14 FAILED at 618.
Hunk #15 FAILED at 932.
Hunk #16 FAILED at 944.
Hunk #17 FAILED at 953.
Hunk #18 FAILED at 972.
Hunk #19 succeeded at 1007 with fuzz 2.
Hunk #20 succeeded at 1182 with fuzz 2.
17 out of 20 hunks FAILED -- saving rejects to
linux-2.4.0-test13-pre5/mm/vmscan.c.rej
patching file `linux-2.4.0-test13-pre5/include/linux/mm.h'
Hunk #1 succeeded at 460 with fuzz 2.
patching file `linux-2.4.0-test13-pre5/include/linux/swap.h'
filemap.c : offset of 3 lines needed
memory.c : dito
vmscan.c : dito
Now, I hacked it by 'hand' and got it running.
I did dbench and saw nearly same results then Daniel Phillips
But my disk is to small. Some writes failed...:-(
Test machine: 256 MB, Athlon 550 SlotA, SCSI, ReiserFS 3.6.23, Blocksize=4K
Test: dbench 48
Throughput: 10.89 MB/sec
Elapsed Time: 9 min 47 secs
"Guten Rutsch in's neue Jahr!" :-)
-Dieter
--
Dieter Nützel
Graduate Student, Computer Science
University of Hamburg
Department of Computer Science
Cognitive Systems Group
Vogt-Kölln-Straße 30
D-22527 Hamburg, Germany
email: nuetzel@kogs.informatik.uni-hamburg.de
@home: Dieter.Nuetzel@hamburg.de
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5
@ 2001-01-08 4:51 Peter Chubb
0 siblings, 0 replies; 5+ messages in thread
From: Peter Chubb @ 2001-01-08 4:51 UTC (permalink / raw)
To: linux-kernel; +Cc: ingo.oeser, riel
Ingo wrote:
> On Wed, Jan 03, 2001 at 09:43:54AM -0200, Rik van Riel wrote:
> > On Fri, 28 Dec 2000, Mike Sklar wrote:
> > > If I wanted to adjust the rlim_cur value of a running
> > > processes, is there any sort of interface for that?
> >
> > Hmmm, I don't think there is an interface to adjust the
> > per-process ulimit settings on-the-fly ...
> >
> > Does anybody know if there's an interface for this ?
> If you don't mean "kill -TERM", no there isn't. It would be evil
> to the process anyway.
The RSS limits patch I sent to linux-kernel some time ago provided an
experimental /proc interface to allow exactly this.
The patch against 2.2.16 is still on our FTP server at
ftp://ftp-au.aurema.com/private/aurpjc31/linux-2216-rsslimit.diff.bz2
Here's the patch against 2.4.0. The main differences between this and
Rik's patch are:
-- you choose soft or hard limits at kernel config time with my
patch; with Rik's you get both (rlim_cur is `soft' rlim_max is
`hard')
-- Rik's patch does some extra stuff to the VM code as well as
the RSS limits
-- Rik's patch doesn't affect swap behaviour (except in so far
as processes over their RSS limit will tend to swap, which reduces
memory pressure on all other processes); my patch means that
processes over RSS limit suffer somewhat
-- My patch puts the limit into the struct mm for slightly more
cache-friendly behaviour, and to allow later interfacing with
per-user resource-management software (it should be possible
to write a kernel module to adjust RSS limits to implement per-user
limits without affecting per-process RLIMIT values)
-- My patch has a /proc interface to allow setting
rlimit[RLIMIT_RSS]
-- my patch implements the rss accounting fields so that time -v
gives reasonable output
Index: linux-2.4.0/CREDITS
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/CREDITS,v
retrieving revision 1.1.1.5
diff -u -b -u -r1.1.1.5 CREDITS
--- linux-2.4.0/CREDITS 2001/01/04 23:02:54 1.1.1.5
+++ linux-2.4.0/CREDITS 2001/01/08 04:41:41
@@ -491,6 +491,24 @@
S: Stanford, California 94305
S: USA
+N: Kingsley Cheung
+E: kingsley@aurema.com
+D: Page fault calculation
+D: /proc/<pid>/rss support
+D: kswapd improvements regarding process RSS limits
+S: Aurema Pty Limited
+S: PO Box 305, Strawberry Hills NSW 2012,
+S: Australia
+
+N: Peter Chubb
+E: peterc@aurema.com
+D: Page fault calculation
+D: /proc/<pid>/rss support
+D: kswapd improvements regarding process RSS limits
+S: Aurema Pty Limited
+S: PO Box 305, Strawberry Hills NSW 2012,
+S: Australia
+
N: Juan Jose Ciarlante
W: http://juanjox.kernelnotes.org/
E: jjciarla@raiz.uncu.edu.ar
Index: linux-2.4.0/Documentation/Configure.help
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/Documentation/Configure.help,v
retrieving revision 1.1.1.6
diff -u -b -u -r1.1.1.6 Configure.help
--- linux-2.4.0/Documentation/Configure.help 2001/01/07 21:44:33 1.1.1.6
+++ linux-2.4.0/Documentation/Configure.help 2001/01/08 04:41:41
@@ -16955,6 +16955,50 @@
another UltraSPARC-IIi-cEngine boardset with a 7-segment display,
you should say N to this option.
+RSS Softlimits (EXPERIMENTAL)
+CONFIG_RSS_SOFTLIMIT
+ If you want the setrlimit(RLIMIT_RSS, ...) system call to work, say
+ Y either here or for RSS Hardlimits. If you don't understand this
+ you don't need it, so say N.
+
+ RSS Softlimits will make it more likely that pages will be stolen
+ from processes that have a resident set size (i.e., real memory
+ footprint) greater than their limit. Processes with a limit set
+ that is below their actual need may still exceed their limits, and
+ in this instance kswapd may work excessively hard.
+
+ Because of the way that RSS is measured and controlled, the limit is
+ approximate only.
+
+ It is harmless to have RSS Softlimits and RSS Hardlimits both set.
+
+RSS Hardlimits (EXPERIMENTAL)
+CONFIG_RSS_HARDLIMIT
+ If you want the setrlimit(RLIMIT_RSS, ...) system call to work, say
+ Y either here or for RSS Softlimits. If you don't understand this
+ you don't need it, so say N.
+
+ RSS Hardlimits changes the behaviour of the kernel at page-fault
+ time. If a process is over its RSS limit when it wants to get a new
+ page, then with this configuration option enabled the process's
+ memory space will be reduced before the page-fault continues.
+
+ Because of the way that RSS is measured and controlled, the actual
+ memory footprint of a process may exceed the set limit for a short
+ time.
+
+ It is harmless to have RSS Softlimits and RSS Hardlimits both set.
+
+Support for /proc/pid/rss (EXPERIMENTAL)
+CONFIG_PROC_RSS
+ Saying Y here adds an extra file inside each process directory in the
+ /proc file system that allows measurement and control of resident
+ set size (real memory footprint). The file format is documented in
+ Documentation/proc_rss.txt
+
+ The main purpose of this file is for testing the results of the RSS
+ Hardlimits or Softlimits configuration options.
+
IA-64 system type
CONFIG_IA64_GENERIC
This selects the system type of your hardware. A "generic" kernel
Index: linux-2.4.0/arch/alpha/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/alpha/config.in,v
retrieving revision 1.1.1.5
diff -u -b -u -r1.1.1.5 config.in
--- linux-2.4.0/arch/alpha/config.in 2001/01/04 23:31:47 1.1.1.5
+++ linux-2.4.0/arch/alpha/config.in 2001/01/08 04:41:42
@@ -231,6 +231,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
if [ "$CONFIG_PROC_FS" = "y" ]; then
choice 'Kernel core (/proc/kcore) format' \
"ELF CONFIG_KCORE_ELF \
Index: linux-2.4.0/arch/alpha/kernel/osf_sys.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/alpha/kernel/osf_sys.c,v
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.1 osf_sys.c
--- linux-2.4.0/arch/alpha/kernel/osf_sys.c 2000/10/04 00:17:47 1.1.1.1
+++ linux-2.4.0/arch/alpha/kernel/osf_sys.c 2001/01/08 04:41:42
@@ -1164,6 +1164,10 @@
r.ru_minflt = current->min_flt;
r.ru_majflt = current->maj_flt;
r.ru_nswap = current->nswap;
+ r.ru_maxrss = current->maxrss;
+ r.ru_ixrss = 0;
+ r.ru_idrss = current->irss;
+ r.ru_isrss = 0;
break;
case RUSAGE_CHILDREN:
r.ru_utime.tv_sec = CT_TO_SECS(current->times.tms_cutime);
@@ -1173,6 +1177,10 @@
r.ru_minflt = current->cmin_flt;
r.ru_majflt = current->cmaj_flt;
r.ru_nswap = current->cnswap;
+ r.ru_maxrss = current->cmaxrss;
+ r.ru_ixrss = 0;
+ r.ru_idrss = current->cirss;
+ r.ru_isrss = 0;
break;
default:
r.ru_utime.tv_sec = CT_TO_SECS(current->times.tms_utime +
@@ -1186,6 +1194,10 @@
r.ru_minflt = current->min_flt + current->cmin_flt;
r.ru_majflt = current->maj_flt + current->cmaj_flt;
r.ru_nswap = current->nswap + current->cnswap;
+ r.ru_maxrss = current->maxrss > current->cmaxrss ? current->maxrss : current->cmaxrss;
+ r.ru_ixrss = 0;
+ r.ru_idrss = current->irss + current>cirss;
+ r.ru_isrss = 0;
break;
}
Index: linux-2.4.0/arch/arm/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/arm/config.in,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 config.in
--- linux-2.4.0/arch/arm/config.in 2000/12/13 00:39:26 1.1.1.3
+++ linux-2.4.0/arch/arm/config.in 2001/01/08 04:41:42
@@ -251,6 +251,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
tristate 'NWFPE math emulation' CONFIG_NWFPE
choice 'Kernel core (/proc/kcore) format' \
"ELF CONFIG_KCORE_ELF \
Index: linux-2.4.0/arch/arm/mm/fault-common.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/arm/mm/fault-common.c,v
retrieving revision 1.1.1.2
diff -u -b -u -r1.1.1.2 fault-common.c
--- linux-2.4.0/arch/arm/mm/fault-common.c 2000/12/13 05:19:47 1.1.1.2
+++ linux-2.4.0/arch/arm/mm/fault-common.c 2001/01/08 04:41:42
@@ -100,9 +100,15 @@
switch (fault) {
case 2:
tsk->maj_flt++;
+#if CONFIG_PROC_RSS
+ update_flt_rate(&(tsk->maj_flt_rate), &(tsk->maj_flt_time));
+#endif
return fault;
case 1:
tsk->min_flt++;
+#if CONFIG_PROC_RSS
+ update_flt_rate(&(tsk->min_flt_rate), &(tsk->min_flt_time));
+#endif
case 0:
return fault;
}
Index: linux-2.4.0/arch/i386/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/i386/config.in,v
retrieving revision 1.1.1.5
diff -u -b -u -r1.1.1.5 config.in
--- linux-2.4.0/arch/i386/config.in 2001/01/04 23:31:17 1.1.1.5
+++ linux-2.4.0/arch/i386/config.in 2001/01/08 04:41:42
@@ -226,6 +226,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
if [ "$CONFIG_PROC_FS" = "y" ]; then
choice 'Kernel core (/proc/kcore) format' \
"ELF CONFIG_KCORE_ELF \
Index: linux-2.4.0/arch/i386/mm/fault.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/i386/mm/fault.c,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 fault.c
--- linux-2.4.0/arch/i386/mm/fault.c 2000/12/13 00:20:39 1.1.1.3
+++ linux-2.4.0/arch/i386/mm/fault.c 2001/01/08 04:41:42
@@ -4,6 +4,7 @@
* Copyright (C) 1995 Linus Torvalds
*/
+#include <linux/config.h>
#include <linux/signal.h>
#include <linux/sched.h>
#include <linux/kernel.h>
@@ -18,6 +19,10 @@
#include <linux/interrupt.h>
#include <linux/init.h>
+#ifdef CONFIG_PROC_RSS
+#include <linux/rss.h>
+#endif
+
#include <asm/system.h>
#include <asm/uaccess.h>
#include <asm/pgalloc.h>
@@ -196,9 +201,15 @@
switch (handle_mm_fault(mm, vma, address, write)) {
case 1:
tsk->min_flt++;
+#if CONFIG_PROC_RSS
+ update_flt_rate(&(tsk->maj_flt_rate), &(tsk->maj_flt_time));
+#endif
break;
case 2:
tsk->maj_flt++;
+#if CONFIG_PROC_RSS
+ update_flt_rate(&(tsk->min_flt_rate), &(tsk->min_flt_time));
+#endif
break;
case 0:
goto do_sigbus;
Index: linux-2.4.0/arch/ia64/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/ia64/config.in,v
retrieving revision 1.1.1.4
diff -u -b -u -r1.1.1.4 config.in
--- linux-2.4.0/arch/ia64/config.in 2001/01/07 21:44:41 1.1.1.4
+++ linux-2.4.0/arch/ia64/config.in 2001/01/08 04:41:42
@@ -93,6 +93,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
tristate 'Kernel support for ELF binaries' CONFIG_BINFMT_ELF
tristate 'Kernel support for MISC binaries' CONFIG_BINFMT_MISC
Index: linux-2.4.0/arch/ia64/ia32/binfmt_elf32.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/ia64/ia32/binfmt_elf32.c,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 binfmt_elf32.c
--- linux-2.4.0/arch/ia64/ia32/binfmt_elf32.c 2001/01/07 21:44:41 1.1.1.3
+++ linux-2.4.0/arch/ia64/ia32/binfmt_elf32.c 2001/01/08 04:41:42
@@ -204,7 +204,8 @@
for (i = 0 ; i < MAX_ARG_PAGES ; i++) {
if (bprm->page[i]) {
- current->mm->rss++;
+ if (++(current->mm->rss) > current->mm->maxrss)
+ current->mm->maxrss = current->mm->rss;
put_dirty_page(current,bprm->page[i],stack_base);
}
stack_base += PAGE_SIZE;
Index: linux-2.4.0/arch/m68k/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/m68k/config.in,v
retrieving revision 1.1.1.4
diff -u -b -u -r1.1.1.4 config.in
--- linux-2.4.0/arch/m68k/config.in 2001/01/07 21:44:42 1.1.1.4
+++ linux-2.4.0/arch/m68k/config.in 2001/01/08 04:41:42
@@ -91,6 +91,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
if [ "$CONFIG_PROC_FS" = "y" ]; then
choice 'Kernel core (/proc/kcore) format' \
"ELF CONFIG_KCORE_ELF \
Index: linux-2.4.0/arch/m68k/atari/stram.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/m68k/atari/stram.c,v
retrieving revision 1.1.1.2
diff -u -b -u -r1.1.1.2 stram.c
--- linux-2.4.0/arch/m68k/atari/stram.c 2000/12/13 05:12:51 1.1.1.2
+++ linux-2.4.0/arch/m68k/atari/stram.c 2001/01/08 04:41:43
@@ -642,7 +642,8 @@
set_pte(dir, pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
swap_free(entry);
get_page(page);
- ++vma->vm_mm->rss;
+ if (++vma->vm_mm->rss > vma->vm_mm->maxrss)
+ vma->vm_mm->maxrss = vma->vm_mm->rss;
}
static inline void unswap_pmd(struct vm_area_struct * vma, pmd_t *dir,
Index: linux-2.4.0/arch/mips/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/mips/config.in,v
retrieving revision 1.1.1.2
diff -u -b -u -r1.1.1.2 config.in
--- linux-2.4.0/arch/mips/config.in 2000/12/13 00:26:37 1.1.1.2
+++ linux-2.4.0/arch/mips/config.in 2001/01/08 04:41:43
@@ -168,6 +168,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
source drivers/parport/Config.in
Index: linux-2.4.0/arch/mips64/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/mips64/config.in,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 config.in
--- linux-2.4.0/arch/mips64/config.in 2000/12/13 05:22:46 1.1.1.3
+++ linux-2.4.0/arch/mips64/config.in 2001/01/08 04:41:43
@@ -104,6 +104,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
tristate 'Kernel support for 64-bit ELF binaries' CONFIG_BINFMT_ELF
bool 'Kernel support for Linux/MIPS 32-bit binary compatibility' CONFIG_MIPS32_COMPAT
if [ "$CONFIG_MIPS32_COMPAT" = "y" ]; then
Index: linux-2.4.0/arch/ppc/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/ppc/config.in,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 config.in
--- linux-2.4.0/arch/ppc/config.in 2000/12/13 00:28:43 1.1.1.3
+++ linux-2.4.0/arch/ppc/config.in 2001/01/08 04:41:43
@@ -118,6 +118,10 @@
bool 'Networking support' CONFIG_NET
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
Index: linux-2.4.0/arch/s390/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/s390/config.in,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 config.in
--- linux-2.4.0/arch/s390/config.in 2000/12/13 00:47:02 1.1.1.3
+++ linux-2.4.0/arch/s390/config.in 2001/01/08 04:41:43
@@ -44,6 +44,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
tristate 'Kernel support for ELF binaries' CONFIG_BINFMT_ELF
endmenu
Index: linux-2.4.0/arch/sh/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/sh/config.in,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 config.in
--- linux-2.4.0/arch/sh/config.in 2001/01/07 21:44:52 1.1.1.3
+++ linux-2.4.0/arch/sh/config.in 2001/01/08 04:41:43
@@ -129,6 +129,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
if [ "$CONFIG_PROC_FS" = "y" ]; then
choice 'Kernel core (/proc/kcore) format' \
"ELF CONFIG_KCORE_ELF \
Index: linux-2.4.0/arch/sparc/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/sparc/config.in,v
retrieving revision 1.1.1.4
diff -u -b -u -r1.1.1.4 config.in
--- linux-2.4.0/arch/sparc/config.in 2000/12/13 05:04:31 1.1.1.4
+++ linux-2.4.0/arch/sparc/config.in 2001/01/08 04:41:43
@@ -59,6 +59,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
if [ "$CONFIG_PROC_FS" = "y" ]; then
define_bool CONFIG_KCORE_ELF y
fi
Index: linux-2.4.0/arch/sparc64/config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/arch/sparc64/config.in,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 config.in
--- linux-2.4.0/arch/sparc64/config.in 2000/12/13 00:36:42 1.1.1.3
+++ linux-2.4.0/arch/sparc64/config.in 2001/01/08 04:41:43
@@ -51,6 +51,10 @@
bool 'System V IPC' CONFIG_SYSVIPC
bool 'BSD Process Accounting' CONFIG_BSD_PROCESS_ACCT
bool 'Sysctl support' CONFIG_SYSCTL
+if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'RSS Softlimits (EXPERIMENTAL)' CONFIG_RSS_SOFTLIMIT
+ bool 'RSS Hardlimits (EXPERIMENTAL)' CONFIG_RSS_HARDLIMIT
+fi
if [ "$CONFIG_PROC_FS" = "y" ]; then
define_bool CONFIG_KCORE_ELF y
fi
Index: linux-2.4.0/drivers/acpi/include/acgcc.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/include/acgcc.h,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/include/acgcc.h 2001/01/07 21:44:56 1.1.1.2
+++ linux-2.4.0/drivers/acpi/include/acgcc.h 2001/01/04 23:30:27 1.1.1.1
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Name: acgcc.h - GCC specific defines, etc.
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/acpi/include/aclinux.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/include/aclinux.h,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/include/aclinux.h 2001/01/07 21:44:56 1.1.1.2
+++ linux-2.4.0/drivers/acpi/include/aclinux.h 2001/01/04 23:30:27 1.1.1.1
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Name: aclinux.h - OS specific defines, etc.
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/acpi/include/actbl1.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/include/actbl1.h,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/include/actbl1.h 2001/01/07 21:44:56 1.1.1.2
+++ linux-2.4.0/drivers/acpi/include/actbl1.h 2001/01/04 23:30:27 1.1.1.1
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Name: actbl1.h - ACPI 1.0 tables
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/acpi/include/actbl2.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/include/actbl2.h,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/include/actbl2.h 2001/01/07 21:44:56 1.1.1.2
+++ linux-2.4.0/drivers/acpi/include/actbl2.h 2001/01/04 23:30:27 1.1.1.1
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Name: actbl2.h - ACPI Specification Revision 2.0 Tables
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/acpi/include/actbl71.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/include/actbl71.h,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/include/actbl71.h 2001/01/07 21:44:56 1.1.1.2
+++ linux-2.4.0/drivers/acpi/include/actbl71.h 2001/01/04 23:30:24 1.1.1.1
@@ -3,7 +3,7 @@
* Name: actbl71.h - IA-64 Extensions to the ACPI Spec Rev. 0.71
* This file includes tables specific to this
* specification revision.
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/acpi/namespace/nsinit.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/namespace/nsinit.c,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/namespace/nsinit.c 2001/01/07 21:44:56 1.1.1.2
+++ linux-2.4.0/drivers/acpi/namespace/nsinit.c 2001/01/04 23:30:33 1.1.1.1
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Module Name: nsinit - namespace initialization
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/acpi/tables/tbconvrt.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/tables/tbconvrt.c,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/tables/tbconvrt.c 2001/01/07 21:44:57 1.1.1.2
+++ linux-2.4.0/drivers/acpi/tables/tbconvrt.c 2001/01/04 23:30:38 1.1.1.1
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Module Name: tbconvrt - ACPI Table conversion utilities
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/acpi/tables/tbxfroot.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/acpi/tables/tbxfroot.c,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/acpi/tables/tbxfroot.c 2001/01/07 21:44:57 1.1.1.2
+++ linux-2.4.0/drivers/acpi/tables/tbxfroot.c 2001/01/04 23:30:37 1.1.1.1
@@ -1,7 +1,7 @@
/******************************************************************************
*
* Module Name: tbxfroot - Find the root ACPI table (RSDT)
- * $Revision: 1.1.1.2 $
+ * $Revision: 1.1.1.1 $
*
*****************************************************************************/
Index: linux-2.4.0/drivers/scsi/README.osst
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/scsi/README.osst,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/scsi/README.osst 2001/01/07 21:45:39 1.1.1.2
+++ linux-2.4.0/drivers/scsi/README.osst 2001/01/04 23:22:35 1.1.1.1
@@ -189,7 +189,7 @@
#!/bin/sh
# Script to create OnStream SC-x0 device nodes (major 206)
# Usage: Makedevs.sh [nos [path to dev]]
-# $Id: README.osst,v 1.1.1.2 2001/01/07 21:45:39 peterc Exp $
+# $Id: README.osst,v 1.1.1.1 2001/01/04 23:22:35 peterc Exp $
major=206
nrs=4
dir=/dev
Index: linux-2.4.0/drivers/scsi/osst.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst.c,v
retrieving revision 1.1.1.2
diff -u -b -u -r1.1.1.2 osst.c
--- linux-2.4.0/drivers/scsi/osst.c 2001/01/07 21:45:39 1.1.1.2
+++ linux-2.4.0/drivers/scsi/osst.c 2001/01/08 04:41:46
@@ -16,14 +16,14 @@
Copyright 1992 - 2000 Kai Makisara
email Kai.Makisara@metla.fi
- $Header: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst.c,v 1.1.1.2 2001/01/07 21:45:39 peterc Exp $
+ $Header: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst.c,v 1.1.1.1 2001/01/04 23:22:39 peterc Exp $
Microscopic alterations - Rik Ling, 2000/12/21
Last modified: Wed Feb 2 22:04:05 2000 by makisara@kai.makisara.local
Some small formal changes - aeb, 950809
*/
-static const char * cvsid = "$Id: osst.c,v 1.1.1.2 2001/01/07 21:45:39 peterc Exp $";
+static const char * cvsid = "$Id: osst.c,v 1.1.1.1 2001/01/04 23:22:39 peterc Exp $";
const char * osst_version = "0.9.4.3";
/* The "failure to reconnect" firmware bug */
Index: linux-2.4.0/drivers/scsi/osst.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst.h,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/scsi/osst.h 2001/01/07 21:45:39 1.1.1.2
+++ linux-2.4.0/drivers/scsi/osst.h 2001/01/04 23:22:39 1.1.1.1
@@ -1,5 +1,5 @@
/*
- * $Header: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst.h,v 1.1.1.2 2001/01/07 21:45:39 peterc Exp $
+ * $Header: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst.h,v 1.1.1.1 2001/01/04 23:22:39 peterc Exp $
*/
#include <asm/byteorder.h>
Index: linux-2.4.0/drivers/scsi/osst_options.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst_options.h,v
retrieving revision 1.1.1.2
retrieving revision 1.1.1.1
diff -u -b -u -r1.1.1.2 -r1.1.1.1
--- linux-2.4.0/drivers/scsi/osst_options.h 2001/01/07 21:45:39 1.1.1.2
+++ linux-2.4.0/drivers/scsi/osst_options.h 2001/01/04 23:22:39 1.1.1.1
@@ -8,7 +8,7 @@
Changed (and renamed) for OnStream SCSI drives garloff@suse.de
2000-06-21
- $Header: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst_options.h,v 1.1.1.2 2001/01/07 21:45:39 peterc Exp $
+ $Header: /wrk/CVSROOT/linux-2.4/drivers/scsi/osst_options.h,v 1.1.1.1 2001/01/04 23:22:39 peterc Exp $
*/
#ifndef _OSST_OPTIONS_H
Index: linux-2.4.0/fs/Config.in
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/fs/Config.in,v
retrieving revision 1.1.1.2
diff -u -b -u -r1.1.1.2 Config.in
--- linux-2.4.0/fs/Config.in 2000/12/12 21:49:13 1.1.1.2
+++ linux-2.4.0/fs/Config.in 2001/01/08 04:41:51
@@ -43,6 +43,10 @@
bool '/proc file system support' CONFIG_PROC_FS
+if [ "$CONFIG_PROC_FS" = "y" -a "$CONFIG_EXPERIMENTAL" = "y" ]; then
+ bool 'Support for /proc/pid/rss? (EXPERIMENTAL)' CONFIG_PROC_RSS
+fi
+
dep_bool '/dev file system support (EXPERIMENTAL)' CONFIG_DEVFS_FS $CONFIG_EXPERIMENTAL
dep_bool ' Automatically mount at boot' CONFIG_DEVFS_MOUNT $CONFIG_DEVFS_FS
dep_bool ' Debug devfs' CONFIG_DEVFS_DEBUG $CONFIG_DEVFS_FS
Index: linux-2.4.0/fs/exec.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/fs/exec.c,v
retrieving revision 1.1.1.6
diff -u -b -u -r1.1.1.6 exec.c
--- linux-2.4.0/fs/exec.c 2001/01/07 21:45:49 1.1.1.6
+++ linux-2.4.0/fs/exec.c 2001/01/08 04:41:52
@@ -321,7 +321,8 @@
struct page *page = bprm->page[i];
if (page) {
bprm->page[i] = NULL;
- current->mm->rss++;
+ if (++current->mm->rss > current->mm->maxrss)
+ current->mm->maxrss = current->mm->rss;
put_dirty_page(current,page,stack_base);
}
stack_base += PAGE_SIZE;
Index: linux-2.4.0/fs/proc/Makefile
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/fs/proc/Makefile,v
retrieving revision 1.1.1.2
diff -u -b -u -r1.1.1.2 Makefile
--- linux-2.4.0/fs/proc/Makefile 2001/01/04 23:03:09 1.1.1.2
+++ linux-2.4.0/fs/proc/Makefile 2001/01/08 04:41:52
@@ -14,8 +14,7 @@
obj-y := inode.o root.o base.o generic.o array.o \
kmsg.o proc_tty.o proc_misc.o kcore.o procfs_syms.o
-ifeq ($(CONFIG_PROC_DEVICETREE),y)
-obj-y += proc_devtree.o
-endif
+obj-$(CONFIG_PROC_DEVICETREE) += proc_devtree.o
+obj-$(CONFIG_PROC_RSS) += rss.o
include $(TOPDIR)/Rules.make
Index: linux-2.4.0/fs/proc/base.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/fs/proc/base.c,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 base.c
--- linux-2.4.0/fs/proc/base.c 2000/12/12 21:49:25 1.1.1.3
+++ linux-2.4.0/fs/proc/base.c 2001/01/08 04:41:52
@@ -11,6 +11,8 @@
* go into icache. We cache the reference to task_struct upon lookup too.
* Eventually it should become a filesystem in its own. We don't use the
* rest of procfs anymore.
+ *
+ * Added support for /proc/<pid>/rss, 20.01.2000, Kingsley Cheung *
*/
#include <asm/uaccess.h>
@@ -39,6 +41,10 @@
int proc_pid_status(struct task_struct*,char*);
int proc_pid_statm(struct task_struct*,char*);
int proc_pid_cpu(struct task_struct*,char*);
+#ifdef CONFIG_PROC_RSS
+int proc_pid_rss_read(struct task_struct*,char*);
+ssize_t proc_pid_rss_write(struct task_struct*,struct file*,char*,size_t,loff_t*);
+#endif
static int proc_fd_link(struct inode *inode, struct dentry **dentry, struct vfsmount **mnt)
{
@@ -309,6 +315,14 @@
read: proc_info_read,
};
+#ifdef CONFIG_PROC_RSS
+static struct file_operations proc_rss_file_operations = {
+ read: proc_info_read,
+ write: proc_pid_rss_write,
+};
+#endif
+
+
#define MAY_PTRACE(p) \
(p==current||(p->p_pptr==current&&(p->ptrace & PT_PTRACED)&&p->state==TASK_STOPPED))
@@ -495,6 +509,9 @@
PROC_PID_STATM,
PROC_PID_MAPS,
PROC_PID_CPU,
+#ifdef CONFIG_PROC_RSS
+ PROC_PID_RSS,
+#endif /* CONFIG_PROC_RSS */
PROC_PID_FD_DIR = 0x8000, /* 0x8000-0xffff */
};
@@ -514,6 +531,9 @@
E(PROC_PID_CWD, "cwd", S_IFLNK|S_IRWXUGO),
E(PROC_PID_ROOT, "root", S_IFLNK|S_IRWXUGO),
E(PROC_PID_EXE, "exe", S_IFLNK|S_IRWXUGO),
+#ifdef CONFIG_PROC_RSS
+ E(PROC_PID_RSS, "rss", S_IFREG|S_IRUGO|S_IWUSR),
+#endif
{0,0,NULL,0}
};
#undef E
@@ -860,6 +880,12 @@
inode->i_op = &proc_mem_inode_operations;
inode->i_fop = &proc_mem_operations;
break;
+#ifdef CONFIG_PROC_RSS
+ case PROC_PID_RSS:
+ inode->i_fop = &proc_info_file_operations;
+ inode->u.proc_i.op.proc_read = proc_pid_rss_read;
+ break;
+#endif
default:
printk("procfs: impossible type (%d)",p->type);
iput(inode);
Index: linux-2.4.0/include/linux/sched.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/include/linux/sched.h,v
retrieving revision 1.1.1.6
diff -u -b -u -r1.1.1.6 sched.h
--- linux-2.4.0/include/linux/sched.h 2001/01/07 21:46:10 1.1.1.6
+++ linux-2.4.0/include/linux/sched.h 2001/01/08 04:41:55
@@ -216,7 +216,9 @@
unsigned long start_code, end_code, start_data, end_data;
unsigned long start_brk, brk, start_stack;
unsigned long arg_start, arg_end, env_start, env_end;
- unsigned long rss, total_vm, locked_vm;
+ unsigned long rss;
+ unsigned long maxrss, rss_limit;
+ unsigned long total_vm, locked_vm;
unsigned long def_flags;
unsigned long cpu_vm_mask;
unsigned long swap_cnt; /* number of pages to swap on next pass */
@@ -351,6 +353,14 @@
/* mm fault and swap info: this can arguably be seen as either mm-specific or thread-specific */
unsigned long min_flt, maj_flt, nswap, cmin_flt, cmaj_flt, cnswap;
int swappable:1;
+/* #if CONFIG_PROC_RSS keep same size, but do we remove ifdefs in fork too? */
+/* major and minor page fault rates and time of occurence */
+ unsigned long maj_flt_rate, min_flt_rate; /* in pages per second */
+ unsigned long maj_flt_time, min_flt_time; /* in seconds */
+/* #endif */
+/* rss statistics -- maxrss is in the struct mm */
+ unsigned long irss;
+ unsigned long cmaxrss, cirss;
/* process credentials */
uid_t uid,euid,suid,fsuid;
gid_t gid,egid,sgid,fsgid;
Index: linux-2.4.0/include/linux/swap.h
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/include/linux/swap.h,v
retrieving revision 1.1.1.4
diff -u -b -u -r1.1.1.4 swap.h
--- linux-2.4.0/include/linux/swap.h 2001/01/04 23:06:01 1.1.1.4
+++ linux-2.4.0/include/linux/swap.h 2001/01/08 04:41:55
@@ -109,6 +109,9 @@
extern int inactive_shortage(void);
extern void wakeup_kswapd(int);
extern int try_to_free_pages(unsigned int gfp_mask);
+#ifdef CONFIG_RSS_HARDLIMIT
+extern int try_to_swap_out_page(unsigned int gfp_mask);
+#endif
/* linux/mm/page_io.c */
extern void rw_swap_page(int, struct page *, int);
Index: linux-2.4.0/kernel/exit.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/kernel/exit.c,v
retrieving revision 1.1.1.5
diff -u -b -u -r1.1.1.5 exit.c
--- linux-2.4.0/kernel/exit.c 2001/01/07 21:46:13 1.1.1.5
+++ linux-2.4.0/kernel/exit.c 2001/01/08 04:41:56
@@ -49,6 +49,8 @@
current->cmin_flt += p->min_flt + p->cmin_flt;
current->cmaj_flt += p->maj_flt + p->cmaj_flt;
current->cnswap += p->nswap + p->cnswap;
+ if (p->cmaxrss > current->cmaxrss)
+ current->cmaxrss = p->cmaxrss;
/*
* Potentially available timeslices are retrieved
* here - this way the parent does not get penalized
@@ -308,6 +310,14 @@
if (mm != tsk->active_mm) BUG();
/* more a memory barrier than a real lock */
task_lock(tsk);
+ /*
+ * can't do this at wait() time, because mm is gone by then.
+ */
+ if (tsk->p_pptr) {
+ if (mm->maxrss > tsk->p_pptr->cmaxrss)
+ tsk->p_pptr->cmaxrss = mm->maxrss;
+ }
+
tsk->mm = NULL;
task_unlock(tsk);
enter_lazy_tlb(mm, current, smp_processor_id());
Index: linux-2.4.0/kernel/fork.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/kernel/fork.c,v
retrieving revision 1.1.1.5
diff -u -b -u -r1.1.1.5 fork.c
--- linux-2.4.0/kernel/fork.c 2001/01/07 21:46:13 1.1.1.5
+++ linux-2.4.0/kernel/fork.c 2001/01/08 04:41:56
@@ -202,6 +202,7 @@
atomic_set(&mm->mm_users, 1);
atomic_set(&mm->mm_count, 1);
init_MUTEX(&mm->mmap_sem);
+ mm->maxrss = mm->rss;
mm->page_table_lock = SPIN_LOCK_UNLOCKED;
mm->pgd = pgd_alloc();
if (mm->pgd)
@@ -284,6 +285,12 @@
tsk->min_flt = tsk->maj_flt = 0;
tsk->cmin_flt = tsk->cmaj_flt = 0;
tsk->nswap = tsk->cnswap = 0;
+ tsk->irss = 0;
+ tsk->cmaxrss = tsk->cirss = 0;
+#if CONFIG_PROC_RSS
+ tsk->maj_flt_time = tsk->min_flt_time = tsk->start_time / HZ;
+ tsk->maj_flt_rate = tsk->min_flt_rate = 0;
+#endif
tsk->mm = NULL;
tsk->active_mm = NULL;
Index: linux-2.4.0/kernel/sys.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/kernel/sys.c,v
retrieving revision 1.1.1.2
diff -u -b -u -r1.1.1.2 sys.c
--- linux-2.4.0/kernel/sys.c 2000/11/25 02:56:24 1.1.1.2
+++ linux-2.4.0/kernel/sys.c 2001/01/08 04:41:56
@@ -1077,6 +1077,10 @@
return -EPERM;
}
*old_rlim = new_rlim;
+
+ if (resource == RLIMIT_RSS && current->mm != &init_mm)
+ current->mm->rss_limit = (new_rlim.rlim_cur == RLIM_INFINITY) ? ULONG_MAX : (new_rlim.rlim_cur >> PAGE_SHIFT);
+
return 0;
}
@@ -1111,6 +1115,10 @@
r.ru_minflt = p->min_flt;
r.ru_majflt = p->maj_flt;
r.ru_nswap = p->nswap;
+ r.ru_maxrss = p->mm->maxrss;
+ r.ru_ixrss = 0;
+ r.ru_idrss = p->irss;
+ r.ru_isrss = 0;
break;
case RUSAGE_CHILDREN:
r.ru_utime.tv_sec = CT_TO_SECS(p->times.tms_cutime);
@@ -1120,6 +1128,10 @@
r.ru_minflt = p->cmin_flt;
r.ru_majflt = p->cmaj_flt;
r.ru_nswap = p->cnswap;
+ r.ru_maxrss = p->cmaxrss;
+ r.ru_ixrss = 0;
+ r.ru_idrss = p->cirss;
+ r.ru_isrss = 0;
break;
default:
r.ru_utime.tv_sec = CT_TO_SECS(p->times.tms_utime + p->times.tms_cutime);
@@ -1129,6 +1141,10 @@
r.ru_minflt = p->min_flt + p->cmin_flt;
r.ru_majflt = p->maj_flt + p->cmaj_flt;
r.ru_nswap = p->nswap + p->cnswap;
+ r.ru_maxrss = p->mm->maxrss > p->cmaxrss ? p->mm->maxrss : p->cmaxrss;
+ r.ru_ixrss = 0;
+ r.ru_idrss = p->irss + p->cirss;
+ r.ru_isrss = 0;
break;
}
return copy_to_user(ru, &r, sizeof(r)) ? -EFAULT : 0;
Index: linux-2.4.0/kernel/timer.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/kernel/timer.c,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 timer.c
--- linux-2.4.0/kernel/timer.c 2000/12/13 02:41:08 1.1.1.3
+++ linux-2.4.0/kernel/timer.c 2001/01/08 04:41:56
@@ -567,6 +567,9 @@
{
p->per_cpu_utime[cpu] += user;
p->per_cpu_stime[cpu] += system;
+ if (p->mm)
+ p->irss += p->mm->rss;
+
do_process_times(p, user, system);
do_it_virt(p, user);
do_it_prof(p);
Index: linux-2.4.0/mm/filemap.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/mm/filemap.c,v
retrieving revision 1.1.1.6
diff -u -b -u -r1.1.1.6 filemap.c
--- linux-2.4.0/mm/filemap.c 2001/01/07 21:46:13 1.1.1.6
+++ linux-2.4.0/mm/filemap.c 2001/01/08 04:41:57
@@ -1967,8 +1967,12 @@
/* Make sure this doesn't exceed the process's max rss. */
error = -EIO;
+#if defined(CONFIG_RSS_HARDLIMIT) || defined(CONFIG_RSS_SOFTLIMIT)
+ rlim_rss = vma->vm_mm->rss_limit;
+#else
rlim_rss = current->rlim ? current->rlim[RLIMIT_RSS].rlim_cur :
LONG_MAX; /* default: see resource.h */
+#endif
if ((vma->vm_mm->rss + (end - start)) > rlim_rss)
return error;
Index: linux-2.4.0/mm/memory.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/mm/memory.c,v
retrieving revision 1.1.1.6
diff -u -b -u -r1.1.1.6 memory.c
--- linux-2.4.0/mm/memory.c 2001/01/07 21:46:13 1.1.1.6
+++ linux-2.4.0/mm/memory.c 2001/01/08 04:41:57
@@ -872,7 +872,8 @@
*/
if (pte_same(*page_table, pte)) {
if (PageReserved(old_page))
- ++mm->rss;
+ if (++mm->rss > mm->maxrss)
+ mm->maxrss = mm->rss;
break_cow(vma, old_page, new_page, address, page_table);
/* Free the old page.. */
@@ -1019,9 +1020,15 @@
struct vm_area_struct * vma, unsigned long address,
pte_t * page_table, swp_entry_t entry, int write_access)
{
- struct page *page = lookup_swap_cache(entry);
+ struct page *page;
pte_t pte;
+#ifdef CONFIG_RSS_HARDLIMIT
+ if (mm->rss >= mm->rss_limit)
+ try_to_shrink_rss(mm, GFP_USER);
+#endif
+
+ page = lookup_swap_cache(entry);
if (!page) {
lock_kernel();
swapin_readahead(entry);
@@ -1034,7 +1041,8 @@
flush_icache_page(vma, page);
}
- mm->rss++;
+ if (++mm->rss > mm->maxrss)
+ mm->maxrss = mm->rss;
pte = mk_pte(page, vma->vm_page_prot);
@@ -1062,13 +1070,18 @@
{
struct page *page = NULL;
pte_t entry = pte_wrprotect(mk_pte(ZERO_PAGE(addr), vma->vm_page_prot));
+#ifdef CONFIG_RSS_HARDLIMIT
+ if (mm->rss > mm->rss_limit)
+ try_to_shrink_rss(mm, GFP_USER);
+#endif
if (write_access) {
page = alloc_page(GFP_HIGHUSER);
if (!page)
return -1;
clear_user_highpage(page, addr);
entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
- mm->rss++;
+ if (++mm->rss > mm->maxrss)
+ mm->maxrss = mm->rss;
flush_page_to_ram(page);
}
set_pte(page_table, entry);
@@ -1107,7 +1120,8 @@
return 0;
if (new_page == NOPAGE_OOM)
return -1;
- ++mm->rss;
+ if (++mm->rss > mm->maxrss)
+ mm->maxrss = mm->rss;
/*
* This silly early PAGE_DIRTY setting removes a race
* due to the bad i386 page protection. But it's valid
Index: linux-2.4.0/mm/swapfile.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/mm/swapfile.c,v
retrieving revision 1.1.1.3
diff -u -b -u -r1.1.1.3 swapfile.c
--- linux-2.4.0/mm/swapfile.c 2001/01/04 23:05:35 1.1.1.3
+++ linux-2.4.0/mm/swapfile.c 2001/01/08 04:41:58
@@ -231,7 +231,8 @@
set_pte(dir, pte_mkdirty(mk_pte(page, vma->vm_page_prot)));
swap_free(entry);
get_page(page);
- ++vma->vm_mm->rss;
+ if (++vma->vm_mm->rss > vma->vm_mm->maxrss)
+ vma->vm_mm->maxrss = vma->vm_mm->rss;
}
static inline void unuse_pmd(struct vm_area_struct * vma, pmd_t *dir,
Index: linux-2.4.0/mm/vmscan.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/mm/vmscan.c,v
retrieving revision 1.1.1.6
diff -u -b -u -r1.1.1.6 vmscan.c
--- linux-2.4.0/mm/vmscan.c 2001/01/07 21:46:13 1.1.1.6
+++ linux-2.4.0/mm/vmscan.c 2001/01/08 04:41:58
@@ -294,6 +294,96 @@
return result;
}
+#if CONFIG_RSS_SOFTLIMIT
+int krssd(void *unused)
+{
+ struct task_struct *tsk = current;
+#define MAX_NPROCS 10
+ pid_t procs[MAX_NPROCS];
+ struct task_struct *p, *oldp;
+ unsigned long rss;
+ int nprocs, i;
+
+ nprocs = 0;
+ (void)unused;
+
+
+ tsk->session = 1;
+ tsk->pgrp = 1;
+ strcpy(tsk->comm, "krssd");
+ sigfillset(&tsk->blocked);
+
+ printk("Starting krssd\n");
+
+ for (;;) {
+
+ /*
+ * Try not to wake up at the same time every second.
+ */
+ tsk->state = TASK_INTERRUPTIBLE;
+ schedule_timeout(HZ + 3);
+
+ /*
+ * Find up to MAX_NPROCS processes that exceed their RSS
+ * limit and attempt to shrink them.
+ * Save the PIDs of any over-limit processes in an array,
+ * so that swap_out_mm can sleep (processes can
+ * die while we're asleep)
+ * Using PIDS rather than proc_t pointers also
+ * reduces the time holding tasklist_lock.
+ */
+ read_lock(&tasklist_lock);
+
+ /* select next processes to scan */
+ oldp = NULL;
+ while (nprocs && !(oldp = find_task_by_pid(procs[--nprocs])))
+ ;
+
+ /* select init if no process found */
+ if (!oldp)
+ oldp = &init_task;
+
+ /* choose at most next MAX_NPROCS */
+ nprocs = 0; p = oldp;
+ while ((p = p->next_task) != oldp && nprocs < MAX_NPROCS)
+ if (p != &init_task && p->swappable && p->mm &&
+ p->mm->rss > p->mm->rss_limit)
+ procs[nprocs++] = p->pid;
+ read_unlock(&tasklist_lock);
+
+ /* Attempt to shrink RSS till under limit */
+ for (i = 0; i < nprocs; i ++) {
+ struct mm_struct *mm;
+ read_lock(&tasklist_lock);
+ p = find_task_by_pid(procs[i]);
+ if (p && (mm = p->mm))
+ atomic_inc(&mm->mm_count);
+ read_unlock(&tasklist_lock);
+
+ if (!p)
+ continue;
+
+ /*
+ * If pages are freed from the process but
+ * are still in use elsewhere,
+ * swap_out_process may return 0
+ * but still shrink rss.
+ * Keep calling it until it cannot do any more work,
+ * or the limit is no longer exceeded.
+ * TODO: think about hysteresis --- track
+ * persistent offenders and reduce RSS even further
+ */
+ while ((rss = mm->rss) > mm->rss_limit &&
+ (swap_out_mm(mm, GFP_KSWAPD) ||
+ rss != mm->rss))
+ ;
+ mmdrop(mm);
+ }
+ }
+}
+
+#endif /* CONFIG_RSS_SOFTLIMIT */
+
/*
* Select the task with maximal swap_cnt and try to swap out a page.
* N.B. This function returns only 0 or 1. Return values != 1 from
@@ -1149,7 +1239,17 @@
swap_setup();
kernel_thread(kswapd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
kernel_thread(kreclaimd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
+#ifdef CONFIG_RSS_SOFTLIMIT
+ kernel_thread(krssd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL);
+#endif
return 0;
}
+
+#ifdef CONFIG_RSS_HARDLIMIT
+void try_to_shrink_rss(struct mm_struct *mm, int gfp_mask)
+{
+ swap_out_mm(mm, gfp_mask);
+}
+#endif
module_init(kswapd_init)
Index: linux-2.4.0/fs/proc/rss.c
===================================================================
RCS file: /wrk/CVSROOT/linux-2.4/fs/proc/rss.c
retrieving revision 1.1
diff -u -b -u -r1.1 rss.c
--- /dev/null Wed May 6 06:32:27 1998
+++ linux-2.4.0/fs/proc/rss.c Mon Jan 8 15:31:22 2001
@@ -0,0 +1,239 @@
+/* fs/proc/rss.c
+ *
+ * 15 March 2000, Kingsley Cheung
+ * Support added for page fault calculation and /proc/pid/rss.
+ */
+#include <linux/ctype.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/resource.h>
+#include <linux/rss.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/sched.h>
+
+#include <asm/page.h>
+#include <asm/processor.h>
+#include <asm/uaccess.h>
+
+
+/*
+ * Below is a table of constants for ((k-1)/k)^n = (4/5)^n, where k is
+ * 5 seconds, n is [0..34] seconds. For n >= 35, (4/5)^n = 0.0
+ * These constants are scaled by 1000, as are the page fault
+ * rates displayed in /proc/<pid>/rss.
+ */
+
+unsigned long mov_ave_table[MOV_AVE_TABLE_SIZE] =
+{ 1000, 800, 640, 512, 410, 328, 262, 210, 168, 134, 107, 86,
+ 69, 55, 44, 35, 28, 23, 18, 14, 12, 9, 7, 6, 5, 4,
+ 3, 2, 2, 2, 1, 1, 1, 1, 1 };
+
+
+
+
+/* Support for /proc/<pid>/rss
+ *
+ * pid_rss_read can integrated into array_read in fs/proc/array.c, but
+ * since we are required to support writing, we have a different set
+ * of file operations.
+ *
+ * The data listed is the following:
+ * - current RSS limit
+ * - maximum RSS limit
+ * - RSS
+ * - major fault rate
+ * - minor fault rate
+ *
+ * Reading /proc/<pid>/rss is allowed only once through an open
+ * descriptor. To obtain more recent data, the file must be
+ * closed and opened again. The file is readable by all everyone.
+ */
+
+int proc_pid_rss_read(struct task_struct *tsk, char *buf)
+{
+ int count;
+ task_lock(tsk);
+
+ /* recalculate page fault rates due to decay */
+ decay_flt_rate(&(tsk->maj_flt_rate), &(tsk->maj_flt_time));
+ decay_flt_rate(&(tsk->min_flt_rate), &(tsk->min_flt_time));
+
+ count = sprintf(buf,
+ "%ld %ld %lu %lu %lu\n",
+ tsk->rlim[RLIMIT_RSS].rlim_cur >> PAGE_SHIFT,
+ tsk->rlim[RLIMIT_RSS].rlim_max >> PAGE_SHIFT,
+ tsk->mm ? tsk->mm->rss : 0,
+ tsk->maj_flt_rate,
+ tsk->min_flt_rate);
+
+ task_unlock(tsk);
+ return count;
+}
+
+
+
+#define skip_space(buffer) \
+do { \
+ while (*(buffer) && isspace(*(buffer))) \
+ (buffer)++; \
+} while(0)
+
+
+static int get_unsigned_numbers(const char *buf, unsigned long *numbers, int count)
+/* Pre: 'numbers' is array of longs of length 'count'
+ * Post: returns number of integers read and sets values in numbers
+ * returns -1 whenever non-digit or white-space encountered
+ */
+{
+ char *p; int i, neg;
+
+ /* loop and get integers, assuming all are in the buffer */
+ for (p = (char *) buf, i = 0; i < count && *p; i++) {
+ /* skip whitespace */
+ skip_space(p);
+ if (!(*p))
+ break;
+
+ /* test for negative values */
+ neg = 0;
+ if (*p == '-' && isdigit(*(p+1))) {
+ neg = 1; p++;
+ }
+
+ /* convert number */
+ numbers[i] = simple_strtoul(p, &p, 0);
+ if (neg)
+ numbers[i] = - numbers[i];
+
+ if (*p && !isspace(*p))
+ return -1;
+ }
+
+ /* ensure remaining numbers are non-white space */
+ skip_space(p);
+ if (*p)
+ return -1;
+
+ return i;
+}
+
+
+#define bad_rss_limit(new_limit) \
+((new_limit) < 0 || (new_limit) > RLIM_INFINITY)
+
+#define getnextline(buf) \
+({ \
+ while (*(buf)) \
+ (buf)++; \
+ (buf)++; \
+})
+
+
+/*
+ * Only the user has write permission to proc/<pid>/rss. The contents
+ * that may be written to it specify the current and maximum RSS limits
+ * the user wishes to set for that process. The contents written to the
+ * pseudo file must adhere to the following:
+ *
+ * 1. There can be at most two numbers per line: the new current
+ * RSS limit followed by whitespace and the new maximum RSS
+ * limit.
+ *
+ * 2. Whenever one number is specified on a line, the number is
+ * interpreted as the new current RSS limit.
+ *
+ * 3. The current RSS limit must never exceed the maximum RSS
+ * limit.
+ *
+ * 4. The maximum RSS limit cannot be increased except by
+ * processes with system resource capabilities.
+ *
+ * 5. The limits must lie between the range 0 and
+ * RLIM_INFINITY / PAGE_SIZE inclusive.
+ *
+ * Violation of these rules will produce invalid or write permission
+ * errors. Attempts to write to /proc/<pid>/rss for non-existent
+ * processes will produce an I/O error.
+ *
+ * Writing to /proc/<pid>/rss can continue forever through an open
+ * descriptor as long as one abides by the rules stated above.
+ * Currently, ppos is not used to limit the number of characters
+ * written.
+ */
+
+ssize_t proc_pid_rss_write(struct task_struct *tsk, struct file *file, const char *buf, size_t count, loff_t *ppos)
+{
+ struct rlimit new_rss;
+ unsigned long numbers[2];
+
+ char *page, *p;
+ int res, ret;
+
+ /* buffer to write to */
+ if (!(page = (char * ) __get_free_page(GFP_KERNEL)))
+ return -ENOMEM;
+
+ /* read user buffer up to one page only */
+ if (count > PAGE_SIZE - 1)
+ count = PAGE_SIZE - 1;
+
+ ret = count;
+
+ if (copy_from_user(page, buf, count))
+ return -EFAULT;
+
+ page[count] = 0;
+ for (p = page; *p; p++)
+ if (*p == '\n')
+ *p = 0;
+
+ /* obtain process current rss limits */
+ new_rss = tsk->rlim[RLIMIT_RSS];
+
+ /* for each line, read at most two integers */
+ for (p = page; p - page < count; getnextline(p)) {
+ /* read numbers from the line */
+ if ((res = get_unsigned_numbers(p, numbers, 2)) < 0) {
+ ret = -EINVAL;
+ goto pid_rss_end;
+ }
+
+ /* if numbers where read from the line */
+ if (res) {
+ /* assign values of new limits */
+ new_rss.rlim_cur = numbers[0] * PAGE_SIZE;
+ if (res == 2)
+ new_rss.rlim_max = numbers[1] * PAGE_SIZE;
+
+ /* ensure rss limits within defined range */
+ if (bad_rss_limit(new_rss.rlim_cur) ||
+ bad_rss_limit(new_rss.rlim_max) ||
+ new_rss.rlim_cur > new_rss.rlim_max) {
+ ret = -EINVAL;
+ goto pid_rss_end;
+ }
+
+ if (new_rss.rlim_max > tsk->rlim[RLIMIT_RSS].rlim_max
+ && !capable(CAP_SYS_RESOURCE)) {
+ ret = -EPERM;
+ goto pid_rss_end;
+ }
+
+ tsk->rlim[RLIMIT_RSS] = new_rss;
+ if (new_rss.rlim_cur == RLIM_INFINITY)
+ tsk->mm->rss_limit = ULONG_MAX;
+ else
+ tsk->mm->rss_limit = new_rss.rlim_cur >> PAGE_SHIFT;
+ }
+ }
+
+ *ppos += ret;
+ file->f_pos += ret;
+
+pid_rss_end:
+ free_page((unsigned long) page);
+
+ return ret;
+}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread[parent not found: <Pine.LNX.4.21.0012292007510.11006-100000@d13.com>]
* Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5
[not found] <Pine.LNX.4.21.0012292007510.11006-100000@d13.com>
@ 2001-01-03 11:43 ` Rik van Riel
2001-01-03 13:12 ` Ingo Oeser
0 siblings, 1 reply; 5+ messages in thread
From: Rik van Riel @ 2001-01-03 11:43 UTC (permalink / raw)
To: Mike Sklar; +Cc: linux-kernel
On Fri, 28 Dec 2000, Mike Sklar wrote:
> If I wanted to adjust the rlim_cur value of a running
> processes, is there any sort of interface for that?
Hmmm, I don't think there is an interface to adjust the
per-process ulimit settings on-the-fly ...
Does anybody know if there's an interface for this ?
regards,
Rik
--
Hollywood goes for world dumbination,
Trailer at 11.
http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5
2001-01-03 11:43 ` Rik van Riel
@ 2001-01-03 13:12 ` Ingo Oeser
0 siblings, 0 replies; 5+ messages in thread
From: Ingo Oeser @ 2001-01-03 13:12 UTC (permalink / raw)
To: Rik van Riel; +Cc: Mike Sklar, linux-kernel
On Wed, Jan 03, 2001 at 09:43:54AM -0200, Rik van Riel wrote:
> On Fri, 28 Dec 2000, Mike Sklar wrote:
> > If I wanted to adjust the rlim_cur value of a running
> > processes, is there any sort of interface for that?
>
> Hmmm, I don't think there is an interface to adjust the
> per-process ulimit settings on-the-fly ...
>
> Does anybody know if there's an interface for this ?
If you don't mean "kill -TERM", no there isn't. It would be evil
to the process anyway.
Some[1] programs ask their resource limits on startup to scale to a
sane amount of memory usage for caching, operation buffers and
the like. If your readjust it to sth. smaller, they'll be killed
soon and if you readjust to sth, bigger, they wouldn't use it.
Regards
Ingo Oeser
[1] I would like to write "most programs", but most programs
assume, that they will never run out of memory and leave it to
the administrator/user to care for this issue :-(
--
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
<<<<<<<<<<<< come and join the fun >>>>>>>>>>>>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5
@ 2000-12-28 22:48 Rik van Riel
0 siblings, 0 replies; 5+ messages in thread
From: Rik van Riel @ 2000-12-28 22:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Alan Cox, linux-mm, linux-kernel
Hi Linus,
I know this is probably not the birthday present you've been
hoping for, but here is a patch agains 2.4.0-test13-pre5 which
does the following - trivial - things:
1. trivially implement RSS ulimit support, with
p->rlim[RLIMIT_RSS].rlim_max treated as a hard limit
and .rlim_cur treated as a soft limit
2. fix the return value from try_to_swap_out() to return
success whenever we make the RSS of a process smaller
3. clean up refill_inactive() ... try_to_swap_out() returns
the expected result now, so things should be balanced again
4. only call deactivate_page() from generic_file_write() if we
write "beyond the end of" the page, so partially written
pages stay active and will remain in memory longer (8% more
performance for dbench, as tested by Daniel Phillips)
5. (minor) s/unsigned int gfp_mask/int gfp_mask/ in vmscan.c
... we had both types used, which is rather inconsistent
Please consider including this patch in the next 2.4 pre-patch,
IMHO all of these things are fairly trivial and it seems to run
very nicely on my test box ;)
regards,
Rik
--
Hollywood goes for world dumbination,
Trailer at 11.
http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/
--- linux-2.4.0-test13-pre5/mm/filemap.c.orig Thu Dec 28 19:11:39 2000
+++ linux-2.4.0-test13-pre5/mm/filemap.c Thu Dec 28 19:28:06 2000
@@ -1912,7 +1912,7 @@
/* Make sure this doesn't exceed the process's max rss. */
error = -EIO;
- rlim_rss = current->rlim ? current->rlim[RLIMIT_RSS].rlim_cur :
+ rlim_rss = current->rlim ? (current->rlim[RLIMIT_RSS].rlim_cur >> PAGE_SHIFT) :
LONG_MAX; /* default: see resource.h */
if ((vma->vm_mm->rss + (end - start)) > rlim_rss)
return error;
@@ -2438,7 +2438,7 @@
}
while (count) {
- unsigned long bytes, index, offset;
+ unsigned long bytes, index, offset, partial = 0;
char *kaddr;
/*
@@ -2448,8 +2448,10 @@
offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */
index = pos >> PAGE_CACHE_SHIFT;
bytes = PAGE_CACHE_SIZE - offset;
- if (bytes > count)
+ if (bytes > count) {
bytes = count;
+ partial = 1;
+ }
/*
* Bring in the user page that we will copy from _first_.
@@ -2491,9 +2493,17 @@
buf += status;
}
unlock:
- /* Mark it unlocked again and drop the page.. */
+ /*
+ * Mark it unlocked again and release the page.
+ * In order to prevent large (fast) file writes
+ * from causing too much memory pressure we move
+ * completely written pages to the inactive list.
+ * We do, however, try to keep the pages that may
+ * still be written to (ie. partially written pages).
+ */
UnlockPage(page);
- deactivate_page(page);
+ if (!partial)
+ deactivate_page(page);
page_cache_release(page);
if (status < 0)
--- linux-2.4.0-test13-pre5/mm/memory.c.orig Thu Dec 28 19:11:39 2000
+++ linux-2.4.0-test13-pre5/mm/memory.c Thu Dec 28 19:12:04 2000
@@ -1198,6 +1198,12 @@
pgd = pgd_offset(mm, address);
pmd = pmd_alloc(pgd, address);
+ if (mm->rss >= (current->rlim[RLIMIT_RSS].rlim_max >> PAGE_SHIFT)) {
+ lock_kernel();
+ enforce_rss_limit(mm, GFP_HIGHUSER);
+ unlock_kernel();
+ }
+
if (pmd) {
pte_t * pte = pte_alloc(pmd, address);
if (pte)
--- linux-2.4.0-test13-pre5/mm/vmscan.c.orig Thu Dec 28 19:11:40 2000
+++ linux-2.4.0-test13-pre5/mm/vmscan.c Thu Dec 28 20:30:10 2000
@@ -49,7 +49,8 @@
if ((!VALID_PAGE(page)) || PageReserved(page))
goto out_failed;
- if (mm->swap_cnt)
+ /* RSS trimming doesn't change the process' chances wrt. normal swap */
+ if (mm->swap_cnt && !(gfp_mask & __GFP_RSS_LIMIT))
mm->swap_cnt--;
onlist = PageActive(page);
@@ -58,7 +59,13 @@
age_page_up(page);
goto out_failed;
}
- if (!onlist)
+ /*
+ * SUBTLE: if the page is on the active list and we're not doing
+ * RSS ulimit trimming, then we let refill_inactive_scan() take
+ * care of the down aging. Always aging down here would severely
+ * disadvantage shared mappings (of eg libc.so).
+ */
+ if (!onlist || (gfp_mask & __GFP_RSS_LIMIT))
/* The page is still mapped, so it can't be freeable... */
age_page_down_ageonly(page);
@@ -85,8 +92,8 @@
* we can just drop our reference to it without doing
* any IO - it's already up-to-date on disk.
*
- * Return 0, as we didn't actually free any real
- * memory, and we should just continue our scan.
+ * Return success, we successfully stole a page from
+ * this process.
*/
if (PageSwapCache(page)) {
entry.val = page->index;
@@ -101,8 +108,8 @@
flush_tlb_page(vma, address);
deactivate_page(page);
page_cache_release(page);
-out_failed:
- return 0;
+
+ return 1;
}
/*
@@ -152,6 +159,7 @@
out_unlock_restore:
set_pte(page_table, pte);
UnlockPage(page);
+out_failed:
return 0;
}
@@ -192,7 +200,7 @@
int result;
mm->swap_address = address + PAGE_SIZE;
result = try_to_swap_out(mm, vma, address, pte, gfp_mask);
- if (result)
+ if (result && !(gfp_mask & __GFP_RSS_LIMIT))
return result;
if (!mm->swap_cnt)
return 0;
@@ -303,6 +311,63 @@
}
/*
+ * This function is used to enforce RSS ulimits for a process. When a
+ * process gets an RSS larger than p->rlim[RLIMIT_RSS].rlim_max, this
+ * function will get called.
+ *
+ * The function is pretty similar to swap_out_mm, except for the fact
+ * that it scans the whole process regardless of return value and it
+ * keeps the swapout statistics intact to not disturb normal swapout.
+ *
+ * XXX: the caller must hold the kernel lock; this function cannot loop
+ * because mlock()ed memory could be bigger than the RSS limit.
+ */
+void enforce_rss_limit(struct mm_struct * mm, int gfp_mask)
+{
+ unsigned long address, old_swap_address;
+ struct vm_area_struct* vma;
+
+ /*
+ * Go through process' page directory.
+ */
+ old_swap_address = mm->swap_address;
+ address = mm->swap_address = 0;
+
+ /* Don't decrement mm->swap_cnt in try_to_swap_out */
+ gfp_mask |= __GFP_RSS_LIMIT;
+ if (!mm->swap_cnt)
+ mm->swap_cnt = 1;
+
+ /*
+ * Find the proper vm-area after freezing the vma chain
+ * and ptes.
+ */
+ spin_lock(&mm->page_table_lock);
+ vma = find_vma(mm, address);
+ if (vma) {
+ if (address < vma->vm_start)
+ address = vma->vm_start;
+
+ for (;;) {
+ /*
+ * Subtle: swap_out_pmd makes sure we scan the
+ * whole VMA, that's a lot more efficient than
+ * a while() loop here would ever be.
+ */
+ swap_out_vma(mm, vma, address, gfp_mask);
+ vma = vma->vm_next;
+ if (!vma)
+ break;
+ address = vma->vm_start;
+ }
+ }
+ /* Reset swap_address, RSS enforcement shouldn't disturb normal swap */
+ mm->swap_address = old_swap_address;
+
+ spin_unlock(&mm->page_table_lock);
+}
+
+/*
* Select the task with maximal swap_cnt and try to swap out a page.
* N.B. This function returns only 0 or 1. Return values != 1 from
* the lower level routines result in continued processing.
@@ -310,7 +375,7 @@
#define SWAP_SHIFT 5
#define SWAP_MIN 8
-static int swap_out(unsigned int priority, int gfp_mask, unsigned long idle_time)
+static int swap_out(unsigned int priority, int gfp_mask)
{
struct task_struct * p;
int counter;
@@ -350,14 +415,15 @@
continue;
if (mm->rss <= 0)
continue;
- /* Skip tasks which haven't slept long enough yet when idle-swapping. */
- if (idle_time && !assign && (!(p->state & TASK_INTERRUPTIBLE) ||
- time_after(p->sleep_time + idle_time * HZ, jiffies)))
- continue;
found_task++;
+ /* If the process' RSS is too big, make it smaller ;) */
+ if (mm->rss > (p->rlim[RLIMIT_RSS].rlim_max >> PAGE_SHIFT))
+ enforce_rss_limit(mm, gfp_mask);
/* Refresh swap_cnt? */
if (assign == 1) {
mm->swap_cnt = (mm->rss >> SWAP_SHIFT);
+ if (mm->rss > (p->rlim[RLIMIT_RSS].rlim_cur >> PAGE_SHIFT))
+ mm->swap_cnt = mm->rss;
if (mm->swap_cnt < SWAP_MIN)
mm->swap_cnt = SWAP_MIN;
}
@@ -497,7 +563,7 @@
#define MAX_LAUNDER (4 * (1 << page_cluster))
int page_launder(int gfp_mask, int sync)
{
- int launder_loop, maxscan, cleaned_pages, maxlaunder;
+ int launder_loop, maxscan, cleaned_pages, maxlaunder, target;
int can_get_io_locks;
struct list_head * page_lru;
struct page * page;
@@ -508,6 +574,8 @@
*/
can_get_io_locks = gfp_mask & __GFP_IO;
+ target = free_shortage();
+
launder_loop = 0;
maxlaunder = 0;
cleaned_pages = 0;
@@ -538,6 +606,12 @@
}
/*
+ * If we have enough free pages, stop doing (expensive) IO.
+ */
+ if (cleaned_pages > target && !free_shortage())
+ break;
+
+ /*
* The page is locked. IO in progress?
* Move it to the back of the list.
*/
@@ -846,10 +920,9 @@
* really care about latency. In that case we don't try
* to free too many pages.
*/
-static int refill_inactive(unsigned int gfp_mask, int user)
+static int refill_inactive(int gfp_mask, int user)
{
int priority, count, start_count, made_progress;
- unsigned long idle_time;
count = inactive_shortage() + free_shortage();
if (user)
@@ -859,17 +932,6 @@
/* Always trim SLAB caches when memory gets low. */
kmem_cache_reap(gfp_mask);
- /*
- * Calculate the minimum time (in seconds) a process must
- * have slept before we consider it for idle swapping.
- * This must be the number of seconds it takes to go through
- * all of the cache. Doing this idle swapping makes the VM
- * smoother once we start hitting swap.
- */
- idle_time = atomic_read(&page_cache_size);
- idle_time += atomic_read(&buffermem_pages);
- idle_time /= (inactive_target + 1);
-
priority = 6;
do {
made_progress = 0;
@@ -879,8 +941,11 @@
schedule();
}
- while (refill_inactive_scan(priority, 1) ||
- swap_out(priority, gfp_mask, idle_time)) {
+ /*
+ * Reclaim old pages which aren't mapped into any
+ * process.
+ */
+ while (refill_inactive_scan(priority, 1)) {
made_progress = 1;
if (--count <= 0)
goto done;
@@ -895,9 +960,9 @@
shrink_icache_memory(priority, gfp_mask);
/*
- * Then, try to page stuff out..
+ * Steal pages from processes.
*/
- while (swap_out(priority, gfp_mask, 0)) {
+ while (swap_out(priority, gfp_mask)) {
made_progress = 1;
if (--count <= 0)
goto done;
@@ -930,7 +995,7 @@
return (count < start_count);
}
-static int do_try_to_free_pages(unsigned int gfp_mask, int user)
+static int do_try_to_free_pages(int gfp_mask, int user)
{
int ret = 0;
@@ -1105,7 +1170,7 @@
* memory but are unable to sleep on kswapd because
* they might be holding some IO locks ...
*/
-int try_to_free_pages(unsigned int gfp_mask)
+int try_to_free_pages(int gfp_mask)
{
int ret = 1;
--- linux-2.4.0-test13-pre5/include/linux/mm.h.orig Thu Dec 28 19:11:45 2000
+++ linux-2.4.0-test13-pre5/include/linux/mm.h Thu Dec 28 19:32:22 2000
@@ -460,6 +460,7 @@
#else
#define __GFP_HIGHMEM 0x0 /* noop */
#endif
+#define __GFP_RSS_LIMIT 0x20
#define GFP_BUFFER (__GFP_HIGH | __GFP_WAIT)
--- linux-2.4.0-test13-pre5/include/linux/swap.h.orig Thu Dec 28 19:11:48 2000
+++ linux-2.4.0-test13-pre5/include/linux/swap.h Thu Dec 28 19:37:54 2000
@@ -108,7 +108,8 @@
extern int free_shortage(void);
extern int inactive_shortage(void);
extern void wakeup_kswapd(int);
-extern int try_to_free_pages(unsigned int gfp_mask);
+extern int try_to_free_pages(int);
+extern void enforce_rss_limit(struct mm_struct *, int);
/* linux/mm/page_io.c */
extern void rw_swap_page(int, struct page *, int);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2001-01-08 4:51 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.LNX.4.21.0012291138380.1403-100000@duckman.distro.conectiva>
2000-12-30 2:25 ` [PATCH] VM fixes + RSS limits 2.4.0-test13-pre5 Dieter Nützel
2001-01-08 4:51 Peter Chubb
[not found] <Pine.LNX.4.21.0012292007510.11006-100000@d13.com>
2001-01-03 11:43 ` Rik van Riel
2001-01-03 13:12 ` Ingo Oeser
-- strict thread matches above, loose matches on Subject: below --
2000-12-28 22:48 Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox