public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andries Brouwer <Andries.Brouwer@cwi.nl>
To: torvalds@osdl.org, akpm@osdl.org
Cc: linux-kernel@vger.kernel.org
Subject: [RFC] restricted overcommit
Date: Mon, 4 Oct 2004 14:44:08 +0200	[thread overview]
Message-ID: <20041004124407.GA9146@apps.cwi.nl> (raw)

[The below is just for discussion, not to be applied]

Below a trimmed down version of a patch I made a few days ago.

The exec.c part can be viewed as cleanup, except that it adds
an EXTRA_STACK_VM_PAGES, following a suggestion by Alan,
giving a guaranteed lower bound on the amount of stack available.

(Without any reasoning or experimenting I wrote 20, and have
not seen a segfault since.)

The security/commoncap.c part makes sure that something is left
for root, even when some user takes all he can, and also, that
something is left for the user, even after one of his buggy programs
ran away and took everything. (Of course he should have used
RLIMIT_AS, but nobody does.)

Given these two patches, one can do #echo 2 > /proc/sys/vm/overcommit_memory
and work as usual and not see an OOM anymore, provided there is enough
swap.

(What I wrote a few days ago was the slightly larger patch where
upon swapon/swapoff the no overcommit mode was enabled/disabled
when the amount of swap was at least twice the amount of physical
memory. Again, this factor 2 is mostly unmotivated.)

So, this seems to work with a sample of size 1 from the set
of Linux users. Maybe others can try this, or comment.

Andries


diff -uprN -X /linux/dontdiff a/fs/exec.c b/fs/exec.c
--- a/fs/exec.c	2004-10-01 22:46:33.000000000 +0200
+++ b/fs/exec.c	2004-10-04 14:14:33.000000000 +0200
@@ -336,6 +336,8 @@ out_sig:
 	force_sig(SIGKILL, current);
 }
 
+#define EXTRA_STACK_VM_PAGES	20	/* random */
+
 int setup_arg_pages(struct linux_binprm *bprm, int executable_stack)
 {
 	unsigned long stack_base;
@@ -373,15 +375,15 @@ int setup_arg_pages(struct linux_binprm 
 	memmove(to, to + offset, PAGE_SIZE - offset);
 	kunmap(bprm->page[j - 1]);
 
-	/* Adjust bprm->p to point to the end of the strings. */
-	bprm->p = PAGE_SIZE * i - offset;
-
 	/* Limit stack size to 1GB */
 	stack_base = current->rlim[RLIMIT_STACK].rlim_max;
 	if (stack_base > (1 << 30))
 		stack_base = 1 << 30;
 	stack_base = PAGE_ALIGN(STACK_TOP - stack_base);
 
+	/* Adjust bprm->p to point to the end of the strings. */
+	bprm->p = stack_base + PAGE_SIZE * i - offset;
+
 	mm->arg_start = stack_base;
 	arg_size = i << PAGE_SHIFT;
 
@@ -390,11 +392,13 @@ int setup_arg_pages(struct linux_binprm 
 		bprm->page[i++] = NULL;
 #else
 	stack_base = STACK_TOP - MAX_ARG_PAGES * PAGE_SIZE;
-	mm->arg_start = bprm->p + stack_base;
+	bprm->p += stack_base;
+	mm->arg_start = bprm->p;
 	arg_size = STACK_TOP - (PAGE_MASK & (unsigned long) mm->arg_start);
 #endif
 
-	bprm->p += stack_base;
+	arg_size += EXTRA_STACK_VM_PAGES * PAGE_SIZE;
+
 	if (bprm->loader)
 		bprm->loader += stack_base;
 	bprm->exec += stack_base;
@@ -415,11 +419,10 @@ int setup_arg_pages(struct linux_binprm 
 		mpnt->vm_mm = mm;
 #ifdef CONFIG_STACK_GROWSUP
 		mpnt->vm_start = stack_base;
-		mpnt->vm_end = PAGE_MASK &
-			(PAGE_SIZE - 1 + (unsigned long) bprm->p);
+		mpnt->vm_end = stack_base + arg_size;
 #else
-		mpnt->vm_start = PAGE_MASK & (unsigned long) bprm->p;
 		mpnt->vm_end = STACK_TOP;
+		mpnt->vm_start = mpnt->vm_end - arg_size;
 #endif
 		/* Adjust stack execute permissions; explicitly enable
 		 * for EXSTACK_ENABLE_X, disable for EXSTACK_DISABLE_X
diff -uprN -X /linux/dontdiff a/security/commoncap.c b/security/commoncap.c
--- a/security/commoncap.c	2004-10-01 22:46:41.000000000 +0200
+++ b/security/commoncap.c	2004-10-04 14:14:33.000000000 +0200
@@ -364,6 +364,14 @@ int cap_vm_enough_memory(long pages)
 		allowed -= allowed / 32;
 	allowed += total_swap_pages;
 
+	/* Leave the last 3% for root */
+	if (current->euid)
+		allowed -= allowed / 32;
+
+	/* Don't let a single process grow too big:
+	   leave 3% of the size of this process for other processes */
+	allowed -= current->mm->total_vm / 32;
+
 	if (atomic_read(&vm_committed_space) < allowed)
 		return 0;
 
diff -uprN -X /linux/dontdiff a/security/dummy.c b/security/dummy.c
--- a/security/dummy.c	2004-10-01 22:46:41.000000000 +0200
+++ b/security/dummy.c	2004-10-04 14:16:06.000000000 +0200
@@ -153,6 +153,14 @@ static int dummy_vm_enough_memory(long p
 		* sysctl_overcommit_ratio / 100;
 	allowed += total_swap_pages;
 
+	/* Leave the last 3% for root */	
+	if (current->euid)
+		allowed -= allowed / 32;
+
+	/* Don't let a single process grow too big:
+	   leave 3% of the size of this process for other processes */
+	allowed -= current->mm->total_vm / 32;
+
 	if (atomic_read(&vm_committed_space) < allowed)
 		return 0;
 

                 reply	other threads:[~2004-10-04 12:44 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20041004124407.GA9146@apps.cwi.nl \
    --to=andries.brouwer@cwi.nl \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox