system hang with "__alloc_page: 1-order allocation failed"

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* system hang with "__alloc_page: 1-order allocation failed"
@ 2001-03-13  1:40 David Shoon
  2001-03-13 10:50 ` Mike Galbraith
  0 siblings, 1 reply; 6+ messages in thread
From: David Shoon @ 2001-03-13  1:40 UTC (permalink / raw)
  To: linux-kernel, alan; +Cc: dave

Hi,

After some testing, 2.4.2, 2.4.2-pre3, and 2.4.3-ac18 and ac19 both
crash/hang when a fork loop (bomb) is executed (under a normal user) and
killed (by a superuser). This isn't what you'd expect in previous
kernels (2.2.x, and 2.0.x), as they both return to normal after killing
the process.

(This might be related to an earlier post about memory allocation?)

Anyway, a 'forkbomb' just looks like this (sorry, just clarifying the
obvious):

int main() {
    while (1)
        fork();
}

With 2.4.2, 2.4.2-pre3 after killing the process (ctrl-c or killall -9
prog) the kernel dumps error messages of: "__alloc_page: 1-order
allocation failed" continuously for a few minutes and then starts to
(randomly?) kill other processes which are active (such as xfs, bash)
with "Out of Memory: Killed process ### (etc.)". Keyboard input doesn't
work, but you can still switch vconsoles.

Under 2.4.2-ac18/19, the system doesn't show the error messages, but it
still hangs after you kill the process. All keyboard input freezes
eventually (can't switch vconsoles).

I'm not sure if it helps, but the system I'm testing this on is a PIII
500mhz, with 196megs of ram, with swap disabled just so I know it's not
device read/writes.

If anyone needs more info, give me a holler..

[ please cc: replies back to me since i'm not on the linux kernel list ]

p.s. apologies if this is already known or fixed
--
David Shoon
dave@zylotech.com.au

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with "__alloc_page: 1-order allocation failed"
  2001-03-13  1:40 system hang with "__alloc_page: 1-order allocation failed" David Shoon
@ 2001-03-13 10:50 ` Mike Galbraith
  2001-03-13 13:03   ` Rik van Riel
  0 siblings, 1 reply; 6+ messages in thread
From: Mike Galbraith @ 2001-03-13 10:50 UTC (permalink / raw)
  To: David Shoon; +Cc: linux-kernel, alan, Rik van Riel, MOLNAR Ingo

On Tue, 13 Mar 2001, David Shoon wrote:

> Hi,

Greetings,

> After some testing, 2.4.2, 2.4.2-pre3, and 2.4.3-ac18 and ac19 both
> crash/hang when a fork loop (bomb) is executed (under a normal user) and
> killed (by a superuser). This isn't what you'd expect in previous
> kernels (2.2.x, and 2.0.x), as they both return to normal after killing
> the process.
>
> (This might be related to an earlier post about memory allocation?)
>
> Anyway, a 'forkbomb' just looks like this (sorry, just clarifying the
> obvious):
>
> int main() {
>     while (1)
>         fork();
> }

Allowing users to run in an unlimited environment is bad of course, and
setting a process limit will cure that.  However...

> With 2.4.2, 2.4.2-pre3 after killing the process (ctrl-c or killall -9
> prog) the kernel dumps error messages of: "__alloc_page: 1-order
> allocation failed" continuously for a few minutes and then starts to
> (randomly?) kill other processes which are active (such as xfs, bash)
> with "Out of Memory: Killed process ### (etc.)". Keyboard input doesn't
> work, but you can still switch vconsoles.

(The oom killer doesn't activate here.. bummer)

First problem is that since high order allocations (>0) can fail for
root just as for any user.  It can (and does currently) happen that
root can't fork a task in order to kill the forkbombs.  Here, I even
modified __alloc_pages() to never allow root allocations to fail..
that alone isn't enough, because it can (does) happen that even though
I never give up, the little memory which is free is so fragmented that
I cannot allocate a task struct (order 1).  Everything swappable has
been swapped and the freed memory consumed by the greedy little user
processes.. so it never gets better and the kernel just pages forever.

(A workaround is to lower max_threads to 25% of memory.. works, but is
really cheezy.  OTOH, allowing half of memory to be allocated in task
structs is a bit cheezy looking too.  That means that these tasks can't
be big enough to be doing real work.. no?)

Second problem is that even SysRq-E doesn't always kill reliably when
you're very low on memory, so your fork bomb may take off all over again..
and it does exactly that.

Third problem _appears_ (heavy emphasis required) to be the scheduler.
Even with the allocator never giving up on root allocations and reducing
max_threads, it happens here that root's killall -9 forkbomb never RUNS
despite order 1 being available.. unless root's shell is set SCHED_RR.
[maybe I wasn't patient enough, but minutes seems long enough to wait]

None of these problems seem to be a 'big hairy deal' with real workloads..
but they are smudges on the otherwise perfect ;-) kernel.

> Under 2.4.2-ac18/19, the system doesn't show the error messages, but it
> still hangs after you kill the process. All keyboard input freezes
> eventually (can't switch vconsoles).
>
> I'm not sure if it helps, but the system I'm testing this on is a PIII
> 500mhz, with 196megs of ram, with swap disabled just so I know it's not
> device read/writes.

Running oom without swap is guaranteed doom.

	-Mike

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with "__alloc_page: 1-order allocation failed"
  2001-03-13 10:50 ` Mike Galbraith
@ 2001-03-13 13:03   ` Rik van Riel
  0 siblings, 0 replies; 6+ messages in thread
From: Rik van Riel @ 2001-03-13 13:03 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: David Shoon, linux-kernel, alan, MOLNAR Ingo

On Tue, 13 Mar 2001, Mike Galbraith wrote:

> (A workaround is to lower max_threads to 25% of memory.. works, but is
> really cheezy.  OTOH, allowing half of memory to be allocated in task
> structs is a bit cheezy looking too.  That means that these tasks
> can't be big enough to be doing real work.. no?)

If half of memory is allocated for task structures, we won't
even be able to allocate the minimum number of page table
pages needed for each task ...

For a "normal" task we'll need at least 1 page directory and
3 page table pages. We only have space for half when the
maximum number of task_struct's is allocated.

Maybe it would be good to lower the default threads-max to
about 10% or less of physical memory ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with "__alloc_page: 1-order allocation failed"
@ 2001-03-13 18:39 Manfred Spraul
  2001-03-13 21:28 ` Chris Evans
  0 siblings, 1 reply; 6+ messages in thread
From: Manfred Spraul @ 2001-03-13 18:39 UTC (permalink / raw)
  To: riel; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 808 bytes --]

> 
> Maybe it would be good to lower the default threads-max to 
> about 10% or less of physical memory ? 
>

And MIN_THREADS_FOR_ROOT should be reintroduced: the define is still
there, but the actual code is missing.

I've attached an older patch that:
* reintroduces MIN_THREADS_FOR_ROOT (or remove the define)
* adds lower and upper bounds for /proc/sys/kernel/threads-max.
* bugfixes for get_pid(). This is the longest part of the patch, but
it's only necessary if you have more than 10.000 threads running. If you
have enough memory: launch a forkbomb. If ~ 32760 thread are running the
kernel enters an endless loop in get_pid() (or around 11000 threads if
they intentionally create additional sessions and process groups)

I tested the patch with 2.4.0-test12, but not with newer kernels.

--
	Manfred

[-- Attachment #2: patch-pid --]
[-- Type: text/plain, Size: 4951 bytes --]

// $Header$
// Kernel Version:
//  VERSION = 2
//  PATCHLEVEL = 4
//  SUBLEVEL = 0
//  EXTRAVERSION = -test12
--- 2.4/kernel/sysctl.c	Sun Dec 17 18:04:04 2000
+++ build-2.4/kernel/sysctl.c	Sat Dec 23 17:29:52 2000
@@ -45,6 +45,7 @@
 extern int bdf_prm[], bdflush_min[], bdflush_max[];
 extern int sysctl_overcommit_memory;
 extern int max_threads;
+extern int threads_high, threads_low;
 extern int nr_queued_signals, max_queued_signals;
 extern int sysrq_enabled;
 
@@ -222,7 +223,8 @@
 	 0644, NULL, &proc_dointvec},
 #endif	 
 	{KERN_MAX_THREADS, "threads-max", &max_threads, sizeof(int),
-	 0644, NULL, &proc_dointvec},
+	 0644, NULL, &proc_dointvec_minmax, &sysctl_intvec, NULL,
+	&threads_low, &threads_high},
 	{KERN_RANDOM, "random", NULL, 0, 0555, random_table},
 	{KERN_OVERFLOWUID, "overflowuid", &overflowuid, sizeof(int), 0644, NULL,
 	 &proc_dointvec_minmax, &sysctl_intvec, NULL,
--- 2.4/kernel/fork.c	Sun Dec 17 18:04:04 2000
+++ build-2.4/kernel/fork.c	Sat Dec 23 16:59:47 2000
@@ -63,6 +63,11 @@
 	wq_write_unlock_irqrestore(&q->lock, flags);
 }
 
+#define THREADS_HIGH	32000
+#define THREADS_LOW	16
+int threads_high = THREADS_HIGH;
+int threads_low = THREADS_LOW;
+
 void __init fork_init(unsigned long mempages)
 {
 	/*
@@ -71,53 +76,79 @@
 	 * of memory.
 	 */
 	max_threads = mempages / (THREAD_SIZE/PAGE_SIZE) / 2;
+	if(max_threads > threads_high)
+		max_threads = threads_high;
 
 	init_task.rlim[RLIMIT_NPROC].rlim_cur = max_threads/2;
 	init_task.rlim[RLIMIT_NPROC].rlim_max = max_threads/2;
 }
 
-/* Protects next_safe and last_pid. */
-spinlock_t lastpid_lock = SPIN_LOCK_UNLOCKED;
+/*
+ * Reserve a few pid values for root, otherwise
+ * the reserved threads might not help him ;-)
+ */
+#define PIDS_FOR_ROOT	60
 
-static int get_pid(unsigned long flags)
+static int search_pid(int start, int* plimit)
 {
-	static int next_safe = PID_MAX;
+	int next_safe = *plimit;
 	struct task_struct *p;
+	int loop = 0;
 
-	if (flags & CLONE_PID)
-		return current->pid;
-
-	spin_lock(&lastpid_lock);
-	if((++last_pid) & 0xffff8000) {
-		last_pid = 300;		/* Skip daemons etc. */
-		goto inside;
-	}
-	if(last_pid >= next_safe) {
-inside:
-		next_safe = PID_MAX;
-		read_lock(&tasklist_lock);
-	repeat:
-		for_each_task(p) {
-			if(p->pid == last_pid	||
-			   p->pgrp == last_pid	||
-			   p->session == last_pid) {
-				if(++last_pid >= next_safe) {
-					if(last_pid & 0xffff8000)
-						last_pid = 300;
-					next_safe = PID_MAX;
+	if(start >= *plimit || start < 300) {
+		loop = 1;
+		start=300;
+	}
+repeat:
+	read_lock(&tasklist_lock);
+	for_each_task(p) {
+		if(p->pid == start	||
+		   p->pgrp == start	||
+		   p->session == start) {
+			if(++start >= next_safe) {
+				read_unlock(&tasklist_lock);
+				if(start >= *plimit) {
+					if(loop) {
+						next_safe=-1;
+						start=-1;
+						break;
+					}
+					loop=1;
+					start = 300;
 				}
+				next_safe = *plimit;
 				goto repeat;
 			}
-			if(p->pid > last_pid && next_safe > p->pid)
-				next_safe = p->pid;
-			if(p->pgrp > last_pid && next_safe > p->pgrp)
-				next_safe = p->pgrp;
-			if(p->session > last_pid && next_safe > p->session)
-				next_safe = p->session;
 		}
-		read_unlock(&tasklist_lock);
+		if(p->pid > start && next_safe > p->pid)
+			next_safe = p->pid;
+		if(p->pgrp > start && next_safe > p->pgrp)
+			next_safe = p->pgrp;
+		if(p->session > start && next_safe > p->session)
+			next_safe = p->session;
+	}
+	read_unlock(&tasklist_lock);
+	*plimit=next_safe; 
+	return start;
+}
+
+static int get_pid(unsigned long flags, int is_root)
+{
+	static int next_safe = PID_MAX-PIDS_FOR_ROOT;
+
+	if (flags & CLONE_PID)
+		return current->pid;
+
+	if(++last_pid < next_safe)
+		return last_pid;
+
+	next_safe = PID_MAX-PIDS_FOR_ROOT;
+	last_pid = search_pid(last_pid, &next_safe);
+
+	if(last_pid==-1 && is_root) {
+		int dummy = PID_MAX;
+		return search_pid(PID_MAX-PIDS_FOR_ROOT, &dummy);
 	}
-	spin_unlock(&lastpid_lock);
 
 	return last_pid;
 }
@@ -573,8 +604,13 @@
 	 * the kernel lock so nr_threads can't
 	 * increase under us (but it may decrease).
 	 */
-	if (nr_threads >= max_threads)
-		goto bad_fork_cleanup_count;
+	{
+		int limit = max_threads;
+		if(current->uid)
+			limit -= MIN_THREADS_LEFT_FOR_ROOT;
+		if (nr_threads >= limit)
+			goto bad_fork_cleanup_count;
+	}
 	
 	get_exec_domain(p->exec_domain);
 
@@ -586,7 +622,6 @@
 	p->state = TASK_UNINTERRUPTIBLE;
 
 	copy_flags(clone_flags, p);
-	p->pid = get_pid(clone_flags);
 
 	p->run_list.next = NULL;
 	p->run_list.prev = NULL;
@@ -662,6 +697,16 @@
 	current->counter >>= 1;
 	if (!current->counter)
 		current->need_resched = 1;
+	
+	/* get the pid value last, we must atomically add the new
+	 * thread to the task lists.
+	 * Atomicity is guaranteed by lock_kernel().
+	 */
+	p->pid = get_pid(clone_flags, !current->uid);
+	if(p->pid==-1) {
+		/* FIXME: cleanup for copy_thread? */
+		goto bad_fork_cleanup_sighand;
+	}
 
 	/*
 	 * Ok, add it to the run-queues and make it

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with "__alloc_page: 1-order allocation failed"
  2001-03-13 18:39 Manfred Spraul
@ 2001-03-13 21:28 ` Chris Evans
  2001-03-13 22:45   ` Manfred Spraul
  0 siblings, 1 reply; 6+ messages in thread
From: Chris Evans @ 2001-03-13 21:28 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: linux-kernel


On Tue, 13 Mar 2001, Manfred Spraul wrote:

> * bugfixes for get_pid(). This is the longest part of the patch, but
> it's only necessary if you have more than 10.000 threads running. If you
> have enough memory: launch a forkbomb. If ~ 32760 thread are running the
> kernel enters an endless loop in get_pid() (or around 11000 threads if
> they intentionally create additional sessions and process groups)

I thought (on Intel) there was a 4092 hard limit?

Chers
Chris


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: system hang with "__alloc_page: 1-order allocation failed"
  2001-03-13 21:28 ` Chris Evans
@ 2001-03-13 22:45   ` Manfred Spraul
  0 siblings, 0 replies; 6+ messages in thread
From: Manfred Spraul @ 2001-03-13 22:45 UTC (permalink / raw)
  To: Chris Evans; +Cc: linux-kernel

From: "Chris Evans" <chris@scary.beasts.org>
>
> I thought (on Intel) there was a 4092 hard limit?
>
That's the 2.2 limit, it's gone.

The new limit is total memory and pid space. The pid's are intentionally
limited to 15 bits, the remaining bits are reserved.

In the worst case one running process can block 3 pid values (one for
the session, one for the process group, one for process id), thus
~11.000 running processes can exhaust the pid space.

--
    Manfred

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-03-13 22:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-03-13  1:40 system hang with "__alloc_page: 1-order allocation failed" David Shoon
2001-03-13 10:50 ` Mike Galbraith
2001-03-13 13:03   ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2001-03-13 18:39 Manfred Spraul
2001-03-13 21:28 ` Chris Evans
2001-03-13 22:45   ` Manfred Spraul

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox