[PATCH] OOM handling - Martin Dalecki

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Martin Dalecki <dalecki@evision-ventures.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	"James A. Sutherland" <jas88@cam.ac.uk>,
	Guest section DW <dwguest@win.tue.nl>,
	Rik van Riel <riel@conectiva.com.br>,
	"Patrick O'Rourke" <orourke@missioncriticallinux.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH] OOM handling
Date: Sun, 25 Mar 2001 15:54:46 +0200	[thread overview]
Message-ID: <3ABDF8A6.7580BD7D@evision-ventures.com> (raw)
In-Reply-To: <E14gVQf-00056B-00@the-village.bc.nu> <3ABB9CF2.E7715667@evision-ventures.com>

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

Martin Dalecki wrote:
> 
> I have a constructive proposal:
> 
> It would make much sense to make the oom killer
> leave not just root processes alone but processes belonging to a UID
> lower
> then a certain value as well (500). This would be:
> 
> 1. Easly managable by the admin. Just let oracle/www and analogous users
>    have a UID lower then let's say 500.
> 
> 2. In full compliance with the port trick done by TCP/IP (ports < 1024
> vers other)
> 
> 3. It wouldn't need any addition of new interface (no jebanoje gawno in
> /proc in addition()
> 
> 4. Really simple to implement/document understand.
> 
> 5. Be the same way as Solaris does similiar things.
> 
> ...
> 
> Damn: I will let my chess club alone toady and will just code it down
> NOW.
> 
> Spec:
> 
> 1. Processes with a UID < 100 are immune to OOM killers.
> 2. Processes with a UID >= 100 && < 500 are hard for the OOM killer to
> take on.
> 3. Processes with a UID >= 500 are easy targets.
> 
> Let me introduce a new terminology in full analogy to "fire walls"
> routers and therabouts:
> 
> Processes of category 1. are called captains (oficerzy)
> Processes of category 2. are called corporals (porucznicy)
> Processes of category 2. are called privates (¿o³nierze)

OK I just did it. as I already told I have "stress tested it" by 
installing the Orcale insternet application server suide
on a hoplessly underequipped box ("only" 128MByte RMA).
The assorted patch is attached. 

Since I'm one day late up to my promise to provide this
patch it's actually fascinating that already 4 people (in esp. not
newbees requesting a new /proc entry for everything)
for reassurance that I will indeed implement it... Well 
this kind of "high" and "eager" feadback seems for me to indicate 
that there is very serious desire for it. And then of course I
just have to ask our people working with DB's here at work as well :-).

Ah... and of course I think this patch can already go directly 
into the official kernel. The quality of code should permit 
it. I would esp. request Rik van Riel to have a closer look
at it...

[-- Attachment #2: oom.diff --]
[-- Type: text/plain, Size: 11110 bytes --]

diff -urN linux/mm/oom_kill.c linux-new/mm/oom_kill.c
--- linux/mm/oom_kill.c	Tue Nov 14 19:56:46 2000
+++ linux-new/mm/oom_kill.c	Sun Mar 25 17:17:34 2001
@@ -1,18 +1,64 @@
 /*
  *  linux/mm/oom_kill.c
- * 
+ *
  *  Copyright (C)  1998,2000  Rik van Riel
  *	Thanks go out to Claus Fischer for some serious inspiration and
  *	for goading me into coding this file...
  *
- *  The routines in this file are used to kill a process when
- *  we're seriously out of memory. This gets called from kswapd()
- *  in linux/mm/vmscan.c when we really run out of memory.
- *
- *  Since we won't call these routines often (on a well-configured
- *  machine) this file will double as a 'coding guide' and a signpost
- *  for newbie kernel hackers. It features several pointers to major
- *  kernel subsystems and hints as to where to find out what things do.
+ *  Sat Mar 24 22:07:15 CET 2001 Marcin Dalecki <dalecki@evision-ventures.com>:
+ *
+ *	Replaced the original algorith with something reasonably, predictable
+ *	and managable. I will call this "Stalins Eviction".
+ */
+
+/*
+ *  The routines in this file are used to kill a process when the system is
+ *  entierly out of memmory (both: RAM and swap).  This gets called from
+ *  kswapd() in linux/mm/vmscan.c when we are in total starvation due to the
+ *  fact, that the only thing the system is busy at, is to try to allocate some
+ *  physical memmory page, where there are no pages anymore left. In such it
+ *  does make perfect sense to kill some offending process, just to make the
+ *  system go on and survive.
+ *
+ *  IT IS A LAST RESORT!
+ *
+ *  ALLERT: In contrast to popular beleve the invention of the mechanism
+ *  presented here IS IMPORTANT for system security reasons. It is preventing
+ *  one border corner of an easy DNS attack in case the sysadmin didn't take
+ *  other measures, which he either overworked or incompetent as he is usually
+ *  doesn't.
+ *
+ *  Basically the eviction goes on as follows:
+ *
+ *  1. Normal interactive user processes are the first candidates for a shoot.
+ *  We consider all users with a UID >= 500 as normal interactive users.
+ *
+ *  2. If there are no processes started by a normal interactive user, we aim
+ *  at the processes from nonessential processes (for the "live" of the system
+ *  as a whole).  We consider users with a UID >= 100 and < 500 as essential
+ *  service user.
+ *
+ *  3. If this still isn't the case we start to shut down the system components
+ *  peace by peace... (UID < 100).
+ *
+ *  In fact the heuristics used to determine, at which of the process classes
+ *  to aim first, are a bit more sophisticated, If you wan't those details
+ *  please read the code below. It does (hopefully so) speak for itself.
+ *
+ *  As an example: If you are running a big Linux box, which is mainly deployed
+ *  as an oracle server, but where normal interactive human users can log on as
+ *  well, then you should run oracle server with a UID < 500 and >= 100. Then
+ *  dumb ass loosers starting 100 netscape and 500 emacs sessions, won't be
+ *  able anylonger to kill the essential oracle service.
+ *
+ *  The introduction of this additional UID semantics shouldn't affect any
+ *  present systems. (Read: It won't make anything worser in comparision to
+ *  previous versions of the Linux kernel.) However every single distributor of
+ *  "enterprise grade" applications for Linux SHOULD take a note on this.
+ *
+ *  regards:
+ *
+ *		Marcin Dalecki
  */
 
 #include <linux/mm.h>
@@ -23,125 +69,141 @@
 
 /* #define DEBUG */
 
-/**
- * int_sqrt - oom_kill.c internal function, rough approximation to sqrt
- * @x: integer of which to calculate the sqrt
- * 
- * A very rough approximation to the sqrt() function.
- */
-static unsigned int int_sqrt(unsigned int x)
-{
-	unsigned int out = x;
-	while (x & ~(unsigned int)1) x >>=2, out >>=1;
-	if (x) out -= out >> 2;
-	return (out ? out : 1);
-}	
-
-/**
- * oom_badness - calculate a numeric value for how bad this task has been
- * @p: task struct of which task we should calculate
- *
- * The formula used is relatively simple and documented inline in the
- * function. The main rationale is that we want to select a good task
- * to kill when we run out of memory.
- *
- * Good in this context means that:
- * 1) we lose the minimum amount of work done
- * 2) we recover a large amount of memory
- * 3) we don't kill anything innocent of eating tons of memory
- * 4) we want to kill the minimum amount of processes (one)
- * 5) we try to kill the process the user expects us to kill, this
- *    algorithm has been meticulously tuned to meet the priniciple
- *    of least surprise ... (be careful when you change it)
- */
+#define CPU_FACTOR 32
+#define AGE_FACTOR 256
 
-static int badness(struct task_struct *p)
+enum uid_class {
+	normal,
+	service,
+	system,
+	immune
+};
+
+static int determine_uid_class(struct task_struct *p)
 {
-	int points, cpu_time, run_time;
+	int uid;
+	int uid_class = system;
 
-	if (!p->mm)
-		return 0;
-	/*
-	 * The memory size of the process is the basis for the badness.
+	/* This makes processes started by for example suexec be better killing
+	 * candidates then root's processes themself.
 	 */
-	points = p->mm->total_vm;
+	uid = p->uid;
+	if (p->euid > p->uid)
+		uid = p->euid;
 
-	/*
-	 * CPU time is in seconds and run time is in minutes. There is no
-	 * particular reason for this other than that it turned out to work
-	 * very well in practice. This is not safe against jiffie wraps
-	 * but we don't care _that_ much...
+	/* This is implementing the intendid semantics of different user id
+	 * value ranges.
 	 */
-	cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3);
-	run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+	if (uid < 100)
+		uid_class = system;
+	else if (uid < 500)
+		uid_class = service;
+	else
+		uid_class = normal;
 
-	points /= int_sqrt(cpu_time);
-	points /= int_sqrt(int_sqrt(run_time));
-
-	/*
-	 * Niced processes are most likely less important, so double
-	 * their badness points.
-	 */
-	if (p->nice > 0)
-		points *= 2;
 
-	/*
-	 * Superuser processes are usually more important, so we make it
+	/* Superuser processes are usually more important, so we make it
 	 * less likely that we kill those.
 	 */
-	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
-				p->uid == 0 || p->euid == 0)
-		points /= 4;
+	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN))
+		uid_class = system;
 
-	/*
-	 * We don't want to kill a process with direct hardware access.
+	/* We don't want to kill a process with direct hardware access.
 	 * Not only could that mess up the hardware, but usually users
 	 * tend to only have this flag set on applications they think
 	 * of as important.
 	 */
 	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
-		points /= 4;
-#ifdef DEBUG
-	printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n",
-	p->pid, p->comm, points);
-#endif
-	return points;
+		uid_class = system;
+
+	return uid_class;
+}
+
+static int calculate_penalty(struct task_struct *p)
+{
+	int cpu_penalty = 0;
+	int age_penalty = 0;
+
+
+	/* Now we calculate the penalty due to the cpu usage.  NOTE: This is
+	 * not safe against jiffie wraps.
+	 */
+	{
+		int run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+
+		if (run_time > 0) {
+			cpu_penalty = (CPU_FACTOR * run_time) /
+				((p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3) + run_time);
+		} else
+			cpu_penalty = CPU_FACTOR;
+	}
+
+	/* Let's make older processes more important then newer ones.
+	 * This is not safe against jiffie wraps, delibrately so.
+	 */
+	if (p->start_time > 0)
+		age_penalty = AGE_FACTOR * p->start_time / jiffies;
+	else
+		age_penalty = 0;
+
+	/* OK this should be sufficient, we don't want to make things more
+	 * complicated then needed. In esp. since there is no easy and portable
+	 * way to determine the total amount of memmory pages present, we don't
+	 * take this into account here.
+	 *
+	 * Let us worry about more detailed heuristics here, only if there will
+	 * be still many people reporting serious problems on linux-kernel.
+	 */
+
+	return cpu_penalty + age_penalty;
 }
 
 /*
- * Simple selection loop. We chose the process with the highest
- * number of 'points'. We need the locks to make sure that the
- * list of task structs doesn't change while we look the other way.
- *
- * (not docbooked, we don't want this one cluttering up the manual)
+ * Simple selection loop. We chose the process with the highest penalty.
  */
-static struct task_struct * select_bad_process(void)
+static struct task_struct * select_process(void)
 {
-	int maxpoints = 0;
-	struct task_struct *p = NULL;
-	struct task_struct *chosen = NULL;
-
-	read_lock(&tasklist_lock);
-	for_each_task(p) {
-		if (p->pid) {
-			int points = badness(p);
-			if (points > maxpoints) {
-				chosen = p;
-				maxpoints = points;
+	enum uid_class i;
+	struct task_struct *choice = NULL;
+
+	for (i = normal; i != immune; ++i) {
+		int maxpenalty = 0;
+		struct task_struct *p = NULL;
+
+		/* The locks make sure that the list of task structs doesn't
+		 * change while we look at it.
+		 */
+
+		read_lock(&tasklist_lock);
+		for_each_task(p) {
+			if (!p->mm)
+				continue;
+
+			if (i != determine_uid_class(p))
+				continue;
+
+			if (p->pid) {
+				int penalty = calculate_penalty(p);
+
+				if (penalty > maxpenalty) {
+					choice = p;
+					maxpenalty = penalty;
+				}
 			}
 		}
+		read_unlock(&tasklist_lock);
+
+		if (choice != NULL)
+			break;
 	}
-	read_unlock(&tasklist_lock);
-	return chosen;
+
+	return choice;
 }
 
-/**
- * oom_kill - kill the "best" process when we run out of memory
- *
+/*
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
- * OR try to be smart about which process to kill. Note that we
- * don't have to be perfect here, we just have to be good.
+ * OR try to be smart about which process to kill.
  *
  * We must be careful though to never send SIGKILL a process with
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
@@ -149,14 +211,12 @@
  */
 void oom_kill(void)
 {
+	struct task_struct *p = select_process();
 
-	struct task_struct *p = select_bad_process();
-
-	/* Found nothing?!?! Either we hang forever, or we panic. */
 	if (p == NULL)
 		panic("Out of memory and no killable processes...\n");
 
-	printk(KERN_ERR "Out of Memory: Killed process %d (%s).\n", p->pid, p->comm);
+	printk(KERN_ERR "Out of memory: killed process %d (%s).\n", p->pid, p->comm);
 
 	/*
 	 * We give our sacrificial lamb high priority and access to
@@ -180,14 +240,14 @@
 	 */
 	current->policy |= SCHED_YIELD;
 	schedule();
+
 	return;
 }
 
-/**
- * out_of_memory - is the system out of memory?
+/** out_of_memory - is the system out of memory?
  *
- * Returns 0 if there is still enough memory left,
- * 1 when we are out of memory (otherwise).
+ * Returns 0 if there is still enough memory left, 1 when we are out of memory
+ * (otherwise).
  */
 int out_of_memory(void)
 {

WARNING: multiple messages have this Message-ID (diff)

From: Martin Dalecki <dalecki@evision-ventures.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>,
	"James A. Sutherland" <jas88@cam.ac.uk>,
	Guest section DW <dwguest@win.tue.nl>,
	Rik van Riel <riel@conectiva.com.br>,
	Patrick O'Rourke <orourke@missioncriticallinux.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH] OOM handling
Date: Sun, 25 Mar 2001 15:54:46 +0200	[thread overview]
Message-ID: <3ABDF8A6.7580BD7D@evision-ventures.com> (raw)
In-Reply-To: 3ABB9CF2.E7715667@evision-ventures.com

[-- Attachment #1: Type: text/plain, Size: 2087 bytes --]

Martin Dalecki wrote:
> 
> I have a constructive proposal:
> 
> It would make much sense to make the oom killer
> leave not just root processes alone but processes belonging to a UID
> lower
> then a certain value as well (500). This would be:
> 
> 1. Easly managable by the admin. Just let oracle/www and analogous users
>    have a UID lower then let's say 500.
> 
> 2. In full compliance with the port trick done by TCP/IP (ports < 1024
> vers other)
> 
> 3. It wouldn't need any addition of new interface (no jebanoje gawno in
> /proc in addition()
> 
> 4. Really simple to implement/document understand.
> 
> 5. Be the same way as Solaris does similiar things.
> 
> ...
> 
> Damn: I will let my chess club alone toady and will just code it down
> NOW.
> 
> Spec:
> 
> 1. Processes with a UID < 100 are immune to OOM killers.
> 2. Processes with a UID >= 100 && < 500 are hard for the OOM killer to
> take on.
> 3. Processes with a UID >= 500 are easy targets.
> 
> Let me introduce a new terminology in full analogy to "fire walls"
> routers and therabouts:
> 
> Processes of category 1. are called captains (oficerzy)
> Processes of category 2. are called corporals (porucznicy)
> Processes of category 2. are called privates (?o3nierze)

OK I just did it. as I already told I have "stress tested it" by 
installing the Orcale insternet application server suide
on a hoplessly underequipped box ("only" 128MByte RMA).
The assorted patch is attached. 

Since I'm one day late up to my promise to provide this
patch it's actually fascinating that already 4 people (in esp. not
newbees requesting a new /proc entry for everything)
for reassurance that I will indeed implement it... Well 
this kind of "high" and "eager" feadback seems for me to indicate 
that there is very serious desire for it. And then of course I
just have to ask our people working with DB's here at work as well :-).

Ah... and of course I think this patch can already go directly 
into the official kernel. The quality of code should permit 
it. I would esp. request Rik van Riel to have a closer look
at it...

[-- Attachment #2: oom.diff --]
[-- Type: text/plain, Size: 11110 bytes --]

diff -urN linux/mm/oom_kill.c linux-new/mm/oom_kill.c
--- linux/mm/oom_kill.c	Tue Nov 14 19:56:46 2000
+++ linux-new/mm/oom_kill.c	Sun Mar 25 17:17:34 2001
@@ -1,18 +1,64 @@
 /*
  *  linux/mm/oom_kill.c
- * 
+ *
  *  Copyright (C)  1998,2000  Rik van Riel
  *	Thanks go out to Claus Fischer for some serious inspiration and
  *	for goading me into coding this file...
  *
- *  The routines in this file are used to kill a process when
- *  we're seriously out of memory. This gets called from kswapd()
- *  in linux/mm/vmscan.c when we really run out of memory.
- *
- *  Since we won't call these routines often (on a well-configured
- *  machine) this file will double as a 'coding guide' and a signpost
- *  for newbie kernel hackers. It features several pointers to major
- *  kernel subsystems and hints as to where to find out what things do.
+ *  Sat Mar 24 22:07:15 CET 2001 Marcin Dalecki <dalecki@evision-ventures.com>:
+ *
+ *	Replaced the original algorith with something reasonably, predictable
+ *	and managable. I will call this "Stalins Eviction".
+ */
+
+/*
+ *  The routines in this file are used to kill a process when the system is
+ *  entierly out of memmory (both: RAM and swap).  This gets called from
+ *  kswapd() in linux/mm/vmscan.c when we are in total starvation due to the
+ *  fact, that the only thing the system is busy at, is to try to allocate some
+ *  physical memmory page, where there are no pages anymore left. In such it
+ *  does make perfect sense to kill some offending process, just to make the
+ *  system go on and survive.
+ *
+ *  IT IS A LAST RESORT!
+ *
+ *  ALLERT: In contrast to popular beleve the invention of the mechanism
+ *  presented here IS IMPORTANT for system security reasons. It is preventing
+ *  one border corner of an easy DNS attack in case the sysadmin didn't take
+ *  other measures, which he either overworked or incompetent as he is usually
+ *  doesn't.
+ *
+ *  Basically the eviction goes on as follows:
+ *
+ *  1. Normal interactive user processes are the first candidates for a shoot.
+ *  We consider all users with a UID >= 500 as normal interactive users.
+ *
+ *  2. If there are no processes started by a normal interactive user, we aim
+ *  at the processes from nonessential processes (for the "live" of the system
+ *  as a whole).  We consider users with a UID >= 100 and < 500 as essential
+ *  service user.
+ *
+ *  3. If this still isn't the case we start to shut down the system components
+ *  peace by peace... (UID < 100).
+ *
+ *  In fact the heuristics used to determine, at which of the process classes
+ *  to aim first, are a bit more sophisticated, If you wan't those details
+ *  please read the code below. It does (hopefully so) speak for itself.
+ *
+ *  As an example: If you are running a big Linux box, which is mainly deployed
+ *  as an oracle server, but where normal interactive human users can log on as
+ *  well, then you should run oracle server with a UID < 500 and >= 100. Then
+ *  dumb ass loosers starting 100 netscape and 500 emacs sessions, won't be
+ *  able anylonger to kill the essential oracle service.
+ *
+ *  The introduction of this additional UID semantics shouldn't affect any
+ *  present systems. (Read: It won't make anything worser in comparision to
+ *  previous versions of the Linux kernel.) However every single distributor of
+ *  "enterprise grade" applications for Linux SHOULD take a note on this.
+ *
+ *  regards:
+ *
+ *		Marcin Dalecki
  */
 
 #include <linux/mm.h>
@@ -23,125 +69,141 @@
 
 /* #define DEBUG */
 
-/**
- * int_sqrt - oom_kill.c internal function, rough approximation to sqrt
- * @x: integer of which to calculate the sqrt
- * 
- * A very rough approximation to the sqrt() function.
- */
-static unsigned int int_sqrt(unsigned int x)
-{
-	unsigned int out = x;
-	while (x & ~(unsigned int)1) x >>=2, out >>=1;
-	if (x) out -= out >> 2;
-	return (out ? out : 1);
-}	
-
-/**
- * oom_badness - calculate a numeric value for how bad this task has been
- * @p: task struct of which task we should calculate
- *
- * The formula used is relatively simple and documented inline in the
- * function. The main rationale is that we want to select a good task
- * to kill when we run out of memory.
- *
- * Good in this context means that:
- * 1) we lose the minimum amount of work done
- * 2) we recover a large amount of memory
- * 3) we don't kill anything innocent of eating tons of memory
- * 4) we want to kill the minimum amount of processes (one)
- * 5) we try to kill the process the user expects us to kill, this
- *    algorithm has been meticulously tuned to meet the priniciple
- *    of least surprise ... (be careful when you change it)
- */
+#define CPU_FACTOR 32
+#define AGE_FACTOR 256
 
-static int badness(struct task_struct *p)
+enum uid_class {
+	normal,
+	service,
+	system,
+	immune
+};
+
+static int determine_uid_class(struct task_struct *p)
 {
-	int points, cpu_time, run_time;
+	int uid;
+	int uid_class = system;
 
-	if (!p->mm)
-		return 0;
-	/*
-	 * The memory size of the process is the basis for the badness.
+	/* This makes processes started by for example suexec be better killing
+	 * candidates then root's processes themself.
 	 */
-	points = p->mm->total_vm;
+	uid = p->uid;
+	if (p->euid > p->uid)
+		uid = p->euid;
 
-	/*
-	 * CPU time is in seconds and run time is in minutes. There is no
-	 * particular reason for this other than that it turned out to work
-	 * very well in practice. This is not safe against jiffie wraps
-	 * but we don't care _that_ much...
+	/* This is implementing the intendid semantics of different user id
+	 * value ranges.
 	 */
-	cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3);
-	run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+	if (uid < 100)
+		uid_class = system;
+	else if (uid < 500)
+		uid_class = service;
+	else
+		uid_class = normal;
 
-	points /= int_sqrt(cpu_time);
-	points /= int_sqrt(int_sqrt(run_time));
-
-	/*
-	 * Niced processes are most likely less important, so double
-	 * their badness points.
-	 */
-	if (p->nice > 0)
-		points *= 2;
 
-	/*
-	 * Superuser processes are usually more important, so we make it
+	/* Superuser processes are usually more important, so we make it
 	 * less likely that we kill those.
 	 */
-	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) ||
-				p->uid == 0 || p->euid == 0)
-		points /= 4;
+	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN))
+		uid_class = system;
 
-	/*
-	 * We don't want to kill a process with direct hardware access.
+	/* We don't want to kill a process with direct hardware access.
 	 * Not only could that mess up the hardware, but usually users
 	 * tend to only have this flag set on applications they think
 	 * of as important.
 	 */
 	if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO))
-		points /= 4;
-#ifdef DEBUG
-	printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n",
-	p->pid, p->comm, points);
-#endif
-	return points;
+		uid_class = system;
+
+	return uid_class;
+}
+
+static int calculate_penalty(struct task_struct *p)
+{
+	int cpu_penalty = 0;
+	int age_penalty = 0;
+
+
+	/* Now we calculate the penalty due to the cpu usage.  NOTE: This is
+	 * not safe against jiffie wraps.
+	 */
+	{
+		int run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
+
+		if (run_time > 0) {
+			cpu_penalty = (CPU_FACTOR * run_time) /
+				((p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3) + run_time);
+		} else
+			cpu_penalty = CPU_FACTOR;
+	}
+
+	/* Let's make older processes more important then newer ones.
+	 * This is not safe against jiffie wraps, delibrately so.
+	 */
+	if (p->start_time > 0)
+		age_penalty = AGE_FACTOR * p->start_time / jiffies;
+	else
+		age_penalty = 0;
+
+	/* OK this should be sufficient, we don't want to make things more
+	 * complicated then needed. In esp. since there is no easy and portable
+	 * way to determine the total amount of memmory pages present, we don't
+	 * take this into account here.
+	 *
+	 * Let us worry about more detailed heuristics here, only if there will
+	 * be still many people reporting serious problems on linux-kernel.
+	 */
+
+	return cpu_penalty + age_penalty;
 }
 
 /*
- * Simple selection loop. We chose the process with the highest
- * number of 'points'. We need the locks to make sure that the
- * list of task structs doesn't change while we look the other way.
- *
- * (not docbooked, we don't want this one cluttering up the manual)
+ * Simple selection loop. We chose the process with the highest penalty.
  */
-static struct task_struct * select_bad_process(void)
+static struct task_struct * select_process(void)
 {
-	int maxpoints = 0;
-	struct task_struct *p = NULL;
-	struct task_struct *chosen = NULL;
-
-	read_lock(&tasklist_lock);
-	for_each_task(p) {
-		if (p->pid) {
-			int points = badness(p);
-			if (points > maxpoints) {
-				chosen = p;
-				maxpoints = points;
+	enum uid_class i;
+	struct task_struct *choice = NULL;
+
+	for (i = normal; i != immune; ++i) {
+		int maxpenalty = 0;
+		struct task_struct *p = NULL;
+
+		/* The locks make sure that the list of task structs doesn't
+		 * change while we look at it.
+		 */
+
+		read_lock(&tasklist_lock);
+		for_each_task(p) {
+			if (!p->mm)
+				continue;
+
+			if (i != determine_uid_class(p))
+				continue;
+
+			if (p->pid) {
+				int penalty = calculate_penalty(p);
+
+				if (penalty > maxpenalty) {
+					choice = p;
+					maxpenalty = penalty;
+				}
 			}
 		}
+		read_unlock(&tasklist_lock);
+
+		if (choice != NULL)
+			break;
 	}
-	read_unlock(&tasklist_lock);
-	return chosen;
+
+	return choice;
 }
 
-/**
- * oom_kill - kill the "best" process when we run out of memory
- *
+/*
  * If we run out of memory, we have the choice between either
  * killing a random task (bad), letting the system crash (worse)
- * OR try to be smart about which process to kill. Note that we
- * don't have to be perfect here, we just have to be good.
+ * OR try to be smart about which process to kill.
  *
  * We must be careful though to never send SIGKILL a process with
  * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that
@@ -149,14 +211,12 @@
  */
 void oom_kill(void)
 {
+	struct task_struct *p = select_process();
 
-	struct task_struct *p = select_bad_process();
-
-	/* Found nothing?!?! Either we hang forever, or we panic. */
 	if (p == NULL)
 		panic("Out of memory and no killable processes...\n");
 
-	printk(KERN_ERR "Out of Memory: Killed process %d (%s).\n", p->pid, p->comm);
+	printk(KERN_ERR "Out of memory: killed process %d (%s).\n", p->pid, p->comm);
 
 	/*
 	 * We give our sacrificial lamb high priority and access to
@@ -180,14 +240,14 @@
 	 */
 	current->policy |= SCHED_YIELD;
 	schedule();
+
 	return;
 }
 
-/**
- * out_of_memory - is the system out of memory?
+/** out_of_memory - is the system out of memory?
  *
- * Returns 0 if there is still enough memory left,
- * 1 when we are out of memory (otherwise).
+ * Returns 0 if there is still enough memory left, 1 when we are out of memory
+ * (otherwise).
  */
 int out_of_memory(void)
 {

next prev parent reply	other threads:[~2001-03-25 14:08 UTC|newest]

Thread overview: 198+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-03-21 22:54 [PATCH] Prevent OOM from killing init Patrick O'Rourke
2001-03-21 22:54 ` Patrick O'Rourke
2001-03-21 23:11 ` Eli Carter
2001-03-21 23:11   ` Eli Carter
2001-03-21 23:40   ` Patrick O'Rourke
2001-03-21 23:40     ` Patrick O'Rourke
2001-03-21 23:48 ` Rik van Riel
2001-03-21 23:48   ` Rik van Riel
2001-03-22  8:14   ` Eric W. Biederman
2001-03-22  8:14     ` Eric W. Biederman
2001-03-22  9:24     ` Rik van Riel
2001-03-22  9:24       ` Rik van Riel
2001-03-22 19:29     ` Philipp Rumpf
2001-03-22 19:29       ` Philipp Rumpf
2001-03-22 11:47   ` Guest section DW
2001-03-22 11:47     ` Guest section DW
2001-03-22 15:01     ` Rik van Riel
2001-03-22 15:01       ` Rik van Riel
2001-03-22 19:04       ` Guest section DW
2001-03-22 19:04         ` Guest section DW
2001-03-22 23:10         ` Jordi Polo
2001-03-22 16:41     ` Eric W. Biederman
2001-03-22 16:41       ` Eric W. Biederman
2001-03-22 20:28     ` Stephen Clouse
2001-03-22 20:28       ` Stephen Clouse
2001-03-22 21:01       ` Ingo Oeser
2001-03-22 21:01         ` Ingo Oeser
2001-03-22 21:23       ` Alan Cox
2001-03-22 21:23         ` Alan Cox
2001-03-22 22:00         ` Guest section DW
2001-03-22 22:00           ` Guest section DW
2001-03-22 22:12           ` Ed Tomlinson
2001-03-22 22:12             ` Ed Tomlinson
2001-03-22 22:52           ` Alan Cox
2001-03-22 22:52             ` Alan Cox
2001-03-22 23:27             ` Guest section DW
2001-03-22 23:27               ` Guest section DW
2001-03-22 23:37               ` Rik van Riel
2001-03-22 23:37                 ` Rik van Riel
2001-03-26 19:04                 ` James Antill
2001-03-26 19:04                   ` James Antill
2001-03-26 20:05                   ` Rik van Riel
2001-03-26 20:05                     ` Rik van Riel
2001-03-22 23:40               ` Alan Cox
2001-03-22 23:40                 ` Alan Cox
2001-03-23 20:09                 ` Szabolcs Szakacsits
2001-03-23 20:09                   ` Szabolcs Szakacsits
2001-03-23 22:21                   ` Alan Cox
2001-03-23 22:21                     ` Alan Cox
2001-03-23 22:37                     ` Szabolcs Szakacsits
2001-03-23 22:37                       ` Szabolcs Szakacsits
2001-03-23 19:57           ` Szabolcs Szakacsits
2001-03-23 19:57             ` Szabolcs Szakacsits
2001-03-22 22:10         ` Doug Ledford
2001-03-22 22:10           ` Doug Ledford
2001-03-22 22:53           ` Alan Cox
2001-03-22 22:53             ` Alan Cox
2001-03-22 23:30             ` Doug Ledford
2001-03-22 23:30               ` Doug Ledford
2001-03-22 23:40               ` Alan Cox
2001-03-22 23:40                 ` Alan Cox
2001-03-22 23:43         ` Stephen Clouse
2001-03-22 23:43           ` Stephen Clouse
2001-03-23 19:26         ` Szabolcs Szakacsits
2001-03-23 19:26           ` Szabolcs Szakacsits
2001-03-23 20:41           ` Paul Jakma
2001-03-23 20:41             ` Paul Jakma
2001-03-23 21:58             ` george anzinger
2001-03-23 21:58               ` george anzinger
2001-03-24  5:55               ` Rik van Riel
2001-03-24  5:55                 ` Rik van Riel
2001-03-24  8:04                 ` Mike Galbraith
2001-03-27 14:05                 ` Scott F. Kaplan
2001-03-28  0:00                   ` Rik van Riel
2001-03-30  3:18                     ` Scott F. Kaplan
2001-03-30 23:03                       ` Rik van Riel
2001-03-23 22:18             ` Szabolcs Szakacsits
2001-03-23 22:18               ` Szabolcs Szakacsits
2001-03-24  2:08               ` Paul Jakma
2001-03-24  2:08                 ` Paul Jakma
2001-03-23  1:31       ` Michael Peddemors
2001-03-23  7:04         ` Rik van Riel
2001-03-23  7:04           ` Rik van Riel
2001-03-23 11:28           ` Guest section DW
2001-03-23 11:28             ` Guest section DW
2001-03-23 14:50             ` Eric W. Biederman
2001-03-23 14:50               ` Eric W. Biederman
2001-03-23 15:13               ` General 2.4 impressions (was Re: [PATCH] Prevent OOM from killing init) Jeff Garzik
2001-03-23 15:13                 ` Jeff Garzik
2001-03-23 16:10                 ` Adding just a pinch of icache/dcache pressure Jan Harkes
2001-03-23 16:10                   ` Jan Harkes
2001-03-23 16:17                   ` Andi Kleen
2001-03-23 16:17                     ` Andi Kleen
2001-03-23 16:51                     ` Jan Harkes
2001-03-23 17:21               ` [PATCH] Prevent OOM from killing init Guest section DW
2001-03-23 17:21                 ` Guest section DW
2001-03-23 20:18                 ` Paul Jakma
2001-03-23 20:18                   ` Paul Jakma
2001-03-24 20:19                   ` Jesse Pollard
2001-03-24 20:19                     ` Jesse Pollard
2001-03-23 23:48                 ` Eric W. Biederman
2001-03-23 23:48                   ` Eric W. Biederman
2001-03-23 21:11             ` José Luis Domingo López
2001-03-23  1:31       ` Michael Peddemors
2001-03-27 15:05       ` Anthony de Boer - USEnet
2002-03-23  0:33       ` Martin Dalecki
2002-03-23  0:33         ` Martin Dalecki
2001-03-22 23:53         ` Rik van Riel
2001-03-22 23:53           ` Rik van Riel
2002-03-23  1:21           ` Martin Dalecki
2002-03-23  1:21             ` Martin Dalecki
2001-03-23  0:20         ` Stephen Clouse
2001-03-23  0:20           ` Stephen Clouse
2002-03-23  1:30           ` Martin Dalecki
2002-03-23  1:30             ` Martin Dalecki
2001-03-23  1:37             ` Rik van Riel
2001-03-23  1:37               ` Rik van Riel
2001-03-23 10:48               ` Martin Dalecki
2001-03-23 10:48                 ` Martin Dalecki
2001-03-23 14:56                 ` Rik van Riel
2001-03-23 14:56                   ` Rik van Riel
2001-03-23 16:43                   ` Guest section DW
2001-03-23 16:43                     ` Guest section DW
2001-03-24  5:57                     ` Rik van Riel
2001-03-24  5:57                       ` Rik van Riel
2001-03-25 16:35                       ` Guest section DW
2001-03-25 16:35                         ` Guest section DW
2001-03-23 20:20                   ` Tom Diehl
2001-03-23 23:56                     ` Tim Wright
2001-03-24  0:21                       ` Tom Diehl
2001-03-23 17:26     ` James A. Sutherland
2001-03-23 17:26       ` James A. Sutherland
2001-03-23 17:32       ` Alan Cox
2001-03-23 17:32         ` Alan Cox
2001-03-23 18:58         ` Martin Dalecki
2001-03-23 18:58           ` Martin Dalecki
2001-03-25 13:54           ` Martin Dalecki [this message]
2001-03-25 13:54             ` [PATCH] OOM handling Martin Dalecki
2001-03-25 15:06             ` Rik van Riel
2001-03-25 15:06               ` Rik van Riel
2001-03-25 15:20               ` Martin Dalecki
2001-03-25 15:20                 ` Martin Dalecki
2001-03-25 15:50                 ` Jeff Garzik
2001-03-25 17:08                 ` Rik van Riel
2001-03-25 17:08                   ` Rik van Riel
2001-03-25 15:44             ` Jonathan Morton
2001-03-25 15:44               ` Jonathan Morton
2001-03-25 15:47               ` Martin Dalecki
2001-03-25 15:47                 ` Martin Dalecki
2001-03-25 16:36               ` Jonathan Morton
2001-03-25 16:36                 ` Jonathan Morton
2001-03-26 21:34                 ` Kevin Buhr
2001-03-26 21:34                   ` Kevin Buhr
2001-03-26 22:00                 ` Jonathan Morton
2001-03-26 22:00                   ` Jonathan Morton
2001-03-26  2:13             ` Matthew Chappee
2001-03-26 11:33               ` Ingo Oeser
2001-03-26 11:49                 ` Jasper Spaans
2001-03-26 16:11               ` Michael Peddemors
2001-03-23 19:45         ` [PATCH] Prevent OOM from killing init Jonathan Morton
2001-03-23 19:45           ` Jonathan Morton
2001-03-23 23:26           ` Eric W. Biederman
2001-03-23 23:26             ` Eric W. Biederman
2001-03-25 15:30         ` Martin Dalecki
2001-03-25 15:30           ` Martin Dalecki
2001-03-25 20:47         ` Stephen Satchell
2001-03-25 20:47           ` Stephen Satchell
2001-03-25 21:51         ` [PATCH] non-overcommit memory, improved OOM handling, safety margin (was Re: Prevent OOM from killing init) Jonathan Morton
2001-03-25 21:51           ` Jonathan Morton
2001-03-27 15:23           ` Pavel Machek
2001-03-27 15:23             ` Pavel Machek
2001-03-23 20:16       ` [PATCH] Prevent OOM from killing init Jordi Polo
2001-03-24  0:03       ` Guest section DW
2001-03-24  0:03         ` Guest section DW
2001-03-24  7:52       ` Doug Ledford
2001-03-24  7:52         ` Doug Ledford
2001-03-25  0:32       ` Kurt Garloff
2001-03-25 15:02         ` Sandy Harris
2001-03-25 18:07         ` Guest section DW
2001-03-22 14:53   ` Patrick O'Rourke
2001-03-22 14:53     ` Patrick O'Rourke
2001-03-22 19:24   ` Philipp Rumpf
2001-03-22 19:24     ` Philipp Rumpf
2001-03-22 22:20   ` James A. Sutherland
2001-03-22 22:20     ` James A. Sutherland
2001-03-23 17:31   ` Szabolcs Szakacsits
2001-03-23 17:31     ` Szabolcs Szakacsits
2001-03-24  5:54     ` Rik van Riel
2001-03-24  5:54       ` Rik van Riel
2001-03-24  6:55       ` Juha Saarinen
2001-03-24  6:55         ` Juha Saarinen
2001-03-27  8:31       ` Roger Gammans
  -- strict thread matches above, loose matches on Subject: below --
2001-03-27 15:13 [PATCH] OOM handling Jonathan Morton
2001-03-27 16:03 ` Michel Wilson
2001-03-27 16:30   ` Martin Dalecki
2001-03-27 18:15   ` Rik van Riel
2001-03-27 16:29 ` Martin Dalecki
2001-03-27 17:07 ` Jonathan Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ABDF8A6.7580BD7D@evision-ventures.com \
    --to=dalecki@evision-ventures.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=dwguest@win.tue.nl \
    --cc=jas88@cam.ac.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=orourke@missioncriticallinux.com \
    --cc=riel@conectiva.com.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.