* [PATCH] Prevent OOM from killing init
@ 2001-03-21 22:54 Patrick O'Rourke
2001-03-21 23:11 ` Eli Carter
2001-03-21 23:48 ` Rik van Riel
0 siblings, 2 replies; 106+ messages in thread
From: Patrick O'Rourke @ 2001-03-21 22:54 UTC (permalink / raw)
To: linux-mm, linux-kernel
Since the system will panic if the init process is chosen by
the OOM killer, the following patch prevents select_bad_process()
from picking init.
Pat
--- xxx/linux-2.4.3-pre6/mm/oom_kill.c Tue Nov 14 13:56:46 2000
+++ linux-2.4.3-pre6/mm/oom_kill.c Wed Mar 21 15:25:03 2001
@@ -123,7 +123,7 @@
read_lock(&tasklist_lock);
for_each_task(p) {
- if (p->pid) {
+ if (p->pid && p->pid != 1) {
int points = badness(p);
if (points > maxpoints) {
chosen = p;
--
Patrick O'Rourke
978.606.0236
orourke@missioncriticallinux.com
^ permalink raw reply [flat|nested] 106+ messages in thread* Re: [PATCH] Prevent OOM from killing init 2001-03-21 22:54 [PATCH] Prevent OOM from killing init Patrick O'Rourke @ 2001-03-21 23:11 ` Eli Carter 2001-03-21 23:40 ` Patrick O'Rourke 2001-03-21 23:48 ` Rik van Riel 1 sibling, 1 reply; 106+ messages in thread From: Eli Carter @ 2001-03-21 23:11 UTC (permalink / raw) To: Patrick O'Rourke; +Cc: linux-mm, linux-kernel Patrick O'Rourke wrote: > > Since the system will panic if the init process is chosen by > the OOM killer, the following patch prevents select_bad_process() > from picking init. > > Pat > > --- xxx/linux-2.4.3-pre6/mm/oom_kill.c Tue Nov 14 13:56:46 2000 > +++ linux-2.4.3-pre6/mm/oom_kill.c Wed Mar 21 15:25:03 2001 > @@ -123,7 +123,7 @@ > > read_lock(&tasklist_lock); > for_each_task(p) { > - if (p->pid) { > + if (p->pid && p->pid != 1) { > int points = badness(p); > if (points > maxpoints) { > chosen = p; > Having not looked at the code... Why not "if( p->pid > 1 )"? (Or can p->pid can be negative?!, um, typecast to unsigned...) Eli -----------------------. Rule of Accuracy: When working toward Eli Carter | the solution of a problem, it always eli.carter(at)inet.com `------------------ helps if you know the answer. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 23:11 ` Eli Carter @ 2001-03-21 23:40 ` Patrick O'Rourke 0 siblings, 0 replies; 106+ messages in thread From: Patrick O'Rourke @ 2001-03-21 23:40 UTC (permalink / raw) To: Eli Carter; +Cc: linux-mm, linux-kernel Eli Carter wrote: > Having not looked at the code... Why not "if( p->pid > 1 )"? (Or can > p->pid can be negative?!, um, typecast to unsigned...) I simply mirrored the check done in do_exit(): if (tsk->pid == 1) panic("Attempted to kill init!"); Since PID_MAX is 32768 I do not believe pids can be negative. I suppose one could make an argument for skipping "daemons", i.e. pids below 300 (see the get_pid() function in kernel/fork.c), but I think that is a larger issue. Pat -- Patrick O'Rourke 978.606.0236 orourke@missioncriticallinux.com ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 22:54 [PATCH] Prevent OOM from killing init Patrick O'Rourke 2001-03-21 23:11 ` Eli Carter @ 2001-03-21 23:48 ` Rik van Riel 2001-03-22 8:14 ` Eric W. Biederman ` (5 more replies) 1 sibling, 6 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-21 23:48 UTC (permalink / raw) To: Patrick O'Rourke; +Cc: linux-mm, linux-kernel On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > Since the system will panic if the init process is chosen by > the OOM killer, the following patch prevents select_bad_process() > from picking init. One question ... has the OOM killer ever selected init on anybody's system ? I think that the scoring algorithm should make sure that we never pick init, unless the system is screwed so badly that init is broken or the only process left ;) Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 23:48 ` Rik van Riel @ 2001-03-22 8:14 ` Eric W. Biederman 2001-03-22 9:24 ` Rik van Riel 2001-03-22 19:29 ` Philipp Rumpf 2001-03-22 11:47 ` Guest section DW ` (4 subsequent siblings) 5 siblings, 2 replies; 106+ messages in thread From: Eric W. Biederman @ 2001-03-22 8:14 UTC (permalink / raw) To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel Rik van Riel <riel@conectiva.com.br> writes: > On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > > > Since the system will panic if the init process is chosen by > > the OOM killer, the following patch prevents select_bad_process() > > from picking init. > > One question ... has the OOM killer ever selected init on > anybody's system ? > > I think that the scoring algorithm should make sure that > we never pick init, unless the system is screwed so badly > that init is broken or the only process left ;) Is there ever a case where killing init is the right thing to do? My impression is that if init is selected the whole machine dies. If you can kill init and still have a machine that mostly works, then I guess it makes some sense not to kill it. Guaranteeing not to select init can buy you piece of mind because init if properly setup can put the machine back together again, while not special casing init means something weird might happen and init would be selected. Eric ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 8:14 ` Eric W. Biederman @ 2001-03-22 9:24 ` Rik van Riel 2001-03-22 19:29 ` Philipp Rumpf 1 sibling, 0 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-22 9:24 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Patrick O'Rourke, linux-mm, linux-kernel On 22 Mar 2001, Eric W. Biederman wrote: > Is there ever a case where killing init is the right thing to do? My > impression is that if init is selected the whole machine dies. If you > can kill init and still have a machine that mostly works, then I guess > it makes some sense not to kill it. > > Guaranteeing not to select init can buy you piece of mind because > init if properly setup can put the machine back together again, while > not special casing init means something weird might happen and init > would be selected. When something weird happens, it might be better to kill init and have the machine reset itself after the panic (echo 30 > /proc/sys/kernel/panic). Killing all other things and leaving just init intact makes for a machine which is as good as dead, without a chance for recovery-by-reboot... OTOH, I haven't heard of the OOM killer ever chosing init, not even of people who tried creating these special kinds of situations to trigger it on purpose. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 8:14 ` Eric W. Biederman 2001-03-22 9:24 ` Rik van Riel @ 2001-03-22 19:29 ` Philipp Rumpf 1 sibling, 0 replies; 106+ messages in thread From: Philipp Rumpf @ 2001-03-22 19:29 UTC (permalink / raw) To: Eric W. Biederman Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, Mar 22, 2001 at 01:14:41AM -0700, Eric W. Biederman wrote: > Rik van Riel <riel@conectiva.com.br> writes: > Is there ever a case where killing init is the right thing to do? There are cases where panic() is the right thing to do. Broken init is such a case. > My impression is that if init is selected the whole machine dies. > If you can kill init and still have a machine that mostly works, you can't. > Guaranteeing not to select init can buy you piece of mind because > init if properly setup can put the machine back together again, while > not special casing init means something weird might happen and init > would be selected. If we're in a situation where long-running processes with relatively small VM are killed the box is very unlikely to be usable anyway. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 23:48 ` Rik van Riel 2001-03-22 8:14 ` Eric W. Biederman @ 2001-03-22 11:47 ` Guest section DW 2001-03-22 15:01 ` Rik van Riel ` (3 more replies) 2001-03-22 14:53 ` Patrick O'Rourke ` (3 subsequent siblings) 5 siblings, 4 replies; 106+ messages in thread From: Guest section DW @ 2001-03-22 11:47 UTC (permalink / raw) To: Rik van Riel, Patrick O'Rourke; +Cc: linux-mm, linux-kernel On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote: > On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > > Since the system will panic if the init process is chosen by > > the OOM killer, the following patch prevents select_bad_process() > > from picking init. There is a dozen other processes that must not be killed. Init is just a random example. > One question ... has the OOM killer ever selected init on > anybody's system ? Last week I installed SuSE 7.1 somewhere. During the install: "VM: killing process rpm", leaving the installer rather confused. (An empty machine, 256MB, 144MB swap, I think 2.2.18.) Last month I had a computer algebra process running for a week. Killed. But this computation was the only task this machine had. Its sole reason of existence. Too bad - zero information out of a week's computation. (I think 2.4.0.) Clearly, Linux cannot be reliable if any process can be killed at any moment. I am not happy at all with my recent experiences. Andries ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 11:47 ` Guest section DW @ 2001-03-22 15:01 ` Rik van Riel 2001-03-22 19:04 ` Guest section DW 2001-03-22 16:41 ` Eric W. Biederman ` (2 subsequent siblings) 3 siblings, 1 reply; 106+ messages in thread From: Rik van Riel @ 2001-03-22 15:01 UTC (permalink / raw) To: Guest section DW; +Cc: Patrick O'Rourke, linux-mm, linux-kernel On Thu, 22 Mar 2001, Guest section DW wrote: > > One question ... has the OOM killer ever selected init on > > anybody's system ? > > Last week I installed SuSE 7.1 somewhere. > During the install: "VM: killing process rpm", > leaving the installer rather confused. > (An empty machine, 256MB, 144MB swap, I think 2.2.18.) That's the 2.2 kernel ... > Last month I had a computer algebra process running for a week. > Killed. But this computation was the only task this machine had. > Its sole reason of existence. > Too bad - zero information out of a week's computation. > (I think 2.4.0.) > > Clearly, Linux cannot be reliable if any process can be killed > at any moment. I am not happy at all with my recent experiences. Note that the OOM killer in 2.4 won't kick in until your machine is out of both memory and swap, see mm/oom_kill.c::out_of_memory(). regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 15:01 ` Rik van Riel @ 2001-03-22 19:04 ` Guest section DW 0 siblings, 0 replies; 106+ messages in thread From: Guest section DW @ 2001-03-22 19:04 UTC (permalink / raw) To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel On Thu, Mar 22, 2001 at 12:01:43PM -0300, Rik van Riel wrote: > > Last month I had a computer algebra process running for a week. > > Killed. But this computation was the only task this machine had. > > Its sole reason of existence. > > Too bad - zero information out of a week's computation. > > > > Clearly, Linux cannot be reliable if any process can be killed > > at any moment. I am not happy at all with my recent experiences. > > Note that the OOM killer in 2.4 won't kick in until your machine > is out of both memory and swap, see mm/oom_kill.c::out_of_memory(). Nevertheless, this process does malloc and malloc returns the requested memory. If a malloc fails the computer algebra process has the choice between various alternatives. Present a prompt, so that the user can examine variables and intermediate results, or request a dump to disk of the status of the computation. Or choose an alternative algorithm, at some other point of the space-time tradeoff curve. But no error return from malloc - just "Killed". Ach. Andries ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 11:47 ` Guest section DW 2001-03-22 15:01 ` Rik van Riel @ 2001-03-22 16:41 ` Eric W. Biederman 2001-03-22 20:28 ` Stephen Clouse 2001-03-23 17:26 ` James A. Sutherland 3 siblings, 0 replies; 106+ messages in thread From: Eric W. Biederman @ 2001-03-22 16:41 UTC (permalink / raw) To: Guest section DW Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Guest section DW <dwguest@win.tue.nl> writes: > On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote: > > On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > > > > Since the system will panic if the init process is chosen by > > > the OOM killer, the following patch prevents select_bad_process() > > > from picking init. > > There is a dozen other processes that must not be killed. > Init is just a random example. Not killing init provides enough for recovery if you truly hit an out of memory situation. With 2.4.x at least it is a box misconfiguration that causes it. The 2.2.x VM doesn't always try to swap, and free things up hard enough, before reporting out of memory. But even the 2.2.x problems are rare. > > > One question ... has the OOM killer ever selected init on > > anybody's system ? > > Last week I installed SuSE 7.1 somewhere. > During the install: "VM: killing process rpm", > leaving the installer rather confused. > (An empty machine, 256MB, 144MB swap, I think 2.2.18.) swap < RAM. ouch! This is a misconfiguration on a machine that actually starts swapping, and where out of memory problems are a reality. The fact an installer would trigger swapping on a 256MB machine is a second problem. > Last month I had a computer algebra process running for a week. > Killed. But this computation was the only task this machine had. > Its sole reason of existence. > Too bad - zero information out of a week's computation. > (I think 2.4.0.) It looks like you didn't have enough resources on that machine period. I pretty much trust 2.4.x in this department. Did that machine also have it's swap misconfigured? > > Clearly, Linux cannot be reliable if any process can be killed > at any moment. I am not happy at all with my recent experiences. Hmm. It should definitely not be at any moment. It should only be when resources are exhausted. So putting enough swap on a machine should be enough, to stop this from ever happening. Eric ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 11:47 ` Guest section DW 2001-03-22 15:01 ` Rik van Riel 2001-03-22 16:41 ` Eric W. Biederman @ 2001-03-22 20:28 ` Stephen Clouse 2001-03-22 21:01 ` Ingo Oeser ` (4 more replies) 2001-03-23 17:26 ` James A. Sutherland 3 siblings, 5 replies; 106+ messages in thread From: Stephen Clouse @ 2001-03-22 20:28 UTC (permalink / raw) To: Guest section DW Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel [-- Attachment #1: msg.pgp --] [-- Type: text/plain, Size: 1765 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, Mar 22, 2001 at 12:47:27PM +0100, Guest section DW wrote: > Last week I installed SuSE 7.1 somewhere. > During the install: "VM: killing process rpm", > leaving the installer rather confused. > (An empty machine, 256MB, 144MB swap, I think 2.2.18.) > > Last month I had a computer algebra process running for a week. > Killed. But this computation was the only task this machine had. > Its sole reason of existence. > Too bad - zero information out of a week's computation. > (I think 2.4.0.) > > Clearly, Linux cannot be reliable if any process can be killed > at any moment. I am not happy at all with my recent experiences. Really the whole oom_kill process seems bass-ackwards to me. I can't in my mind logically justify annihilating large-VM processes that have been running for days or weeks instead of just returning ENOMEM to a process that just started up. We run Oracle on a development box here, and it's always the first to get the axe (non-root process using 70-80 MB VM). Whenever someone's testing decides to run away with memory, I usually spend the rest of the day getting intimate with the backup files, since SIGKILLing random Oracle processes, as you might have guessed, has a tendency to rape the entire database. It would be nice to give immunity to certain uids, or better yet, just turn the damn thing off entirely. I've already hacked that in...errr, out. - -- Stephen Clouse <stephenc@theiqgroup.com> Senior Programmer, IQ Coordinator Project Lead The IQ Group, Inc. <http://www.theiqgroup.com/> -----BEGIN PGP SIGNATURE----- Version: PGP 6.5.8 iQA/AwUBOrpgbgOGqGs0PadnEQLp5QCfZMwtDZRNwYQ6RJX0MJ8lRVHTj3YAoNlt pFWT2i+2y+Yze/6EYy9V0oaE =QIrK -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 20:28 ` Stephen Clouse @ 2001-03-22 21:01 ` Ingo Oeser 2001-03-22 21:23 ` Alan Cox ` (3 subsequent siblings) 4 siblings, 0 replies; 106+ messages in thread From: Ingo Oeser @ 2001-03-22 21:01 UTC (permalink / raw) To: Stephen Clouse Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, Mar 22, 2001 at 02:28:31PM -0600, Stephen Clouse wrote: [Another OOM-Killing thread] > It would be nice to give immunity to certain uids, or better > yet, just turn the damn thing off entirely. I've already > hacked that in...errr, out. That's fine and suits best for all. I have provided an API for installing such OOM handlers (and have provided even an simple example for using it). See http://www.tu-chemnitz.de/~ioe/oom-kill-api/index.html for details. It applies to all regular kernels and with some offsets even to ac20. So this is the way to go for custom OOM handling. Rik noted once, that not much research has been done yet on this topic and that he is certain, that his code cannot cover all cases. Linus on the other hand doesn't like the idea of 'plugins' for core kernel code. So this patch is the best thing, that can be done about the situation. All work should be based on it, since it allows customers and researchers, that LIKE to try such 'plugins' to try all of them instead of having to patch and recompile the kernel for every OOM handler available. I would LOVE to start a link collection for all OOM handlers based on my patch or even host them, IF they are implemented as modules (as suggested by my API). This should avoid duplicate effort of this. Of course I hope to satisfy all needs by this. I'm also willing to include any API changes (read: exported functions, structs and variables) necessary for some OOM handlers in my patch. Thanks & Regards Ingo Oeser -- 10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag> <<<<<<<<<<<< been there and had much fun >>>>>>>>>>>> ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 20:28 ` Stephen Clouse 2001-03-22 21:01 ` Ingo Oeser @ 2001-03-22 21:23 ` Alan Cox 2001-03-22 22:00 ` Guest section DW ` (3 more replies) 2001-03-23 1:31 ` Michael Peddemors ` (2 subsequent siblings) 4 siblings, 4 replies; 106+ messages in thread From: Alan Cox @ 2001-03-22 21:23 UTC (permalink / raw) To: Stephen Clouse Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel > Really the whole oom_kill process seems bass-ackwards to me. I can't in my mind > logically justify annihilating large-VM processes that have been running for > days or weeks instead of just returning ENOMEM to a process that just started > up. How do you return an out of memory error to a C program that is out of memory due to a stack growth fault. There is actually not a language construct for it > It would be nice to give immunity to certain uids, or better yet, just turn the > damn thing off entirely. I've already hacked that in...errr, out. Eventually you have to kill something or the machine deadlocks. The oom killing doesnt kick in until that point. So its up to you how you like your errors. One of the things that we badly need to resurrect for 2.5 is the beancounter work which would let you reasonably do things like guaranteed Oracle a certain amount of the machine, or restrict all the untrusted users to a total of 200Mb hard limit between them etc ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 21:23 ` Alan Cox @ 2001-03-22 22:00 ` Guest section DW 2001-03-22 22:12 ` Ed Tomlinson ` (2 more replies) 2001-03-22 22:10 ` Doug Ledford ` (2 subsequent siblings) 3 siblings, 3 replies; 106+ messages in thread From: Guest section DW @ 2001-03-22 22:00 UTC (permalink / raw) To: Alan Cox, Stephen Clouse Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, Mar 22, 2001 at 09:23:54PM +0000, Alan Cox wrote: > > Really the whole oom_kill process seems bass-ackwards to me. I can't in my mind > > logically justify annihilating large-VM processes that have been running for > > days or weeks instead of just returning ENOMEM to a process that just started > > up. > > How do you return an out of memory error to a C program that is out of memory > due to a stack growth fault. There is actually not a language construct for it Alan, this is a fake argument. Linux is bad, and you defend it by saying that it is impossible to be perfect. I have used various Unix flavours for approximately thirty years. Stack overflow has not been a real problem. Of course they occurred every now and then, but roughly speaking only for unchecked recursion, that is, in cases of a program bug. Presently however, a flawless program can be killed. That is what makes Linux unreliable. > Eventually you have to kill something or the machine deadlocks. Alan, this is a fake argument. When I have a computer algebra system, and it computes millions of function values for some expensive function, then it keeps a cache of already computed values. Maybe a value is needed again and we save ten seconds of computation. But of course, when we run out of memory, nothing is easier than just throwing this cache out. You see, the bug is that malloc does not fail. This means that the decisions about what to do are not taken by the program that knows what it is doing, but by the kernel. Andries ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 22:00 ` Guest section DW @ 2001-03-22 22:12 ` Ed Tomlinson 2001-03-22 22:52 ` Alan Cox 2001-03-23 19:57 ` Szabolcs Szakacsits 2 siblings, 0 replies; 106+ messages in thread From: Ed Tomlinson @ 2001-03-22 22:12 UTC (permalink / raw) To: Guest section DW, Alan Cox, Stephen Clouse Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thursday 22 March 2001 17:00, Guest section DW wrote: > On Thu, Mar 22, 2001 at 09:23:54PM +0000, Alan Cox wrote: > > > Really the whole oom_kill process seems bass-ackwards to me. I can't > > > in my mind logically justify annihilating large-VM processes that have > > > been running for days or weeks instead of just returning ENOMEM to a > > > process that just started up. > > > > How do you return an out of memory error to a C program that is out of > > memory due to a stack growth fault. There is actually not a language > > construct for it > > Alan, this is a fake argument. > Linux is bad, and you defend it by saying that it is impossible to be > perfect. > > I have used various Unix flavours for approximately thirty years. > Stack overflow has not been a real problem. Of course they occurred > every now and then, but roughly speaking only for unchecked recursion, > that is, in cases of a program bug. > > Presently however, a flawless program can be killed. > That is what makes Linux unreliable. > > > Eventually you have to kill something or the machine deadlocks. > > Alan, this is a fake argument. > When I have a computer algebra system, and it computes millions of > function values for some expensive function, then it keeps a cache > of already computed values. Maybe a value is needed again and we > save ten seconds of computation. > But of course, when we run out of memory, nothing is easier than > just throwing this cache out. > > You see, the bug is that malloc does not fail. This means that the > decisions about what to do are not taken by the program that knows > what it is doing, but by the kernel. By this arguement the OOM kill code is fine... If malloc is broken fix it. Maybe we need to stage things so that ENOMEM gets returned to requests before we are totally out of memory. If the apps ignore the errors then the kills happen. Thoughts? Ed Tomlinson ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 22:00 ` Guest section DW 2001-03-22 22:12 ` Ed Tomlinson @ 2001-03-22 22:52 ` Alan Cox 2001-03-22 23:27 ` Guest section DW 2001-03-23 19:57 ` Szabolcs Szakacsits 2 siblings, 1 reply; 106+ messages in thread From: Alan Cox @ 2001-03-22 22:52 UTC (permalink / raw) To: Guest section DW Cc: Alan Cox, Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel > > Eventually you have to kill something or the machine deadlocks. > > Alan, this is a fake argument. No it is not. > You see, the bug is that malloc does not fail. This means that the > decisions about what to do are not taken by the program that knows > what it is doing, but by the kernel. Even if malloc fails the situation is no different. You can do overcommit avoidance in Linux if you are bored enough to try it. I did it in 1.2 one afternoon when bored. You simply account address space. Almost everything you need to touch is in mm/*.c and localised. The only exception is ptrace. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 22:52 ` Alan Cox @ 2001-03-22 23:27 ` Guest section DW 2001-03-22 23:37 ` Rik van Riel 2001-03-22 23:40 ` Alan Cox 0 siblings, 2 replies; 106+ messages in thread From: Guest section DW @ 2001-03-22 23:27 UTC (permalink / raw) To: Alan Cox Cc: Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, Mar 22, 2001 at 10:52:09PM +0000, Alan Cox wrote: > > You see, the bug is that malloc does not fail. This means that the > > decisions about what to do are not taken by the program that knows > > what it is doing, but by the kernel. > Even if malloc fails the situation is no different. Why do you say so? > You can do overcommit avoidance in Linux if you are bored enough to try it. Would you accept it as the default? Would Linus? (With disk I/O we are terribly conservative, using very cautious settings, and many people use hdparm to double or triple their disk speed. But for a few these optimistic settings cause data corruption, so we do not make it the default. Similarly I would be happy if the "no overcommit", "no OOM killer" situation was the default. The people who need a reliable system will leave it that way. The people who do not mind if some process is killed once in a while use vmparm or /proc/vm/overcommit or so to make Linux achieve more on average.) Andries ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 23:27 ` Guest section DW @ 2001-03-22 23:37 ` Rik van Riel 2001-03-26 19:04 ` James Antill 2001-03-22 23:40 ` Alan Cox 1 sibling, 1 reply; 106+ messages in thread From: Rik van Riel @ 2001-03-22 23:37 UTC (permalink / raw) To: Guest section DW Cc: Alan Cox, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Guest section DW wrote: > On Thu, Mar 22, 2001 at 10:52:09PM +0000, Alan Cox wrote: > > > You can do overcommit avoidance in Linux if you are bored enough to try it. > > Would you accept it as the default? Would Linus? It wouldn't help. Suppose you run without overcommit and you fill up RAM and swap to the last page. Then you change the size of one of the windows on your desktop and a program gets sent -SIGWINCH. In order to process this signal, the program needs to allocate some variables on its stack, possibly needing a new page to be allocated for its stack ... ... and since this is something which could happen to any program on the system, the result of non-overcommit would be getting a random process killed (though not completely random, syslogd and klogd would get killed more often than the others). The only solution to not getting processes killed is to run with enough memory and swap space, having an OOM killer which takes care to *NOT* let any random innocent process gets killed is nothing but a bonus, IMHO. regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 23:37 ` Rik van Riel @ 2001-03-26 19:04 ` James Antill 2001-03-26 20:05 ` Rik van Riel 0 siblings, 1 reply; 106+ messages in thread From: James Antill @ 2001-03-26 19:04 UTC (permalink / raw) To: Rik van Riel Cc: Guest section DW, Alan Cox, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel Rik van Riel <riel@conectiva.com.br> writes: > On Fri, 23 Mar 2001, Guest section DW wrote: > > On Thu, Mar 22, 2001 at 10:52:09PM +0000, Alan Cox wrote: > > > > > You can do overcommit avoidance in Linux if you are bored enough to try it. > > > > Would you accept it as the default? Would Linus? > > It wouldn't help. Suppose you run without overcommit and you > fill up RAM and swap to the last page. > > Then you change the size of one of the windows on your desktop > and a program gets sent -SIGWINCH. Ignoring the fact that most people don't use a tty based desktop, and that I'm pretty happy having my desktop die in flames when OOM (my DNS or smtp server on the other hand...). > In order to process this > signal, the program needs to allocate some variables on its > stack, possibly needing a new page to be allocated for its > stack ... man sigaltstack > ... and since this is something which could happen to any program > on the system, the result of non-overcommit would be getting a > random process killed (though not completely random, syslogd and > klogd would get killed more often than the others). I fail to see why, stack usage can be limited (and possibly cleanly handled by having a prctl() to say make sure X pages are available on the stack). If you want overcommit great, and I think it's a valid default ... but it'd be nice if I could say I don't want it for apps that aren't written using glib etc. -- # James Antill -- james@and.org :0: * ^From: .*james@and\.org /dev/null ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-26 19:04 ` James Antill @ 2001-03-26 20:05 ` Rik van Riel 0 siblings, 0 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-26 20:05 UTC (permalink / raw) To: James Antill Cc: Guest section DW, Alan Cox, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On 26 Mar 2001, James Antill wrote: > If you want overcommit great, and I think it's a valid default > ... but it'd be nice if I could say I don't want it for apps that > aren't written using glib etc. Agreed. Jonathan Morton seems to be making progress in testing and debugging the non-overcommit patch from some time ago. If things turn out to be trivial enough I wouldn't be surprised if we got to see the option of non-overcommit somewhere in future 2.4 and 2.5 kernels... regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 23:27 ` Guest section DW 2001-03-22 23:37 ` Rik van Riel @ 2001-03-22 23:40 ` Alan Cox 2001-03-23 20:09 ` Szabolcs Szakacsits 1 sibling, 1 reply; 106+ messages in thread From: Alan Cox @ 2001-03-22 23:40 UTC (permalink / raw) To: Guest section DW Cc: Alan Cox, Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel > > Even if malloc fails the situation is no different. > Why do you say so? Because you will fail on other things - stack overflow, signal delivery, eventually it will get you. You just cut the odds down. > > You can do overcommit avoidance in Linux if you are bored enough to try it. > > Would you accept it as the default? Would Linus? I'd like to have it there as an option. As to the default - You would have to see how much applications assume they can overcommit and rely on it. You might find you need a few Gbytes of swap just to boot ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 23:40 ` Alan Cox @ 2001-03-23 20:09 ` Szabolcs Szakacsits 2001-03-23 22:21 ` Alan Cox 0 siblings, 1 reply; 106+ messages in thread From: Szabolcs Szakacsits @ 2001-03-23 20:09 UTC (permalink / raw) To: Alan Cox Cc: Guest section DW, Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, 22 Mar 2001, Alan Cox wrote: > I'd like to have it there as an option. As to the default - You > would have to see how much applications assume they can overcommit > and rely on it. You might find you need a few Gbytes of swap just to > boot Seems a bit exaggeration ;) Here are numbers, http://lists.openresources.com/NetBSD/tech-userlevel/msg00722.html 6-50% more VM and the performance hit also isn't so bad as it's thought (Eduardo Horvath sent a non-overcommit patch for Linux about one year ago). Szaka ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 20:09 ` Szabolcs Szakacsits @ 2001-03-23 22:21 ` Alan Cox 2001-03-23 22:37 ` Szabolcs Szakacsits 0 siblings, 1 reply; 106+ messages in thread From: Alan Cox @ 2001-03-23 22:21 UTC (permalink / raw) To: Szabolcs Szakacsits Cc: Alan Cox, Guest section DW, Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel > > and rely on it. You might find you need a few Gbytes of swap just to > > boot > > Seems a bit exaggeration ;) Here are numbers, NetBSD is if I remember rightly still using a.out library styles. > 6-50% more VM and the performance hit also isn't so bad as it's thought > (Eduardo Horvath sent a non-overcommit patch for Linux about one year > ago). The Linux performance hit would be so close to zero you shouldnt be able to measure it - or it was in 1.2 anyway ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 22:21 ` Alan Cox @ 2001-03-23 22:37 ` Szabolcs Szakacsits 0 siblings, 0 replies; 106+ messages in thread From: Szabolcs Szakacsits @ 2001-03-23 22:37 UTC (permalink / raw) To: Alan Cox Cc: Guest section DW, Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Alan Cox wrote: > > > and rely on it. You might find you need a few Gbytes of swap just to > > > boot > > Seems a bit exaggeration ;) Here are numbers, > NetBSD is if I remember rightly still using a.out library styles. No, it uses ELF today, moreover the numbers were from Solaris. NetBSD also switched from non-overcommit to overcommit-only [AFAIK] mode with "random" process killing with its new UVM. > > 6-50% more VM and the performance hit also isn't so bad as it's thought > > (Eduardo Horvath sent a non-overcommit patch for Linux about one year > > ago). > The Linux performance hit would be so close to zero you shouldnt be able to > measure it - or it was in 1.2 anyway Yep, something like this :) Szaka ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 22:00 ` Guest section DW 2001-03-22 22:12 ` Ed Tomlinson 2001-03-22 22:52 ` Alan Cox @ 2001-03-23 19:57 ` Szabolcs Szakacsits 2 siblings, 0 replies; 106+ messages in thread From: Szabolcs Szakacsits @ 2001-03-23 19:57 UTC (permalink / raw) To: Guest section DW Cc: Alan Cox, Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, 22 Mar 2001, Guest section DW wrote: > Presently however, a flawless program can be killed. > That is what makes Linux unreliable. Your advocation is "save the application, crash the OS!". But you can't be blamed because everybody's first reaction is this :) But if you start to think you get the conclusion that process killing can't be avoided if you want the system keep running. But I agree Linux lacks some important things [see my other email] that could make the situation easily and inexpensively controllable. BTW, your app isn't flawless because it doesn't consider Linux memory management is [quasi-]overcommit-only at present ;) [or you used other apps as well, e.g. login, ps, cron is enough to kill your app when it stopped at OOM time]. Szaka ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 21:23 ` Alan Cox 2001-03-22 22:00 ` Guest section DW @ 2001-03-22 22:10 ` Doug Ledford 2001-03-22 22:53 ` Alan Cox 2001-03-22 23:43 ` Stephen Clouse 2001-03-23 19:26 ` Szabolcs Szakacsits 3 siblings, 1 reply; 106+ messages in thread From: Doug Ledford @ 2001-03-22 22:10 UTC (permalink / raw) To: Alan Cox Cc: Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Alan Cox wrote: > > > Really the whole oom_kill process seems bass-ackwards to me. I can't in my mind > > logically justify annihilating large-VM processes that have been running for > > days or weeks instead of just returning ENOMEM to a process that just started > > up. > > How do you return an out of memory error to a C program that is out of memory > due to a stack growth fault. There is actually not a language construct for it Simple, you reclaim a few of those uptodate buffers. My testing here has resulting in more of my system daemons getting killed than anything else, and it never once has solved the actual problem of simple memory pressure from apps reading/writing to disk and disk cache not releasing buffers quick enough. > > It would be nice to give immunity to certain uids, or better yet, just turn the > > damn thing off entirely. I've already hacked that in...errr, out. > > Eventually you have to kill something or the machine deadlocks. The oom killing > doesnt kick in until that point. So its up to you how you like your errors. I beg to differ. If you tell me that a machine that looks like this: [dledford@monster dledford]$ free total used free shared buffers cached Mem: 1017800 1014808 2992 0 73644 796392 -/+ buffers/cache: 144772 873028 Swap: 0 0 0 [dledford@monster dledford]$ is in need of killing sshd, I'll claim you are smoking some nice stuff ;-) -- Doug Ledford <dledford@redhat.com> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 22:10 ` Doug Ledford @ 2001-03-22 22:53 ` Alan Cox 2001-03-22 23:30 ` Doug Ledford 0 siblings, 1 reply; 106+ messages in thread From: Alan Cox @ 2001-03-22 22:53 UTC (permalink / raw) To: Doug Ledford Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel > > How do you return an out of memory error to a C program that is out of memory > > due to a stack growth fault. There is actually not a language construct for it > > Simple, you reclaim a few of those uptodate buffers. My testing here has If you have reclaimable buffers you are not out of memory. If oom is triggered in that state it is a bug. If you are complaining that the oom killer triggers at the wrong time then thats a completely unrelated issue. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 22:53 ` Alan Cox @ 2001-03-22 23:30 ` Doug Ledford 2001-03-22 23:40 ` Alan Cox 0 siblings, 1 reply; 106+ messages in thread From: Doug Ledford @ 2001-03-22 23:30 UTC (permalink / raw) To: Alan Cox Cc: Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Alan Cox wrote: > > > > How do you return an out of memory error to a C program that is out of memory > > > due to a stack growth fault. There is actually not a language construct for it > > > > Simple, you reclaim a few of those uptodate buffers. My testing here has > > If you have reclaimable buffers you are not out of memory. If oom is triggered > in that state it is a bug. If you are complaining that the oom killer triggers > at the wrong time then thats a completely unrelated issue. Ummm, yeah, that would pretty much be the claim. Real easy to reproduce too. Take your favorite machine with lots of RAM, run just a handful of startup process and system daemons, then log in on a few terminals and do: while true; do bonnie -s (1/2 ram); done Pretty soon, system daemons will start to die. -- Doug Ledford <dledford@redhat.com> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 23:30 ` Doug Ledford @ 2001-03-22 23:40 ` Alan Cox 0 siblings, 0 replies; 106+ messages in thread From: Alan Cox @ 2001-03-22 23:40 UTC (permalink / raw) To: Doug Ledford Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel > Ummm, yeah, that would pretty much be the claim. Real easy to reproduce too. > Take your favorite machine with lots of RAM, run just a handful of startup > process and system daemons, then log in on a few terminals and do: > > while true; do bonnie -s (1/2 ram); done > > Pretty soon, system daemons will start to die. Then thats a bug. I assume you've provided Rik with a detailed test case already ? ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 21:23 ` Alan Cox 2001-03-22 22:00 ` Guest section DW 2001-03-22 22:10 ` Doug Ledford @ 2001-03-22 23:43 ` Stephen Clouse 2001-03-23 19:26 ` Szabolcs Szakacsits 3 siblings, 0 replies; 106+ messages in thread From: Stephen Clouse @ 2001-03-22 23:43 UTC (permalink / raw) To: Alan Cox Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel [-- Attachment #1: msg.pgp --] [-- Type: text/plain, Size: 2188 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, Mar 22, 2001 at 09:23:54PM +0000, Alan Cox wrote: > How do you return an out of memory error to a C program that is out of memory > due to a stack growth fault. There is actually not a language construct for it Hmmm...the old "Error 3 while attempting to report Error 3" dialog from MS Excel. > Eventually you have to kill something or the machine deadlocks. The oom killing > doesnt kick in until that point. So its up to you how you like your errors. It's interesting that I never recall oom being a problem (like this) with 2.0 or 2.2. And the machines I was working with at the time were far crappier than these current boxen -- they'd ride the oom line almost constantly. Back then a new process would either a) scream "Out of memory!" or b) segfault. You could argue that b is not desirable, but I'd prefer that to the current behavior, really. In fact this type of behavior still happens under 2.4 when we hit OOM on the development boxen, although not consistently (only about half the time); oom_kill annihilates something we don't want it to, then the mallocing process that triggered it decides it has become bored with life and procceds to abort/segfault anyway. I wish I could reproduce it consistently. In any case, the behavior of oom_kill (whether you consider it correct or not) is really the symptom and not the cause. We've alleviated most of it via creative use of ulimit. Still, the seemingly draconian behavior needs a bit finer-grained control. > One of the things that we badly need to resurrect for 2.5 is the beancounter > work which would let you reasonably do things like guaranteed Oracle a certain > amount of the machine, or restrict all the untrusted users to a total of 200Mb > hard limit between them etc Let me know when you branch :) Sounds like a fun project. - -- Stephen Clouse <stephenc@theiqgroup.com> Senior Programmer, IQ Coordinator Project Lead The IQ Group, Inc. <http://www.theiqgroup.com/> -----BEGIN PGP SIGNATURE----- Version: PGP 6.5.8 iQA/AwUBOrqOLAOGqGs0PadnEQKWFACfaqzjtUQD4uGaLFnxn6M9Xc4N6QIAoJO3 nJTISp0ekbXEUiAY9PJVf2vr =B3u4 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 21:23 ` Alan Cox ` (2 preceding siblings ...) 2001-03-22 23:43 ` Stephen Clouse @ 2001-03-23 19:26 ` Szabolcs Szakacsits 2001-03-23 20:41 ` Paul Jakma 3 siblings, 1 reply; 106+ messages in thread From: Szabolcs Szakacsits @ 2001-03-23 19:26 UTC (permalink / raw) To: Alan Cox Cc: Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, 22 Mar 2001, Alan Cox wrote: > One of the things that we badly need to resurrect for 2.5 is the > beancounter work which would let you reasonably do things like > guaranteed Oracle a certain amount of the machine, or restrict all > the untrusted users to a total of 200Mb hard limit between them etc This would improve Linux reliability but it could be much better with added *optional* non-overcommit (most other OS also support this, also that's the default mostly [please no, "but it deadlocks" because it's not true, they also kill processes (Solaris, etc)]), reserved superuser memory (ala Solaris, True64, etc when OOM in non-overcommit, users complain and superuser acts, not the OS killing their tasks) and superuser *advisory* OOM killer [there was patch for this before], I think in the last area Linux is already more ahead than others at present. About the "use resource limits!". Yes, this is one solution. The *expensive* solution (admin time, worse resource utilization, etc). Others make it cheaper mixing with the above ones. Szaka ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 19:26 ` Szabolcs Szakacsits @ 2001-03-23 20:41 ` Paul Jakma 2001-03-23 21:58 ` george anzinger 2001-03-23 22:18 ` Szabolcs Szakacsits 0 siblings, 2 replies; 106+ messages in thread From: Paul Jakma @ 2001-03-23 20:41 UTC (permalink / raw) To: Szabolcs Szakacsits Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote: > About the "use resource limits!". Yes, this is one solution. The > *expensive* solution (admin time, worse resource utilization, etc). traditional user limits have worse resource utilisation? think what kind of utilisation a guaranteed allocation system would have. instead of 128MB, you'd need maybe a GB of RAM and many many GB of swap for most systems. some hopefully non-ranting points: - setting up limits on a RH system takes 1 minute by editing /etc/security/limits.conf. - Rik's current oom killer may not do a good job now, but it's impossible for it to do a /perfect/ job without implementing kernel/esp.c. - with limits set you will have: - /possible/ underutilisation on some workloads. - chance of hitting Rik's OOM killer reduced to almost nothing. no matter how good or bad Rik's killer is, i'd much rather set limits and just about /never/ have it invoked. more beancounting will make limits more useful (eg global?) and maybe dists can start setting up some kind of limits by default at install time based on the RAM installed and whether user selected server/workstation/etc.. install. Then hopefully we can be a little less concerned about how close Rik gets to the impossible task of implementing esp.c. > Szaka --paulj ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 20:41 ` Paul Jakma @ 2001-03-23 21:58 ` george anzinger 2001-03-24 5:55 ` Rik van Riel 2001-03-23 22:18 ` Szabolcs Szakacsits 1 sibling, 1 reply; 106+ messages in thread From: george anzinger @ 2001-03-23 21:58 UTC (permalink / raw) To: Paul Jakma Cc: Szabolcs Szakacsits, Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel What happens if you just make swap VERY large? Does the system thrash it self to a virtual standstill? Is this a possible answer? Supposedly you could then sneak in and blow away the bad guys manually ... George Paul Jakma wrote: > > On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote: > > > About the "use resource limits!". Yes, this is one solution. The > > *expensive* solution (admin time, worse resource utilization, etc). > > traditional user limits have worse resource utilisation? think what > kind of utilisation a guaranteed allocation system would have. instead > of 128MB, you'd need maybe a GB of RAM and many many GB of swap for > most systems. > > some hopefully non-ranting points: > > - setting up limits on a RH system takes 1 minute by editing > /etc/security/limits.conf. > > - Rik's current oom killer may not do a good job now, but it's > impossible for it to do a /perfect/ job without implementing > kernel/esp.c. > > - with limits set you will have: > - /possible/ underutilisation on some workloads. > - chance of hitting Rik's OOM killer reduced to almost nothing. > > no matter how good or bad Rik's killer is, i'd much rather set limits > and just about /never/ have it invoked. > > more beancounting will make limits more useful (eg global?) and maybe > dists can start setting up some kind of limits by default at install > time based on the RAM installed and whether user selected > server/workstation/etc.. install. > > Then hopefully we can be a little less concerned about how close Rik > gets to the impossible task of implementing esp.c. > > > Szaka > > --paulj > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 21:58 ` george anzinger @ 2001-03-24 5:55 ` Rik van Riel 0 siblings, 0 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-24 5:55 UTC (permalink / raw) To: george anzinger Cc: Paul Jakma, Szabolcs Szakacsits, Alan Cox, Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, george anzinger wrote: > What happens if you just make swap VERY large? Does the system thrash > it self to a virtual standstill? It does. I need to implement load control code (so we suspend processes in turn to keep the load low enough so we can avoid thrashing). > Is this a possible answer? Supposedly you could then sneak in and > blow away the bad guys manually ... This certainly works. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 20:41 ` Paul Jakma 2001-03-23 21:58 ` george anzinger @ 2001-03-23 22:18 ` Szabolcs Szakacsits 2001-03-24 2:08 ` Paul Jakma 1 sibling, 1 reply; 106+ messages in thread From: Szabolcs Szakacsits @ 2001-03-23 22:18 UTC (permalink / raw) To: Paul Jakma Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Paul Jakma wrote: > On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote: > > About the "use resource limits!". Yes, this is one solution. The > > *expensive* solution (admin time, worse resource utilization, etc). Thanks for cutting out relevant parts that said how to increase user base and satisfaction keeping and using the existent possibility as well. > traditional user limits have worse resource utilisation? think what > kind of utilisation a guaranteed allocation system would have. instead > of 128MB, you'd need maybe a GB of RAM and many many GB of swap for > most systems. Nonsense hodgepodge. See and/or mesaure the impact. I sent numbers in my former email. You also missed non-overcommit must be _optional_ [i.e. you wouldn't be forced to use it ;)]. Yes, there are users and enterprises who require it and would happily pay the 50-100% extra swap space for the same workload and extra reliability. > - setting up limits on a RH system takes 1 minute by editing > /etc/security/limits.conf. At every time you add/delete users, add/delete special apps, etc. Please note again, some people wants this way, some only for sometimes, and others really don't care because system guarantees for the admins they will always have the resources to take action [unfortunately this is not Linux]. > - Rik's current oom killer may not do a good job now, but it's > impossible for it to do a /perfect/ job without implementing > kernel/esp.c. Rik's killer is quite fine at _default_. But there will be always people who won't like it [the bastards think humans can still make better decisions than machines]. Wouldn't it be win for both sides if you could point out, "Hey, if you don't like the default, use the /proc/sys/vm/oom_killer interface"? As I said before there are also such patch by Chris Swiedler and definitely not a huge, complex one. And these stupid threads could be forgotten for good and all. > - with limits set you will have: > - /possible/ underutilisation on some workloads. Depends, guaranteed underutilisation or guaranteed extra unreliability fit the picture many times as well. > no matter how good or bad Rik's killer is, i'd much rather set limits > and just about /never/ have it invoked. Thanks for expressing your opinion but others [not necessarily me] have "occasionally" other one depending on the job what the box must do. Szaka ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 22:18 ` Szabolcs Szakacsits @ 2001-03-24 2:08 ` Paul Jakma 0 siblings, 0 replies; 106+ messages in thread From: Paul Jakma @ 2001-03-24 2:08 UTC (permalink / raw) To: Szabolcs Szakacsits; +Cc: Paul Jakma, linux-mm, Linux Kernel On Sat, 24 Mar 2001, Szabolcs Szakacsits wrote: > Nonsense hodgepodge. See and/or mesaure the impact. I sent numbers in my > former email. You also missed non-overcommit must be _optional_ [i.e. > you wouldn't be forced to use it ;)]. Yes, there are users and > enterprises who require it and would happily pay the 50-100% extra swap > space for the same workload and extra reliability. ok.. the last time OOM came up, the main objection to fully guaranteed vm was the possible huge overhead. if someone knows how to do it without a huge overhead, i'd love to see it and try it out. > At every time you add/delete users, add/delete special apps, etc. no.. pam_limits knows about groups, and you can specify limit for that group, one time. @user ... ... ... > Rik's killer is quite fine at _default_. But there will be always > people who won't like it exactly... so lets try avoid ever needing it. it is a last resort. > default, use the /proc/sys/vm/oom_killer interface"? As I said > before there are also such patch by Chris Swiedler and definitely > not a huge, complex one. uhmm.. where? > And these stupid threads could be forgotten for good and all. :) > Szaka regards, -- Paul Jakma paul@clubi.ie paul@jakma.org PGP5 key: http://www.clubi.ie/jakma/publickey.txt ------------------------------------------- Fortune: The optimum committee has no members. -- Norman Augustine ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 20:28 ` Stephen Clouse 2001-03-22 21:01 ` Ingo Oeser 2001-03-22 21:23 ` Alan Cox @ 2001-03-23 1:31 ` Michael Peddemors 2001-03-23 7:04 ` Rik van Riel 2001-03-27 15:05 ` Anthony de Boer - USEnet 2002-03-23 0:33 ` Martin Dalecki 4 siblings, 1 reply; 106+ messages in thread From: Michael Peddemors @ 2001-03-23 1:31 UTC (permalink / raw) To: Stephen Clouse Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Here, Here.. killing qmail on a server who's sole task is running mail doesn't seem to make much sense either.. > > Clearly, Linux cannot be reliable if any process can be killed > > at any moment. I am not happy at all with my recent experiences. > > Really the whole oom_kill process seems bass-ackwards to me. I can't in my mind > logically justify annihilating large-VM processes that have been running for > days or weeks instead of just returning ENOMEM to a process that just started > up. > > We run Oracle on a development box here, and it's always the first to get the > axe (non-root process using 70-80 MB VM). Whenever someone's testing decides to > run away with memory, I usually spend the rest of the day getting intimate with > the backup files, since SIGKILLing random Oracle processes, as you might have > guessed, has a tendency to rape the entire database. -- "Catch the Magic of Linux..." -------------------------------------------------------- Michael Peddemors - Senior Consultant LinuxAdministration - Internet Services NetworkServices - Programming - Security WizardInternet Services http://www.wizard.ca Linux Support Specialist - http://www.linuxmagic.com -------------------------------------------------------- (604)589-0037 Beautiful British Columbia, Canada ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 1:31 ` Michael Peddemors @ 2001-03-23 7:04 ` Rik van Riel 2001-03-23 11:28 ` Guest section DW 0 siblings, 1 reply; 106+ messages in thread From: Rik van Riel @ 2001-03-23 7:04 UTC (permalink / raw) To: Michael Peddemors Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel On 22 Mar 2001, Michael Peddemors wrote: > Here, Here.. killing qmail on a server who's sole task is running mail > doesn't seem to make much sense either.. I won't defend the current OOM killing code. Instead, I'm asking everybody who's unhappy with the current code to come up with something better. Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 7:04 ` Rik van Riel @ 2001-03-23 11:28 ` Guest section DW 2001-03-23 14:50 ` Eric W. Biederman 2001-03-23 21:11 ` José Luis Domingo López 0 siblings, 2 replies; 106+ messages in thread From: Guest section DW @ 2001-03-23 11:28 UTC (permalink / raw) To: Rik van Riel, Michael Peddemors Cc: Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Fri, Mar 23, 2001 at 04:04:09AM -0300, Rik van Riel wrote: > On 22 Mar 2001, Michael Peddemors wrote: > > > Here, Here.. killing qmail on a server who's sole task is running mail > > doesn't seem to make much sense either.. > > I won't defend the current OOM killing code. > > Instead, I'm asking everybody who's unhappy with the > current code to come up with something better. To a murderer: "Why did you kill that old lady?" Reply: "I won't defend that deed, but who else should I have killed?" Andries - getting more and more unhappy with OOM Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs). Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs). Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs). Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm). [yes, that was rpm growing too large, taking a few emacs sessions] [2.4.2] ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 11:28 ` Guest section DW @ 2001-03-23 14:50 ` Eric W. Biederman 2001-03-23 15:13 ` General 2.4 impressions (was Re: [PATCH] Prevent OOM from killing init) Jeff Garzik 2001-03-23 17:21 ` [PATCH] Prevent OOM from killing init Guest section DW 2001-03-23 21:11 ` José Luis Domingo López 1 sibling, 2 replies; 106+ messages in thread From: Eric W. Biederman @ 2001-03-23 14:50 UTC (permalink / raw) To: Guest section DW Cc: Rik van Riel, Michael Peddemors, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel Guest section DW <dwguest@win.tue.nl> writes: > On Fri, Mar 23, 2001 at 04:04:09AM -0300, Rik van Riel wrote: > > On 22 Mar 2001, Michael Peddemors wrote: > > > > > Here, Here.. killing qmail on a server who's sole task is running mail > > > doesn't seem to make much sense either.. > > > > I won't defend the current OOM killing code. > > > > Instead, I'm asking everybody who's unhappy with the > > current code to come up with something better. > > To a murderer: "Why did you kill that old lady?" > Reply: "I won't defend that deed, but who else should I have killed?" > > Andries - getting more and more unhappy with OOM > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs). > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs). > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs). > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm). > > [yes, that was rpm growing too large, taking a few emacs sessions] > [2.4.2] Let me get this straight you don't have enough swap for your workload? And you don't have per process limits on root by default? So you are complaining about the OOM killer? Eric ^ permalink raw reply [flat|nested] 106+ messages in thread
* General 2.4 impressions (was Re: [PATCH] Prevent OOM from killing init) 2001-03-23 14:50 ` Eric W. Biederman @ 2001-03-23 15:13 ` Jeff Garzik 2001-03-23 16:10 ` Adding just a pinch of icache/dcache pressure Jan Harkes 2001-03-23 17:21 ` [PATCH] Prevent OOM from killing init Guest section DW 1 sibling, 1 reply; 106+ messages in thread From: Jeff Garzik @ 2001-03-23 15:13 UTC (permalink / raw) To: linux-kernel; +Cc: linux-mm Personally I think the OOM killer itself is fine. I think there are problems elsewhere which are triggering the OOM killer when it should not be triggered, ie. a leak like Doug Ledford was reporting. I definitely see heavier page/dcache usage in 2.4 -- but that is to be expected due to 2.4 changes! So it is incredibily difficult to quantify if something is wrong, and if so, where... My own impressions of 2.4 are that it "feels faster" for my own uses and it's stable. The downsides I find are that heavy fs activity seems to imply increased swapping, which jibes with a guess that the page/dcache is exceptionally greedy with releasing pages under memory pressure. </unquantified vague ramble> -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full moon on a dark night, MandrakeSoft | and a smooth road all the way to your door. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Adding just a pinch of icache/dcache pressure... 2001-03-23 15:13 ` General 2.4 impressions (was Re: [PATCH] Prevent OOM from killing init) Jeff Garzik @ 2001-03-23 16:10 ` Jan Harkes 2001-03-23 16:17 ` Andi Kleen 0 siblings, 1 reply; 106+ messages in thread From: Jan Harkes @ 2001-03-23 16:10 UTC (permalink / raw) To: Jeff Garzik; +Cc: linux-kernel, linux-mm On Fri, Mar 23, 2001 at 10:13:55AM -0500, Jeff Garzik wrote: > Personally I think the OOM killer itself is fine. I think there are > problems elsewhere which are triggering the OOM killer when it should > not be triggered, ie. a leak like Doug Ledford was reporting. > > I definitely see heavier page/dcache usage in 2.4 -- but that is to be > expected due to 2.4 changes! So it is incredibily difficult to quantify > if something is wrong, and if so, where... > > My own impressions of 2.4 are that it "feels faster" for my own uses and > it's stable. The downsides I find are that heavy fs activity seems to > imply increased swapping, which jibes with a guess that the page/dcache > is exceptionally greedy with releasing pages under memory pressure. > > </unquantified vague ramble> Like I said earlier, I should stop theorizing and write the code. Here is a teeny little patch that adds a bit of pressure to the inode and dentry slabcaches during inactive shortage. On the 512MB desktop without the change, the inode+dentry slabs typically used up about 300MB after running my normal day-to-day workload for about 24 hours. Now, the inode+dentry slabs are using only 90MB. As there is more memory available for the buffer and page caches, kswapd seems to have less trouble keeping up with my typical workload. btw. There definitely is a network receive buffer leak somewhere in either the 3c905C path or higher up in the network layers (2.4.0 or 2.4.1). The normal path does not leak anything. I was seeing it only for a couple of days when there was a failing switch that must have randomly corrupted packets. The switch got replaced and the leakage disappeared, so I went back into a non-ikd kernel and stopped looking for the problem. Jan ================= --- linux/fs/inode.c.orig Thu Mar 22 13:20:55 2001 +++ linux/fs/inode.c Thu Mar 22 14:00:10 2001 @@ -270,19 +270,6 @@ spin_unlock(&inode_lock); } -/* - * Called with the spinlock already held.. - */ -static void sync_all_inodes(void) -{ - struct super_block * sb = sb_entry(super_blocks.next); - for (; sb != sb_entry(&super_blocks); sb = sb_entry(sb->s_list.next)) { - if (!sb->s_dev) - continue; - sync_list(&sb->s_dirty); - } -} - /** * write_inode_now - write an inode to disk * @inode: inode to write to disk @@ -507,8 +494,6 @@ struct inode * inode; spin_lock(&inode_lock); - /* go simple and safe syncing everything before starting */ - sync_all_inodes(); entry = inode_unused.prev; while (entry != &inode_unused) @@ -554,6 +539,9 @@ if (priority) count = inodes_stat.nr_unused / priority; + + if (priority < 6) + sync_inodes(0); prune_icache(count); kmem_cache_shrink(inode_cachep); --- linux/mm/vmscan.c.orig Thu Mar 22 14:00:41 2001 +++ linux/mm/vmscan.c Thu Mar 22 14:35:26 2001 @@ -845,9 +845,11 @@ * reclaim unused slab cache if memory is low. */ if (free_shortage()) { + shrink_dcache_memory(5, gfp_mask); + shrink_icache_memory(5, gfp_mask); + } else { shrink_dcache_memory(DEF_PRIORITY, gfp_mask); shrink_icache_memory(DEF_PRIORITY, gfp_mask); - } else { /* * Illogical, but true. At least for now. * ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: Adding just a pinch of icache/dcache pressure... 2001-03-23 16:10 ` Adding just a pinch of icache/dcache pressure Jan Harkes @ 2001-03-23 16:17 ` Andi Kleen 2001-03-23 16:51 ` Jan Harkes 0 siblings, 1 reply; 106+ messages in thread From: Andi Kleen @ 2001-03-23 16:17 UTC (permalink / raw) To: Jan Harkes; +Cc: Jeff Garzik, linux-kernel, linux-mm On Fri, Mar 23, 2001 at 05:10:56PM +0100, Jan Harkes wrote: > btw. There definitely is a network receive buffer leak somewhere in > either the 3c905C path or higher up in the network layers (2.4.0 or > 2.4.1). The normal path does not leak anything. What do you mean with "normal path" ? And are you sure it was a leak? TCP can buffer quite a bit of skbs, but it should be bounded based on the number of sockets. -Andi ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: Adding just a pinch of icache/dcache pressure... 2001-03-23 16:17 ` Andi Kleen @ 2001-03-23 16:51 ` Jan Harkes 0 siblings, 0 replies; 106+ messages in thread From: Jan Harkes @ 2001-03-23 16:51 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Fri, Mar 23, 2001 at 05:17:16PM +0100, Andi Kleen wrote: > On Fri, Mar 23, 2001 at 05:10:56PM +0100, Jan Harkes wrote: > > btw. There definitely is a network receive buffer leak somewhere in > > either the 3c905C path or higher up in the network layers (2.4.0 or > > 2.4.1). The normal path does not leak anything. > > What do you mean with "normal path" ? > > And are you sure it was a leak? TCP can buffer quite a bit of skbs, but it > should be bounded based on the number of sockets. > > -Andi No corrupted packets. I was pretty sure it was a leak once I noticed that most of my memory got allocated here: Top 10 of the not yet freed allocations taken from /proc/memleak in an IKD-patched 2.4.2 kernel a couple of weeks ago: memleak/01-02-27__15:44:19 74603 buffer.c:1234 42956 3c59x.c:2232 13025 dcache.c:598 12392 inode.c:665 5921 dcache.c:603 4480 ll_rw_blk.c:397 2304 raid5.c:154 2105 mmap.c:276 2064 af_unix.c:1340 1312 file_table.c:62 Buffer, dcache and inode allocations are all accounted for, I was expecting the problem there. However the 3c59x.c allocations are not, each of those buffers is taken from the size-2048 slab so they were already taking about 88MB. This was after running a backup, but the backup was already over and the sockets must have been closed. The backup statistics showed tcp transfer speed to be an average of 75kB/s instead of the more typical 350kB/s Before the backup run, (01-02-27__14:41:45) 7679 3c59x.c:2232 Later that afternoon the switch was fixed and life returned to normal. I rebooted the next day and ran another backup, this is the top ten unfreed allocations after that run. memleak/01-02-28__16:03:03 191764 buffer.c:1234 13957 inode.c:665 9684 dcache.c:598 4620 ll_rw_blk.c:397 2304 raid5.c:154 1587 mmap.c:276 1066 file_table.c:62 864 raid5.c:322 846 dst.c:103 802 dcache.c:603 ... 224 3c59x.c:2232 # not even in the top 10, it is number 19 I don't have any more numbers, and can't reproduce the situation anymore. Jan ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 14:50 ` Eric W. Biederman 2001-03-23 15:13 ` General 2.4 impressions (was Re: [PATCH] Prevent OOM from killing init) Jeff Garzik @ 2001-03-23 17:21 ` Guest section DW 2001-03-23 20:18 ` Paul Jakma 2001-03-23 23:48 ` Eric W. Biederman 1 sibling, 2 replies; 106+ messages in thread From: Guest section DW @ 2001-03-23 17:21 UTC (permalink / raw) To: Eric W. Biederman Cc: Rik van Riel, Michael Peddemors, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Fri, Mar 23, 2001 at 07:50:25AM -0700, Eric W. Biederman wrote: > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs). > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs). > > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs). > > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm). > > > > [yes, that was rpm growing too large, taking a few emacs sessions] > > [2.4.2] > > Let me get this straight you don't have enough swap for your workload? > And you don't have per process limits on root by default? > > So you are complaining about the OOM killer? I should not react - your questions are phrased rhetorically. But yes, I am complaining because Linux by default is unreliable. I strongly prefer a system that is reliable by default, and I'll leave it to others to run it in an unreliable mode. Andries ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:21 ` [PATCH] Prevent OOM from killing init Guest section DW @ 2001-03-23 20:18 ` Paul Jakma 2001-03-24 20:19 ` Jesse Pollard 2001-03-23 23:48 ` Eric W. Biederman 1 sibling, 1 reply; 106+ messages in thread From: Paul Jakma @ 2001-03-23 20:18 UTC (permalink / raw) To: Guest section DW Cc: Eric W. Biederman, Rik van Riel, Michael Peddemors, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Guest section DW wrote: > But yes, I am complaining because Linux by default is unreliable. no, your distribution is unreliable by default. > I strongly prefer a system that is reliable by default, > and I'll leave it to others to run it in an unreliable mode. currently, setting sensible user limits on my machines means i never get a hosed machine due to OOM. These limits are easy to set via pam_limits. (not perfect though, i think its session specific..) granted, if the machine hasn't been setup with user limits, then linux doesn't deal at all well with OOM, so this should be fixed. but it can easily be argued that admin error in not configuring limits is the main cause for OOM. > Andries regards, --paulj ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 20:18 ` Paul Jakma @ 2001-03-24 20:19 ` Jesse Pollard 0 siblings, 0 replies; 106+ messages in thread From: Jesse Pollard @ 2001-03-24 20:19 UTC (permalink / raw) To: Paul Jakma, Guest section DW Cc: Eric W. Biederman, Rik van Riel, Michael Peddemors, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Paul Jakma wrote: >On Fri, 23 Mar 2001, Guest section DW wrote: > >> But yes, I am complaining because Linux by default is unreliable. > >no, your distribution is unreliable by default. > >> I strongly prefer a system that is reliable by default, >> and I'll leave it to others to run it in an unreliable mode. > >currently, setting sensible user limits on my machines means i never >get a hosed machine due to OOM. These limits are easy to set via >pam_limits. (not perfect though, i think its session specific..) Process specific. Each forked process gets the same limits. You get OOM as soon as all processes together use more than the system capacity. >granted, if the machine hasn't been setup with user limits, then linux >doesn't deal at all well with OOM, so this should be fixed. but it can >easily be argued that admin error in not configuring limits is the >main cause for OOM. Admin has no real control is the problem. Limits are only good for one process. As soon as that process forks one other process then the useage limit is twice the limit established. >> Andries > >regards, > >--paulj -- ------------------------------------------------------------------------- Jesse I Pollard, II Email: jesse@cats-chateau.net Any opinions expressed are solely my own. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:21 ` [PATCH] Prevent OOM from killing init Guest section DW 2001-03-23 20:18 ` Paul Jakma @ 2001-03-23 23:48 ` Eric W. Biederman 1 sibling, 0 replies; 106+ messages in thread From: Eric W. Biederman @ 2001-03-23 23:48 UTC (permalink / raw) To: Guest section DW Cc: Rik van Riel, Michael Peddemors, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel Guest section DW <dwguest@win.tue.nl> writes: > On Fri, Mar 23, 2001 at 07:50:25AM -0700, Eric W. Biederman wrote: > > > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs). > > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs). > > > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs). > > > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm). > > > > > > [yes, that was rpm growing too large, taking a few emacs sessions] > > > [2.4.2] > > > > Let me get this straight you don't have enough swap for your workload? > > And you don't have per process limits on root by default? > > > > So you are complaining about the OOM killer? > > I should not react - your questions are phrased rhetorically. To some extent I was also very puzzled by your complaint. You have setup a system that by your definition unreliably and then you complain it is unreliable. > > But yes, I am complaining because Linux by default is unreliable. > I strongly prefer a system that is reliable by default, > and I'll leave it to others to run it in an unreliable mode. Now all I know the system didn't have enough resources to do what you asked to it do and it failed. That sounds reliable to me. Obviously you were suprised at how the system failed. Given that unix has been doing this kind of thing for decades, you obviously missed how the unix malloc overcommited memory. Does you application trap sigsegv on a different stack so you can catch stack growth failure? And how does your app handle this case? Having a no over commit kernel option would help. A cheap workaround is to call mlock_all(MCL_FUTRE...). Then you are garantteed you will always have ram locked into memory for your program. This assumes you have enough ram for your program. Eric ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 11:28 ` Guest section DW 2001-03-23 14:50 ` Eric W. Biederman @ 2001-03-23 21:11 ` José Luis Domingo López 1 sibling, 0 replies; 106+ messages in thread From: José Luis Domingo López @ 2001-03-23 21:11 UTC (permalink / raw) To: linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1360 bytes --] On Friday, 23 March 2001, at 12:28:15 +0100, Guest section DW wrote: > [...] > To a murderer: "Why did you kill that old lady?" > Reply: "I won't defend that deed, but who else should I have killed?" > No comments. > Andries - getting more and more unhappy with OOM > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs). > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs). > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs). > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm). > > [yes, that was rpm growing too large, taking a few emacs sessions] > [2.4.2] > OOM clearly didn't work perfectly in this case, but it worked and left your machine usable (maybe you lost data on your emacs sessions). From my (OS design newbie) point of view, there must be quite difficult to keep track of all system processes, and even a resource intensive task. If you can do it better, come up with a kernel patch, submit it, and get credit and fame for it. I would love to see Linux as the perfect OS for everyone, but won't ever complain about each other's work, mainly when I'm unable to contribute a thing. -- José Luis Domingo López Linux Registered User #189436 Debian GNU/Linux Potato (P166 64 MB RAM) jdomingo AT internautas DOT org => Spam at your own risk ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 20:28 ` Stephen Clouse ` (2 preceding siblings ...) 2001-03-23 1:31 ` Michael Peddemors @ 2001-03-27 15:05 ` Anthony de Boer - USEnet 2002-03-23 0:33 ` Martin Dalecki 4 siblings, 0 replies; 106+ messages in thread From: Anthony de Boer - USEnet @ 2001-03-27 15:05 UTC (permalink / raw) To: linux-kernel Stephen Clouse wrote: > We run Oracle on a development box here, and it's always the first to get the > axe (non-root process using 70-80 MB VM). ... > It would be nice to give immunity to certain uids, ... It would seem to me that the new capabilities stuff _could_ be the answer. Basically, all "am I root?" checks in the kernel should be becoming cap flags, the OOM killer already avoids killing root processes, it's already a tenet that yes you can hose your system doing insane things as root but that nonroot users should NOT be able to hose a system, so being able to eg. grant this capability to Oracle or ungrant it from sendmail could let a sysadmin tell the kernel what must be preserved regardless of its UID. As a baseline I'd want to see all user processes die before any UID 0 stuff, but being able to retune this would be extremely good. -- Anthony de Boer -- as seen at http://www.leftmind.net/~adb/ -- BOFH, eh? / "Just when you think you've got a handle on herding cats, \ \ along comes a three-legged cat on amphetamines." -- Skud / ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 20:28 ` Stephen Clouse ` (3 preceding siblings ...) 2001-03-27 15:05 ` Anthony de Boer - USEnet @ 2002-03-23 0:33 ` Martin Dalecki 2001-03-22 23:53 ` Rik van Riel 2001-03-23 0:20 ` Stephen Clouse 4 siblings, 2 replies; 106+ messages in thread From: Martin Dalecki @ 2002-03-23 0:33 UTC (permalink / raw) To: Stephen Clouse Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Stephen Clouse wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Thu, Mar 22, 2001 at 12:47:27PM +0100, Guest section DW wrote: > > Last week I installed SuSE 7.1 somewhere. > > During the install: "VM: killing process rpm", > > leaving the installer rather confused. > > (An empty machine, 256MB, 144MB swap, I think 2.2.18.) > > > > Last month I had a computer algebra process running for a week. > > Killed. But this computation was the only task this machine had. > > Its sole reason of existence. > > Too bad - zero information out of a week's computation. > > (I think 2.4.0.) > > > > Clearly, Linux cannot be reliable if any process can be killed > > at any moment. I am not happy at all with my recent experiences. > > Really the whole oom_kill process seems bass-ackwards to me. I can't in my mind > logically justify annihilating large-VM processes that have been running for > days or weeks instead of just returning ENOMEM to a process that just started > up. > > We run Oracle on a development box here, and it's always the first to get the > axe (non-root process using 70-80 MB VM). Whenever someone's testing decides to > run away with memory, I usually spend the rest of the day getting intimate with > the backup files, since SIGKILLing random Oracle processes, as you might have > guessed, has a tendency to rape the entire database. > > It would be nice to give immunity to certain uids, or better yet, just turn the > damn thing off entirely. I've already hacked that in...errr, out. AMEN! TO THIS! Uptime of a process is a much better mesaure for a killing candidate then it's size. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2002-03-23 0:33 ` Martin Dalecki @ 2001-03-22 23:53 ` Rik van Riel 2002-03-23 1:21 ` Martin Dalecki 2001-03-23 0:20 ` Stephen Clouse 1 sibling, 1 reply; 106+ messages in thread From: Rik van Riel @ 2001-03-22 23:53 UTC (permalink / raw) To: Martin Dalecki Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel On Sat, 23 Mar 2002, Martin Dalecki wrote: > Uptime of a process is a much better mesaure for a killing > candidate then it's size. You'll have fun with your root shell, then ;) The current OOM code takes things like uptime, used cpu, size and a bunch of other things into account. If it turns out that the code is not attaching a proper weight to some of these factors, you should be sending patches, not flames. (the code is full of comments, so it should be easy enough to find your way around the code and tweak it until it does the right thing in a number of test cases) regards, Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 23:53 ` Rik van Riel @ 2002-03-23 1:21 ` Martin Dalecki 0 siblings, 0 replies; 106+ messages in thread From: Martin Dalecki @ 2002-03-23 1:21 UTC (permalink / raw) To: Rik van Riel Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel Rik van Riel wrote: > > On Sat, 23 Mar 2002, Martin Dalecki wrote: > > > Uptime of a process is a much better mesaure for a killing > > candidate then it's size. > > You'll have fun with your root shell, then ;) You mean the remote one? > The current OOM code takes things like uptime, used cpu, size > and a bunch of other things into account. > > If it turns out that the code is not attaching a proper weight > to some of these factors, you should be sending patches, not > flames. Did I say anything insulting? I have just stated what I think is more important... BTW> it's not quite obvious that You have to look into oom_kill to find it in the kernel source where to look at. (Yes I did just find /usr/src/linux -name "oom*" becouse I happen to remember but! OK i will just place - in front of the description lines where I think that you where mislead: * Good in this context means that: * 1) we lose the minimum amount of work done -* 2) we recover a large amount of memory * 3) we don't kill anything innocent of eating tons of memory -* 4) we want to kill the minimum amount of processes (one) * 5) we try to kill the process the user expects us to kill, this * algorithm has been meticulously tuned to meet the priniciple * of least surprise ... (be careful when you change it) The following is a wrong assumtion. You usually nice processes to the background just to guarantee for example smoot interaction just in case you won't login in in some time to the machine. For example let's have an dedicated http server, which does a lot of embedded perl. It's quite clever to renice it back, just in case this remote machine get's overloaded, becouse otherwise your chances to get a login in case the machine starts to trash, would be much worser. But this doesn't mean that the process isn't more important - becouse you do it to make the machine crowl through high load peaks and still let you in in case you have something urgent to do on it. /* * Niced processes are most likely less important, so double * their badness points. */ if (p->nice > 0) points *= 2; BTW> Why the hell you don't just use a polynomial approximation for int_sqrt - the range of values is very closed an you are working in a finite ring anyway - you could very easly find a simple approximation which wouldn't need any looping. This should be reversted: points /= int_sqrt(cpu_time); points /= int_sqrt(int_sqrt(run_time)); points = p->mm->total_vm; /* * CPU time is in seconds and run time is in minutes. There is no * particular reason for this other than that it turned out to work * very well in practice. This is not safe against jiffie wraps * but we don't care _that_ much... */ cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3); run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10); points /= int_sqrt(cpu_time); points /= int_sqrt(int_sqrt(run_time)); ============================================================== NOW I SEE THE MOST IMPORTANT MISTAKE: There should be a de-normalization of the units CPU_time/total_uptime RUN_time/total_uptime mem/total_mem. Otherwise you can't map the intended logics sufficiently safe on to the calculation you do. You compare bits with seconds - which is WRONG. Let: m := memmory used by the process M := the total memmory in the system. c := cpu time used by the process u := uptime of the process. U := uptime of the system Then you calculate points as (m / sqrt(c)) / sqrt(sqrt(r)) Which is just very wired function with a non homogen behaviour. (Just take the first derivative of it in any dimension to see what I mean) You should calculate to represent you intended logics: x * (m / M) + y * (U / c) + z * (U / u), where x y z are constants representing the wighting heuristic importance one gives to those particular measure points. A simple *normalized* polynom the only thing people and computers can realy deal with. > (the code is full of comments, so it should be easy enough to > find your way around the code and tweak it until it does the > right thing in a number of test cases) > > regards, > > Rik > -- > Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml > > Virtual memory is like a game you can't win; > However, without VM there's truly nothing to lose... > > http://www.surriel.com/ > http://www.conectiva.com/ http://distro.conectiva.com/ -- - phone: +49 214 8656 283 - job: eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!) - langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort: ru_RU.KOI8-R ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2002-03-23 0:33 ` Martin Dalecki 2001-03-22 23:53 ` Rik van Riel @ 2001-03-23 0:20 ` Stephen Clouse 2002-03-23 1:30 ` Martin Dalecki 1 sibling, 1 reply; 106+ messages in thread From: Stephen Clouse @ 2001-03-23 0:20 UTC (permalink / raw) To: Martin Dalecki Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel [-- Attachment #1: msg.pgp --] [-- Type: text/plain, Size: 1752 bytes --] -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Sat, Mar 23, 2002 at 01:33:50AM +0100, Martin Dalecki wrote: > AMEN! TO THIS! > Uptime of a process is a much better mesaure for a killing candidate > then it's size. Thing is, if you take a good study of mm/oom_kill.c, it *does* take start time into account, as well as CPU time. The problem is that a process (like Oracle, in our case) using ludicrous amounts of memory can still rank at the top of the list, even with the time-based reduction factors, because total VM is the starting number in the equation for determining what to kill. Oracle or what not sitting at 80 MB for a day or two will still find a way to outrank the newly-started 1 MB shell process whose malloc triggered oom_kill in the first place. If anything, time really needs to be a hard criterion for sorting the final list on and not merely a variable in the equation and thus tied to vmsize. This is why the production database boxen aren't running 2.4 yet. I can control Oracle's usage very finely (since it uses a fixed memory pool preallocated at startup), but if something else decides to fire up on there (like the nightly backup and maintenance routine) and decides it needs just a pinch more memory than what's available -- ick. 2.2.x doesn't appear to enforce new memory allocation with a sniper rifle -- the new process just suffers a pleasant ("Out of memory!") or violent (SIGSEGV) death. - -- Stephen Clouse <stephenc@theiqgroup.com> Senior Programmer, IQ Coordinator Project Lead The IQ Group, Inc. <http://www.theiqgroup.com/> -----BEGIN PGP SIGNATURE----- Version: PGP 6.5.8 iQA/AwUBOrqW3wOGqGs0PadnEQLZUwCfWTr8HwAChQamWWvWWzZcX5DZ8PAAnROB Ja25OAQu3W1h7Ck0SU/TfKj8 =VlQt -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 0:20 ` Stephen Clouse @ 2002-03-23 1:30 ` Martin Dalecki 2001-03-23 1:37 ` Rik van Riel 0 siblings, 1 reply; 106+ messages in thread From: Martin Dalecki @ 2002-03-23 1:30 UTC (permalink / raw) To: Stephen Clouse Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Stephen Clouse wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Sat, Mar 23, 2002 at 01:33:50AM +0100, Martin Dalecki wrote: > > AMEN! TO THIS! > > Uptime of a process is a much better mesaure for a killing candidate > > then it's size. > > Thing is, if you take a good study of mm/oom_kill.c, it *does* take start time I did thing is Rik did use a non normalized formula in oom_kill for the calculation of the kill penalty a process get's. This is the main reason for the non controllable behaviour of it. > into account, as well as CPU time. The problem is that a process (like Oracle, > in our case) using ludicrous amounts of memory can still rank at the top of the > list, even with the time-based reduction factors, because total VM is the > starting number in the equation for determining what to kill. Oracle or what > not sitting at 80 MB for a day or two will still find a way to outrank the > newly-started 1 MB shell process whose malloc triggered oom_kill in the first > place. This is due to the broken calculation formula in oom_kill(). > > If anything, time really needs to be a hard criterion for sorting the final list > on and not merely a variable in the equation and thus tied to vmsize. > > This is why the production database boxen aren't running 2.4 yet. I can control > Oracle's usage very finely (since it uses a fixed memory pool preallocated at > startup), but if something else decides to fire up on there (like the nightly > backup and maintenance routine) and decides it needs just a pinch more memory > than what's available -- ick. 2.2.x doesn't appear to enforce new memory > allocation with a sniper rifle -- the new process just suffers a pleasant ("Out > of memory!") or violent (SIGSEGV) death. And you should never ever overcommit memmory to oracle! Don't make the buffers bigger then half the memmory in the system really. There ARE circumstances where oracle is using all available memmory in very random manner. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2002-03-23 1:30 ` Martin Dalecki @ 2001-03-23 1:37 ` Rik van Riel 2001-03-23 10:48 ` Martin Dalecki 0 siblings, 1 reply; 106+ messages in thread From: Rik van Riel @ 2001-03-23 1:37 UTC (permalink / raw) To: Martin Dalecki Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel On Sat, 23 Mar 2002, Martin Dalecki wrote: > This is due to the broken calculation formula in oom_kill(). Feel free to write better-working code. Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 1:37 ` Rik van Riel @ 2001-03-23 10:48 ` Martin Dalecki 2001-03-23 14:56 ` Rik van Riel 0 siblings, 1 reply; 106+ messages in thread From: Martin Dalecki @ 2001-03-23 10:48 UTC (permalink / raw) To: Rik van Riel Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel Rik van Riel wrote: > > On Sat, 23 Mar 2002, Martin Dalecki wrote: > > > This is due to the broken calculation formula in oom_kill(). > > Feel free to write better-working code. I don't get paid for it and I'm not idling through my days... ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 10:48 ` Martin Dalecki @ 2001-03-23 14:56 ` Rik van Riel 2001-03-23 16:43 ` Guest section DW 2001-03-23 20:20 ` Tom Diehl 0 siblings, 2 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-23 14:56 UTC (permalink / raw) To: Martin Dalecki Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Martin Dalecki wrote: > Rik van Riel wrote: > > On Sat, 23 Mar 2002, Martin Dalecki wrote: > > > > > This is due to the broken calculation formula in oom_kill(). > > > > Feel free to write better-working code. > > I don't get paid for it and I'm not idling through my days... <similar response from Andries> Well, in that case you'll have to live with the current OOM killer. Martin wrote down a pretty detailed description of what's wrong with my algorithm, if it really bothers him he should be able to come up with something better. Personally, I think there is more important VM code to look after, since OOM is a pretty rare occurrance anyway. regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 14:56 ` Rik van Riel @ 2001-03-23 16:43 ` Guest section DW 2001-03-24 5:57 ` Rik van Riel 2001-03-23 20:20 ` Tom Diehl 1 sibling, 1 reply; 106+ messages in thread From: Guest section DW @ 2001-03-23 16:43 UTC (permalink / raw) To: Rik van Riel, Martin Dalecki Cc: Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Fri, Mar 23, 2001 at 11:56:23AM -0300, Rik van Riel wrote: > On Fri, 23 Mar 2001, Martin Dalecki wrote: > > > Feel free to write better-working code. > > > > I don't get paid for it and I'm not idling through my days... > > <similar response from Andries> No lies please. Andries ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 16:43 ` Guest section DW @ 2001-03-24 5:57 ` Rik van Riel 2001-03-25 16:35 ` Guest section DW 0 siblings, 1 reply; 106+ messages in thread From: Rik van Riel @ 2001-03-24 5:57 UTC (permalink / raw) To: Guest section DW Cc: Martin Dalecki, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Guest section DW wrote: > On Fri, Mar 23, 2001 at 11:56:23AM -0300, Rik van Riel wrote: > > On Fri, 23 Mar 2001, Martin Dalecki wrote: > > > > > Feel free to write better-working code. > > > > > > I don't get paid for it and I'm not idling through my days... > > > > <similar response from Andries> > > No lies please. You mean that you ARE willing to implement what you've been arguing for? Cool, I can't wait to see your patch. Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-24 5:57 ` Rik van Riel @ 2001-03-25 16:35 ` Guest section DW 0 siblings, 0 replies; 106+ messages in thread From: Guest section DW @ 2001-03-25 16:35 UTC (permalink / raw) To: Rik van Riel Cc: Martin Dalecki, Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel On Sat, Mar 24, 2001 at 02:57:27AM -0300, Rik van Riel wrote: > On Fri, 23 Mar 2001, Guest section DW wrote: > > On Fri, Mar 23, 2001 at 11:56:23AM -0300, Rik van Riel wrote: > > > On Fri, 23 Mar 2001, Martin Dalecki wrote: > > > > > > > Feel free to write better-working code. > > > > > > > > I don't get paid for it and I'm not idling through my days... > > > > > > <similar response from Andries> > > > > No lies please. > > You mean that you ARE willing to implement what you've been > arguing for? There had not been any such response by me - thus you should not ascribe to me such a response. Concerning overcommit: people tell me that Eduardo Horvath in his patch submitted to l-k on 2000-03-31 already solved the problem (entirely or to a large extent). : This patch will prevent the linux kernel from allowing VM overcommit. I have not yet read the code. Andries ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 14:56 ` Rik van Riel 2001-03-23 16:43 ` Guest section DW @ 2001-03-23 20:20 ` Tom Diehl 2001-03-23 23:56 ` Tim Wright 1 sibling, 1 reply; 106+ messages in thread From: Tom Diehl @ 2001-03-23 20:20 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-kernel On Fri, 23 Mar 2001, Rik van Riel wrote: > Well, in that case you'll have to live with the current OOM > killer. Martin wrote down a pretty detailed description of > what's wrong with my algorithm, if it really bothers him he > should be able to come up with something better. > > Personally, I think there is more important VM code to look > after, since OOM is a pretty rare occurrance anyway. Well actually it is not that rare at least for me. Every 3 or 4 days I run into it (It happened again this morning). The machine has 128 Megs of ram and 256 Megs of swap. It is my desktop machine and I keep 3 or 4 netscape windows running all of the time. Well I try to at least. Every 3 or 4 days the OOM Killer kills netscape, it happened this morning. If I could fix it I would but alas I do not have the knowledge. The best I can do is test. :( This is NOT a complaint I just bring this up as another data point. It used to lock the machine so things are getting better. fwiw, I am currently running 2.4.2-ac18. The old ac kernels (do not remember exactly which ones but it was single digits) would allow the machine to start thrashing. I could usually see that it was running out of memory and if I was fast enough could kill Netscape b4 the machine locked. If I was not fast enough it would lock hard. Nothing in the logs. HTH, -- ......Tom ATA100 is another testimony to the fact that pigs can be tdiehl@pil.net made to fly given sufficient thrust (to borrow an RFC) Alan Cox lkml 11 Jan 01 ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 20:20 ` Tom Diehl @ 2001-03-23 23:56 ` Tim Wright 2001-03-24 0:21 ` Tom Diehl 0 siblings, 1 reply; 106+ messages in thread From: Tim Wright @ 2001-03-23 23:56 UTC (permalink / raw) To: Tom Diehl; +Cc: Rik van Riel, linux-kernel Netscape 4 has some very nasty habits like suddenly consuming ~80MB of memory. Disabling java support seems to eradicate most occurences of this particularly obnoxious behaviour. Under these circumstances, the OOM killer is doing exactly the right thing i.e. killing a runaway app. Tim On Fri, Mar 23, 2001 at 03:20:41PM -0500, Tom Diehl wrote: > On Fri, 23 Mar 2001, Rik van Riel wrote: > > > Well, in that case you'll have to live with the current OOM > > killer. Martin wrote down a pretty detailed description of > > what's wrong with my algorithm, if it really bothers him he > > should be able to come up with something better. > > > > Personally, I think there is more important VM code to look > > after, since OOM is a pretty rare occurrance anyway. > > Well actually it is not that rare at least for me. Every 3 or 4 days I run > into it (It happened again this morning). The machine has 128 Megs of ram > and 256 Megs of swap. It is my desktop machine and I keep 3 or 4 netscape > windows running all of the time. Well I try to at least. Every 3 or 4 days > the OOM Killer kills netscape, it happened this morning. If I could fix it > I would but alas I do not have the knowledge. The best I can do is test. :( > > This is NOT a complaint I just bring this up as another data point. > It used to lock the machine so things are getting better. fwiw, I am > currently running 2.4.2-ac18. The old ac kernels (do not remember exactly > which ones but it was single digits) would allow the machine to start > thrashing. I could usually see that it was running out of memory and if I > was fast enough could kill Netscape b4 the machine locked. If I was not > fast enough it would lock hard. Nothing in the logs. > > HTH, > > -- > ......Tom ATA100 is another testimony to the fact that pigs can be > tdiehl@pil.net made to fly given sufficient thrust (to borrow an RFC) > Alan Cox lkml 11 Jan 01 > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Tim Wright - timw@splhi.com or timw@aracnet.com or twright@us.ibm.com IBM Linux Technology Center, Beaverton, Oregon Interested in Linux scalability ? Look at http://lse.sourceforge.net/ "Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 23:56 ` Tim Wright @ 2001-03-24 0:21 ` Tom Diehl 0 siblings, 0 replies; 106+ messages in thread From: Tom Diehl @ 2001-03-24 0:21 UTC (permalink / raw) To: Tim Wright; +Cc: Rik van Riel, linux-kernel On Fri, 23 Mar 2001, Tim Wright wrote: > Netscape 4 has some very nasty habits like suddenly consuming ~80MB of memory. > Disabling java support seems to eradicate most occurences of this particularly > obnoxious behaviour. Under these circumstances, the OOM killer is doing exactly > the right thing i.e. killing a runaway app. Thanks for the info. I sus[ected as much but I was not sure. -- ......Tom ATA100 is another testimony to the fact that pigs can be tdiehl@pil.net made to fly given sufficient thrust (to borrow an RFC) Alan Cox lkml 11 Jan 01 ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-22 11:47 ` Guest section DW ` (2 preceding siblings ...) 2001-03-22 20:28 ` Stephen Clouse @ 2001-03-23 17:26 ` James A. Sutherland 2001-03-23 17:32 ` Alan Cox ` (3 more replies) 3 siblings, 4 replies; 106+ messages in thread From: James A. Sutherland @ 2001-03-23 17:26 UTC (permalink / raw) To: Guest section DW Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Thu, 22 Mar 2001, Guest section DW wrote: > On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote: > > On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > > > > Since the system will panic if the init process is chosen by > > > the OOM killer, the following patch prevents select_bad_process() > > > from picking init. > > There is a dozen other processes that must not be killed. > Init is just a random example. That depends what you mean by "must not". If it's your missile guidance system, aircraft autopilot or life support system, the system must not run out of memory in the first place. If the system breaks down badly, killing init and thus panicking (hence rebooting, if the system is set up that way) seems the best approach. > > One question ... has the OOM killer ever selected init on > > anybody's system ? > > Last week I installed SuSE 7.1 somewhere. > During the install: "VM: killing process rpm", > leaving the installer rather confused. > (An empty machine, 256MB, 144MB swap, I think 2.2.18.) If SuSE's install program needs more than a quarter Gb of RAM, you need a better distro. > Last month I had a computer algebra process running for a week. > Killed. But this computation was the only task this machine had. > Its sole reason of existence. > Too bad - zero information out of a week's computation. A computation your system was incapable of performing. OK, it's a shame it took you a week to find this out, but the computation had to die: if the only process running cannot run, it has to die! > (I think 2.4.0.) > > Clearly, Linux cannot be reliable if any process can be killed > at any moment. What on earth did you expect to happen when the process exceeded the machine's capabilities? Using more than all the resources fails. There isn't an alternative. James. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:26 ` James A. Sutherland @ 2001-03-23 17:32 ` Alan Cox 2001-03-23 18:58 ` Martin Dalecki ` (4 more replies) 2001-03-24 0:03 ` [PATCH] Prevent OOM from killing init Guest section DW ` (2 subsequent siblings) 3 siblings, 5 replies; 106+ messages in thread From: Alan Cox @ 2001-03-23 17:32 UTC (permalink / raw) To: James A. Sutherland Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel > That depends what you mean by "must not". If it's your missile guidance > system, aircraft autopilot or life support system, the system must not run > out of memory in the first place. If the system breaks down badly, killing > init and thus panicking (hence rebooting, if the system is set up that > way) seems the best approach. Ultra reliable systems dont contain memory allocators. There are good reasons for this but the design trade offs are rather hard to make in a real world environment Solving the trivial overcommit case is not a difficult task but since I don't believe it is needed I'll wait for those who moan so loudly to do it ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:32 ` Alan Cox @ 2001-03-23 18:58 ` Martin Dalecki 2001-03-25 13:54 ` [PATCH] OOM handling Martin Dalecki 2001-03-23 19:45 ` [PATCH] Prevent OOM from killing init Jonathan Morton ` (3 subsequent siblings) 4 siblings, 1 reply; 106+ messages in thread From: Martin Dalecki @ 2001-03-23 18:58 UTC (permalink / raw) To: Alan Cox Cc: James A. Sutherland, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel I have a constructive proposal: It would make much sense to make the oom killer leave not just root processes alone but processes belonging to a UID lower then a certain value as well (500). This would be: 1. Easly managable by the admin. Just let oracle/www and analogous users have a UID lower then let's say 500. 2. In full compliance with the port trick done by TCP/IP (ports < 1024 vers other) 3. It wouldn't need any addition of new interface (no jebanoje gawno in /proc in addition() 4. Really simple to implement/document understand. 5. Be the same way as Solaris does similiar things. ... Damn: I will let my chess club alone toady and will just code it down NOW. Spec: 1. Processes with a UID < 100 are immune to OOM killers. 2. Processes with a UID >= 100 && < 500 are hard for the OOM killer to take on. 3. Processes with a UID >= 500 are easy targets. Let me introduce a new terminology in full analogy to "fire walls" routers and therabouts: Processes of category 1. are called captains (oficerzy) Processes of category 2. are called corporals (porucznicy) Processes of category 2. are called privates (¿o³nierze) ;-) ^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH] OOM handling 2001-03-23 18:58 ` Martin Dalecki @ 2001-03-25 13:54 ` Martin Dalecki 2001-03-25 15:06 ` Rik van Riel ` (2 more replies) 0 siblings, 3 replies; 106+ messages in thread From: Martin Dalecki @ 2001-03-25 13:54 UTC (permalink / raw) To: Alan Cox, James A. Sutherland, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2087 bytes --] Martin Dalecki wrote: > > I have a constructive proposal: > > It would make much sense to make the oom killer > leave not just root processes alone but processes belonging to a UID > lower > then a certain value as well (500). This would be: > > 1. Easly managable by the admin. Just let oracle/www and analogous users > have a UID lower then let's say 500. > > 2. In full compliance with the port trick done by TCP/IP (ports < 1024 > vers other) > > 3. It wouldn't need any addition of new interface (no jebanoje gawno in > /proc in addition() > > 4. Really simple to implement/document understand. > > 5. Be the same way as Solaris does similiar things. > > ... > > Damn: I will let my chess club alone toady and will just code it down > NOW. > > Spec: > > 1. Processes with a UID < 100 are immune to OOM killers. > 2. Processes with a UID >= 100 && < 500 are hard for the OOM killer to > take on. > 3. Processes with a UID >= 500 are easy targets. > > Let me introduce a new terminology in full analogy to "fire walls" > routers and therabouts: > > Processes of category 1. are called captains (oficerzy) > Processes of category 2. are called corporals (porucznicy) > Processes of category 2. are called privates (¿o³nierze) OK I just did it. as I already told I have "stress tested it" by installing the Orcale insternet application server suide on a hoplessly underequipped box ("only" 128MByte RMA). The assorted patch is attached. Since I'm one day late up to my promise to provide this patch it's actually fascinating that already 4 people (in esp. not newbees requesting a new /proc entry for everything) for reassurance that I will indeed implement it... Well this kind of "high" and "eager" feadback seems for me to indicate that there is very serious desire for it. And then of course I just have to ask our people working with DB's here at work as well :-). Ah... and of course I think this patch can already go directly into the official kernel. The quality of code should permit it. I would esp. request Rik van Riel to have a closer look at it... [-- Attachment #2: oom.diff --] [-- Type: text/plain, Size: 11110 bytes --] diff -urN linux/mm/oom_kill.c linux-new/mm/oom_kill.c --- linux/mm/oom_kill.c Tue Nov 14 19:56:46 2000 +++ linux-new/mm/oom_kill.c Sun Mar 25 17:17:34 2001 @@ -1,18 +1,64 @@ /* * linux/mm/oom_kill.c - * + * * Copyright (C) 1998,2000 Rik van Riel * Thanks go out to Claus Fischer for some serious inspiration and * for goading me into coding this file... * - * The routines in this file are used to kill a process when - * we're seriously out of memory. This gets called from kswapd() - * in linux/mm/vmscan.c when we really run out of memory. - * - * Since we won't call these routines often (on a well-configured - * machine) this file will double as a 'coding guide' and a signpost - * for newbie kernel hackers. It features several pointers to major - * kernel subsystems and hints as to where to find out what things do. + * Sat Mar 24 22:07:15 CET 2001 Marcin Dalecki <dalecki@evision-ventures.com>: + * + * Replaced the original algorith with something reasonably, predictable + * and managable. I will call this "Stalins Eviction". + */ + +/* + * The routines in this file are used to kill a process when the system is + * entierly out of memmory (both: RAM and swap). This gets called from + * kswapd() in linux/mm/vmscan.c when we are in total starvation due to the + * fact, that the only thing the system is busy at, is to try to allocate some + * physical memmory page, where there are no pages anymore left. In such it + * does make perfect sense to kill some offending process, just to make the + * system go on and survive. + * + * IT IS A LAST RESORT! + * + * ALLERT: In contrast to popular beleve the invention of the mechanism + * presented here IS IMPORTANT for system security reasons. It is preventing + * one border corner of an easy DNS attack in case the sysadmin didn't take + * other measures, which he either overworked or incompetent as he is usually + * doesn't. + * + * Basically the eviction goes on as follows: + * + * 1. Normal interactive user processes are the first candidates for a shoot. + * We consider all users with a UID >= 500 as normal interactive users. + * + * 2. If there are no processes started by a normal interactive user, we aim + * at the processes from nonessential processes (for the "live" of the system + * as a whole). We consider users with a UID >= 100 and < 500 as essential + * service user. + * + * 3. If this still isn't the case we start to shut down the system components + * peace by peace... (UID < 100). + * + * In fact the heuristics used to determine, at which of the process classes + * to aim first, are a bit more sophisticated, If you wan't those details + * please read the code below. It does (hopefully so) speak for itself. + * + * As an example: If you are running a big Linux box, which is mainly deployed + * as an oracle server, but where normal interactive human users can log on as + * well, then you should run oracle server with a UID < 500 and >= 100. Then + * dumb ass loosers starting 100 netscape and 500 emacs sessions, won't be + * able anylonger to kill the essential oracle service. + * + * The introduction of this additional UID semantics shouldn't affect any + * present systems. (Read: It won't make anything worser in comparision to + * previous versions of the Linux kernel.) However every single distributor of + * "enterprise grade" applications for Linux SHOULD take a note on this. + * + * regards: + * + * Marcin Dalecki */ #include <linux/mm.h> @@ -23,125 +69,141 @@ /* #define DEBUG */ -/** - * int_sqrt - oom_kill.c internal function, rough approximation to sqrt - * @x: integer of which to calculate the sqrt - * - * A very rough approximation to the sqrt() function. - */ -static unsigned int int_sqrt(unsigned int x) -{ - unsigned int out = x; - while (x & ~(unsigned int)1) x >>=2, out >>=1; - if (x) out -= out >> 2; - return (out ? out : 1); -} - -/** - * oom_badness - calculate a numeric value for how bad this task has been - * @p: task struct of which task we should calculate - * - * The formula used is relatively simple and documented inline in the - * function. The main rationale is that we want to select a good task - * to kill when we run out of memory. - * - * Good in this context means that: - * 1) we lose the minimum amount of work done - * 2) we recover a large amount of memory - * 3) we don't kill anything innocent of eating tons of memory - * 4) we want to kill the minimum amount of processes (one) - * 5) we try to kill the process the user expects us to kill, this - * algorithm has been meticulously tuned to meet the priniciple - * of least surprise ... (be careful when you change it) - */ +#define CPU_FACTOR 32 +#define AGE_FACTOR 256 -static int badness(struct task_struct *p) +enum uid_class { + normal, + service, + system, + immune +}; + +static int determine_uid_class(struct task_struct *p) { - int points, cpu_time, run_time; + int uid; + int uid_class = system; - if (!p->mm) - return 0; - /* - * The memory size of the process is the basis for the badness. + /* This makes processes started by for example suexec be better killing + * candidates then root's processes themself. */ - points = p->mm->total_vm; + uid = p->uid; + if (p->euid > p->uid) + uid = p->euid; - /* - * CPU time is in seconds and run time is in minutes. There is no - * particular reason for this other than that it turned out to work - * very well in practice. This is not safe against jiffie wraps - * but we don't care _that_ much... + /* This is implementing the intendid semantics of different user id + * value ranges. */ - cpu_time = (p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3); - run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10); + if (uid < 100) + uid_class = system; + else if (uid < 500) + uid_class = service; + else + uid_class = normal; - points /= int_sqrt(cpu_time); - points /= int_sqrt(int_sqrt(run_time)); - - /* - * Niced processes are most likely less important, so double - * their badness points. - */ - if (p->nice > 0) - points *= 2; - /* - * Superuser processes are usually more important, so we make it + /* Superuser processes are usually more important, so we make it * less likely that we kill those. */ - if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN) || - p->uid == 0 || p->euid == 0) - points /= 4; + if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_ADMIN)) + uid_class = system; - /* - * We don't want to kill a process with direct hardware access. + /* We don't want to kill a process with direct hardware access. * Not only could that mess up the hardware, but usually users * tend to only have this flag set on applications they think * of as important. */ if (cap_t(p->cap_effective) & CAP_TO_MASK(CAP_SYS_RAWIO)) - points /= 4; -#ifdef DEBUG - printk(KERN_DEBUG "OOMkill: task %d (%s) got %d points\n", - p->pid, p->comm, points); -#endif - return points; + uid_class = system; + + return uid_class; +} + +static int calculate_penalty(struct task_struct *p) +{ + int cpu_penalty = 0; + int age_penalty = 0; + + + /* Now we calculate the penalty due to the cpu usage. NOTE: This is + * not safe against jiffie wraps. + */ + { + int run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10); + + if (run_time > 0) { + cpu_penalty = (CPU_FACTOR * run_time) / + ((p->times.tms_utime + p->times.tms_stime) >> (SHIFT_HZ + 3) + run_time); + } else + cpu_penalty = CPU_FACTOR; + } + + /* Let's make older processes more important then newer ones. + * This is not safe against jiffie wraps, delibrately so. + */ + if (p->start_time > 0) + age_penalty = AGE_FACTOR * p->start_time / jiffies; + else + age_penalty = 0; + + /* OK this should be sufficient, we don't want to make things more + * complicated then needed. In esp. since there is no easy and portable + * way to determine the total amount of memmory pages present, we don't + * take this into account here. + * + * Let us worry about more detailed heuristics here, only if there will + * be still many people reporting serious problems on linux-kernel. + */ + + return cpu_penalty + age_penalty; } /* - * Simple selection loop. We chose the process with the highest - * number of 'points'. We need the locks to make sure that the - * list of task structs doesn't change while we look the other way. - * - * (not docbooked, we don't want this one cluttering up the manual) + * Simple selection loop. We chose the process with the highest penalty. */ -static struct task_struct * select_bad_process(void) +static struct task_struct * select_process(void) { - int maxpoints = 0; - struct task_struct *p = NULL; - struct task_struct *chosen = NULL; - - read_lock(&tasklist_lock); - for_each_task(p) { - if (p->pid) { - int points = badness(p); - if (points > maxpoints) { - chosen = p; - maxpoints = points; + enum uid_class i; + struct task_struct *choice = NULL; + + for (i = normal; i != immune; ++i) { + int maxpenalty = 0; + struct task_struct *p = NULL; + + /* The locks make sure that the list of task structs doesn't + * change while we look at it. + */ + + read_lock(&tasklist_lock); + for_each_task(p) { + if (!p->mm) + continue; + + if (i != determine_uid_class(p)) + continue; + + if (p->pid) { + int penalty = calculate_penalty(p); + + if (penalty > maxpenalty) { + choice = p; + maxpenalty = penalty; + } } } + read_unlock(&tasklist_lock); + + if (choice != NULL) + break; } - read_unlock(&tasklist_lock); - return chosen; + + return choice; } -/** - * oom_kill - kill the "best" process when we run out of memory - * +/* * If we run out of memory, we have the choice between either * killing a random task (bad), letting the system crash (worse) - * OR try to be smart about which process to kill. Note that we - * don't have to be perfect here, we just have to be good. + * OR try to be smart about which process to kill. * * We must be careful though to never send SIGKILL a process with * CAP_SYS_RAW_IO set, send SIGTERM instead (but it's unlikely that @@ -149,14 +211,12 @@ */ void oom_kill(void) { + struct task_struct *p = select_process(); - struct task_struct *p = select_bad_process(); - - /* Found nothing?!?! Either we hang forever, or we panic. */ if (p == NULL) panic("Out of memory and no killable processes...\n"); - printk(KERN_ERR "Out of Memory: Killed process %d (%s).\n", p->pid, p->comm); + printk(KERN_ERR "Out of memory: killed process %d (%s).\n", p->pid, p->comm); /* * We give our sacrificial lamb high priority and access to @@ -180,14 +240,14 @@ */ current->policy |= SCHED_YIELD; schedule(); + return; } -/** - * out_of_memory - is the system out of memory? +/** out_of_memory - is the system out of memory? * - * Returns 0 if there is still enough memory left, - * 1 when we are out of memory (otherwise). + * Returns 0 if there is still enough memory left, 1 when we are out of memory + * (otherwise). */ int out_of_memory(void) { ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 13:54 ` [PATCH] OOM handling Martin Dalecki @ 2001-03-25 15:06 ` Rik van Riel 2001-03-25 15:20 ` Martin Dalecki 2001-03-25 15:44 ` Jonathan Morton 2001-03-26 2:13 ` Matthew Chappee 2 siblings, 1 reply; 106+ messages in thread From: Rik van Riel @ 2001-03-25 15:06 UTC (permalink / raw) To: Martin Dalecki Cc: Alan Cox, James A. Sutherland, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel On Sun, 25 Mar 2001, Martin Dalecki wrote: > Ah... and of course I think this patch can already go directly > into the official kernel. The quality of code should permit > it. I would esp. request Rik van Riel to have a closer look > at it... - the algorithms are just as much black magic as the old ones - it hasn't been tested in any other workload than your Oracle server (at least, not that I've heard of) - the comments are just too rude ;) (though fun) - the AGE_FACTOR calculation will overflow after the system has an uptime of just _3_ days - your code might be good for server loads, but for normal users it will kill what amounts to a random process ... this is horribly wrong for desktop systems In short, I like some of your ideas, but I really fail to see why this version of the code would be any better than what we're having now. In fact, since there seem to be about 1000x more desktop boxes than Oracle boxes (probably even more), I'd say that the current algorithm in the kernel is better (since it's right for more systems). Now if you can make something which preserves the heuristics which serve us so well on desktop boxes and add something that makes it also work on your Oracle servers, then I'd be interested. Alternatively, I also wouldn't mind a completely new algorithm, as long as it turns out to work well on desktop boxes too. But remember that we cannot tell this without first testing the thing on a few dozen (hundreds?) of machines with different workloads... regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 15:06 ` Rik van Riel @ 2001-03-25 15:20 ` Martin Dalecki 2001-03-25 15:50 ` Jeff Garzik 2001-03-25 17:08 ` Rik van Riel 0 siblings, 2 replies; 106+ messages in thread From: Martin Dalecki @ 2001-03-25 15:20 UTC (permalink / raw) To: Rik van Riel Cc: Alan Cox, James A. Sutherland, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel Rik van Riel wrote: > > On Sun, 25 Mar 2001, Martin Dalecki wrote: > > > Ah... and of course I think this patch can already go directly > > into the official kernel. The quality of code should permit > > it. I would esp. request Rik van Riel to have a closer look > > at it... > > - the algorithms are just as much black magic as the old ones > - it hasn't been tested in any other workload than your Oracle > server (at least, not that I've heard of) No that's not true! Read the code please. The result is a simple wighted sum without artificial unit. > - the comments are just too rude ;) > (though fun) That's only a matter for the "smooth" anglosaxons. Different cultures have different measures on this. I don't feel the need to adjust myself to the american cultural obstructivity. I esp. to the habit of don't saying clearly what one means if one want's to criticize something. > - the AGE_FACTOR calculation will overflow after the system has > an uptime of just _3_ days > - your code might be good for server loads, but for normal > users it will kill what amounts to a random process ... this > is horribly wrong for desktop systems No that isn't true. I esp. the behaviour will be predictable. > In short, I like some of your ideas, but I really fail to see why > this version of the code would be any better than what we're having > now. In fact, since there seem to be about 1000x more desktop boxes > than Oracle boxes (probably even more), I'd say that the current > algorithm in the kernel is better (since it's right for more systems). You misunderstood me compleatly. I wasn't using an running oracle db as a test case. I was using the INSTALLATION process. Since you apparently don't know about oracle I will tell you: It involves a lot of different applications. Infact TONS of: Java, shell, compiler, linker, apache, perl and whatanot. > Now if you can make something which preserves the heuristics which > serve us so well on desktop boxes and add something that makes it > also work on your Oracle servers, then I'd be interested. I would like to state: The current heuristics DON'T serve us well on desktop boxes... > Alternatively, I also wouldn't mind a completely new algorithm, as > long as it turns out to work well on desktop boxes too. But remember I was testing on a NOTEBOOK. > that we cannot tell this without first testing the thing on a few > dozen (hundreds?) of machines with different workloads... That's true for sure. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 15:20 ` Martin Dalecki @ 2001-03-25 15:50 ` Jeff Garzik 2001-03-25 17:08 ` Rik van Riel 1 sibling, 0 replies; 106+ messages in thread From: Jeff Garzik @ 2001-03-25 15:50 UTC (permalink / raw) To: Martin Dalecki; +Cc: Rik van Riel, linux-kernel Martin Dalecki wrote: > Rik van Riel wrote: > > - the comments are just too rude ;) > > (though fun) > > That's only a matter for the "smooth" anglosaxons. Different > cultures have different measures on this. I don't feel the need > to adjust myself to the american cultural obstructivity. > I esp. to the habit of don't saying clearly what one means if one > want's to criticize something. Rik should know that lkml and the kernel sources are in no way politically correct... Fuck 'em... :) Jeff -- Jeff Garzik | May you have warm words on a cold evening, Building 1024 | a full moon on a dark night, MandrakeSoft | and a smooth road all the way to your door. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 15:20 ` Martin Dalecki 2001-03-25 15:50 ` Jeff Garzik @ 2001-03-25 17:08 ` Rik van Riel 1 sibling, 0 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-25 17:08 UTC (permalink / raw) To: Martin Dalecki Cc: Alan Cox, James A. Sutherland, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel On Sun, 25 Mar 2001, Martin Dalecki wrote: > Rik van Riel wrote: > > - the AGE_FACTOR calculation will overflow after the system has > > an uptime of just _3_ days > > I esp. the behaviour will be predictable. Ummmm ? Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 13:54 ` [PATCH] OOM handling Martin Dalecki 2001-03-25 15:06 ` Rik van Riel @ 2001-03-25 15:44 ` Jonathan Morton 2001-03-25 15:47 ` Martin Dalecki 2001-03-25 16:36 ` Jonathan Morton 2001-03-26 2:13 ` Matthew Chappee 2 siblings, 2 replies; 106+ messages in thread From: Jonathan Morton @ 2001-03-25 15:44 UTC (permalink / raw) To: Rik van Riel, Martin Dalecki Cc: Alan Cox, James A. Sutherland, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel >- the AGE_FACTOR calculation will overflow after the system has > an uptime of just _3_ days Tsk tsk tsk... >Now if you can make something which preserves the heuristics which >serve us so well on desktop boxes and add something that makes it >also work on your Oracle servers, then I'd be interested. What do people think of my "adjustments" to the existing algorithm? Mostly it gives extra longevity to low-UID and long-running processes, which to my mind makes sense for both server and desktop boxen. Taking for example an 80Mb process under my adjustments, it is reduced to under the badness of a new shell process after less than a week's uptime (compared to several months), especially if it is run as low-UID. Small, short-lived interactive processes still don't get *too* adversely affected, but a memory hog with only a few hours' uptime will still get killed with high probability (pretty much what we want). I didn't quite understand Martin's comments about "not normalised" - presumably this is some mathematical argument, but what does this actually mean? -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 15:44 ` Jonathan Morton @ 2001-03-25 15:47 ` Martin Dalecki 2001-03-25 16:36 ` Jonathan Morton 1 sibling, 0 replies; 106+ messages in thread From: Martin Dalecki @ 2001-03-25 15:47 UTC (permalink / raw) To: Jonathan Morton Cc: Rik van Riel, Alan Cox, James A. Sutherland, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel Jonathan Morton wrote: > > >- the AGE_FACTOR calculation will overflow after the system has > > an uptime of just _3_ days > > Tsk tsk tsk... > > >Now if you can make something which preserves the heuristics which > >serve us so well on desktop boxes and add something that makes it > >also work on your Oracle servers, then I'd be interested. > > What do people think of my "adjustments" to the existing algorithm? Mostly > it gives extra longevity to low-UID and long-running processes, which to my > mind makes sense for both server and desktop boxen. > > Taking for example an 80Mb process under my adjustments, it is reduced to > under the badness of a new shell process after less than a week's uptime > (compared to several months), especially if it is run as low-UID. Small, > short-lived interactive processes still don't get *too* adversely affected, > but a memory hog with only a few hours' uptime will still get killed with > high probability (pretty much what we want). > > I didn't quite understand Martin's comments about "not normalised" - > presumably this is some mathematical argument, but what does this actually > mean? Not mathematics. It's from physics. Very trivial physics, basic scool indeed. If you try to calculate some weightning factors which involve different units (in this case mostly seconds and bits) then you will have to make sure tha those units get factorized out. Rik is just throwing the absolute values together... Trivial example: "How long does it take to travel from A to B?" "It takes about 1000sec." "How long does it take to travel from C to D?" "It takes about 100sec." "Ah, so it's 10 times longer from A to B then from C to D". Write it down - you just divide the seconds out. In case of varying intervalls you have to normalize measures by max/min values. Since for example the amount of RAM in a box can vary as well. Otherwise your algorithms will behave very differently on boxes with low RAM in comparision to boxes with huge amounts of it. That's what one says if he talks about an algorithm "scalling well". ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 15:44 ` Jonathan Morton 2001-03-25 15:47 ` Martin Dalecki @ 2001-03-25 16:36 ` Jonathan Morton 2001-03-26 21:34 ` Kevin Buhr 2001-03-26 22:00 ` Jonathan Morton 1 sibling, 2 replies; 106+ messages in thread From: Jonathan Morton @ 2001-03-25 16:36 UTC (permalink / raw) To: Martin Dalecki Cc: Rik van Riel, Alan Cox, James A. Sutherland, Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel >> I didn't quite understand Martin's comments about "not normalised" - >> presumably this is some mathematical argument, but what does this actually >> mean? > >Not mathematics. It's from physics. Very trivial physics, basic scool >indeed. >If you try to calculate some weightning >factors which involve different units (in this case mostly seconds and >bits) >then you will have to make sure tha those units get factorized out. >Rik is just throwing the absolute values together... Understood - my Physics courses covered this as well, but not using the word "normalise". -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 16:36 ` Jonathan Morton @ 2001-03-26 21:34 ` Kevin Buhr 2001-03-26 22:00 ` Jonathan Morton 1 sibling, 0 replies; 106+ messages in thread From: Kevin Buhr @ 2001-03-26 21:34 UTC (permalink / raw) To: Jonathan Morton; +Cc: Martin Dalecki, Rik van Riel, linux-mm, linux-kernel Jonathan Morton <chromi@cyberspace.org> writes: > > Understood - my Physics courses covered this as well, but not using the > word "normalise". Be that as it may, Martin's comments about normalizing are nonsense. Rik's killer (at least in 2.4.3-pre7) produces a badness value that's a product of badness factors of various units. It then uses these products only for relative comparisons, choosing the process with maximum badness product to kill. No normalization is necessary, nor would it have any effect. The reason a 256 Meg process on a 1 Gig machine was being killed had nothing to do with normalization---it was a bug where the OOM killer was being called long before we were reduced to last resorts. Kevin <buhr@stat.wisc.edu> ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 16:36 ` Jonathan Morton 2001-03-26 21:34 ` Kevin Buhr @ 2001-03-26 22:00 ` Jonathan Morton 1 sibling, 0 replies; 106+ messages in thread From: Jonathan Morton @ 2001-03-26 22:00 UTC (permalink / raw) To: Kevin Buhr; +Cc: Martin Dalecki, Rik van Riel, linux-mm, linux-kernel >> Understood - my Physics courses covered this as well, but not using the >> word "normalise". > >Be that as it may, Martin's comments about normalizing are nonsense. >Rik's killer (at least in 2.4.3-pre7) produces a badness value that's >a product of badness factors of various units. It then uses these >products only for relative comparisons, choosing the process with >maximum badness product to kill. No normalization is necessary, nor >would it have any effect. > >The reason a 256 Meg process on a 1 Gig machine was being killed had >nothing to do with normalization---it was a bug where the OOM killer >was being called long before we were reduced to last resorts. Of course, I realised that. Actually, what the code does is take an initial badness factor (the memory usage), then divide it using goodness factors (some based on time, some purely arbitrary), both of which can be considered dimensionless. Also, at the end, the absolute value is not considered - we simply look at the biggest one and kill it. All "denormalisation" does is scale all the values, it doesn't affect which one actually turns out biggest. -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-25 13:54 ` [PATCH] OOM handling Martin Dalecki 2001-03-25 15:06 ` Rik van Riel 2001-03-25 15:44 ` Jonathan Morton @ 2001-03-26 2:13 ` Matthew Chappee 2001-03-26 11:33 ` Ingo Oeser 2001-03-26 16:11 ` Michael Peddemors 2 siblings, 2 replies; 106+ messages in thread From: Matthew Chappee @ 2001-03-26 2:13 UTC (permalink / raw) To: dalecki; +Cc: linux-kernel > OK I just did it. as I already told I have "stress tested it" by > Since I'm one day late up to my promise to provide this > patch it's actually fascinating that already 4 people (in esp. not > newbees requesting a new /proc entry for everything) > for reassurance that I will indeed implement it... Well > this kind of "high" and "eager" feadback seems for me to indicate that > there is very serious desire for it. And then of course I > just have to ask our people working with DB's here at work as well :-). I'm one of the four that contacted you. :-) I'm certainly not a newbie and it appears that you nailed the reason that I'm interested. I'm an Oracle DBA that runs a fairly large database(s) on Linux. A patch like this is important. Case in point: We do not have loads of money, so we have to double up our servers. A database server can also be an app server, or a web server, etc. Now, let's say that Joe Surfer has 10 netscape sessions open on my database server (hey, talk to my boss, it's not my fault). He's grabbing Pr0n/MP3s/whatever as fast as our 'T' will allow. One of the websites that he visits has some nasty java that bloats his browser to the point of OOM. Something has to die in order for the machine to stay alive. Remember the 100 sided die from D&D? Roll it and kill -9? Hopefully not, I should be able to tell the OOM_Killer to wipe out this user's stuff first, based on the prowess of his UID. The point being, my database shouldn't be selected for termination. Nobody ever got fired for kill -9'ing netscape, but Oracle is a different story. I urge you, consider the patch. > Ah... and of course I think this patch can already go directly > into the official kernel. The quality of code should permit > it. I would esp. request Rik van Riel to have a closer look > at it... Whoa, easy there trigger. I'd rather have a wacked out OOM_Killer than a barely-tested alternative. Matthew ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-26 2:13 ` Matthew Chappee @ 2001-03-26 11:33 ` Ingo Oeser 2001-03-26 11:49 ` Jasper Spaans 2001-03-26 16:11 ` Michael Peddemors 1 sibling, 1 reply; 106+ messages in thread From: Ingo Oeser @ 2001-03-26 11:33 UTC (permalink / raw) To: Matthew Chappee; +Cc: dalecki, linux-kernel On Sun, Mar 25, 2001 at 09:13:20PM -0500, Matthew Chappee wrote: > The point being, my database shouldn't be selected for > termination. Nobody ever got fired for kill -9'ing netscape, > but Oracle is a different story. I urge you, consider the > patch. No, you got fired for not setting ulimits. Your boss is right then! ulimit -d 65536 ulimit -v 81920 and my netscape is very happy most of the time. And my system is not disturbed. 64MB RAM + 256MB swap. In a school I had the same setup on a 256MB server (256MB swap) serving apps (StarOffice and Netscape) to ~16 X clients. I never had OOM there. I think this is the amount of memory an oracle server at least have to have, right? What are your ulimits? What are your amounts of RAM+SWAP? Regards Ingo Oeser -- 10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag> <<<<<<<<<<<< been there and had much fun >>>>>>>>>>>> ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-26 11:33 ` Ingo Oeser @ 2001-03-26 11:49 ` Jasper Spaans 0 siblings, 0 replies; 106+ messages in thread From: Jasper Spaans @ 2001-03-26 11:49 UTC (permalink / raw) To: Ingo Oeser; +Cc: Matthew Chappee, dalecki, linux-kernel On Mon, Mar 26, 2001 at 01:33:05PM +0200, Ingo Oeser wrote: > > The point being, my database shouldn't be selected for > > termination. Nobody ever got fired for kill -9'ing netscape, > > but Oracle is a different story. I urge you, consider the > > patch. > > No, you got fired for not setting ulimits. Your boss is right > then! > > ulimit -d 65536 > ulimit -v 81920 Ehm, right. Running netscape (or any other memory hog which doesn't belong on a server) on a production server seems reason enough for a little talk with your boss. On the other hand, if no other apps are running on your box, and Oracle gets killed due to OOM, you probably have underestimated your hardware needs, or Oracle has gone haywire, which is a good reason for killing it. Thus, nothing seems wrong with the current kill algorithm to me... Just my two cents, -- Q_. Jasper Spaans <j@sp3r.net> `~\ http://jsp.ds9a.nl/ Mr /\ Tel/Fax: +31-20-8749842 Zap Move all .sig for great justice! ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-26 2:13 ` Matthew Chappee 2001-03-26 11:33 ` Ingo Oeser @ 2001-03-26 16:11 ` Michael Peddemors 1 sibling, 0 replies; 106+ messages in thread From: Michael Peddemors @ 2001-03-26 16:11 UTC (permalink / raw) To: matthew; +Cc: linux-kernel Uh... and aside from init, mission critical stuff... crond should never get killed, it runs mission critical cleanup tasks.. If crond dies, might as well make the machine die in a lot of cases.. I hate to miss my nightly database exports... Getting to look more and more like we need some way to configure certain tasks at the admin level to never die.. -- "Catch the Magic of Linux..." -------------------------------------------------------- Michael Peddemors - Senior Consultant LinuxAdministration - Internet Services NetworkServices - Programming - Security WizardInternet Services http://www.wizard.ca Linux Support Specialist - http://www.linuxmagic.com -------------------------------------------------------- (604)589-0037 Beautiful British Columbia, Canada ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:32 ` Alan Cox 2001-03-23 18:58 ` Martin Dalecki @ 2001-03-23 19:45 ` Jonathan Morton 2001-03-23 23:26 ` Eric W. Biederman 2001-03-25 15:30 ` Martin Dalecki ` (2 subsequent siblings) 4 siblings, 1 reply; 106+ messages in thread From: Jonathan Morton @ 2001-03-23 19:45 UTC (permalink / raw) To: Martin Dalecki, Alan Cox Cc: James A. Sutherland, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel >It would make much sense to make the oom killer >leave not just root processes alone but processes belonging to a UID >lower >then a certain value as well (500). This would be: > >1. Easly managable by the admin. Just let oracle/www and analogous users > have a UID lower then let's say 500. That sounds vaguely sensible. However, make it a "much less likely" rather than an "impossible", otherwise we end up with an unkillable runaway root process killing everything else in userland. I'm still in favour of a failing malloc(), and I'm currently reading a bit of source and docs to figure out where this should be done and why it isn't done now. So far I've found the overcommit_memory flag, which looks kinda promising. >1. Processes with a UID < 100 are immune to OOM killers. >2. Processes with a UID >= 100 && < 500 are hard for the OOM killer to >take on. >3. Processes with a UID >= 500 are easy targets. As I said above, "immune" can be dangerous. "Extremely hard" would be better terminology and behaviour. It also helps that the current weighting in badness() appears to leave getty processes alone, since they don't consume much and normally have long uptimes - also I believe init would try to restart them anyway. -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 19:45 ` [PATCH] Prevent OOM from killing init Jonathan Morton @ 2001-03-23 23:26 ` Eric W. Biederman 0 siblings, 0 replies; 106+ messages in thread From: Eric W. Biederman @ 2001-03-23 23:26 UTC (permalink / raw) To: Jonathan Morton Cc: Martin Dalecki, Alan Cox, James A. Sutherland, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Jonathan Morton <chromi@cyberspace.org> writes: > >It would make much sense to make the oom killer > >leave not just root processes alone but processes belonging to a UID > >lower > >then a certain value as well (500). This would be: > > > >1. Easly managable by the admin. Just let oracle/www and analogous users > > have a UID lower then let's say 500. > > That sounds vaguely sensible. However, make it a "much less likely" rather > than an "impossible", otherwise we end up with an unkillable runaway root > process killing everything else in userland. > > I'm still in favour of a failing malloc(), and I'm currently reading a bit > of source and docs to figure out where this should be done and why it isn't > done now. So far I've found the overcommit_memory flag, which looks kinda > promising. Lookup mlock & mlock_all they will handle the single process case. Of course if you OOM you still have problems but that should make them much harder to trigger. Eric ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:32 ` Alan Cox 2001-03-23 18:58 ` Martin Dalecki 2001-03-23 19:45 ` [PATCH] Prevent OOM from killing init Jonathan Morton @ 2001-03-25 15:30 ` Martin Dalecki 2001-03-25 20:47 ` Stephen Satchell 2001-03-25 21:51 ` [PATCH] non-overcommit memory, improved OOM handling, safety margin (was Re: Prevent OOM from killing init) Jonathan Morton 4 siblings, 0 replies; 106+ messages in thread From: Martin Dalecki @ 2001-03-25 15:30 UTC (permalink / raw) To: Alan Cox Cc: James A. Sutherland, Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel Alan Cox wrote: > > > That depends what you mean by "must not". If it's your missile guidance > > system, aircraft autopilot or life support system, the system must not run > > out of memory in the first place. If the system breaks down badly, killing > > init and thus panicking (hence rebooting, if the system is set up that > > way) seems the best approach. > > Ultra reliable systems dont contain memory allocators. There are good reasons > for this but the design trade offs are rather hard to make in a real world > environment I esp. they run on CPU's without a stack or what? ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:32 ` Alan Cox ` (2 preceding siblings ...) 2001-03-25 15:30 ` Martin Dalecki @ 2001-03-25 20:47 ` Stephen Satchell 2001-03-25 21:51 ` [PATCH] non-overcommit memory, improved OOM handling, safety margin (was Re: Prevent OOM from killing init) Jonathan Morton 4 siblings, 0 replies; 106+ messages in thread From: Stephen Satchell @ 2001-03-25 20:47 UTC (permalink / raw) To: linux-mm, linux-kernel At 05:30 PM 3/25/01 +0200, you wrote: > > Ultra reliable systems dont contain memory allocators. There are good > reasons > > for this but the design trade offs are rather hard to make in a real world > > environment > >I esp. they run on CPU's without a stack or what? No dynamic memory allocation AT ALL. That includes the prohibition of a stack. I've seen avionics-loop systems that abstract a stack but the "allocators" are part of the application and are designed to fall over gracefully when they become full -- but getting this past a project manager is hard, as it should be. Then there are those systems with rather interesting watchdog timers. If you don't tickle them just right, they fire and force a restart. The nastiest of these required that you send four specific values to a specific I/O port, and the hardware looked to see if the values violated certain timing guidelines. If you sent the code too early or too late, or if the value in the sequence was incorrect, BAM. The hardware was designed by a guy with some rather interesting experiences with software "engineers" dealing with watchdog timers... Satch ^ permalink raw reply [flat|nested] 106+ messages in thread
* [PATCH] non-overcommit memory, improved OOM handling, safety margin (was Re: Prevent OOM from killing init) 2001-03-23 17:32 ` Alan Cox ` (3 preceding siblings ...) 2001-03-25 20:47 ` Stephen Satchell @ 2001-03-25 21:51 ` Jonathan Morton 2001-03-27 15:23 ` Pavel Machek 4 siblings, 1 reply; 106+ messages in thread From: Jonathan Morton @ 2001-03-25 21:51 UTC (permalink / raw) To: linux-mm, linux-kernel; +Cc: douglas [-- Attachment #1: Type: text/plain, Size: 1157 bytes --] The attached patch is against 2.4.1 and incorporates the following: - More optimistic OOM checking, and slightly improved OOM-kill algorithm, as per my previous patch. - Accounting of reserved memory, allowing for... - Non-overcommittal of memory if sysctl_overcommit_memory < 0, enforced even for root if < -1 (as per old non-overcommittal patch for 2.3.99, but fixed). - Behaviour when sysctl_overcommit_memory == 0 or 1 is as per my original patch (eg. soft-limited overcommittal and full overcommittal respectively). Defaults to -1. - If a process is larger than 4 times the free VM on the system, it is not allowed to allocate or reserve any more, unless overcomittal is allowed as above. Note that root may have less privilege to hog memory when sysctl_overcommit_memory == 0 than when == -1. - As part of the above, a new function vm_invalidate_totalmem() is available which should be called whenever the total amount of VM changes - at the moment this is done in sys_swap{on,off}(). This is to avoid having to recalculate the amount of available memory and swap whenever an allocation is needed. If someone knows a better way, let me know. [-- Attachment #2: oom-patch.2.diff --] [-- Type: text/plain, Size: 16943 bytes --] diff -ur -x via-rhine* linux-2.4.1.orig/fs/exec.c linux/fs/exec.c --- linux-2.4.1.orig/fs/exec.c Tue Jan 30 07:10:58 2001 +++ linux/fs/exec.c Sun Mar 25 17:05:03 2001 @@ -385,19 +385,27 @@ static int exec_mmap(void) { struct mm_struct * mm, * old_mm; + struct task_struct * tsk = current; + unsigned long reserved = 0; - old_mm = current->mm; + old_mm = tsk->mm; if (old_mm && atomic_read(&old_mm->mm_users) == 1) { + /* Keep old stack reservation */ mm_release(); exit_mmap(old_mm); return 0; } + reserved = vm_enough_memory(tsk->rlim[RLIMIT_STACK].rlim_cur >> + PAGE_SHIFT); + if(!reserved) + return -ENOMEM; + mm = mm_alloc(); if (mm) { - struct mm_struct *active_mm = current->active_mm; + struct mm_struct *active_mm = tsk->active_mm; - if (init_new_context(current, mm)) { + if (init_new_context(tsk, mm)) { mmdrop(mm); return -ENOMEM; } @@ -422,6 +430,8 @@ mmdrop(active_mm); return 0; } + + vm_release_memory(reserved); return -ENOMEM; } diff -ur -x via-rhine* linux-2.4.1.orig/fs/proc/proc_misc.c linux/fs/proc/proc_misc.c --- linux-2.4.1.orig/fs/proc/proc_misc.c Tue Nov 7 19:08:09 2000 +++ linux/fs/proc/proc_misc.c Sun Mar 25 16:57:07 2001 @@ -175,7 +175,9 @@ "LowTotal: %8lu kB\n" "LowFree: %8lu kB\n" "SwapTotal: %8lu kB\n" - "SwapFree: %8lu kB\n", + "SwapFree: %8lu kB\n" + "VMTotal: %8lu kB\n" + "VMReserved:%8lu kB\n", K(i.totalram), K(i.freeram), K(i.sharedram), @@ -190,7 +192,9 @@ K(i.totalram-i.totalhigh), K(i.freeram-i.freehigh), K(i.totalswap), - K(i.freeswap)); + K(i.freeswap), + K(vm_total()), + K(vm_reserved)); return proc_calc_metrics(page, start, off, count, eof, len); #undef B diff -ur -x via-rhine* linux-2.4.1.orig/include/linux/mm.h linux/include/linux/mm.h --- linux-2.4.1.orig/include/linux/mm.h Tue Jan 30 07:24:56 2001 +++ linux/include/linux/mm.h Sun Mar 25 16:57:07 2001 @@ -24,6 +24,13 @@ #include <asm/atomic.h> /* + * These are used to prevent VM overcommit. + */ +extern unsigned long vm_reserved; +extern spinlock_t vm_lock; +extern inline unsigned long vm_total(void); + +/* * Linux kernel virtual memory manager primitives. * The idea being to have a "virtual" mm in the same way * we have a virtual fs - giving a cleaner interface to the @@ -444,6 +451,14 @@ extern unsigned long do_brk(unsigned long, unsigned long); struct zone_t; + +extern long vm_enough_memory(long pages); +extern inline void vm_release_memory(long pages) { + int flags; + spin_lock_irqsave(&vm_lock, flags); + vm_reserved -= pages; + spin_unlock_irqrestore(&vm_lock, flags); +} /* filemap.c */ extern void remove_inode_page(struct page *); extern unsigned long page_unuse(struct page *); diff -ur -x via-rhine* linux-2.4.1.orig/include/linux/sched.h linux/include/linux/sched.h --- linux-2.4.1.orig/include/linux/sched.h Tue Jan 30 07:24:56 2001 +++ linux/include/linux/sched.h Sun Mar 25 16:57:07 2001 @@ -424,9 +424,9 @@ /* * Limit the stack by to some sane default: root can always - * increase this limit if needed.. 8MB seems reasonable. + * increase this limit if needed.. 2MB should be more than enough. */ -#define _STK_LIM (8*1024*1024) +#define _STK_LIM (2*1024*1024) #define DEF_COUNTER (10*HZ/100) /* 100 ms time slice */ #define MAX_COUNTER (20*HZ/100) diff -ur -x via-rhine* linux-2.4.1.orig/kernel/exit.c linux/kernel/exit.c --- linux-2.4.1.orig/kernel/exit.c Thu Jan 4 09:00:35 2001 +++ linux/kernel/exit.c Sun Mar 25 17:29:57 2001 @@ -305,6 +305,11 @@ mm_release(); if (mm) { atomic_inc(&mm->mm_count); + if (atomic_read(&mm->mm_users) == 1) { + /* Only release stack if we're the last one using this mm */ + vm_release_memory(tsk->rlim[RLIMIT_STACK].rlim_cur >> + PAGE_SHIFT); + } if (mm != tsk->active_mm) BUG(); /* more a memory barrier than a real lock */ task_lock(tsk); diff -ur -x via-rhine* linux-2.4.1.orig/kernel/fork.c linux/kernel/fork.c --- linux-2.4.1.orig/kernel/fork.c Mon Jan 22 23:54:06 2001 +++ linux/kernel/fork.c Sun Mar 25 18:23:35 2001 @@ -125,6 +125,7 @@ static inline int dup_mmap(struct mm_struct * mm) { struct vm_area_struct * mpnt, *tmp, **pprev; + unsigned long reserved = 0; int retval; flush_cache_mm(current->mm); @@ -142,6 +143,15 @@ retval = -ENOMEM; if(mpnt->vm_flags & VM_DONTCOPY) continue; + + reserved = 0; + if((mpnt->vm_flags & (VM_GROWSDOWN | VM_WRITE | VM_SHARED)) == VM_WRITE) { + unsigned long npages = mpnt->vm_end - mpnt->vm_start; + reserved = vm_enough_memory(npages >> PAGE_SHIFT); + if(!reserved) + goto fail_nomem; + } + tmp = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); if (!tmp) goto fail_nomem; @@ -280,6 +290,7 @@ static int copy_mm(unsigned long clone_flags, struct task_struct * tsk) { struct mm_struct * mm, *oldmm; + unsigned long reserved; int retval; tsk->min_flt = tsk->maj_flt = 0; @@ -305,6 +316,10 @@ } retval = -ENOMEM; + reserved = vm_enough_memory(tsk->rlim[RLIMIT_STACK].rlim_cur >> PAGE_SHIFT); + if(!reserved) + goto fail_nomem; + mm = allocate_mm(); if (!mm) goto fail_nomem; @@ -349,6 +364,8 @@ free_pt: mmput(mm); fail_nomem: + if (reserved) + vm_release_memory(reserved); return retval; } diff -ur -x via-rhine* linux-2.4.1.orig/kernel/sys.c linux/kernel/sys.c --- linux-2.4.1.orig/kernel/sys.c Mon Oct 16 20:58:51 2000 +++ linux/kernel/sys.c Sun Mar 25 16:57:07 2001 @@ -1060,6 +1060,7 @@ asmlinkage long sys_setrlimit(unsigned int resource, struct rlimit *rlim) { struct rlimit new_rlim, *old_rlim; + struct task_struct *tsk; if (resource >= RLIM_NLIMITS) return -EINVAL; @@ -1067,7 +1068,8 @@ return -EFAULT; if (new_rlim.rlim_cur < 0 || new_rlim.rlim_max < 0) return -EINVAL; - old_rlim = current->rlim + resource; + tsk = current; + old_rlim = tsk->rlim + resource; if (((new_rlim.rlim_cur > old_rlim->rlim_max) || (new_rlim.rlim_max > old_rlim->rlim_max)) && !capable(CAP_SYS_RESOURCE)) @@ -1075,6 +1077,17 @@ if (resource == RLIMIT_NOFILE) { if (new_rlim.rlim_cur > NR_OPEN || new_rlim.rlim_max > NR_OPEN) return -EPERM; + } + /* if PF_VFORK is set we're just borrowing the VM so don't touch it */ + if (resource == RLIMIT_STACK && !(tsk->flags & PF_VFORK)) { + long newpages = + ((long)(new_rlim.rlim_cur - old_rlim->rlim_cur) >> + PAGE_SHIFT); + if (newpages > 0 && !vm_enough_memory(newpages)) + /* We should really return EAGAIN or ENOMEM. */ + return -EPERM; + if (newpages < 0) + vm_release_memory(-newpages); } *old_rlim = new_rlim; return 0; diff -ur -x via-rhine* linux-2.4.1.orig/mm/mmap.c linux/mm/mmap.c --- linux-2.4.1.orig/mm/mmap.c Mon Jan 29 16:10:41 2001 +++ linux/mm/mmap.c Sun Mar 25 21:06:08 2001 @@ -36,12 +36,42 @@ __S000, __S001, __S010, __S011, __S100, __S101, __S110, __S111 }; -int sysctl_overcommit_memory; +int sysctl_overcommit_memory = -1; + +/* Unfortunately these need to be longs so we need a spinlock. */ +unsigned long vm_reserved = 0; +unsigned long totalvm = 0; +spinlock_t vm_lock = SPIN_LOCK_UNLOCKED; + +void vm_invalidate_totalmem(void) +{ + int flags; + + spin_lock_irqsave(&vm_lock, flags); + totalvm = 0; + spin_unlock_irqrestore(&vm_lock, flags); +} + +unsigned long vm_total(void) +{ + int flags; + + spin_lock_irqsave(&vm_lock, flags); + if(!totalvm) { + struct sysinfo i; + si_meminfo(&i); + si_swapinfo(&i); + totalvm = i.totalram + i.totalswap; + } + spin_unlock_irqrestore(&vm_lock, flags); + + return totalvm; +} /* Check that a process has enough memory to allocate a * new virtual mapping. */ -int vm_enough_memory(long pages) +long vm_enough_memory(long pages) { /* Stupid algorithm to decide if we have enough memory: while * simple, it hopefully works in most obvious cases.. Easy to @@ -52,18 +82,44 @@ * (buffers+cache), use the minimum values. Allow an extra 2% * of num_physpages for safety margin. */ + /* From non-overcommit patch: only allow vm_reserved to exceed + * vm_total if we're root. + */ - long free; + int flags; + long free = 0; - /* Sometimes we want to use more memory than we have. */ - if (sysctl_overcommit_memory) - return 1; - - free = atomic_read(&buffermem_pages); - free += atomic_read(&page_cache_size); - free += nr_free_pages(); - free += nr_swap_pages; - return free > pages; + spin_lock_irqsave(&vm_lock, flags); + if(sysctl_overcommit_memory < 0) + free = vm_total() - vm_reserved; + else { + free = atomic_read(&buffermem_pages); + free += atomic_read(&page_cache_size); + free += nr_free_pages(); + free += nr_swap_pages; + } + + /* Attempt to curtail memory allocations before hard OOM occurs. + * Based on current process size, which is hopefully a good and fast heuristic. + * Also fix bug where the real OOM limit of (free == freepages.min) is not taken into account. + * In fact, we use freepages.high as the threshold to make sure there's still room for buffers+cache. + * + * -- Jonathan "Chromatix" Morton, 24th March 2001 + */ + + if(current->mm) + free -= (current->mm->total_vm / 4); + free -= freepages.high; + + if(pages > free) + if( !(sysctl_overcommit_memory == -1 && current->uid == 0) + && sysctl_overcommit_memory != 1) + pages = 0; + + vm_reserved += pages; + spin_unlock_irqrestore(&vm_lock, flags); + + return pages; } /* Remove one vm structure from the inode's i_mapping address space. */ @@ -148,10 +204,6 @@ if (find_vma_intersection(mm, oldbrk, newbrk+PAGE_SIZE)) goto out; - /* Check if we have enough memory.. */ - if (!vm_enough_memory((newbrk-oldbrk) >> PAGE_SHIFT)) - goto out; - /* Ok, looks good - let it rip. */ if (do_brk(oldbrk, newbrk-oldbrk) != oldbrk) goto out; @@ -190,6 +242,7 @@ { struct mm_struct * mm = current->mm; struct vm_area_struct * vma; + long reserved = 0; int correct_wcount = 0; int error; @@ -317,7 +370,7 @@ /* Private writable mapping? Check memory availability.. */ if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) == VM_WRITE && !(flags & MAP_NORESERVE) && - !vm_enough_memory(len >> PAGE_SHIFT)) + !(reserved = vm_enough_memory(len >> PAGE_SHIFT))) goto free_vma; if (file) { @@ -367,6 +420,7 @@ zap_page_range(mm, vma->vm_start, vma->vm_end - vma->vm_start); flush_tlb_range(mm, vma->vm_start, vma->vm_end); free_vma: + vm_release_memory(reserved); kmem_cache_free(vm_area_cachep, vma); return error; } @@ -546,6 +600,9 @@ area->vm_mm->total_vm -= len >> PAGE_SHIFT; if (area->vm_flags & VM_LOCKED) area->vm_mm->locked_vm -= len >> PAGE_SHIFT; + if ((area->vm_flags & (VM_GROWSDOWN | VM_WRITE | VM_SHARED)) + == VM_WRITE) + vm_release_memory(len >> PAGE_SHIFT); /* Unmapping the whole area. */ if (addr == area->vm_start && end == area->vm_end) { @@ -781,7 +838,7 @@ { struct mm_struct * mm = current->mm; struct vm_area_struct * vma; - unsigned long flags, retval; + unsigned long flags, retval, reserved = 0; len = PAGE_ALIGN(len); if (!len) @@ -812,7 +869,7 @@ if (mm->map_count > MAX_MAP_COUNT) return -ENOMEM; - if (!vm_enough_memory(len >> PAGE_SHIFT)) + if (!(reserved = vm_enough_memory(len >> PAGE_SHIFT))) return -ENOMEM; flags = vm_flags(PROT_READ|PROT_WRITE|PROT_EXEC, @@ -836,8 +893,10 @@ * create a vma struct for an anonymous mapping */ vma = kmem_cache_alloc(vm_area_cachep, SLAB_KERNEL); - if (!vma) + if (!vma) { + vm_release_memory(reserved); return -ENOMEM; + } vma->vm_mm = mm; vma->vm_start = addr; @@ -900,6 +959,9 @@ zap_page_range(mm, start, size); if (mpnt->vm_file) fput(mpnt->vm_file); + if ((mpnt->vm_flags & (VM_GROWSDOWN | VM_WRITE | VM_SHARED)) + == VM_WRITE) + vm_release_memory(size >> PAGE_SHIFT); kmem_cache_free(vm_area_cachep, mpnt); mpnt = next; } diff -ur -x via-rhine* linux-2.4.1.orig/mm/mremap.c linux/mm/mremap.c --- linux-2.4.1.orig/mm/mremap.c Fri Dec 29 22:07:24 2000 +++ linux/mm/mremap.c Sun Mar 25 16:57:07 2001 @@ -13,8 +13,6 @@ #include <asm/uaccess.h> #include <asm/pgalloc.h> -extern int vm_enough_memory(long pages); - static inline pte_t *get_one_pte(struct mm_struct *mm, unsigned long addr) { pgd_t * pgd; @@ -168,7 +166,7 @@ unsigned long flags, unsigned long new_addr) { struct vm_area_struct *vma; - unsigned long ret = -EINVAL; + unsigned long ret = -EINVAL, reserved = 0; if (flags & ~(MREMAP_FIXED | MREMAP_MAYMOVE)) goto out; @@ -240,7 +238,7 @@ /* Private writable mapping? Check memory availability.. */ if ((vma->vm_flags & (VM_SHARED | VM_WRITE)) == VM_WRITE && !(flags & MAP_NORESERVE) && - !vm_enough_memory((new_len - old_len) >> PAGE_SHIFT)) + !(reserved = vm_enough_memory((new_len - old_len) >> PAGE_SHIFT))) goto out; /* old_len exactly to the end of the area.. @@ -265,6 +263,7 @@ addr + new_len); } ret = addr; + reserved = 0; goto out; } } @@ -281,8 +280,12 @@ goto out; } ret = move_vma(vma, addr, old_len, new_len, new_addr); + if (ret != -ENOMEM) + reserved = 0; } out: + if (reserved) + vm_release_memory(reserved); return ret; } diff -ur -x via-rhine* linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c --- linux-2.4.1.orig/mm/oom_kill.c Tue Nov 14 18:56:46 2000 +++ linux/mm/oom_kill.c Sat Mar 24 23:27:59 2001 @@ -76,7 +76,9 @@ run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10); points /= int_sqrt(cpu_time); - points /= int_sqrt(int_sqrt(run_time)); + + /* Long-running processes are *very* important, so don't take the 4th root */ + points /= run_time; /* * Niced processes are most likely less important, so double @@ -93,6 +95,10 @@ p->uid == 0 || p->euid == 0) points /= 4; + /* Much the same goes for processes with low UIDs */ + if(p->uid < 100 || p->euid < 100) + points /= 2; + /* * We don't want to kill a process with direct hardware access. * Not only could that mess up the hardware, but usually users @@ -192,17 +198,24 @@ int out_of_memory(void) { struct sysinfo swp_info; + long free; /* Enough free memory? Not OOM. */ - if (nr_free_pages() > freepages.min) + free = nr_free_pages(); + if (free > freepages.min) + return 0; + + if (free + nr_inactive_clean_pages() > freepages.low) return 0; - if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low) + /* Buffers and caches can be freed up (Jonathan "Chromatix" Morton) */ + free += atomic_read(&buffermem_pages); + free += atomic_read(&page_cache_size); + if (free > freepages.low) return 0; /* Enough swap space left? Not OOM. */ - si_swapinfo(&swp_info); - if (swp_info.freeswap > 0) + if (nr_swap_pages > 0) return 0; /* Else... */ diff -ur -x via-rhine* linux-2.4.1.orig/mm/shmem.c linux/mm/shmem.c --- linux-2.4.1.orig/mm/shmem.c Sun Jan 28 03:50:08 2001 +++ linux/mm/shmem.c Sun Mar 25 18:31:56 2001 @@ -844,7 +844,6 @@ struct inode * inode; struct dentry *dentry, *root; struct qstr this; - int vm_enough_memory(long pages); error = -ENOMEM; if (!vm_enough_memory((size) >> PAGE_SHIFT)) diff -ur -x via-rhine* linux-2.4.1.orig/mm/swapfile.c linux/mm/swapfile.c --- linux-2.4.1.orig/mm/swapfile.c Fri Dec 29 22:07:24 2000 +++ linux/mm/swapfile.c Sun Mar 25 20:45:06 2001 @@ -17,6 +17,9 @@ #include <asm/pgtable.h> +extern int sysctl_overcommit_memory; +extern void vm_invalidate_totalmem(void); + spinlock_t swaplock = SPIN_LOCK_UNLOCKED; unsigned int nr_swapfiles; @@ -403,7 +406,7 @@ { struct swap_info_struct * p = NULL; struct nameidata nd; - int i, type, prev; + int i, type, prev, flags; int err; if (!capable(CAP_SYS_ADMIN)) @@ -448,7 +451,18 @@ nr_swap_pages -= p->pages; swap_list_unlock(); p->flags = SWP_USED; - err = try_to_unuse(type); + + /* Don't allow removal of swap if it will cause overcommit */ + spin_lock_irqsave(&vm_lock, flags); + if ((sysctl_overcommit_memory < 0) && + (vm_reserved > vm_total())) { + spin_unlock_irqrestore(&vm_lock, flags); + err = -ENOMEM; + } else { + spin_unlock_irqrestore(&vm_lock, flags); + err = try_to_unuse(type); + } + if (err) { /* re-insert swap space back into swap_list */ swap_list_lock(); @@ -483,6 +497,7 @@ unlock_kernel(); path_release(&nd); out: + vm_invalidate_totalmem(); return err; } @@ -557,6 +572,7 @@ unsigned long maxpages; int swapfilesize; struct block_device *bdev = NULL; + int flags; if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -787,6 +803,7 @@ out: if (swap_header) free_page((long) swap_header); + vm_invalidate_totalmem(); unlock_kernel(); return error; } [-- Attachment #3: Type: text/plain, Size: 578 bytes --] -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] non-overcommit memory, improved OOM handling, safety margin (was Re: Prevent OOM from killing init) 2001-03-25 21:51 ` [PATCH] non-overcommit memory, improved OOM handling, safety margin (was Re: Prevent OOM from killing init) Jonathan Morton @ 2001-03-27 15:23 ` Pavel Machek 0 siblings, 0 replies; 106+ messages in thread From: Pavel Machek @ 2001-03-27 15:23 UTC (permalink / raw) To: Jonathan Morton, linux-mm, linux-kernel; +Cc: douglas Hi! > The attached patch is against 2.4.1 and incorporates the following: The patch seems to be word-wrapped... Pavel > diff -ur -x via-rhine* linux-2.4.1.orig/fs/exec.c linux/fs/exec.c > --- > linux-2.4.1.orig/fs/exec.c Tue Jan 30 07:10:58 2001 > +++ > linux/fs/exec.c Sun Mar 25 17:05:03 2001 > @@ -385,19 +385,27 @@ > static int > exec_mmap(void) > { > struct mm_struct * mm, * old_mm; > + struct > task_struct * tsk = current; > + unsigned long reserved = 0; > > - old_mm = > current->mm; -- I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care." Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:26 ` James A. Sutherland 2001-03-23 17:32 ` Alan Cox @ 2001-03-24 0:03 ` Guest section DW 2001-03-24 7:52 ` Doug Ledford 2001-03-25 0:32 ` Kurt Garloff 3 siblings, 0 replies; 106+ messages in thread From: Guest section DW @ 2001-03-24 0:03 UTC (permalink / raw) To: James A. Sutherland Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel On Fri, Mar 23, 2001 at 05:26:22PM +0000, James A. Sutherland wrote: > > Clearly, Linux cannot be reliable if any process can be killed > > at any moment. > > What on earth did you expect to happen when the process exceeded the > machine's capabilities? Using more than all the resources fails. There > isn't an alternative. That is the wrong way to phrase these things. Large processes usually do not have a definite set of needed resources. They can use lots of memory for buffers and cache and hash and be a bit faster, or use much less and be a bit slower. Linux first promises a lot of memory, but then fails to deliver, without returning any error to the program. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:26 ` James A. Sutherland 2001-03-23 17:32 ` Alan Cox 2001-03-24 0:03 ` [PATCH] Prevent OOM from killing init Guest section DW @ 2001-03-24 7:52 ` Doug Ledford 2001-03-25 0:32 ` Kurt Garloff 3 siblings, 0 replies; 106+ messages in thread From: Doug Ledford @ 2001-03-24 7:52 UTC (permalink / raw) To: James A. Sutherland Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel "James A. Sutherland" wrote: > On Thu, 22 Mar 2001, Guest section DW wrote: > > (I think 2.4.0.) > > > > Clearly, Linux cannot be reliable if any process can be killed > > at any moment. > > What on earth did you expect to happen when the process exceeded the > machine's capabilities? Using more than all the resources fails. There > isn't an alternative. You might be successful in convincing myself or Andries of this as soon as the oom killer only kills things when the system is really out of memory. Right now, it's not really an oom killer, it's more like an "I'm Too Lazy To Free Up Some More Pages So Now You Die" (ITLTFUSMPSNYD) killer. -- Doug Ledford <dledford@redhat.com> http://people.redhat.com/dledford Please check my web site for aic7xxx updates/answers before e-mailing me about problems ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:26 ` James A. Sutherland ` (2 preceding siblings ...) 2001-03-24 7:52 ` Doug Ledford @ 2001-03-25 0:32 ` Kurt Garloff 2001-03-25 15:02 ` Sandy Harris 2001-03-25 18:07 ` Guest section DW 3 siblings, 2 replies; 106+ messages in thread From: Kurt Garloff @ 2001-03-25 0:32 UTC (permalink / raw) To: James A. Sutherland; +Cc: Linux kernel list [-- Attachment #1: Type: text/plain, Size: 874 bytes --] On Fri, Mar 23, 2001 at 05:26:22PM +0000, James A. Sutherland wrote: > If SuSE's install program needs more than a quarter Gb of RAM, you need a > better distro. Well, it's rpm ... I guess the Debian packager is more friendly. But if you choose to install a huge number of packages, the job to do for the package manager (dependencies ...) is no trivial to do with few resources. But that's not the point of the discussion. Kernel related questions IMHO are: (1) Why do we get into OOM? Can we avoid it? (2) Is OOM sometimes misdetected (or triggered too early) and why? (3) Does the OOM killer choose the right processes? Regards, -- Kurt Garloff <garloff@suse.de> Eindhoven, NL GPG key: See mail header, key servers Linux kernel development SuSE GmbH, Nuernberg, FRG SCSI, Security [-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --] ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-25 0:32 ` Kurt Garloff @ 2001-03-25 15:02 ` Sandy Harris 2001-03-25 18:07 ` Guest section DW 1 sibling, 0 replies; 106+ messages in thread From: Sandy Harris @ 2001-03-25 15:02 UTC (permalink / raw) To: Linux kernel list Kurt Garloff wrote: > Kernel related questions IMHO are: > (1) Why do we get into OOM? There was a long thread about this a few months back. We get into OOM because malloc(), calloc() etc. can allocate more memory than is actually available. e.g. Say you have machine with 64 RAM + 64 swap = 128 megs with 40 megs in use, so 88 free. Now two processes each malloc() 80 megs. Both succeed. If both processes then use that memory, someone is likely to fail later. > Can we avoid it? The obvious solution is to consider the above behaviour a bug and fix it. The second malloc() should fail. The process making that call can then look at the return value and decide what to do about the failure. However, this was extensively discussed here last year, and that solution was quite firmly rejected. I never understood the reasons. See the archives. Someone did announce they were working on patches implementing a sane malloc(). What happened to that project? ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-25 0:32 ` Kurt Garloff 2001-03-25 15:02 ` Sandy Harris @ 2001-03-25 18:07 ` Guest section DW 1 sibling, 0 replies; 106+ messages in thread From: Guest section DW @ 2001-03-25 18:07 UTC (permalink / raw) To: Kurt Garloff, James A. Sutherland, Linux kernel list On Sun, Mar 25, 2001 at 01:32:42AM +0100, Kurt Garloff wrote: > On Fri, Mar 23, 2001 at 05:26:22PM +0000, James A. Sutherland wrote: > > If SuSE's install program needs more than a quarter Gb of RAM, you need a > > better distro. > > Well, it's rpm ... Yes. I investigated and found rpm's data base corrupted, and rpm cannot handle that. Since I have several occurrences of rpm being killed by the oom killer in my logs it is entirely possible that the data base got corrupted because rpm was killed while in the process of updating it. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 23:48 ` Rik van Riel 2001-03-22 8:14 ` Eric W. Biederman 2001-03-22 11:47 ` Guest section DW @ 2001-03-22 14:53 ` Patrick O'Rourke 2001-03-22 19:24 ` Philipp Rumpf ` (2 subsequent siblings) 5 siblings, 0 replies; 106+ messages in thread From: Patrick O'Rourke @ 2001-03-22 14:53 UTC (permalink / raw) To: Rik van Riel; +Cc: linux-mm, linux-kernel Rik van Riel wrote: > One question ... has the OOM killer ever selected init on > anybody's system ? Yes, which is why I created the patch. -- Patrick O'Rourke 978.606.0236 orourke@missioncriticallinux.com ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 23:48 ` Rik van Riel ` (2 preceding siblings ...) 2001-03-22 14:53 ` Patrick O'Rourke @ 2001-03-22 19:24 ` Philipp Rumpf 2001-03-22 22:20 ` James A. Sutherland 2001-03-23 17:31 ` Szabolcs Szakacsits 5 siblings, 0 replies; 106+ messages in thread From: Philipp Rumpf @ 2001-03-22 19:24 UTC (permalink / raw) To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote: > On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > > > Since the system will panic if the init process is chosen by > > the OOM killer, the following patch prevents select_bad_process() > > from picking init. > > One question ... has the OOM killer ever selected init on > anybody's system ? Yes, I managed to reproduce this a while ago. (init was the only process around though). We don't ever kill init, fwiw; we panic(), which is the right thing to do if init can't keep running. > I think that the scoring algorithm should make sure that > we never pick init, unless the system is screwed so badly > that init is broken or the only process left ;) I can't think of a situation where the OOM killer does the wrong thing. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 23:48 ` Rik van Riel ` (3 preceding siblings ...) 2001-03-22 19:24 ` Philipp Rumpf @ 2001-03-22 22:20 ` James A. Sutherland 2001-03-23 17:31 ` Szabolcs Szakacsits 5 siblings, 0 replies; 106+ messages in thread From: James A. Sutherland @ 2001-03-22 22:20 UTC (permalink / raw) To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel On Wed, 21 Mar 2001, Rik van Riel wrote: > On Wed, 21 Mar 2001, Patrick O'Rourke wrote: > > > Since the system will panic if the init process is chosen by > > the OOM killer, the following patch prevents select_bad_process() > > from picking init. > > One question ... has the OOM killer ever selected init on > anybody's system ? Well, I managed to get the OOM killer killing init once; OTOH, I had just broken MM completely (disabled freeing of pages entirely!) so that doesn't really count, I think :-) > I think that the scoring algorithm should make sure that > we never pick init, unless the system is screwed so badly > that init is broken or the only process left ;) If the system is that badly screwed, killing init is probably the right thing to do, since this should then cause a panic, and thus a reboot if the machine is so configured? James. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-21 23:48 ` Rik van Riel ` (4 preceding siblings ...) 2001-03-22 22:20 ` James A. Sutherland @ 2001-03-23 17:31 ` Szabolcs Szakacsits 2001-03-24 5:54 ` Rik van Riel 5 siblings, 1 reply; 106+ messages in thread From: Szabolcs Szakacsits @ 2001-03-23 17:31 UTC (permalink / raw) To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel On Wed, 21 Mar 2001, Rik van Riel wrote: > One question ... has the OOM killer ever selected init on > anybody's system ? Hi Rik, When I ported your OOM killer to 2.2.x and integrated it into the 'reserved root memory' [*] patch, during intensive testing I found two cases when init was killed. It happened on low-end machines and when OOM killer wasn't triggered so init was killed in the page fault handler. The later was also one of the reasons I replaced the "random" OOM killer in page fault handler with yours [so there is only one OOM killer]. I also asked you at that time whether there was any reason you didn't put it also there but unfortunately you didn't answer. Practice showed it works there as well [and actually some crashes that was reported here recently could have been avoided in this way] but technically maybe I missed something? Other things that bothered me, - niced processes are penalized - trying to kill a task that is permanently in TASK_UNINTERRUPTIBLE will probably deadlock the machine [or the random OOM killer will kill the box]. Szaka [*] who are interested, it can be found at http://mlf.linux.rulez.org/mlf/ezaz/reserved_root_memory.html ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-23 17:31 ` Szabolcs Szakacsits @ 2001-03-24 5:54 ` Rik van Riel 2001-03-24 6:55 ` Juha Saarinen 2001-03-27 8:31 ` Roger Gammans 0 siblings, 2 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-24 5:54 UTC (permalink / raw) To: Szabolcs Szakacsits; +Cc: Patrick O'Rourke, linux-mm, linux-kernel On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote: > When I ported your OOM killer to 2.2.x and integrated it into the > 'reserved root memory' [*] patch, during intensive testing I found two > cases when init was killed. It happened on low-end machines and when > OOM killer wasn't triggered so init was killed in the page fault > handler. The later was also one of the reasons I replaced the "random" > OOM killer in page fault handler with yours [so there is only one OOM > killer]. Good idea, we should do this for 2.4. I cannot remember reading an email from you about this, it's quite possible I just missed it and didn't answer because I never read it ... > Other things that bothered me, > - niced processes are penalized This can be considered a bug and should be fixed... > - trying to kill a task that is permanently in TASK_UNINTERRUPTIBLE > will probably deadlock the machine [or the random OOM killer will > kill the box]. This could indeed be a problem, though I cannot really see any case where a task would be in TASK_UNINTERRUPTIBLE permanently. OTOH, a 1GB read() will take a (much) too long time to finish. Your ideas sound really good, would you have the time to implement them for 2.4 ? regards, Rik -- Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com.br/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* RE: [PATCH] Prevent OOM from killing init 2001-03-24 5:54 ` Rik van Riel @ 2001-03-24 6:55 ` Juha Saarinen 2001-03-27 8:31 ` Roger Gammans 1 sibling, 0 replies; 106+ messages in thread From: Juha Saarinen @ 2001-03-24 6:55 UTC (permalink / raw) To: Rik van Riel, Szabolcs Szakacsits Cc: Patrick O'Rourke, linux-mm, linux-kernel :: Your ideas sound really good, would you have the time to implement :: them for 2.4 ? 2.4 or 2.5? -- Juha ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] Prevent OOM from killing init 2001-03-24 5:54 ` Rik van Riel 2001-03-24 6:55 ` Juha Saarinen @ 2001-03-27 8:31 ` Roger Gammans 1 sibling, 0 replies; 106+ messages in thread From: Roger Gammans @ 2001-03-27 8:31 UTC (permalink / raw) To: linux-kernel On Sat, Mar 24, 2001 at 02:54:55AM -0300, Rik van Riel wrote: > On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote: > > - trying to kill a task that is permanently in TASK_UNINTERRUPTIBLE > > will probably deadlock the machine [or the random OOM killer will > > kill the box]. > > This could indeed be a problem, though I cannot really see any > case where a task would be in TASK_UNINTERRUPTIBLE permanently. I've seen this with 'mt rewind' jamming on ide-tape. I'm not sure of the exact pathology , but ISTR that it was related issuing that command while the hardware was busy. In any case the point is that a badly written driver or faulty h/w even in a subsiduary system can cause this. In an ideal world of course these wouldn't happen, but OTOH is this an issue in failing a box which is going to fail anyway if we don't kill the process. If we could ensure a graceful failure so much the better. TTFN -- Roger Think of the mess on the carpet. Sensible people do all their demon-summoning in the garage, which you can just hose down afterwards. -- damerell@chiark.greenend.org.uk ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling
@ 2001-03-27 15:13 Jonathan Morton
2001-03-27 16:03 ` Michel Wilson
` (2 more replies)
0 siblings, 3 replies; 106+ messages in thread
From: Jonathan Morton @ 2001-03-27 15:13 UTC (permalink / raw)
To: linux-kernel
>> >> Of course, I realised that. Actually, what the code does is take an
>> >> initial badness factor (the memory usage), then divide it using goodness
>> >> factors (some based on time, some purely arbitrary), both of which can be
>> >> considered dimensionless. Also, at the end, the absolute value is not
>> >> considered - we simply look at the biggest one and kill it. All
>> >> "denormalisation" does is scale all the values, it doesn't affect
>>which one
>> >> actually turns out biggest.
>> >
>> >So you should realize as well that the actual code implementing this
>> >all is by no means numerically stable...
>>
>> It probably isn't, no. I'll take another look at it and do some dry runs
>> sometime, and see whether they come out as I expect.
>
>Well the output depends heavly on the actual memsize of the process,
>which IMHO isn't a good value for choosing killing candidates...
>Second there is the problem that it's not possible to wight
>the goodness values against each other. The unit
>remaining is Bit/sqr(seconds). Try to get a grasp on this.
>Please have a look at my patch. The function I'm using
>there is a simply wighted sum of two process parameters.
I just ran the following test case through my (Saturday) version of the code:
80MB Oracle process
1 hour CPU time
1 week uptime
UID = 50
The result was less than 1, which means Oracle (or virtually any other
process with an hour of CPU time and a week's uptime) would not get killed.
You're perfectly right about the numerical stability argument, though.
Integers are notoriously granular, so maybe an increase in resolution is
justified. There's also an issue where an almost-new process (with
run_time under 1024 seconds) would be given infinitely large badness - that
needs fixing. Jiffie wrap is worth taking account for, too. The comments
accompanying the code are completely wrong - cpu_time is in units of 8
seconds, and run_time is in units of 1024 seconds, NOT seconds and minutes
as described.
HOWEVER, I just took a look at your patch from Sunday. I have very serious
concerns about this, which I will try to explain below:
First, your code uses a hard and arbitrary priority level. This is
arranged such that if the "bad process" (which I use as a euphemism to
indicate a runaway memory hog) is in any class other than "normal", all
"normal" processes MUST exit before the "bad process" will even be
considered. As a test case:
Suppose you're running Sendmail as uid 25, which puts it in the "system"
class. This is a multiuser system and there are a lot of interactive,
unprivileged users present. You are also running RPC services as "service"
class, using UIDs between 100-500. Now suppose that Sendmail springs a big
memory leak and swamps the available memory, causing OOM - Sendmail is now
the "bad process" I mentioned earlier. The sysadmin isn't watching the
system closely enough to kill Sendmail manually, and in any case the system
is thrashing so hard he wouldn't be able to log in quickly.
With your code, all the interactive users would be systematically thrown
off the system (losing all their work - SIGKILL is not kind) and the RPC
services would be shut down. Depending on the relative ages of Sendmail
and other system services, other essential system daemons may also be shut
down (since your code does not take memory usage into account). Finally,
Sendmail itself is killed and the problem goes away.
In the same scenario, my version of the code would probably kill Sendmail
relatively early in the sequence, since it is the one hogging all the RAM.
A few of the larger interactive process might get killed, depending on
relative ages. The major flaw in my code is that a sufficiently long-lived
process becomes virtually immortal, even if it happens to spring a serious
leak after this time - the flaw in yours is that system processes have *too
high* priority relative to others, *right from the beginning*. Both
problems need addressing if either of our algorithms can be considered
acceptable.
Oh and BTW, I think Bit/sqr(seconds) is a perfectly acceptable unit for
"badness". Think about it - it increases with pigginess and decreases with
longevity. I really don't see a problem with it per se.
I'm going to be travelling tomorrow, so I've moved my VM work onto my
PowerBook and will consider OOM-kill-selection algorithms and
memory-accounting while I fly. See you on the other side of the ocean, and
hopefully the fresh Canadian air will help me think about this clearly.
--------------------------------------------------------------
from: Jonathan "Chromatix" Morton
mail: chromi@cyberspace.org (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk
The key to knowledge is not to rely on people to teach you it.
Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/
-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----
^ permalink raw reply [flat|nested] 106+ messages in thread* RE: [PATCH] OOM handling 2001-03-27 15:13 [PATCH] OOM handling Jonathan Morton @ 2001-03-27 16:03 ` Michel Wilson 2001-03-27 16:30 ` Martin Dalecki 2001-03-27 18:15 ` Rik van Riel 2001-03-27 16:29 ` Martin Dalecki 2001-03-27 17:07 ` Jonathan Morton 2 siblings, 2 replies; 106+ messages in thread From: Michel Wilson @ 2001-03-27 16:03 UTC (permalink / raw) To: linux-kernel > relative ages. The major flaw in my code is that a sufficiently > long-lived > process becomes virtually immortal, even if it happens to spring a serious > leak after this time - the flaw in yours is that system processes I think this could easily be fixed if you'd 'chop off' the runtime at a certain point: if(runtime > something_big) runtime = something_big; This would of course need some tuning. The only thing i don't like about this is that it's a kind of 'magical value', but i suppose it's not a very good idea to make this configurable, right? Michel Wilson. ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-27 16:03 ` Michel Wilson @ 2001-03-27 16:30 ` Martin Dalecki 2001-03-27 18:15 ` Rik van Riel 1 sibling, 0 replies; 106+ messages in thread From: Martin Dalecki @ 2001-03-27 16:30 UTC (permalink / raw) To: Michel Wilson; +Cc: linux-kernel Michel Wilson wrote: > > > relative ages. The major flaw in my code is that a sufficiently > > long-lived > > process becomes virtually immortal, even if it happens to spring a serious > > leak after this time - the flaw in yours is that system processes > > I think this could easily be fixed if you'd 'chop off' the runtime at a > certain point: > > if(runtime > something_big) > runtime = something_big; > > This would of course need some tuning. The only thing i don't like about > this is that it's a kind of 'magical value', but i suppose it's not a very > good idea to make this configurable, right? Then after some time runtime becomes allmost irrelevant. You are basically for what I call normalization by the total system uptime. ^ permalink raw reply [flat|nested] 106+ messages in thread
* RE: [PATCH] OOM handling 2001-03-27 16:03 ` Michel Wilson 2001-03-27 16:30 ` Martin Dalecki @ 2001-03-27 18:15 ` Rik van Riel 1 sibling, 0 replies; 106+ messages in thread From: Rik van Riel @ 2001-03-27 18:15 UTC (permalink / raw) To: Michel Wilson; +Cc: linux-kernel On Tue, 27 Mar 2001, Michel Wilson wrote: > > relative ages. The major flaw in my code is that a sufficiently > > long-lived > > process becomes virtually immortal, even if it happens to spring a serious > > leak after this time - the flaw in yours is that system processes > > I think this could easily be fixed if you'd 'chop off' the runtime at a > certain point: > > if(runtime > something_big) > runtime = something_big; > > This would of course need some tuning. The only thing i don't > like about this is that it's a kind of 'magical value', This is the reason I used the sqrt approximation in my OOM killer ;) Rik -- Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml Virtual memory is like a game you can't win; However, without VM there's truly nothing to lose... http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 106+ messages in thread
* Re: [PATCH] OOM handling 2001-03-27 15:13 [PATCH] OOM handling Jonathan Morton 2001-03-27 16:03 ` Michel Wilson @ 2001-03-27 16:29 ` Martin Dalecki 2001-03-27 17:07 ` Jonathan Morton 2 siblings, 0 replies; 106+ messages in thread From: Martin Dalecki @ 2001-03-27 16:29 UTC (permalink / raw) To: Jonathan Morton; +Cc: linux-kernel Jonathan Morton wrote: > > Oh and BTW, I think Bit/sqr(seconds) is a perfectly acceptable unit for > "badness". Think about it - it increases with pigginess and decreases with > longevity. I really don't see a problem with it per se. Right it's not a problem pre se, but as you already explained the problem is in the weightinig of different factors. It's a matter of principle. ^ permalink raw reply [flat|nested] 106+ messages in thread
* RE: [PATCH] OOM handling 2001-03-27 15:13 [PATCH] OOM handling Jonathan Morton 2001-03-27 16:03 ` Michel Wilson 2001-03-27 16:29 ` Martin Dalecki @ 2001-03-27 17:07 ` Jonathan Morton 2 siblings, 0 replies; 106+ messages in thread From: Jonathan Morton @ 2001-03-27 17:07 UTC (permalink / raw) To: Michel Wilson, linux-kernel >> relative ages. The major flaw in my code is that a sufficiently >> long-lived >> process becomes virtually immortal, even if it happens to spring a serious >> leak after this time - the flaw in yours is that system processes > >I think this could easily be fixed if you'd 'chop off' the runtime at a >certain point: > >if(runtime > something_big) > runtime = something_big; > >This would of course need some tuning. The only thing i don't like about >this is that it's a kind of 'magical value', but i suppose it's not a very >good idea to make this configurable, right? Configurable is good, but right now I'm considering alternative (but reasonably similar) algorithms. If I can come up with something that works reasonably well under all the scenarios I can think up - which is quite a range - then configurable options may not be necessary. In any case, other work I'm doing should make OOM a thing of the past on most systems, since malloc() and other memory-reservation calls will normally fail before OOM happens. It might just happen that totally different algorithms apply best to different usage patterns, and I can put in some logic to try and detect these patterns as needed, selecting the most appropriate algorithm. An embedded system is very different from a large batch-computation system, and likewise for an Internet server, multiuser host, or single-user workstation. Internet servers come in different sizes, too - the 486 NAT and web proxy differs considerably from the dedicated mail/web/database server. What would really help me is if a number of people with boxen under each of the above loads could send me a "snapshot" of their system, under normal load, containing the following info: - General usage pattern description, in plain English - Physical and swap memory: total sizes and current utilisation, in MB - System uptime in days - Summary of processes running at that instant, including for each process: - Approximate UID range - SIZE (not RSS, I want total size) - CPU time (with separate user and system totals if possible) - run time Generalisations would probably be helpful - I don't expect to receive a list of 500 emacs and bash processes, but indications of the distribution of the above values for sensible groupings of processes would be valuable. Of course, if you group processes, include information on how many process you're grouping. :) For your security and protection, it would probably not be wise to indicate the hostname or IP address(es) of the systems you profile in this manner. You may, however, wish to invent codenames for the machines in case it becomes necessary to refer to specific cases. Profiles can be sent to me at <chromi@cyberspace.org>, please include the string [SNAPSHOT] in the subject for easy identification. -------------------------------------------------------------- from: Jonathan "Chromatix" Morton mail: chromi@cyberspace.org (not for attachments) big-mail: chromatix@penguinpowered.com uni-mail: j.d.morton@lancaster.ac.uk The key to knowledge is not to rely on people to teach you it. Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/ -----BEGIN GEEK CODE BLOCK----- Version 3.12 GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*) -----END GEEK CODE BLOCK----- ^ permalink raw reply [flat|nested] 106+ messages in thread
end of thread, other threads:[~2001-03-27 19:53 UTC | newest] Thread overview: 106+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2001-03-21 22:54 [PATCH] Prevent OOM from killing init Patrick O'Rourke 2001-03-21 23:11 ` Eli Carter 2001-03-21 23:40 ` Patrick O'Rourke 2001-03-21 23:48 ` Rik van Riel 2001-03-22 8:14 ` Eric W. Biederman 2001-03-22 9:24 ` Rik van Riel 2001-03-22 19:29 ` Philipp Rumpf 2001-03-22 11:47 ` Guest section DW 2001-03-22 15:01 ` Rik van Riel 2001-03-22 19:04 ` Guest section DW 2001-03-22 16:41 ` Eric W. Biederman 2001-03-22 20:28 ` Stephen Clouse 2001-03-22 21:01 ` Ingo Oeser 2001-03-22 21:23 ` Alan Cox 2001-03-22 22:00 ` Guest section DW 2001-03-22 22:12 ` Ed Tomlinson 2001-03-22 22:52 ` Alan Cox 2001-03-22 23:27 ` Guest section DW 2001-03-22 23:37 ` Rik van Riel 2001-03-26 19:04 ` James Antill 2001-03-26 20:05 ` Rik van Riel 2001-03-22 23:40 ` Alan Cox 2001-03-23 20:09 ` Szabolcs Szakacsits 2001-03-23 22:21 ` Alan Cox 2001-03-23 22:37 ` Szabolcs Szakacsits 2001-03-23 19:57 ` Szabolcs Szakacsits 2001-03-22 22:10 ` Doug Ledford 2001-03-22 22:53 ` Alan Cox 2001-03-22 23:30 ` Doug Ledford 2001-03-22 23:40 ` Alan Cox 2001-03-22 23:43 ` Stephen Clouse 2001-03-23 19:26 ` Szabolcs Szakacsits 2001-03-23 20:41 ` Paul Jakma 2001-03-23 21:58 ` george anzinger 2001-03-24 5:55 ` Rik van Riel 2001-03-23 22:18 ` Szabolcs Szakacsits 2001-03-24 2:08 ` Paul Jakma 2001-03-23 1:31 ` Michael Peddemors 2001-03-23 7:04 ` Rik van Riel 2001-03-23 11:28 ` Guest section DW 2001-03-23 14:50 ` Eric W. Biederman 2001-03-23 15:13 ` General 2.4 impressions (was Re: [PATCH] Prevent OOM from killing init) Jeff Garzik 2001-03-23 16:10 ` Adding just a pinch of icache/dcache pressure Jan Harkes 2001-03-23 16:17 ` Andi Kleen 2001-03-23 16:51 ` Jan Harkes 2001-03-23 17:21 ` [PATCH] Prevent OOM from killing init Guest section DW 2001-03-23 20:18 ` Paul Jakma 2001-03-24 20:19 ` Jesse Pollard 2001-03-23 23:48 ` Eric W. Biederman 2001-03-23 21:11 ` José Luis Domingo López 2001-03-27 15:05 ` Anthony de Boer - USEnet 2002-03-23 0:33 ` Martin Dalecki 2001-03-22 23:53 ` Rik van Riel 2002-03-23 1:21 ` Martin Dalecki 2001-03-23 0:20 ` Stephen Clouse 2002-03-23 1:30 ` Martin Dalecki 2001-03-23 1:37 ` Rik van Riel 2001-03-23 10:48 ` Martin Dalecki 2001-03-23 14:56 ` Rik van Riel 2001-03-23 16:43 ` Guest section DW 2001-03-24 5:57 ` Rik van Riel 2001-03-25 16:35 ` Guest section DW 2001-03-23 20:20 ` Tom Diehl 2001-03-23 23:56 ` Tim Wright 2001-03-24 0:21 ` Tom Diehl 2001-03-23 17:26 ` James A. Sutherland 2001-03-23 17:32 ` Alan Cox 2001-03-23 18:58 ` Martin Dalecki 2001-03-25 13:54 ` [PATCH] OOM handling Martin Dalecki 2001-03-25 15:06 ` Rik van Riel 2001-03-25 15:20 ` Martin Dalecki 2001-03-25 15:50 ` Jeff Garzik 2001-03-25 17:08 ` Rik van Riel 2001-03-25 15:44 ` Jonathan Morton 2001-03-25 15:47 ` Martin Dalecki 2001-03-25 16:36 ` Jonathan Morton 2001-03-26 21:34 ` Kevin Buhr 2001-03-26 22:00 ` Jonathan Morton 2001-03-26 2:13 ` Matthew Chappee 2001-03-26 11:33 ` Ingo Oeser 2001-03-26 11:49 ` Jasper Spaans 2001-03-26 16:11 ` Michael Peddemors 2001-03-23 19:45 ` [PATCH] Prevent OOM from killing init Jonathan Morton 2001-03-23 23:26 ` Eric W. Biederman 2001-03-25 15:30 ` Martin Dalecki 2001-03-25 20:47 ` Stephen Satchell 2001-03-25 21:51 ` [PATCH] non-overcommit memory, improved OOM handling, safety margin (was Re: Prevent OOM from killing init) Jonathan Morton 2001-03-27 15:23 ` Pavel Machek 2001-03-24 0:03 ` [PATCH] Prevent OOM from killing init Guest section DW 2001-03-24 7:52 ` Doug Ledford 2001-03-25 0:32 ` Kurt Garloff 2001-03-25 15:02 ` Sandy Harris 2001-03-25 18:07 ` Guest section DW 2001-03-22 14:53 ` Patrick O'Rourke 2001-03-22 19:24 ` Philipp Rumpf 2001-03-22 22:20 ` James A. Sutherland 2001-03-23 17:31 ` Szabolcs Szakacsits 2001-03-24 5:54 ` Rik van Riel 2001-03-24 6:55 ` Juha Saarinen 2001-03-27 8:31 ` Roger Gammans -- strict thread matches above, loose matches on Subject: below -- 2001-03-27 15:13 [PATCH] OOM handling Jonathan Morton 2001-03-27 16:03 ` Michel Wilson 2001-03-27 16:30 ` Martin Dalecki 2001-03-27 18:15 ` Rik van Riel 2001-03-27 16:29 ` Martin Dalecki 2001-03-27 17:07 ` Jonathan Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox