public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* OOM killer???
@ 2001-03-27 10:59 Rogier Wolff
  2001-03-27 12:14 ` Jonathan Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Rogier Wolff @ 2001-03-27 10:59 UTC (permalink / raw)
  To: linux-kernel


Just a quick bug-report: 

One of our machines just started spewing:

Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).
Out of Memory: Killed process 117 (sendmail).

What we did to run it out of memory, I don't know. But I do know that
it shouldn't be killing one process more than once... (the process
should not exist after one try...)

Kernel 2.4.0 .

			Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots. 
* There are also old, bald pilots. 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 10:59 OOM killer??? Rogier Wolff
@ 2001-03-27 12:14 ` Jonathan Morton
  2001-03-27 13:24   ` Martin Dalecki
  2001-03-27 13:57   ` Jonathan Morton
  0 siblings, 2 replies; 13+ messages in thread
From: Jonathan Morton @ 2001-03-27 12:14 UTC (permalink / raw)
  To: Rogier Wolff, linux-kernel

>Out of Memory: Killed process 117 (sendmail).
>
>What we did to run it out of memory, I don't know. But I do know that
>it shouldn't be killing one process more than once... (the process
>should not exist after one try...)

This is a known bug in the Out-of-Memory handler, where it does not count the buffer and cache memory as "free" (it should), causing premature OOM killing.  It is, however, normal for the OOM killer to attempt to kill a process more than once - it takes a few scheduler cycles for the SIGKILL to actually reach the process and take effect.

Also, it probably shouldn't have killed Sendmail, since that is usually a long-running, low-UID (and important) process.  The OOM-kill selector is another thing that wants fixing, and my patch contains a *very rough* beginning to this.

The following patch should solve your problem for now, until a more detailed fix (which also clears up many other problems) is available in the stable kernel.

Alan and/or Linus may wish to apply this patch too...

(excerpt from my original patch from Saturday follows)

--- start ---
diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
--- linux-2.4.1.orig/mm/oom_kill.c      Tue Nov 14 18:56:46 2000
+++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
@@ -76,7 +76,9 @@
        run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);

        points /= int_sqrt(cpu_time);
-       points /= int_sqrt(int_sqrt(run_time));
+
+       /* Long-running processes are *very* important, so don't take the 4th root */
+       points /= run_time;

        /*
         * Niced processes are most likely less important, so double
@@ -93,6 +95,10 @@
                                p->uid == 0 || p->euid == 0)
                points /= 4;

+       /* Much the same goes for processes with low UIDs */
+       if(p->uid < 100 || p->euid < 100)
+         points /= 2;
+
        /*
         * We don't want to kill a process with direct hardware access.
         * Not only could that mess up the hardware, but usually users
@@ -192,12 +198,20 @@
 int out_of_memory(void)
 {
        struct sysinfo swp_info;
+       long free;

        /* Enough free memory?  Not OOM. */
-       if (nr_free_pages() > freepages.min)
+       free = nr_free_pages();
+       if (free > freepages.min)
+               return 0;
+
+       if (free + nr_inactive_clean_pages() > freepages.low)
                return 0;

-       if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
+       /* Buffers and caches can be freed up (Jonathan "Chromatix" Morton) */
+       free += atomic_read(&buffermem_pages);
+       free += atomic_read(&page_cache_size);
+       if (free > freepages.low)
                return 0;

        /* Enough swap space left?  Not OOM. */
--- end ---

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 12:14 ` Jonathan Morton
@ 2001-03-27 13:24   ` Martin Dalecki
  2001-03-27 15:31     ` Jonathan Lundell
                       ` (2 more replies)
  2001-03-27 13:57   ` Jonathan Morton
  1 sibling, 3 replies; 13+ messages in thread
From: Martin Dalecki @ 2001-03-27 13:24 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Rogier Wolff, linux-kernel

Jonathan Morton wrote:
> 
> >Out of Memory: Killed process 117 (sendmail).
> >
> >What we did to run it out of memory, I don't know. But I do know that
> >it shouldn't be killing one process more than once... (the process
> >should not exist after one try...)
> 
> This is a known bug in the Out-of-Memory handler, where it does not count the buffer and cache memory as "free" (it should), causing premature OOM killing.  It is, however, normal for the OOM killer to attempt to kill a process more than once - it takes a few scheduler cycles for the SIGKILL to actually reach the process and take effect.
> 
> Also, it probably shouldn't have killed Sendmail, since that is usually a long-running, low-UID (and important) process.  The OOM-kill selector is another thing that wants fixing, and my patch contains a *very rough* beginning to this.
> 
> The following patch should solve your problem for now, until a more detailed fix (which also clears up many other problems) is available in the stable kernel.
> 
> Alan and/or Linus may wish to apply this patch too...
> 
> (excerpt from my original patch from Saturday follows)
> 
> --- start ---
> diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
> --- linux-2.4.1.orig/mm/oom_kill.c      Tue Nov 14 18:56:46 2000
> +++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
> @@ -76,7 +76,9 @@
>         run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
> 
>         points /= int_sqrt(cpu_time);
> -       points /= int_sqrt(int_sqrt(run_time));
> +
> +       /* Long-running processes are *very* important, so don't take the 4th root */
> +       points /= run_time;
> 
>         /*
>          * Niced processes are most likely less important, so double
> @@ -93,6 +95,10 @@
>                                 p->uid == 0 || p->euid == 0)
>                 points /= 4;
> 
> +       /* Much the same goes for processes with low UIDs */
> +       if(p->uid < 100 || p->euid < 100)
> +         points /= 2;
> +

Plase change to 100 to 500 - this would make it consistant with
the useradd command, which starts adding new users at the UID 500

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 12:14 ` Jonathan Morton
  2001-03-27 13:24   ` Martin Dalecki
@ 2001-03-27 13:57   ` Jonathan Morton
  1 sibling, 0 replies; 13+ messages in thread
From: Jonathan Morton @ 2001-03-27 13:57 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Rogier Wolff, linux-kernel

>Plase change to 100 to 500 - this would make it consistant with
>the useradd command, which starts adding new users at the UID 500

Depends on which distribution you're using.  In my experience, almost all
the really important stuff happens below 100.  In any case, the
OOM-kill-selection algorithm in this patch is *not* final.  See my
accompanying mail.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 13:24   ` Martin Dalecki
@ 2001-03-27 15:31     ` Jonathan Lundell
  2001-03-27 16:07       ` Config bug? In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL Greg Ingram
  2001-03-27 18:08     ` OOM killer??? Ingo Oeser
  2001-03-27 18:37     ` Jonathan Morton
  2 siblings, 1 reply; 13+ messages in thread
From: Jonathan Lundell @ 2001-03-27 15:31 UTC (permalink / raw)
  To: linux-kernel

Martin Dalecki <dalecki@evision-ventures.com> writes:

>Plase change to 100 to 500 - this would make it consistant with
>the useradd command, which starts adding new users at the UID 500

It's probably best to keep it somewhere <500, so that one can have "static" (<500) UIDs of either flavor: OOM-killable or not. 100 seems like "enough" non-killable users to me, but that may be a lack of imagination on my part.

-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Config bug? In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL
  2001-03-27 15:31     ` Jonathan Lundell
@ 2001-03-27 16:07       ` Greg Ingram
  2001-03-27 16:14         ` Jeff Garzik
  0 siblings, 1 reply; 13+ messages in thread
From: Greg Ingram @ 2001-03-27 16:07 UTC (permalink / raw)
  To: linux-kernel


In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL.  The RTL8139
driver is not labelled as experimental.  Is this an error?

- Greg



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Config bug? In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL
  2001-03-27 16:07       ` Config bug? In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL Greg Ingram
@ 2001-03-27 16:14         ` Jeff Garzik
  2001-03-27 16:37           ` [PATCH] 2.2.19 drivers/net/Config.in Greg Ingram
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff Garzik @ 2001-03-27 16:14 UTC (permalink / raw)
  To: Greg Ingram; +Cc: linux-kernel, Alan Cox

Greg Ingram wrote:
> 
> In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL.  The RTL8139
> driver is not labelled as experimental.  Is this an error?

Yeah, add '(EXPERIMENTAL)' to the text.  Send a patch to Alan if you
want...

-- 
Jeff Garzik       | May you have warm words on a cold evening,
Building 1024     | a full moon on a dark night,
MandrakeSoft      | and a smooth road all the way to your door.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH] 2.2.19 drivers/net/Config.in
  2001-03-27 16:14         ` Jeff Garzik
@ 2001-03-27 16:37           ` Greg Ingram
  0 siblings, 0 replies; 13+ messages in thread
From: Greg Ingram @ 2001-03-27 16:37 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Alan Cox


On Tue, 27 Mar 2001, Jeff Garzik wrote:

> Greg Ingram wrote:
> > 
> > In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL.  The RTL8139
> > driver is not labelled as experimental.  Is this an error?
> 
> Yeah, add '(EXPERIMENTAL)' to the text.  Send a patch to Alan if you
> want...

Okay.  I really thought it would be the other way around, that is, that
the driver is no longer experimental.  Anyway, I also tagged the 8139too
driver as experimental.  Patch follows.

- Greg

--- linux/drivers/net/Config.in.orig	Tue Mar 27 10:26:52 2001
+++ linux/drivers/net/Config.in	Tue Mar 27 10:27:39 2001
@@ -98,10 +98,10 @@
     tristate 'NI6510 support' CONFIG_NI65
   fi
   if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
-      tristate 'RealTek 8129/8139 (not 8019/8029!) support' CONFIG_RTL8139
+      tristate 'RealTek 8129/8139 (not 8019/8029!) support (EXPERIMENTAL)' CONFIG_RTL8139
   fi
   if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
-      tristate 'Alternative RealTek 8139 driver (8139too) support' CONFIG_RTL8139TOO
+      tristate 'Alternative RealTek 8139 driver (8139too) support (EXPERIMENTAL)' CONFIG_RTL8139TOO
       if [ "$CONFIG_RTL8139TOO" != "n" ]; then
           bool '  Use PIO instead of MMIO' CONFIG_8139TOO_PIO
           bool '  Support for automatic channel equalization' CONFIG_8139TOO_TUNE_TWISTER


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 13:24   ` Martin Dalecki
  2001-03-27 15:31     ` Jonathan Lundell
@ 2001-03-27 18:08     ` Ingo Oeser
  2001-03-27 19:07       ` Martin Dalecki
  2001-03-27 18:37     ` Jonathan Morton
  2 siblings, 1 reply; 13+ messages in thread
From: Ingo Oeser @ 2001-03-27 18:08 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Jonathan Morton, Rogier Wolff, linux-kernel

On Tue, Mar 27, 2001 at 03:24:16PM +0200, Martin Dalecki wrote:
> > @@ -93,6 +95,10 @@
> >                                 p->uid == 0 || p->euid == 0)
> >                 points /= 4;
> > 
> > +       /* Much the same goes for processes with low UIDs */
> > +       if(p->uid < 100 || p->euid < 100)
> > +         points /= 2;
> > +
> 
> Plase change to 100 to 500 - this would make it consistant with
> the useradd command, which starts adding new users at the UID 500

No, useradd reads usally the /etc/login.defs to select the range.
The oom-killer should have configurables for that, to allow the
policy decisions in USER space -- where it belongs -- not in KERNEL space

If we use my OOM killer API, this patch would be a module and
could have module parameters to select that.

Johnathan: I URGE you to apply my patch before adding OOM killer
   stuff. What's wrong with it, that you cannot use it? ;-)

It is easy to add configurables to a module and play with them
WITHOUT recompiling.

Dynamic sysctl tables would also be possible, IF we had an value
that is DEFINED to be invalid for sysctrl(2) and only valid for /proc.

It is also better to include the egid into the decision. There
are deamons, that I defintely want to be killed on a workstation,
but not on a server.

e.g. My important matlab calculation, which runs in user mode
should not be killed. But killing a local webserver, which serves
my help system is ok (because I will not loose work, and might
get it over the net, if there is a problem).

So as Rik stated: The OOM killer cannot suit all people, so it
has to be configurable, to be OOM kill, not overkill ;-)

Thanks & Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 13:24   ` Martin Dalecki
  2001-03-27 15:31     ` Jonathan Lundell
  2001-03-27 18:08     ` OOM killer??? Ingo Oeser
@ 2001-03-27 18:37     ` Jonathan Morton
  2 siblings, 0 replies; 13+ messages in thread
From: Jonathan Morton @ 2001-03-27 18:37 UTC (permalink / raw)
  To: Ingo Oeser, Martin Dalecki; +Cc: Rogier Wolff, linux-kernel

>If we use my OOM killer API, this patch would be a module and
>could have module parameters to select that.
>
>Johnathan: I URGE you to apply my patch before adding OOM killer
>   stuff. What's wrong with it, that you cannot use it? ;-)
>
>It is easy to add configurables to a module and play with them
>WITHOUT recompiling.

Thanks for reminding me - I'll look into it on the plane and see what I can
do with it.

>e.g. My important matlab calculation, which runs in user mode
>should not be killed. But killing a local webserver, which serves
>my help system is ok (because I will not loose work, and might
>get it over the net, if there is a problem).
>
>So as Rik stated: The OOM killer cannot suit all people, so it
>has to be configurable, to be OOM kill, not overkill ;-)

Yes, configurability is probably a very good idea.  However, it would be
best to include a good set of general parameters in the kernel itself, so
the set of average systems needs as little tweaking as possible.  One
cannot expect every sysadmin to be familiar with these arcane (and rarely
actually used) parameters, so being able to select "server", "batch",
"workstation", "embedded" and so on would help massively.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 18:08     ` OOM killer??? Ingo Oeser
@ 2001-03-27 19:07       ` Martin Dalecki
  2001-03-27 19:55         ` Andreas Dilger
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Dalecki @ 2001-03-27 19:07 UTC (permalink / raw)
  To: Ingo Oeser; +Cc: Jonathan Morton, Rogier Wolff, linux-kernel

Ingo Oeser wrote:
> 
> On Tue, Mar 27, 2001 at 03:24:16PM +0200, Martin Dalecki wrote:
> > > @@ -93,6 +95,10 @@
> > >                                 p->uid == 0 || p->euid == 0)
> > >                 points /= 4;
> > >
> > > +       /* Much the same goes for processes with low UIDs */
> > > +       if(p->uid < 100 || p->euid < 100)
> > > +         points /= 2;
> > > +
> >
> > Plase change to 100 to 500 - this would make it consistant with
> > the useradd command, which starts adding new users at the UID 500
> 
> No, useradd reads usally the /etc/login.defs to select the range.
> The oom-killer should have configurables for that, to allow the
> policy decisions in USER space -- where it belongs -- not in KERNEL space

OK sysctl would be more appripriate.

> If we use my OOM killer API, this patch would be a module and
> could have module parameters to select that.
> 
> Johnathan: I URGE you to apply my patch before adding OOM killer
>    stuff. What's wrong with it, that you cannot use it? ;-)
> 
> It is easy to add configurables to a module and play with them
> WITHOUT recompiling.

It's total overkill and therefore not a good design.

> Dynamic sysctl tables would also be possible, IF we had an value
> that is DEFINED to be invalid for sysctrl(2) and only valid for /proc.
> 
> It is also better to include the egid into the decision. There
> are deamons, that I defintely want to be killed on a workstation,
> but not on a server.
> 
> e.g. My important matlab calculation, which runs in user mode
> should not be killed. But killing a local webserver, which serves
> my help system is ok (because I will not loose work, and might
> get it over the net, if there is a problem).
> 
> So as Rik stated: The OOM killer cannot suit all people, so it
> has to be configurable, to be OOM kill, not overkill ;-)

Irony: Why then not store this information permanently - inside
the UID of the application?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 19:07       ` Martin Dalecki
@ 2001-03-27 19:55         ` Andreas Dilger
  2001-03-27 21:13           ` Andreas Rogge
  0 siblings, 1 reply; 13+ messages in thread
From: Andreas Dilger @ 2001-03-27 19:55 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Ingo Oeser, Jonathan Morton, Rogier Wolff, linux-kernel

Martin Dalecki writes:
> Ingo Oeser wrote:
> > So as Rik stated: The OOM killer cannot suit all people, so it
> > has to be configurable, to be OOM kill, not overkill ;-)
> 
> Irony: Why then not store this information permanently - inside
> the UID of the application?

Because in some cases (large companies and such) the UID is centrally
controlled across all machines in the company, so there are > 100 (or
500 or 1000) "system" UIDs.  At one company I did work for, there were
dozens (maybe > 100) oracle instances alone (each with different UID
and passwords for security), and lots more "system" application UIDs,
each unique.

Encoding more information into the UID is getting back to the bad old
days of "uid 0" is can do anything, rather than the capability model we
are working towards.  Even so, encoding process killability info in the
UID is _still_ not putting policy in user space, because if you don't
like how the OOM killer works you still need to recompile and reboot.

Having a configurable OOM killer is not overkill, IMHO, because it is
only called in very rare cases (i.e. OOM is hopefully a rare event),
so it is definitely not on the fast path.  I'm sure people will agree
that spending a few extra cycles to kill the correct process is far
better than killing a lot of incorrect processes quickly.

Every time this subject comes up, I point to AIX and SIGDANGER - a signal
sent to processes when the system gets OOM.  If the process has registered
a SIGDANGER handler, it has a chance to free cache and such (or do a clean
shutdown), otherwise the default signal handler will kill the process.

SIGDANGER would fix the original problem (killing numerical methods
application running for weeks) perfectly - the application can freely
allocate cache memory to speed up the calculations.  When system gets
OOM (for whatever reason), it sends SIGDANGER to applications first and
they can free buffers or do safe shutdown, and this may get system out
of OOM case without having to kill anything.

Granted, I'm not against fixing the VM to reducing OOM conditions in the
first place.  Having SIGDANGER still gives the application a chance to
save itself before it is killed, which none of the OOM changes have
addressed at all.  It is _still_ possible to get a system into OOM from
network buffers and such, regardless of whether an application is
calling malloc() returns NULL or not.

Also, having a SIGDANGER handler _could_ reduce a process "badness"
value when looking for processes to SIGKILL, when calling all of the
SIGDANGER handlers has not freed enough memory to get out of OOM.  This
assumes that programs which register SIGDANGER handlers are important,
rather than malicious (in which case your system has other problems).

Cheers, Andreas
-- 
Andreas Dilger  \ "If a man ate a pound of pasta and a pound of antipasto,
                 \  would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/               -- Dogbert

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: OOM killer???
  2001-03-27 19:55         ` Andreas Dilger
@ 2001-03-27 21:13           ` Andreas Rogge
  0 siblings, 0 replies; 13+ messages in thread
From: Andreas Rogge @ 2001-03-27 21:13 UTC (permalink / raw)
  To: Andreas Dilger, Martin Dalecki
  Cc: Ingo Oeser, Jonathan Morton, Rogier Wolff, linux-kernel

--On Tuesday, March 27, 2001 12:55:50 -0700 Andreas Dilger 
<adilger@turbolinux.com> wrote:

> Every time this subject comes up, I point to AIX and SIGDANGER - a signal
> sent to processes when the system gets OOM.  If the process has registered
> a SIGDANGER handler, it has a chance to free cache and such (or do a clean
> shutdown), otherwise the default signal handler will kill the process.

Having a SIGDANGER would be a fine thing, but this will need patching in all
current daemons and there has to be a possibility to configure the behaviour
of the process when recieving a SIGDANGER. i.e. it is a good idea to kill
apache on a workstation, but a very bad idea to kill apache on a webserver.
Generally I'd like to see such an implementation, but wouldn't it be better
to have a pre-seclction of the processes getting SIGDANGER?

For example: if OOM occours, send SIGDANGER to all non-root-processes with a
nice-level of n or higher (where n should be discussed).

This would make it easy to "configure" SIGDANGER-unaware Applications - in 
the meantime, until all applications are SIGDANGER-aware -  to deal with
OOM-situations. You just do an "nice -n -1 httpd" and one's httpd won't
get killed when OOM occours.

IMO this would dramatically improve the OOM-Problems right now.

--
Andreas Rogge <lu01@rogge.yi.org>
Available on IRCnet:#linux.de as Dyson

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2001-03-27 21:14 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-03-27 10:59 OOM killer??? Rogier Wolff
2001-03-27 12:14 ` Jonathan Morton
2001-03-27 13:24   ` Martin Dalecki
2001-03-27 15:31     ` Jonathan Lundell
2001-03-27 16:07       ` Config bug? In 2.2.19 CONFIG_RTL8139 depends on CONFIG_EXPERIMENTAL Greg Ingram
2001-03-27 16:14         ` Jeff Garzik
2001-03-27 16:37           ` [PATCH] 2.2.19 drivers/net/Config.in Greg Ingram
2001-03-27 18:08     ` OOM killer??? Ingo Oeser
2001-03-27 19:07       ` Martin Dalecki
2001-03-27 19:55         ` Andreas Dilger
2001-03-27 21:13           ` Andreas Rogge
2001-03-27 18:37     ` Jonathan Morton
2001-03-27 13:57   ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox