Only 10 MB/sec with via 82c686b chipset?

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Only 10 MB/sec with via 82c686b chipset?
@ 2001-03-21  3:48 SodaPop
  2001-03-21 13:48 ` egger
  2001-03-21 14:18 ` Only 10 MB/sec with via 82c686b chipset? Jonathan Morton
  0 siblings, 2 replies; 162+ messages in thread
From: SodaPop @ 2001-03-21  3:48 UTC (permalink / raw)
  To: linux-kernel

Only 10 MB/sec with via 82c686b chipset?

I have an IWill KK-266R motherboard with an athlon-c 1200
processor in it, and for the life of me I can't get more than
10 MB/sec through the on-board ide controller.  Yes, all the
appropriate support is turned on in the kernel to enable dma
and specific chipset support, and yes, I think I have all
relevant patches and a reasonable kernel.

I started out with stock kernel 2.4.2.  I later added Hedrick's
ide.2.4.3-p4.all.03132001.patch, which did not change the
behaviour other than to include messages in the dmesg output. I
have even tried removing the via-specific chipset support from
the kernel, only to find that enabling dma with generic support
leaves me at the same 10 MB/sec as the specific support does.

I noted a number of other interesting things;  one, that -X33,
-X34, and -X64 through -X69 all have the same 10 MB/sec transfer
rate, and two, that the 10 MB/sec transfer rate can be linearly
increased to 12 MB/sec by raising the system bus from 100 mhz to
120 mhz (all components are safely rated at 133, no overclocking
involved.)

It is also quite strange that I have been able to run 'hdparm
-t /dev/hda' and 'hdparm -t /dev/hdb' concurrently and can still
get the full 10 MB/sec on both, for a sum total of 20 MB/sec. I
would have expected that the drives would clobber each other
given that they are on the same ide bus.

>From the /proc data and other information below, it seems to me
that this is some kind of screwball tuning issue between linux and
the chipset, not the chipset and the drives.  As near as I can
tell, the chipset is talking to the drives at a much higher data
rate than 10 MB/sec, but for some reason linux isn't able to
process the data any faster than that.  Running hdparm -t in
parallel and observing a speed increase from raising the cpu
clock leads me in that direction.  (Also note that hdparm -t only
uses a few percent of cpu.  It's not like the machine doesn't
have enough processing power.)

I'm really baffled at this point.  I can't rule out that I have
done something dumb, but for the life of me I can't think of
anything else.  I've been to a number of web pages, but the
general consensus seems to be that this chipset should just work,
and work beautifully without any trouble.  There aren't any
fixes because I seem to be the only one having this problem.

Does anyone have any other ideas?

-dennis T





Misc hardware information:
----------------------------------------------------
The board has raid hardware on it, but its currently disabled with
jumpers.  Cables are high quality 80 wire/40 pin cables.  Both
drives are the same, but currently hdb has the 32 gig clip jumper
attached.  Putting the drives on separate ide busses does not
change the 10 MB/sec throughput.  Removing hdb from the chain
does not raise the throughput of hda.  Both drives are rated at
37 MB/sec continuous DTR.  Hdd is a cheap 8x cdrom.

Hdb, when installed in a nearby machine, has a 17 MB/sec data
rate.  The machine is an AMD K6-II 500, 100 MHz bus, with via
82c586 chipset, running kernel 2.4.1 (via chipset support
enabled.)




Various miscellaneous data, and dmesg output:
----------------------------------------------------

root@gurney:~# cat /proc/ide/via
----------VIA BusMastering IDE Configuration----------------
Driver Version:                     3.20
South Bridge:                       VIA vt82c686b
Revision:                           ISA 0x40 IDE 0x6
BM-DMA base:                        0xd000
PCI clock:                          33MHz
Master Read  Cycle IRDY:            0ws
Master Write Cycle IRDY:            0ws
BM IDE Status Register Read Retry:  yes
Max DRDY Pulse Width:               No limit
-----------------------Primary IDE-------Secondary IDE------
Read DMA FIFO flush:          yes                 yes
End Sector FIFO flush:         no                  no
Prefetch Buffer:              yes                 yes
Post Write Buffer:            yes                  no
Enabled:                      yes                 yes
Simplex only:                  no                  no
Cable Type:                   80w                 40w
-------------------drive0----drive1----drive2----drive3-----
Transfer Mode:       UDMA      UDMA       PIO       PIO
Address Setup:       30ns      30ns     120ns      60ns
Cmd Active:          90ns      90ns      90ns      90ns
Cmd Recovery:        30ns      30ns      90ns      90ns
Data Active:         90ns      90ns     330ns     240ns
Data Recovery:       30ns      30ns     270ns     240ns
Cycle Time:          20ns      20ns      50ns      90ns
Transfer Rate:  100.0MB/s 100.0MB/s  40.0MB/s  22.2MB/s

----------------------------------------------------
root@gurney:~# hdparm /dev/hda

/dev/hda:
 multcount    =  0 (off)
 I/O support  =  1 (32-bit)
 unmaskirq    =  1 (on)
 using_dma    =  1 (on)
 keepsettings =  0 (off)
 nowerr       =  0 (off)
 readonly     =  0 (off)
 readahead    =  8 (on)
 geometry     = 5606/255/63, sectors = 90069840, start = 0

----------------------------------------------------
root@gurney:~# hdparm -i /dev/hda

/dev/hda:

 Model=IBM-DTLA-307045, FwRev=TX6OA60A, SerialNo=YMEYMNF7564
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=40
 BuffType=DualPortCache, BuffSize=1916kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=90069840
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes: pio0 pio1 pio2 pio3 pio4
 DMA modes: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5

----------------------------------------------------
root@gurney:~# hdparm -T /dev/hda

/dev/hda:
 Timing buffer-cache reads:   128 MB in  0.88 seconds =145.45 MB/sec

----------------------------------------------------
root@gurney:~# hdparm -t /dev/hda

/dev/hda:
 Timing buffered disk reads:  64 MB in  6.11 seconds = 10.47 MB/sec

----------------------------------------------------
root@gurney:~# dmesg
Linux version 2.4.2 (root@gurney) (gcc version 2.95.2 19991024 (release)) #4 Tue Mar 20 18:02:16 CST 2001
BIOS-provided physical RAM map:
 BIOS-e820: 000000000009fc00 @ 0000000000000000 (usable)
 BIOS-e820: 0000000000000400 @ 000000000009fc00 (usable)
 BIOS-e820: 0000000000010000 @ 00000000000f0000 (reserved)
 BIOS-e820: 0000000000010000 @ 00000000ffff0000 (reserved)
 BIOS-e820: 000000000fef0000 @ 0000000000100000 (usable)
 BIOS-e820: 000000000000d000 @ 000000000fff3000 (ACPI data)
 BIOS-e820: 0000000000003000 @ 000000000fff0000 (ACPI NVS)
On node 0 totalpages: 65520
zone(0): 4096 pages.
zone(1): 61424 pages.
zone(2): 0 pages.
Kernel command line: BOOT_IMAGE=2.4.2 ro root=304
Initializing CPU#0
Detected 898.848 MHz processor.
Console: colour VGA+ 80x34
Calibrating delay loop... 1789.13 BogoMIPS
Memory: 255656k/262080k available (903k kernel code, 6036k reserved, 337k data,
192k init, 0k highmem)
Dentry-cache hash table entries: 32768 (order: 6, 262144 bytes)
Buffer-cache hash table entries: 16384 (order: 4, 65536 bytes)
Page-cache hash table entries: 65536 (order: 6, 262144 bytes)
Inode-cache hash table entries: 16384 (order: 5, 131072 bytes)
CPU: Before vendor init, caps: 0183f9ff c1c7f9ff 00000000, vendor = 2
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 256K (64 bytes/line)
CPU: After vendor init, caps: 0183f9ff c1c7f9ff 00000000 00000000
CPU: After generic, caps: 0183f9ff c1c7f9ff 00000000 00000000
CPU: Common caps: 0183f9ff c1c7f9ff 00000000 00000000
CPU: AMD Athlon(tm) Processor stepping 02
Enabling fast FPU save and restore... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.37 (20001109) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfb240, last bus=1
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Bus master read caching disabled
Unknown bridge resource 0: assuming transparent
PCI: Using IRQ router VIA [1106/0686] at 00:07.0
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Starting kswapd v1.8
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
block: queued sectors max/low 169880kB/56626kB, 512 slots per queue
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: IDE controller on PCI bus 00 dev 39
VP_IDE: chipset revision 6
VP_IDE: not 100% native mode: will probe irqs later
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
VP_IDE: VIA vt82c686b (rev 40) IDE UDMA100 controller on pci00:07.1
    ide0: BM-DMA at 0xd000-0xd007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xd008-0xd00f, BIOS settings: hdc:pio, hdd:DMA
hda: IBM-DTLA-307045, ATA DISK drive
hdb: IBM-DTLA-307045, ATA DISK drive
hdd: HITACHI CDR-7930, ATAPI CD/DVD-ROM drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
hda: 90069840 sectors (46116 MB) w/1916KiB Cache, CHS=5606/255/63, UDMA(100)
hdb: 66055248 sectors (33820 MB) w/1916KiB Cache, CHS=65531/16/63, UDMA(100)
hdd: ATAPI 8X CD-ROM drive, 128kB Cache, DMA
Uniform CD-ROM driver Revision: 3.12
Partition check:
 hda: hda1 hda2 hda3 hda4
 hdb: hdb1 hdb2
Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ SERIAL_PCI enabled
ttyS00 at 0x03f8 (irq = 4) is a 16550A
Real Time Clock Driver v1.10d
8139too Fast Ethernet driver 0.9.13 loaded
PCI: Found IRQ 10 for device 00:0b.0
PCI: The same IRQ used for device 00:0f.0
eth0: RealTek RTL8139 Fast Ethernet at 0xd0800000, 00:e0:7d:95:af:a6, IRQ 10
eth0:  Identified 8139 chip type 'RTL-8139C'
Linux agpgart interface v0.99 (c) Jeff Hartmann
agpgart: Maximum main memory to use for agp memory: 203M
agpgart: Detected Via Apollo Pro KT133 chipset
agpgart: AGP aperture is 64M @ 0xd8000000
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 2048 buckets, 16Kbytes
TCP: Hash tables configured (established 16384 bind 16384)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
reiserfs: checking transaction log (device 03:04) ...
Using r5 hash to sort names
ReiserFS version 3.6.25
VFS: Mounted root (reiserfs filesystem) readonly.
Freeing unused kernel memory: 192k freed
Adding Swap: 530136k swap-space (priority -1)
hda: Write Cache SUCCESSED Flushing!<6>hda: Write Cache SUCCESSED Flushing!<6>usb.c: registered new driver usbdevfs
usb.c: registered new driver hub




^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b chipset?
  2001-03-21  3:48 Only 10 MB/sec with via 82c686b chipset? SodaPop
@ 2001-03-21 13:48 ` egger
  2001-03-22  2:14   ` TimO
                     ` (2 more replies)
  2001-03-21 14:18 ` Only 10 MB/sec with via 82c686b chipset? Jonathan Morton
  1 sibling, 3 replies; 162+ messages in thread
From: egger @ 2001-03-21 13:48 UTC (permalink / raw)
  To: soda; +Cc: linux-kernel

On 20 Mar, SodaPop wrote:

> I have an IWill KK-266R motherboard with an athlon-c 1200
> processor in it, and for the life of me I can't get more than
> 10 MB/sec through the on-board ide controller.  Yes, all the
> appropriate support is turned on in the kernel to enable dma
> and specific chipset support, and yes, I think I have all
> relevant patches and a reasonable kernel.

 Yes, actually I'm seeing the same on a KT133 board from Elitegroup.
 Although here I get a bit more: 15 MB/s

> I noted a number of other interesting things;  one, that -X33,
> -X34, and -X64 through -X69 all have the same 10 MB/sec transfer
> rate, and two, that the 10 MB/sec transfer rate can be linearly
> increased to 12 MB/sec by raising the system bus from 100 mhz to
> 120 mhz (all components are safely rated at 133, no overclocking
> involved.)

 Duh, before making such a claim you should consider the fact that
 this is overclocking your PCI/AGP bus and I have yet to see any
 graphic cards/IDE controllers/other devices which are rated for
 37MHz PCI bus speed.

-- 

Servus,
       Daniel


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b chipset?
  2001-03-21 13:48 ` egger
@ 2001-03-22  2:14   ` TimO
  2001-03-23  2:38   ` Only 10 MB/sec with via 82c686b - FIXED SodaPop
  2001-03-23  2:59   ` Only 10 MB/sec with via 82c686b - FIXED SodaPop
  2 siblings, 0 replies; 162+ messages in thread
From: TimO @ 2001-03-22  2:14 UTC (permalink / raw)
  To: linux-kernel

egger@suse.de wrote:
> 
> On 20 Mar, SodaPop wrote:
> 
> > I have an IWill KK-266R motherboard with an athlon-c 1200
> > processor in it, and for the life of me I can't get more than
> > 10 MB/sec through the on-board ide controller.  Yes, all the
> > appropriate support is turned on in the kernel to enable dma
> > and specific chipset support, and yes, I think I have all
> > relevant patches and a reasonable kernel.
> 
>  Yes, actually I'm seeing the same on a KT133 board from Elitegroup.
>  Although here I get a bit more: 15 MB/s
> 
> > I noted a number of other interesting things;  one, that -X33,
> > -X34, and -X64 through -X69 all have the same 10 MB/sec transfer
> > rate, and two, that the 10 MB/sec transfer rate can be linearly
> > increased to 12 MB/sec by raising the system bus from 100 mhz to
> > 120 mhz (all components are safely rated at 133, no overclocking
> > involved.)
> 
>  Duh, before making such a claim you should consider the fact that
>  this is overclocking your PCI/AGP bus and I have yet to see any
>  graphic cards/IDE controllers/other devices which are rated for
>  37MHz PCI bus speed.
> 
> --
> 
> Servus,
>        Daniel

Actually this depends on the MB.  On mine for instance (Athlon also, but
not IWill), the PCI bus is a quotient of the oscillator and the FSB is
a multiple of the PCI and the CPU & ev6 are multiples of the FSB. 
Memory
speed is also a multibple of PCI.  In this case increasing the FSB 
doesn't increase the PCI.  Mine has two crystals 100/133 jumper con-
figurable.

Anyway this is probably getting way off-topic; although I'd kinda like
to see Dennis' output of both buffered and sustained output.  Looks
kinda
like a hw config problem.  Mine for comparison:

Maxtor 13.4 Gig 5400rpm ATA66
Buffered:		~170 - 175 MB
Sustained:		~24 MB inner  ~26 MB outer  
PCI: 33    FSB:  100    Memory 128M@133    CPU:  750Mhz    EV6:  200

===============
-- Tim
--------------------==============++==============--------------------

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b - FIXED
  2001-03-21 13:48 ` egger
  2001-03-22  2:14   ` TimO
@ 2001-03-23  2:38   ` SodaPop
  2001-03-23  9:48     ` Alan Cox
  2001-03-23  2:59   ` Only 10 MB/sec with via 82c686b - FIXED SodaPop
  2 siblings, 1 reply; 162+ messages in thread
From: SodaPop @ 2001-03-23  2:38 UTC (permalink / raw)
  To: egger; +Cc: linux-kernel

Regarding the overclocking of the PCI bus, I was not aware of this.  The
documentation led me to believe the pci clock was fixed, however further
experimentation indicates that's clearly not the case.  Thanks.

Regarding the fix:  I installed an Ensonique AudioPci sound card, and
experienced horrible distortion, crackling, and high pitched chirps any
time I tried to use the device.  I noticed that various interrupts were
causing chunks of the real audio to sometimes slip through; on a whim I
tried ping flooding a nearby machine and the sound quality improved
greatly.

Putting two and two together, it occurred to me that the motherboard was
having irq/interrupt routing problems.  The disks could not get reasonable
throughput because the interrupts were getting choked or held up, and the
sound card couldn't properly function either.

Wonder of wonders, I flashed the bios to the latest and greatest version.
Current data transfer rates are 35.7 MB/sec on both udma drives, exactly
as expected and darn close to the continuous read limits of the disks.
The audio also started working, flawlessly.

There are other issues however - the athlon now runs significantly hotter
at idle for one, but the most serious is that the K7 kernel optimizations
cause horrendous kernel panics and crashes.  I'm running now on a kernel
compiled for 386, which seems to be stable.  I'll attempt to build other
kernels to see if I can figure out whats going on.

Net result:  IWill KK266 motherboards have bios problems, it may be a good
idea to upgrade the bios.

-dennis T

On Wed, 21 Mar 2001 egger@suse.de wrote:

> On 20 Mar, SodaPop wrote:
>
> > I have an IWill KK-266R motherboard with an athlon-c 1200
> > processor in it, and for the life of me I can't get more than
> > 10 MB/sec through the on-board ide controller.  Yes, all the
> > appropriate support is turned on in the kernel to enable dma
> > and specific chipset support, and yes, I think I have all
> > relevant patches and a reasonable kernel.
>
>  Yes, actually I'm seeing the same on a KT133 board from Elitegroup.
>  Although here I get a bit more: 15 MB/s
>
> > I noted a number of other interesting things;  one, that -X33,
> > -X34, and -X64 through -X69 all have the same 10 MB/sec transfer
> > rate, and two, that the 10 MB/sec transfer rate can be linearly
> > increased to 12 MB/sec by raising the system bus from 100 mhz to
> > 120 mhz (all components are safely rated at 133, no overclocking
> > involved.)
>
>  Duh, before making such a claim you should consider the fact that
>  this is overclocking your PCI/AGP bus and I have yet to see any
>  graphic cards/IDE controllers/other devices which are rated for
>  37MHz PCI bus speed.
>
> --
>
> Servus,
>        Daniel
>

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b - FIXED
  2001-03-23  2:38   ` Only 10 MB/sec with via 82c686b - FIXED SodaPop
@ 2001-03-23  9:48     ` Alan Cox
  2001-03-23 16:21       ` SodaPop
  2001-03-23 17:00       ` [PATCH] Prevent OOM from killing init SodaPop
  0 siblings, 2 replies; 162+ messages in thread
From: Alan Cox @ 2001-03-23  9:48 UTC (permalink / raw)
  To: SodaPop; +Cc: egger, linux-kernel

> Wonder of wonders, I flashed the bios to the latest and greatest version.
> Current data transfer rates are 35.7 MB/sec on both udma drives, exactly
> as expected and darn close to the continuous read limits of the disks.
> The audio also started working, flawlessly.
> 
> There are other issues however - the athlon now runs significantly hotter
> at idle for one, but the most serious is that the K7 kernel optimizations
> cause horrendous kernel panics and crashes.  I'm running now on a kernel
> compiled for 386, which seems to be stable.  I'll attempt to build other
> kernels to see if I can figure out whats going on.

Check the bios update didnt leave some of the other configuration values
wrong. A 'reset to factory defaults' and resetting the stuff you need might
be a good idea. Could be it now has voltages wrong or something like that

Alan


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b - FIXED
  2001-03-23  9:48     ` Alan Cox
@ 2001-03-23 16:21       ` SodaPop
  2001-03-23 17:00       ` [PATCH] Prevent OOM from killing init SodaPop
  1 sibling, 0 replies; 162+ messages in thread
From: SodaPop @ 2001-03-23 16:21 UTC (permalink / raw)
  To: Alan Cox; +Cc: egger, linux-kernel

Thanks Alan, but no dice.  Most of the stuff on the board is autodetect
anyway, but even after a reset it exhibited the same behaviour.

Windows, however, continues to run fine.  It's not properly set up with
all the various drivers installed though, so its probably running with the
equivalent of a 486 kernel as well.

I tried rebuilding with K6-II as the cpu type instead of K7, that works
fine.  Is there any data in particular you'd like me to collect or
experiments you'd like me to try?

-dennis T


On Fri, 23 Mar 2001, Alan Cox wrote:

> > Wonder of wonders, I flashed the bios to the latest and greatest version.
> > Current data transfer rates are 35.7 MB/sec on both udma drives, exactly
> > as expected and darn close to the continuous read limits of the disks.
> > The audio also started working, flawlessly.
> >
> > There are other issues however - the athlon now runs significantly hotter
> > at idle for one, but the most serious is that the K7 kernel optimizations
> > cause horrendous kernel panics and crashes.  I'm running now on a kernel
> > compiled for 386, which seems to be stable.  I'll attempt to build other
> > kernels to see if I can figure out whats going on.
>
> Check the bios update didnt leave some of the other configuration values
> wrong. A 'reset to factory defaults' and resetting the stuff you need might
> be a good idea. Could be it now has voltages wrong or something like that
>
> Alan
>


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  9:48     ` Alan Cox
  2001-03-23 16:21       ` SodaPop
@ 2001-03-23 17:00       ` SodaPop
  2001-03-23 18:42         ` Martin Dalecki
  2001-03-23 19:19         ` Jonathan Morton
  1 sibling, 2 replies; 162+ messages in thread
From: SodaPop @ 2001-03-23 17:00 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

Rik, is there any way we could get a /proc entry for this, so that one
could do something like:

cat /proc/oom-kill-scores | sort +3

to get a process list (similar to ps) with a field for the current oom
scores?  It would likely be very useful to be able to dump the current
scores and see what will be the first thing to die, and may help people
tune the killer for specific uses.

Part of the current problem with the OOM killer is that people only know
what it's going to kill after it's too late.

-dennis T

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:00       ` [PATCH] Prevent OOM from killing init SodaPop
@ 2001-03-23 18:42         ` Martin Dalecki
  2001-03-23 20:25           ` SodaPop
  2001-03-23 19:19         ` Jonathan Morton
  1 sibling, 1 reply; 162+ messages in thread
From: Martin Dalecki @ 2001-03-23 18:42 UTC (permalink / raw)
  To: SodaPop; +Cc: Rik van Riel, linux-kernel

SodaPop wrote:
> 
> Rik, is there any way we could get a /proc entry for this, so that one
> could do something like:

I will respond; NO there is no way for security reasons this is not a
good idea.
 
> cat /proc/oom-kill-scores | sort +3

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 18:42         ` Martin Dalecki
@ 2001-03-23 20:25           ` SodaPop
  2001-03-23 20:33             ` Martin Dalecki
  0 siblings, 1 reply; 162+ messages in thread
From: SodaPop @ 2001-03-23 20:25 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Rik van Riel, linux-kernel

On Fri, 23 Mar 2001, Martin Dalecki wrote:

> SodaPop wrote:
> >
> > Rik, is there any way we could get a /proc entry for this, so that one
> > could do something like:
>
> I will respond; NO there is no way for security reasons this is not a
> good idea.
>
> > cat /proc/oom-kill-scores | sort +3

Oh, you mean like /proc/kcore is a bad idea for security reasons?

Duh, make its permission bits 400.

-dennis T



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 20:25           ` SodaPop
@ 2001-03-23 20:33             ` Martin Dalecki
  0 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-23 20:33 UTC (permalink / raw)
  To: SodaPop; +Cc: Rik van Riel, linux-kernel

SodaPop wrote:
> 
> On Fri, 23 Mar 2001, Martin Dalecki wrote:
> 
> > SodaPop wrote:
> > >
> > > Rik, is there any way we could get a /proc entry for this, so that one
> > > could do something like:
> >
> > I will respond; NO there is no way for security reasons this is not a
> > good idea.
> >
> > > cat /proc/oom-kill-scores | sort +3
> 
> Oh, you mean like /proc/kcore is a bad idea for security reasons?

Yes. It should be the good old /dev/core anyway.
But its far more obscure to hack at, since it isn't plain text,
so basically it's far more difficult to get mands on it...

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:00       ` [PATCH] Prevent OOM from killing init SodaPop
  2001-03-23 18:42         ` Martin Dalecki
@ 2001-03-23 19:19         ` Jonathan Morton
  1 sibling, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-23 19:19 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: linux-kernel

>> Rik, is there any way we could get a /proc entry for this, so that one
>> could do something like:
>
>I will respond; NO there is no way for security reasons this is not a
>good idea.

Just out of interest, what information does the OOM score expose that isn't
already available to Joe Random Unprivileged User?  Looking at my 2.4.1
source, nothing.  The badness() function uses the following:

- memory size
- run time
- cpu time
- nice value
- if it's a root process
- (rare) if process has direct hardware access

Apart from the last item, which is rarely encountered, all the above info
is available using 'top' or 'ps' or via the /proc filesystem already, by
any unprivileged user (unless you've make /proc su-access only, in which
case your point is moot anyway).

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b - FIXED
  2001-03-21 13:48 ` egger
  2001-03-22  2:14   ` TimO
  2001-03-23  2:38   ` Only 10 MB/sec with via 82c686b - FIXED SodaPop
@ 2001-03-23  2:59   ` SodaPop
  2 siblings, 0 replies; 162+ messages in thread
From: SodaPop @ 2001-03-23  2:59 UTC (permalink / raw)
  To: egger; +Cc: linux-kernel

Further note - kernels built with K6-2 support seem to be just fine.  But
all athlon/K7 kernels die horribly, with greatly varying death messages.
Most commonly I get bogus pointer/dereference errors and eventually init
gets killed, other times it just locks up, sometimes I get things like
'cannot exec syslogd: Out of memory'.  It looks like the memory registers
are horked up somehow.

I could try to copy some of this out by hand if anyone thought it
worthwhile.  Either way, I think IWill has some work to do yet on their
system bios.

-dennis T




On Wed, 21 Mar 2001 egger@suse.de wrote:

> On 20 Mar, SodaPop wrote:
>
> > I have an IWill KK-266R motherboard with an athlon-c 1200
> > processor in it, and for the life of me I can't get more than
> > 10 MB/sec through the on-board ide controller.  Yes, all the
> > appropriate support is turned on in the kernel to enable dma
> > and specific chipset support, and yes, I think I have all
> > relevant patches and a reasonable kernel.
>
>  Yes, actually I'm seeing the same on a KT133 board from Elitegroup.
>  Although here I get a bit more: 15 MB/s
>
> > I noted a number of other interesting things;  one, that -X33,
> > -X34, and -X64 through -X69 all have the same 10 MB/sec transfer
> > rate, and two, that the 10 MB/sec transfer rate can be linearly
> > increased to 12 MB/sec by raising the system bus from 100 mhz to
> > 120 mhz (all components are safely rated at 133, no overclocking
> > involved.)
>
>  Duh, before making such a claim you should consider the fact that
>  this is overclocking your PCI/AGP bus and I have yet to see any
>  graphic cards/IDE controllers/other devices which are rated for
>  37MHz PCI bus speed.
>
> --
>
> Servus,
>        Daniel
>


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b chipset?
  2001-03-21  3:48 Only 10 MB/sec with via 82c686b chipset? SodaPop
  2001-03-21 13:48 ` egger
@ 2001-03-21 14:18 ` Jonathan Morton
  2001-03-21 17:34   ` egger
  1 sibling, 1 reply; 162+ messages in thread
From: Jonathan Morton @ 2001-03-21 14:18 UTC (permalink / raw)
  To: egger; +Cc: linux-kernel

> Duh, before making such a claim you should consider the fact that
> this is overclocking your PCI/AGP bus and I have yet to see any
> graphic cards/IDE controllers/other devices which are rated for
> 37MHz PCI bus speed.

The "blue and white" PowerMac G3 and certain early PowerMac G4s used a
66MHz PCI card for graphics in lieu of proper AGP.  66MHz PCI is used in
certain high-end workstations, as well, but it's not normally found on
consumer-level devices.

Look at 'lspci -vvv' output for the "66MHz" flag on the devices listed
there - all the ones in my Duron system leave it unset, except for my (very
recent and pretty nippy) SCSI controller and (AGP) video card.

That said, *most* PCI devices don't like being overclocked, and it's not
well known that pushing the system bus also pushes the PCI and ISA buses in
the same manner.  A friend of mine had *severe* locking problems with his
system when he inserted his cheap SCSI adapter into his overclocked
machine, even though the other cards handled it OK (relatively speaking -
I'm not convinced).  I don't know how far he'd overclocked it, but 37MHz
kinda rings true.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: Only 10 MB/sec with via 82c686b chipset?
  2001-03-21 14:18 ` Only 10 MB/sec with via 82c686b chipset? Jonathan Morton
@ 2001-03-21 17:34   ` egger
  0 siblings, 0 replies; 162+ messages in thread
From: egger @ 2001-03-21 17:34 UTC (permalink / raw)
  To: chromi; +Cc: linux-kernel

On 21 Mar, Jonathan Morton wrote:

> The "blue and white" PowerMac G3 and certain early PowerMac G4s used a
> 66MHz PCI card for graphics in lieu of proper AGP.  66MHz PCI is used
> in certain high-end workstations, as well, but it's not normally found
> on consumer-level devices.

> Look at 'lspci -vvv' output for the "66MHz" flag on the devices listed
> there - all the ones in my Duron system leave it unset, except for my
> (very recent and pretty nippy) SCSI controller and (AGP) video card.

 Well, I wasn't talking about 66Mhz nor about 64bit cards but rather 
 normal consumer cards which are specified for 33Mhz.

> That said, *most* PCI devices don't like being overclocked, and it's
> not well known that pushing the system bus also pushes the PCI and ISA
> buses in the same manner.  A friend of mine had *severe* locking
> problems with his system when he inserted his cheap SCSI adapter into
> his overclocked machine, even though the other cards handled it OK
> (relatively speaking - I'm not convinced).  I don't know how far he'd
> overclocked it, but 37MHz kinda rings true.

 Trying to enhance a systems performance by overclocking the bus is
 about the most stupid thing one can do as one ANY of the connected
 devices memory/CPU/chipset/PCI devices/AGP cards (just to name a
 few) have a high probability of failing and all those probabilites
 factor up which basically means that the system is in a unpredictable
 state which is never acceptable for any serious kind of system. 

 Better leave the overclocking of busses to kids who are bored
 by their live.

-- 

Servus,
       Daniel

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] Prevent OOM from killing init
@ 2001-03-21 22:54 Patrick O'Rourke
  2001-03-21 23:11 ` Eli Carter
  2001-03-21 23:48 ` Rik van Riel
  0 siblings, 2 replies; 162+ messages in thread
From: Patrick O'Rourke @ 2001-03-21 22:54 UTC (permalink / raw)
  To: linux-mm, linux-kernel

Since the system will panic if the init process is chosen by
the OOM killer, the following patch prevents select_bad_process()
from picking init.

Pat

--- xxx/linux-2.4.3-pre6/mm/oom_kill.c  Tue Nov 14 13:56:46 2000
+++ linux-2.4.3-pre6/mm/oom_kill.c      Wed Mar 21 15:25:03 2001
@@ -123,7 +123,7 @@

         read_lock(&tasklist_lock);
         for_each_task(p) {
-               if (p->pid) {
+               if (p->pid && p->pid != 1) {
                         int points = badness(p);
                         if (points > maxpoints) {
                                 chosen = p;

-- 
Patrick O'Rourke
978.606.0236
orourke@missioncriticallinux.com


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 22:54 [PATCH] Prevent OOM from killing init Patrick O'Rourke
@ 2001-03-21 23:11 ` Eli Carter
  2001-03-21 23:40   ` Patrick O'Rourke
  2001-03-21 23:48 ` Rik van Riel
  1 sibling, 1 reply; 162+ messages in thread
From: Eli Carter @ 2001-03-21 23:11 UTC (permalink / raw)
  To: Patrick O'Rourke; +Cc: linux-mm, linux-kernel

Patrick O'Rourke wrote:
> 
> Since the system will panic if the init process is chosen by
> the OOM killer, the following patch prevents select_bad_process()
> from picking init.
> 
> Pat
> 
> --- xxx/linux-2.4.3-pre6/mm/oom_kill.c  Tue Nov 14 13:56:46 2000
> +++ linux-2.4.3-pre6/mm/oom_kill.c      Wed Mar 21 15:25:03 2001
> @@ -123,7 +123,7 @@
> 
>          read_lock(&tasklist_lock);
>          for_each_task(p) {
> -               if (p->pid) {
> +               if (p->pid && p->pid != 1) {
>                          int points = badness(p);
>                          if (points > maxpoints) {
>                                  chosen = p;
> 

Having not looked at the code... Why not "if( p->pid > 1 )"?  (Or can
p->pid can be negative?!, um, typecast to unsigned...)

Eli
-----------------------.           Rule of Accuracy: When working toward
Eli Carter             |            the solution of a problem, it always 
eli.carter(at)inet.com `------------------ helps if you know the answer.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:11 ` Eli Carter
@ 2001-03-21 23:40   ` Patrick O'Rourke
  0 siblings, 0 replies; 162+ messages in thread
From: Patrick O'Rourke @ 2001-03-21 23:40 UTC (permalink / raw)
  To: Eli Carter; +Cc: linux-mm, linux-kernel

Eli Carter wrote:

> Having not looked at the code... Why not "if( p->pid > 1 )"?  (Or can
> p->pid can be negative?!, um, typecast to unsigned...)

I simply mirrored the check done in do_exit():

	if (tsk->pid == 1)
		panic("Attempted to kill init!");

Since PID_MAX is 32768 I do not believe pids can be negative.

I suppose one could make an argument for skipping "daemons", i.e.
pids below 300 (see the get_pid() function in kernel/fork.c), but
I think that is a larger issue.

Pat

-- 
Patrick O'Rourke
978.606.0236
orourke@missioncriticallinux.com

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 22:54 [PATCH] Prevent OOM from killing init Patrick O'Rourke
  2001-03-21 23:11 ` Eli Carter
@ 2001-03-21 23:48 ` Rik van Riel
  2001-03-22  8:14   ` Eric W. Biederman
                     ` (5 more replies)
  1 sibling, 6 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-21 23:48 UTC (permalink / raw)
  To: Patrick O'Rourke; +Cc: linux-mm, linux-kernel

On Wed, 21 Mar 2001, Patrick O'Rourke wrote:

> Since the system will panic if the init process is chosen by
> the OOM killer, the following patch prevents select_bad_process()
> from picking init.

One question ... has the OOM killer ever selected init on
anybody's system ?

I think that the scoring algorithm should make sure that
we never pick init, unless the system is screwed so badly
that init is broken or the only process left ;)

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:48 ` Rik van Riel
@ 2001-03-22  8:14   ` Eric W. Biederman
  2001-03-22  9:24     ` Rik van Riel
  2001-03-22 19:29     ` Philipp Rumpf
  2001-03-22 11:47   ` Guest section DW
                     ` (4 subsequent siblings)
  5 siblings, 2 replies; 162+ messages in thread
From: Eric W. Biederman @ 2001-03-22  8:14 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

Rik van Riel <riel@conectiva.com.br> writes:

> On Wed, 21 Mar 2001, Patrick O'Rourke wrote:
> 
> > Since the system will panic if the init process is chosen by
> > the OOM killer, the following patch prevents select_bad_process()
> > from picking init.
> 
> One question ... has the OOM killer ever selected init on
> anybody's system ?
> 
> I think that the scoring algorithm should make sure that
> we never pick init, unless the system is screwed so badly
> that init is broken or the only process left ;)

Is there ever a case where killing init is the right thing to do?
My impression is that if init is selected the whole machine dies.
If you can kill init and still have a machine that mostly works,
then I guess it makes some sense not to kill it.

Guaranteeing not to select init can buy you piece of mind because
init if properly setup can put the machine back together again, while
not special casing init means something weird might happen and init
would be selected.

Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22  8:14   ` Eric W. Biederman
@ 2001-03-22  9:24     ` Rik van Riel
  2001-03-22 19:29     ` Philipp Rumpf
  1 sibling, 0 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-22  9:24 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

On 22 Mar 2001, Eric W. Biederman wrote:

> Is there ever a case where killing init is the right thing to do? My
> impression is that if init is selected the whole machine dies. If you
> can kill init and still have a machine that mostly works, then I guess
> it makes some sense not to kill it.
>
> Guaranteeing not to select init can buy you piece of mind because
> init if properly setup can put the machine back together again, while
> not special casing init means something weird might happen and init
> would be selected.

When something weird happens, it might be better to kill
init and have the machine reset itself after the panic
(echo 30 > /proc/sys/kernel/panic).

Killing all other things and leaving just init intact
makes for a machine which is as good as dead, without a
chance for recovery-by-reboot...

OTOH, I haven't heard of the OOM killer ever chosing init,
not even of people who tried creating these special kinds
of situations to trigger it on purpose.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22  8:14   ` Eric W. Biederman
  2001-03-22  9:24     ` Rik van Riel
@ 2001-03-22 19:29     ` Philipp Rumpf
  1 sibling, 0 replies; 162+ messages in thread
From: Philipp Rumpf @ 2001-03-22 19:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

On Thu, Mar 22, 2001 at 01:14:41AM -0700, Eric W. Biederman wrote:
> Rik van Riel <riel@conectiva.com.br> writes:
> Is there ever a case where killing init is the right thing to do?

There are cases where panic() is the right thing to do.  Broken init
is such a case.

> My impression is that if init is selected the whole machine dies.
> If you can kill init and still have a machine that mostly works,

you can't.

> Guaranteeing not to select init can buy you piece of mind because
> init if properly setup can put the machine back together again, while
> not special casing init means something weird might happen and init
> would be selected.

If we're in a situation where long-running processes with relatively
small VM are killed the box is very unlikely to be usable anyway.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:48 ` Rik van Riel
  2001-03-22  8:14   ` Eric W. Biederman
@ 2001-03-22 11:47   ` Guest section DW
  2001-03-22 15:01     ` Rik van Riel
                       ` (3 more replies)
  2001-03-22 14:53   ` Patrick O'Rourke
                     ` (3 subsequent siblings)
  5 siblings, 4 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-22 11:47 UTC (permalink / raw)
  To: Rik van Riel, Patrick O'Rourke; +Cc: linux-mm, linux-kernel

On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote:
> On Wed, 21 Mar 2001, Patrick O'Rourke wrote:

> > Since the system will panic if the init process is chosen by
> > the OOM killer, the following patch prevents select_bad_process()
> > from picking init.

There is a dozen other processes that must not be killed.
Init is just a random example.

> One question ... has the OOM killer ever selected init on
> anybody's system ?

Last week I installed SuSE 7.1 somewhere.
During the install: "VM: killing process rpm",
leaving the installer rather confused.
(An empty machine, 256MB, 144MB swap, I think 2.2.18.)

Last month I had a computer algebra process running for a week.
Killed. But this computation was the only task this machine had.
Its sole reason of existence.
Too bad - zero information out of a week's computation.
(I think 2.4.0.)

Clearly, Linux cannot be reliable if any process can be killed
at any moment. I am not happy at all with my recent experiences.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 11:47   ` Guest section DW
@ 2001-03-22 15:01     ` Rik van Riel
  2001-03-22 19:04       ` Guest section DW
  2001-03-22 16:41     ` Eric W. Biederman
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 162+ messages in thread
From: Rik van Riel @ 2001-03-22 15:01 UTC (permalink / raw)
  To: Guest section DW; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

On Thu, 22 Mar 2001, Guest section DW wrote:

> > One question ... has the OOM killer ever selected init on
> > anybody's system ?
> 
> Last week I installed SuSE 7.1 somewhere.
> During the install: "VM: killing process rpm",
> leaving the installer rather confused.
> (An empty machine, 256MB, 144MB swap, I think 2.2.18.)

That's the 2.2 kernel ...


> Last month I had a computer algebra process running for a week.
> Killed. But this computation was the only task this machine had.
> Its sole reason of existence.
> Too bad - zero information out of a week's computation.
> (I think 2.4.0.)
> 
> Clearly, Linux cannot be reliable if any process can be killed
> at any moment. I am not happy at all with my recent experiences.

Note that the OOM killer in 2.4 won't kick in until your machine
is out of both memory and swap, see mm/oom_kill.c::out_of_memory().

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 15:01     ` Rik van Riel
@ 2001-03-22 19:04       ` Guest section DW
  0 siblings, 0 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-22 19:04 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

On Thu, Mar 22, 2001 at 12:01:43PM -0300, Rik van Riel wrote:

> > Last month I had a computer algebra process running for a week.
> > Killed. But this computation was the only task this machine had.
> > Its sole reason of existence.
> > Too bad - zero information out of a week's computation.
> > 
> > Clearly, Linux cannot be reliable if any process can be killed
> > at any moment. I am not happy at all with my recent experiences.
> 
> Note that the OOM killer in 2.4 won't kick in until your machine
> is out of both memory and swap, see mm/oom_kill.c::out_of_memory().

Nevertheless, this process does malloc and malloc returns the requested
memory. If a malloc fails the computer algebra process has the choice
between various alternatives. Present a prompt, so that the user can
examine variables and intermediate results, or request a dump to disk
of the status of the computation. Or choose an alternative algorithm,
at some other point of the space-time tradeoff curve.
But no error return from malloc - just "Killed". Ach.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 11:47   ` Guest section DW
  2001-03-22 15:01     ` Rik van Riel
@ 2001-03-22 16:41     ` Eric W. Biederman
  2001-03-22 20:28     ` Stephen Clouse
  2001-03-23 17:26     ` James A. Sutherland
  3 siblings, 0 replies; 162+ messages in thread
From: Eric W. Biederman @ 2001-03-22 16:41 UTC (permalink / raw)
  To: Guest section DW
  Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

Guest section DW <dwguest@win.tue.nl> writes:

> On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote:
> > On Wed, 21 Mar 2001, Patrick O'Rourke wrote:
> 
> > > Since the system will panic if the init process is chosen by
> > > the OOM killer, the following patch prevents select_bad_process()
> > > from picking init.
> 
> There is a dozen other processes that must not be killed.
> Init is just a random example.

Not killing init provides enough for recovery if you truly hit
an out of memory situation.  With 2.4.x at least it is a box
misconfiguration that causes it.   The 2.2.x VM doesn't always try
to swap, and free things up hard enough, before reporting out of
memory.  But even the 2.2.x problems are rare.

> 
> > One question ... has the OOM killer ever selected init on
> > anybody's system ?
> 
> Last week I installed SuSE 7.1 somewhere.
> During the install: "VM: killing process rpm",
> leaving the installer rather confused.
> (An empty machine, 256MB, 144MB swap, I think 2.2.18.)

swap < RAM. ouch!  This is a misconfiguration on a machine that
actually starts swapping, and where out of memory problems are a
reality.  The fact an installer would trigger swapping on a 256MB
machine is a second problem. 

> Last month I had a computer algebra process running for a week.
> Killed. But this computation was the only task this machine had.
> Its sole reason of existence.
> Too bad - zero information out of a week's computation.
> (I think 2.4.0.)

It looks like you didn't have enough resources on that machine
period.  I pretty much trust 2.4.x in this department.  Did that
machine also have it's swap misconfigured?

> 
> Clearly, Linux cannot be reliable if any process can be killed
> at any moment. I am not happy at all with my recent experiences.

Hmm.  It should definitely not be at any moment.  It should only be
when resources are exhausted.  So putting enough swap on a machine
should be enough, to stop this from ever happening.

Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 11:47   ` Guest section DW
  2001-03-22 15:01     ` Rik van Riel
  2001-03-22 16:41     ` Eric W. Biederman
@ 2001-03-22 20:28     ` Stephen Clouse
  2001-03-22 21:01       ` Ingo Oeser
                         ` (4 more replies)
  2001-03-23 17:26     ` James A. Sutherland
  3 siblings, 5 replies; 162+ messages in thread
From: Stephen Clouse @ 2001-03-22 20:28 UTC (permalink / raw)
  To: Guest section DW
  Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

[-- Attachment #1: msg.pgp --]
[-- Type: text/plain, Size: 1765 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Mar 22, 2001 at 12:47:27PM +0100, Guest section DW wrote:
> Last week I installed SuSE 7.1 somewhere.
> During the install: "VM: killing process rpm",
> leaving the installer rather confused.
> (An empty machine, 256MB, 144MB swap, I think 2.2.18.)
> 
> Last month I had a computer algebra process running for a week.
> Killed. But this computation was the only task this machine had.
> Its sole reason of existence.
> Too bad - zero information out of a week's computation.
> (I think 2.4.0.)
> 
> Clearly, Linux cannot be reliable if any process can be killed
> at any moment. I am not happy at all with my recent experiences.

Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
logically justify annihilating large-VM processes that have been running for 
days or weeks instead of just returning ENOMEM to a process that just started 
up.

We run Oracle on a development box here, and it's always the first to get the
axe (non-root process using 70-80 MB VM).  Whenever someone's testing decides to 
run away with memory, I usually spend the rest of the day getting intimate with
the backup files, since SIGKILLing random Oracle processes, as you might have
guessed, has a tendency to rape the entire database.

It would be nice to give immunity to certain uids, or better yet, just turn the
damn thing off entirely.  I've already hacked that in...errr, out.

- -- 
Stephen Clouse <stephenc@theiqgroup.com>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-----BEGIN PGP SIGNATURE-----
Version: PGP 6.5.8

iQA/AwUBOrpgbgOGqGs0PadnEQLp5QCfZMwtDZRNwYQ6RJX0MJ8lRVHTj3YAoNlt
pFWT2i+2y+Yze/6EYy9V0oaE
=QIrK
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 20:28     ` Stephen Clouse
@ 2001-03-22 21:01       ` Ingo Oeser
  2001-03-22 21:23       ` Alan Cox
                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 162+ messages in thread
From: Ingo Oeser @ 2001-03-22 21:01 UTC (permalink / raw)
  To: Stephen Clouse
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

On Thu, Mar 22, 2001 at 02:28:31PM -0600, Stephen Clouse wrote:
[Another OOM-Killing thread] 
> It would be nice to give immunity to certain uids, or better
> yet, just turn the damn thing off entirely.  I've already
> hacked that in...errr, out.

That's fine and suits best for all.

I have provided an API for installing such OOM handlers (and have
provided even an simple example for using it).

See http://www.tu-chemnitz.de/~ioe/oom-kill-api/index.html for
details.

It applies to all regular kernels and with some offsets even to
ac20. So this is the way to go for custom OOM handling. 

Rik noted once, that not much research has been done yet on this
topic and that he is certain, that his code cannot cover all
cases.

Linus on the other hand doesn't like the idea of 'plugins' for
core kernel code. 

So this patch is the best thing, that can be done about the
situation.

All work should be based on it, since it allows customers and
researchers, that LIKE to try such 'plugins' to try all of them
instead of having to patch and recompile the kernel for every OOM
handler available.

I would LOVE to start a link collection for all OOM handlers
based on my patch or even host them, IF they are implemented as
modules (as suggested by my API). This should avoid duplicate
effort of this.

Of course I hope to satisfy all needs by this. I'm also willing
to include any API changes (read: exported functions, structs and
variables) necessary for some OOM handlers in my patch.

Thanks & Regards

Ingo Oeser
-- 
10.+11.03.2001 - 3. Chemnitzer LinuxTag <http://www.tu-chemnitz.de/linux/tag>
         <<<<<<<<<<<<     been there and had much fun   >>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 20:28     ` Stephen Clouse
  2001-03-22 21:01       ` Ingo Oeser
@ 2001-03-22 21:23       ` Alan Cox
  2001-03-22 22:00         ` Guest section DW
                           ` (3 more replies)
  2001-03-23  1:31       ` Michael Peddemors
                         ` (2 subsequent siblings)
  4 siblings, 4 replies; 162+ messages in thread
From: Alan Cox @ 2001-03-22 21:23 UTC (permalink / raw)
  To: Stephen Clouse
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

> Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
> logically justify annihilating large-VM processes that have been running for 
> days or weeks instead of just returning ENOMEM to a process that just started 
> up.

How do you return an out of memory error to a C program that is out of memory
due to a stack growth fault. There is actually not a language construct for it

> It would be nice to give immunity to certain uids, or better yet, just turn the
> damn thing off entirely.  I've already hacked that in...errr, out.

Eventually you have to kill something or the machine deadlocks. The oom killing
doesnt kick in until that point. So its up to you how you like your errors.

One of the things that we badly need to resurrect for 2.5 is the beancounter
work which would let you reasonably do things like guaranteed Oracle a certain
amount of the machine, or restrict all the untrusted users to a total of 200Mb
hard limit between them etc


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 21:23       ` Alan Cox
@ 2001-03-22 22:00         ` Guest section DW
  2001-03-22 22:12           ` Ed Tomlinson
                             ` (2 more replies)
  2001-03-22 22:10         ` Doug Ledford
                           ` (2 subsequent siblings)
  3 siblings, 3 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-22 22:00 UTC (permalink / raw)
  To: Alan Cox, Stephen Clouse
  Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

On Thu, Mar 22, 2001 at 09:23:54PM +0000, Alan Cox wrote:
> > Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
> > logically justify annihilating large-VM processes that have been running for 
> > days or weeks instead of just returning ENOMEM to a process that just started 
> > up.
> 
> How do you return an out of memory error to a C program that is out of memory
> due to a stack growth fault. There is actually not a language construct for it

Alan, this is a fake argument.
Linux is bad, and you defend it by saying that it is impossible to be perfect.

I have used various Unix flavours for approximately thirty years.
Stack overflow has not been a real problem. Of course they occurred
every now and then, but roughly speaking only for unchecked recursion,
that is, in cases of a program bug.

Presently however, a flawless program can be killed.
That is what makes Linux unreliable.

> Eventually you have to kill something or the machine deadlocks.

Alan, this is a fake argument.
When I have a computer algebra system, and it computes millions of
function values for some expensive function, then it keeps a cache
of already computed values. Maybe a value is needed again and we
save ten seconds of computation.
But of course, when we run out of memory, nothing is easier than
just throwing this cache out.

You see, the bug is that malloc does not fail. This means that the
decisions about what to do are not taken by the program that knows
what it is doing, but by the kernel.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 22:00         ` Guest section DW
@ 2001-03-22 22:12           ` Ed Tomlinson
  2001-03-22 22:52           ` Alan Cox
  2001-03-23 19:57           ` Szabolcs Szakacsits
  2 siblings, 0 replies; 162+ messages in thread
From: Ed Tomlinson @ 2001-03-22 22:12 UTC (permalink / raw)
  To: Guest section DW, Alan Cox, Stephen Clouse
  Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

On Thursday 22 March 2001 17:00, Guest section DW wrote:
> On Thu, Mar 22, 2001 at 09:23:54PM +0000, Alan Cox wrote:
> > > Really the whole oom_kill process seems bass-ackwards to me.  I can't
> > > in my mind logically justify annihilating large-VM processes that have
> > > been running for days or weeks instead of just returning ENOMEM to a
> > > process that just started up.
> >
> > How do you return an out of memory error to a C program that is out of
> > memory due to a stack growth fault. There is actually not a language
> > construct for it
>
> Alan, this is a fake argument.
> Linux is bad, and you defend it by saying that it is impossible to be
> perfect.
>
> I have used various Unix flavours for approximately thirty years.
> Stack overflow has not been a real problem. Of course they occurred
> every now and then, but roughly speaking only for unchecked recursion,
> that is, in cases of a program bug.
>
> Presently however, a flawless program can be killed.
> That is what makes Linux unreliable.
>
> > Eventually you have to kill something or the machine deadlocks.
>
> Alan, this is a fake argument.
> When I have a computer algebra system, and it computes millions of
> function values for some expensive function, then it keeps a cache
> of already computed values. Maybe a value is needed again and we
> save ten seconds of computation.
> But of course, when we run out of memory, nothing is easier than
> just throwing this cache out.
>
> You see, the bug is that malloc does not fail. This means that the
> decisions about what to do are not taken by the program that knows
> what it is doing, but by the kernel.

By this arguement the OOM kill code is fine...  If malloc is broken fix it.  
Maybe we need to stage things so that ENOMEM gets returned to requests
before we are totally out of memory.  If the apps ignore the errors then the
kills happen.

Thoughts?
Ed Tomlinson

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 22:00         ` Guest section DW
  2001-03-22 22:12           ` Ed Tomlinson
@ 2001-03-22 22:52           ` Alan Cox
  2001-03-22 23:27             ` Guest section DW
  2001-03-23 19:57           ` Szabolcs Szakacsits
  2 siblings, 1 reply; 162+ messages in thread
From: Alan Cox @ 2001-03-22 22:52 UTC (permalink / raw)
  To: Guest section DW
  Cc: Alan Cox, Stephen Clouse, Rik van Riel, Patrick O'Rourke,
	linux-mm, linux-kernel

> > Eventually you have to kill something or the machine deadlocks.
> 
> Alan, this is a fake argument.

No it is not.

> You see, the bug is that malloc does not fail. This means that the
> decisions about what to do are not taken by the program that knows
> what it is doing, but by the kernel.

Even if malloc fails the situation is no different. You can do 
overcommit avoidance in Linux if you are bored enough to try it. I did it
in 1.2 one afternoon when bored. You simply account address space. Almost
everything you need to touch is in mm/*.c and localised. The only exception
is ptrace.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 22:52           ` Alan Cox
@ 2001-03-22 23:27             ` Guest section DW
  2001-03-22 23:37               ` Rik van Riel
  2001-03-22 23:40               ` Alan Cox
  0 siblings, 2 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-22 23:27 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stephen Clouse, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

On Thu, Mar 22, 2001 at 10:52:09PM +0000, Alan Cox wrote:

> > You see, the bug is that malloc does not fail. This means that the
> > decisions about what to do are not taken by the program that knows
> > what it is doing, but by the kernel.

> Even if malloc fails the situation is no different.

Why do you say so?

> You can do overcommit avoidance in Linux if you are bored enough to try it.

Would you accept it as the default? Would Linus?

(With disk I/O we are terribly conservative, using very cautious settings,
and many people use hdparm to double or triple their disk speed.
But for a few these optimistic settings cause data corruption,
so we do not make it the default.
Similarly I would be happy if the "no overcommit", "no OOM killer"
situation was the default. The people who need a reliable system
will leave it that way. The people who do not mind if some process
is killed once in a while use vmparm or /proc/vm/overcommit or so
to make Linux achieve more on average.)

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:27             ` Guest section DW
@ 2001-03-22 23:37               ` Rik van Riel
  2001-03-26 19:04                 ` James Antill
  2001-03-22 23:40               ` Alan Cox
  1 sibling, 1 reply; 162+ messages in thread
From: Rik van Riel @ 2001-03-22 23:37 UTC (permalink / raw)
  To: Guest section DW
  Cc: Alan Cox, Stephen Clouse, Patrick O'Rourke, linux-mm,
	linux-kernel

On Fri, 23 Mar 2001, Guest section DW wrote:
> On Thu, Mar 22, 2001 at 10:52:09PM +0000, Alan Cox wrote:
>
> > You can do overcommit avoidance in Linux if you are bored enough to try it.
>
> Would you accept it as the default? Would Linus?

It wouldn't help.  Suppose you run without overcommit and you
fill up RAM and swap to the last page.

Then you change the size of one of the windows on your desktop
and a program gets sent -SIGWINCH. In order to process this
signal, the program needs to allocate some variables on its
stack, possibly needing a new page to be allocated for its
stack ...

... and since this is something which could happen to any program
on the system, the result of non-overcommit would be getting a
random process killed (though not completely random, syslogd and
klogd would get killed more often than the others).

The only solution to not getting processes killed is to run with
enough memory and swap space, having an OOM killer which takes care
to *NOT* let any random innocent process gets killed is nothing but
a bonus, IMHO.

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:37               ` Rik van Riel
@ 2001-03-26 19:04                 ` James Antill
  2001-03-26 20:05                   ` Rik van Riel
  0 siblings, 1 reply; 162+ messages in thread
From: James Antill @ 2001-03-26 19:04 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Guest section DW, Alan Cox, Stephen Clouse, Patrick O'Rourke,
	linux-mm, linux-kernel

  Rik van Riel <riel@conectiva.com.br> writes:

> On Fri, 23 Mar 2001, Guest section DW wrote:
> > On Thu, Mar 22, 2001 at 10:52:09PM +0000, Alan Cox wrote:
> >
> > > You can do overcommit avoidance in Linux if you are bored enough to try it.
> >
> > Would you accept it as the default? Would Linus?
> 
> It wouldn't help.  Suppose you run without overcommit and you
> fill up RAM and swap to the last page.
> 
> Then you change the size of one of the windows on your desktop
> and a program gets sent -SIGWINCH.

 Ignoring the fact that most people don't use a tty based desktop, and
that I'm pretty happy having my desktop die in flames when OOM (my DNS
or smtp server on the other hand...).

>                                    In order to process this
> signal, the program needs to allocate some variables on its
> stack, possibly needing a new page to be allocated for its
> stack ...

man sigaltstack

> ... and since this is something which could happen to any program
> on the system, the result of non-overcommit would be getting a
> random process killed (though not completely random, syslogd and
> klogd would get killed more often than the others).

 I fail to see why, stack usage can be limited (and possibly cleanly
handled by having a prctl() to say make sure X pages are available on
the stack).

 If you want overcommit great, and I think it's a valid default
... but it'd be nice if I could say I don't want it for apps that
aren't written using glib etc.

-- 
# James Antill -- james@and.org
:0:
* ^From: .*james@and\.org
/dev/null

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-26 19:04                 ` James Antill
@ 2001-03-26 20:05                   ` Rik van Riel
  0 siblings, 0 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-26 20:05 UTC (permalink / raw)
  To: James Antill
  Cc: Guest section DW, Alan Cox, Stephen Clouse, Patrick O'Rourke,
	linux-mm, linux-kernel

On 26 Mar 2001, James Antill wrote:

>  If you want overcommit great, and I think it's a valid default
> ... but it'd be nice if I could say I don't want it for apps that
> aren't written using glib etc.

Agreed.  Jonathan Morton seems to be making progress in testing
and debugging the non-overcommit patch from some time ago. If
things turn out to be trivial enough I wouldn't be surprised if
we got to see the option of non-overcommit somewhere in future
2.4 and 2.5 kernels...

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:27             ` Guest section DW
  2001-03-22 23:37               ` Rik van Riel
@ 2001-03-22 23:40               ` Alan Cox
  2001-03-23 20:09                 ` Szabolcs Szakacsits
  1 sibling, 1 reply; 162+ messages in thread
From: Alan Cox @ 2001-03-22 23:40 UTC (permalink / raw)
  To: Guest section DW
  Cc: Alan Cox, Stephen Clouse, Rik van Riel, Patrick O'Rourke,
	linux-mm, linux-kernel

> > Even if malloc fails the situation is no different.
> Why do you say so?

Because you will fail on other things - stack overflow, signal delivery,
eventually it will get you. You just cut the odds down. 

> > You can do overcommit avoidance in Linux if you are bored enough to try it.
> 
> Would you accept it as the default? Would Linus?

I'd like to have it there as an option. As to the default - You would have to
see how much applications assume they can overcommit and rely on it. You might
find you need a few Gbytes of swap just to boot

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:40               ` Alan Cox
@ 2001-03-23 20:09                 ` Szabolcs Szakacsits
  2001-03-23 22:21                   ` Alan Cox
  0 siblings, 1 reply; 162+ messages in thread
From: Szabolcs Szakacsits @ 2001-03-23 20:09 UTC (permalink / raw)
  To: Alan Cox
  Cc: Guest section DW, Stephen Clouse, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel


On Thu, 22 Mar 2001, Alan Cox wrote:

> I'd like to have it there as an option. As to the default - You
> would have to see how much applications assume they can overcommit
> and rely on it. You might find you need a few Gbytes of swap just to
> boot

Seems a bit exaggeration ;) Here are numbers,

	http://lists.openresources.com/NetBSD/tech-userlevel/msg00722.html

6-50% more VM and the performance hit also isn't so bad as it's thought
(Eduardo Horvath sent a non-overcommit patch for Linux about one year
ago).

	Szaka


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 20:09                 ` Szabolcs Szakacsits
@ 2001-03-23 22:21                   ` Alan Cox
  2001-03-23 22:37                     ` Szabolcs Szakacsits
  0 siblings, 1 reply; 162+ messages in thread
From: Alan Cox @ 2001-03-23 22:21 UTC (permalink / raw)
  To: Szabolcs Szakacsits
  Cc: Alan Cox, Guest section DW, Stephen Clouse, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

> > and rely on it. You might find you need a few Gbytes of swap just to
> > boot
> 
> Seems a bit exaggeration ;) Here are numbers,

NetBSD is if I remember rightly still using a.out library styles. 

> 6-50% more VM and the performance hit also isn't so bad as it's thought
> (Eduardo Horvath sent a non-overcommit patch for Linux about one year
> ago).

The Linux performance hit would be so close to zero you shouldnt be able to
measure it - or it was in 1.2 anyway

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 22:21                   ` Alan Cox
@ 2001-03-23 22:37                     ` Szabolcs Szakacsits
  0 siblings, 0 replies; 162+ messages in thread
From: Szabolcs Szakacsits @ 2001-03-23 22:37 UTC (permalink / raw)
  To: Alan Cox
  Cc: Guest section DW, Stephen Clouse, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel


On Fri, 23 Mar 2001, Alan Cox wrote:
> > > and rely on it. You might find you need a few Gbytes of swap just to
> > > boot
> > Seems a bit exaggeration ;) Here are numbers,
> NetBSD is if I remember rightly still using a.out library styles.

No, it uses ELF today, moreover the numbers were from Solaris. NetBSD
also switched from non-overcommit to overcommit-only [AFAIK] mode with
"random" process killing with its new UVM.

> > 6-50% more VM and the performance hit also isn't so bad as it's thought
> > (Eduardo Horvath sent a non-overcommit patch for Linux about one year
> > ago).
> The Linux performance hit would be so close to zero you shouldnt be able to
> measure it - or it was in 1.2 anyway

Yep, something like this :)

	Szaka


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 22:00         ` Guest section DW
  2001-03-22 22:12           ` Ed Tomlinson
  2001-03-22 22:52           ` Alan Cox
@ 2001-03-23 19:57           ` Szabolcs Szakacsits
  2 siblings, 0 replies; 162+ messages in thread
From: Szabolcs Szakacsits @ 2001-03-23 19:57 UTC (permalink / raw)
  To: Guest section DW
  Cc: Alan Cox, Stephen Clouse, Rik van Riel, Patrick O'Rourke,
	linux-mm, linux-kernel

On Thu, 22 Mar 2001, Guest section DW wrote:
> Presently however, a flawless program can be killed.
> That is what makes Linux unreliable.

Your advocation is "save the application, crash the OS!". But you can't
be blamed because everybody's first reaction is this :) But if you start
to think you get the conclusion that process killing can't be avoided if
you want the system keep running. But I agree Linux lacks some important
things [see my other email] that could make the situation easily and
inexpensively controllable.

BTW, your app isn't flawless because it doesn't consider Linux memory
management is [quasi-]overcommit-only at present ;) [or you used other
apps as well, e.g. login, ps, cron is enough to kill your app when it
stopped at OOM time].

	Szaka

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 21:23       ` Alan Cox
  2001-03-22 22:00         ` Guest section DW
@ 2001-03-22 22:10         ` Doug Ledford
  2001-03-22 22:53           ` Alan Cox
  2001-03-22 23:43         ` Stephen Clouse
  2001-03-23 19:26         ` Szabolcs Szakacsits
  3 siblings, 1 reply; 162+ messages in thread
From: Doug Ledford @ 2001-03-22 22:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stephen Clouse, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

Alan Cox wrote:
> 
> > Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
> > logically justify annihilating large-VM processes that have been running for
> > days or weeks instead of just returning ENOMEM to a process that just started
> > up.
> 
> How do you return an out of memory error to a C program that is out of memory
> due to a stack growth fault. There is actually not a language construct for it

Simple, you reclaim a few of those uptodate buffers.  My testing here has
resulting in more of my system daemons getting killed than anything else, and
it never once has solved the actual problem of simple memory pressure from
apps reading/writing to disk and disk cache not releasing buffers quick
enough.

> > It would be nice to give immunity to certain uids, or better yet, just turn the
> > damn thing off entirely.  I've already hacked that in...errr, out.
> 
> Eventually you have to kill something or the machine deadlocks. The oom killing
> doesnt kick in until that point. So its up to you how you like your errors.

I beg to differ.  If you tell me that a machine that looks like this:

[dledford@monster dledford]$ free
             total       used       free     shared    buffers     cached
Mem:       1017800    1014808       2992          0      73644     796392
-/+ buffers/cache:     144772     873028
Swap:            0          0          0
[dledford@monster dledford]$ 

is in need of killing sshd, I'll claim you are smoking some nice stuff ;-)

-- 

 Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
      Please check my web site for aic7xxx updates/answers before
                      e-mailing me about problems

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 22:10         ` Doug Ledford
@ 2001-03-22 22:53           ` Alan Cox
  2001-03-22 23:30             ` Doug Ledford
  0 siblings, 1 reply; 162+ messages in thread
From: Alan Cox @ 2001-03-22 22:53 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

> > How do you return an out of memory error to a C program that is out of memory
> > due to a stack growth fault. There is actually not a language construct for it
> 
> Simple, you reclaim a few of those uptodate buffers.  My testing here has

If you have reclaimable buffers you are not out of memory. If oom is triggered
in that state it is a bug. If you are complaining that the oom killer triggers
at the wrong time then thats a completely unrelated issue.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 22:53           ` Alan Cox
@ 2001-03-22 23:30             ` Doug Ledford
  2001-03-22 23:40               ` Alan Cox
  0 siblings, 1 reply; 162+ messages in thread
From: Doug Ledford @ 2001-03-22 23:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stephen Clouse, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

Alan Cox wrote:
> 
> > > How do you return an out of memory error to a C program that is out of memory
> > > due to a stack growth fault. There is actually not a language construct for it
> >
> > Simple, you reclaim a few of those uptodate buffers.  My testing here has
> 
> If you have reclaimable buffers you are not out of memory. If oom is triggered
> in that state it is a bug. If you are complaining that the oom killer triggers
> at the wrong time then thats a completely unrelated issue.

Ummm, yeah, that would pretty much be the claim.  Real easy to reproduce too. 
Take your favorite machine with lots of RAM, run just a handful of startup
process and system daemons, then log in on a few terminals and do:

while true; do bonnie -s (1/2 ram); done

Pretty soon, system daemons will start to die.

-- 

 Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
      Please check my web site for aic7xxx updates/answers before
                      e-mailing me about problems

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:30             ` Doug Ledford
@ 2001-03-22 23:40               ` Alan Cox
  0 siblings, 0 replies; 162+ messages in thread
From: Alan Cox @ 2001-03-22 23:40 UTC (permalink / raw)
  To: Doug Ledford
  Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

> Ummm, yeah, that would pretty much be the claim.  Real easy to reproduce too. 
> Take your favorite machine with lots of RAM, run just a handful of startup
> process and system daemons, then log in on a few terminals and do:
> 
> while true; do bonnie -s (1/2 ram); done
> 
> Pretty soon, system daemons will start to die.

Then thats a bug. I assume you've provided Rik with a detailed test case 
already ?

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 21:23       ` Alan Cox
  2001-03-22 22:00         ` Guest section DW
  2001-03-22 22:10         ` Doug Ledford
@ 2001-03-22 23:43         ` Stephen Clouse
  2001-03-23 19:26         ` Szabolcs Szakacsits
  3 siblings, 0 replies; 162+ messages in thread
From: Stephen Clouse @ 2001-03-22 23:43 UTC (permalink / raw)
  To: Alan Cox
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

[-- Attachment #1: msg.pgp --]
[-- Type: text/plain, Size: 2188 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, Mar 22, 2001 at 09:23:54PM +0000, Alan Cox wrote:
> How do you return an out of memory error to a C program that is out of memory
> due to a stack growth fault. There is actually not a language construct for it

Hmmm...the old "Error 3 while attempting to report Error 3" dialog from MS
Excel.  

> Eventually you have to kill something or the machine deadlocks. The oom killing
> doesnt kick in until that point. So its up to you how you like your errors.

It's interesting that I never recall oom being a problem (like this) with 2.0 or 
2.2.  And the machines I was working with at the time were far crappier than
these current boxen -- they'd ride the oom line almost constantly.  Back then a
new process would either a) scream "Out of memory!" or b) segfault.  You could
argue that b is not desirable, but I'd prefer that to the current behavior, 
really.  In fact this type of behavior still happens under 2.4 when we hit OOM
on the development boxen, although not consistently (only about half the time);
oom_kill annihilates something we don't want it to, then the mallocing process
that triggered it decides it has become bored with life and procceds to
abort/segfault anyway.  I wish I could reproduce it consistently.

In any case, the behavior of oom_kill (whether you consider it correct or
not) is really the symptom and not the cause.  We've alleviated most of it via
creative use of ulimit.  Still, the seemingly draconian behavior needs a bit
finer-grained control.

> One of the things that we badly need to resurrect for 2.5 is the beancounter
> work which would let you reasonably do things like guaranteed Oracle a certain
> amount of the machine, or restrict all the untrusted users to a total of 200Mb
> hard limit between them etc

Let me know when you branch :)  Sounds like a fun project.

- -- 
Stephen Clouse <stephenc@theiqgroup.com>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-----BEGIN PGP SIGNATURE-----
Version: PGP 6.5.8

iQA/AwUBOrqOLAOGqGs0PadnEQKWFACfaqzjtUQD4uGaLFnxn6M9Xc4N6QIAoJO3
nJTISp0ekbXEUiAY9PJVf2vr
=B3u4
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 21:23       ` Alan Cox
                           ` (2 preceding siblings ...)
  2001-03-22 23:43         ` Stephen Clouse
@ 2001-03-23 19:26         ` Szabolcs Szakacsits
  2001-03-23 20:41           ` Paul Jakma
  3 siblings, 1 reply; 162+ messages in thread
From: Szabolcs Szakacsits @ 2001-03-23 19:26 UTC (permalink / raw)
  To: Alan Cox
  Cc: Stephen Clouse, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

On Thu, 22 Mar 2001, Alan Cox wrote:

> One of the things that we badly need to resurrect for 2.5 is the
> beancounter work which would let you reasonably do things like
> guaranteed Oracle a certain amount of the machine, or restrict all
> the untrusted users to a total of 200Mb hard limit between them etc

This would improve Linux reliability but it could be much better with
added *optional* non-overcommit (most other OS also support this, also
that's the default mostly [please no, "but it deadlocks" because it's
not true, they also kill processes (Solaris, etc)]), reserved superuser
memory (ala Solaris, True64, etc when OOM in non-overcommit, users
complain and superuser acts, not the OS killing their tasks) and
superuser *advisory* OOM killer [there was patch for this before], I
think in the last area Linux is already more ahead than others at
present.

About the "use resource limits!". Yes, this is one solution. The
*expensive* solution (admin time, worse resource utilization, etc).
Others make it cheaper mixing with the above ones.

        Szaka

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 19:26         ` Szabolcs Szakacsits
@ 2001-03-23 20:41           ` Paul Jakma
  2001-03-23 21:58             ` george anzinger
  2001-03-23 22:18             ` Szabolcs Szakacsits
  0 siblings, 2 replies; 162+ messages in thread
From: Paul Jakma @ 2001-03-23 20:41 UTC (permalink / raw)
  To: Szabolcs Szakacsits
  Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote:

> About the "use resource limits!". Yes, this is one solution. The
> *expensive* solution (admin time, worse resource utilization, etc).

traditional user limits have worse resource utilisation? think what
kind of utilisation a guaranteed allocation system would have. instead
of 128MB, you'd need maybe a GB of RAM and many many GB of swap for
most systems.

some hopefully non-ranting points:

- setting up limits on a RH system takes 1 minute by editing
/etc/security/limits.conf.

- Rik's current oom killer may not do a good job now, but it's
impossible for it to do a /perfect/ job without implementing
kernel/esp.c.

- with limits set you will have:
 - /possible/ underutilisation on some workloads.
 - chance of hitting Rik's OOM killer reduced to almost nothing.

no matter how good or bad Rik's killer is, i'd much rather set limits
and just about /never/ have it invoked.

more beancounting will make limits more useful (eg global?) and maybe
dists can start setting up some kind of limits by default at install
time based on the RAM installed and whether user selected
server/workstation/etc.. install.

Then hopefully we can be a little less concerned about how close Rik
gets to the impossible task of implementing esp.c.

>         Szaka

--paulj

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 20:41           ` Paul Jakma
@ 2001-03-23 21:58             ` george anzinger
  2001-03-24  5:55               ` Rik van Riel
  2001-03-23 22:18             ` Szabolcs Szakacsits
  1 sibling, 1 reply; 162+ messages in thread
From: george anzinger @ 2001-03-23 21:58 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Szabolcs Szakacsits, Alan Cox, Stephen Clouse, Guest section DW,
	Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

What happens if you just make swap VERY large?  Does the system thrash
it self to a virtual standstill?  Is this a possible answer?  Supposedly
you could then sneak in and blow away the bad guys manually ...

George

Paul Jakma wrote:
> 
> On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote:
> 
> > About the "use resource limits!". Yes, this is one solution. The
> > *expensive* solution (admin time, worse resource utilization, etc).
> 
> traditional user limits have worse resource utilisation? think what
> kind of utilisation a guaranteed allocation system would have. instead
> of 128MB, you'd need maybe a GB of RAM and many many GB of swap for
> most systems.
> 
> some hopefully non-ranting points:
> 
> - setting up limits on a RH system takes 1 minute by editing
> /etc/security/limits.conf.
> 
> - Rik's current oom killer may not do a good job now, but it's
> impossible for it to do a /perfect/ job without implementing
> kernel/esp.c.
> 
> - with limits set you will have:
>  - /possible/ underutilisation on some workloads.
>  - chance of hitting Rik's OOM killer reduced to almost nothing.
> 
> no matter how good or bad Rik's killer is, i'd much rather set limits
> and just about /never/ have it invoked.
> 
> more beancounting will make limits more useful (eg global?) and maybe
> dists can start setting up some kind of limits by default at install
> time based on the RAM installed and whether user selected
> server/workstation/etc.. install.
> 
> Then hopefully we can be a little less concerned about how close Rik
> gets to the impossible task of implementing esp.c.
> 
> >         Szaka
> 
> --paulj
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 21:58             ` george anzinger
@ 2001-03-24  5:55               ` Rik van Riel
  0 siblings, 0 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-24  5:55 UTC (permalink / raw)
  To: george anzinger
  Cc: Paul Jakma, Szabolcs Szakacsits, Alan Cox, Stephen Clouse,
	Guest section DW, Patrick O'Rourke, linux-mm, linux-kernel

On Fri, 23 Mar 2001, george anzinger wrote:

> What happens if you just make swap VERY large?  Does the system thrash
> it self to a virtual standstill?

It does.  I need to implement load control code (so we suspend
processes in turn to keep the load low enough so we can avoid
thrashing).

> Is this a possible answer?  Supposedly you could then sneak in and
> blow away the bad guys manually ...

This certainly works.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 20:41           ` Paul Jakma
  2001-03-23 21:58             ` george anzinger
@ 2001-03-23 22:18             ` Szabolcs Szakacsits
  2001-03-24  2:08               ` Paul Jakma
  1 sibling, 1 reply; 162+ messages in thread
From: Szabolcs Szakacsits @ 2001-03-23 22:18 UTC (permalink / raw)
  To: Paul Jakma
  Cc: Alan Cox, Stephen Clouse, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

On Fri, 23 Mar 2001, Paul Jakma wrote:
> On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote:
> > About the "use resource limits!". Yes, this is one solution. The
> > *expensive* solution (admin time, worse resource utilization, etc).

Thanks for cutting out relevant parts that said how to increase user
base and satisfaction keeping and using the existent possibility as
well.

> traditional user limits have worse resource utilisation? think what
> kind of utilisation a guaranteed allocation system would have. instead
> of 128MB, you'd need maybe a GB of RAM and many many GB of swap for
> most systems.

Nonsense hodgepodge. See and/or mesaure the impact. I sent numbers in my
former email. You also missed non-overcommit must be _optional_ [i.e.
you wouldn't be forced to use it ;)]. Yes, there are users and
enterprises who require it and would happily pay the 50-100% extra swap
space for the same workload and extra reliability.

> - setting up limits on a RH system takes 1 minute by editing
> /etc/security/limits.conf.

At every time you add/delete users, add/delete special apps, etc.
Please note again, some people wants this way, some only for sometimes,
and others really don't care because system guarantees for the admins
they will always have the resources to take action [unfortunately this
is not Linux].

> - Rik's current oom killer may not do a good job now, but it's
> impossible for it to do a /perfect/ job without implementing
> kernel/esp.c.

Rik's killer is quite fine at _default_. But there will be always people
who won't like it [the bastards think humans can still make better
decisions than machines]. Wouldn't it be win for both sides if you could
point out, "Hey, if you don't like the default, use the
/proc/sys/vm/oom_killer interface"? As I said before there are also
such patch by Chris Swiedler and definitely not a huge, complex one.
And these stupid threads could be forgotten for good and all.

> - with limits set you will have:
>  - /possible/ underutilisation on some workloads.

Depends, guaranteed underutilisation or guaranteed extra unreliability
fit the picture many times as well.

> no matter how good or bad Rik's killer is, i'd much rather set limits
> and just about /never/ have it invoked.

Thanks for expressing your opinion but others [not necessarily me] have
"occasionally" other one depending on the job what the box must do.

	Szaka

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 22:18             ` Szabolcs Szakacsits
@ 2001-03-24  2:08               ` Paul Jakma
  0 siblings, 0 replies; 162+ messages in thread
From: Paul Jakma @ 2001-03-24  2:08 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Paul Jakma, linux-mm, Linux Kernel

On Sat, 24 Mar 2001, Szabolcs Szakacsits wrote:

> Nonsense hodgepodge. See and/or mesaure the impact. I sent numbers in my
> former email. You also missed non-overcommit must be _optional_ [i.e.
> you wouldn't be forced to use it ;)]. Yes, there are users and
> enterprises who require it and would happily pay the 50-100% extra swap
> space for the same workload and extra reliability.

ok.. the last time OOM came up, the main objection to fully
guaranteed vm was the possible huge overhead.

if someone knows how to do it without a huge overhead, i'd love to
see it and try it out.

> At every time you add/delete users, add/delete special apps, etc.

no.. pam_limits knows about groups, and you can specify limit for
that group, one time.

@user ... ... ...

> Rik's killer is quite fine at _default_. But there will be always
> people who won't like it

exactly... so lets try avoid ever needing it. it is a last resort.

> default, use the /proc/sys/vm/oom_killer interface"? As I said
> before there are also such patch by Chris Swiedler and definitely
> not a huge, complex one.

uhmm.. where?

> And these stupid threads could be forgotten for good and all.

:)

> 	Szaka

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org
PGP5 key: http://www.clubi.ie/jakma/publickey.txt
-------------------------------------------
Fortune:
The optimum committee has no members.
		-- Norman Augustine


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 20:28     ` Stephen Clouse
  2001-03-22 21:01       ` Ingo Oeser
  2001-03-22 21:23       ` Alan Cox
@ 2001-03-23  1:31       ` Michael Peddemors
  2001-03-23  7:04         ` Rik van Riel
  2001-03-27 15:05       ` Anthony de Boer - USEnet
  2002-03-23  0:33       ` Martin Dalecki
  4 siblings, 1 reply; 162+ messages in thread
From: Michael Peddemors @ 2001-03-23  1:31 UTC (permalink / raw)
  To: Stephen Clouse
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

Here, Here.. killing qmail on a server who's sole task is running mail doesn't seem to make much sense either..

> > Clearly, Linux cannot be reliable if any process can be killed

> > at any moment. I am not happy at all with my recent experiences.
> 
> Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
> logically justify annihilating large-VM processes that have been running for 
> days or weeks instead of just returning ENOMEM to a process that just started 
> up.
> 
> We run Oracle on a development box here, and it's always the first to get the
> axe (non-root process using 70-80 MB VM).  Whenever someone's testing decides to 
> run away with memory, I usually spend the rest of the day getting intimate with
> the backup files, since SIGKILLing random Oracle processes, as you might have
> guessed, has a tendency to rape the entire database.

-- 
"Catch the Magic of Linux..."
--------------------------------------------------------
Michael Peddemors - Senior Consultant
LinuxAdministration - Internet Services
NetworkServices - Programming - Security
WizardInternet Services http://www.wizard.ca
Linux Support Specialist - http://www.linuxmagic.com
--------------------------------------------------------
(604)589-0037 Beautiful British Columbia, Canada


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  1:31       ` Michael Peddemors
@ 2001-03-23  7:04         ` Rik van Riel
  2001-03-23 11:28           ` Guest section DW
  0 siblings, 1 reply; 162+ messages in thread
From: Rik van Riel @ 2001-03-23  7:04 UTC (permalink / raw)
  To: Michael Peddemors
  Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm,
	linux-kernel

On 22 Mar 2001, Michael Peddemors wrote:

> Here, Here.. killing qmail on a server who's sole task is running mail
> doesn't seem to make much sense either..

I won't defend the current OOM killing code.

Instead, I'm asking everybody who's unhappy with the
current code to come up with something better.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  7:04         ` Rik van Riel
@ 2001-03-23 11:28           ` Guest section DW
  2001-03-23 14:50             ` Eric W. Biederman
  2001-03-23 21:11             ` José Luis Domingo López
  0 siblings, 2 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-23 11:28 UTC (permalink / raw)
  To: Rik van Riel, Michael Peddemors
  Cc: Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel

On Fri, Mar 23, 2001 at 04:04:09AM -0300, Rik van Riel wrote:
> On 22 Mar 2001, Michael Peddemors wrote:
> 
> > Here, Here.. killing qmail on a server who's sole task is running mail
> > doesn't seem to make much sense either..
> 
> I won't defend the current OOM killing code.
> 
> Instead, I'm asking everybody who's unhappy with the
> current code to come up with something better.

To a murderer: "Why did you kill that old lady?"
Reply: "I won't defend that deed, but who else should I have killed?"

Andries - getting more and more unhappy with OOM

Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs).
Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs).
Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs).
Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm).

[yes, that was rpm growing too large, taking a few emacs sessions]
[2.4.2]

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 11:28           ` Guest section DW
@ 2001-03-23 14:50             ` Eric W. Biederman
  2001-03-23 17:21               ` Guest section DW
  2001-03-23 21:11             ` José Luis Domingo López
  1 sibling, 1 reply; 162+ messages in thread
From: Eric W. Biederman @ 2001-03-23 14:50 UTC (permalink / raw)
  To: Guest section DW
  Cc: Rik van Riel, Michael Peddemors, Stephen Clouse,
	Patrick O'Rourke, linux-mm, linux-kernel

Guest section DW <dwguest@win.tue.nl> writes:

> On Fri, Mar 23, 2001 at 04:04:09AM -0300, Rik van Riel wrote:
> > On 22 Mar 2001, Michael Peddemors wrote:
> > 
> > > Here, Here.. killing qmail on a server who's sole task is running mail
> > > doesn't seem to make much sense either..
> > 
> > I won't defend the current OOM killing code.
> > 
> > Instead, I'm asking everybody who's unhappy with the
> > current code to come up with something better.
> 
> To a murderer: "Why did you kill that old lady?"
> Reply: "I won't defend that deed, but who else should I have killed?"

> 
> Andries - getting more and more unhappy with OOM
> 
> Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs).
> Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs).
> Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs).
> Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm).
> 
> [yes, that was rpm growing too large, taking a few emacs sessions]
> [2.4.2]

Let me get this straight you don't have enough swap for your workload?
And you don't have per process limits on root by default?

So you are complaining about the OOM killer?  

Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 14:50             ` Eric W. Biederman
@ 2001-03-23 17:21               ` Guest section DW
  2001-03-23 20:18                 ` Paul Jakma
  2001-03-23 23:48                 ` Eric W. Biederman
  0 siblings, 2 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-23 17:21 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Rik van Riel, Michael Peddemors, Stephen Clouse,
	Patrick O'Rourke, linux-mm, linux-kernel

On Fri, Mar 23, 2001 at 07:50:25AM -0700, Eric W. Biederman wrote:

> > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs).
> > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs).
> > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs).
> > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm).
> > 
> > [yes, that was rpm growing too large, taking a few emacs sessions]
> > [2.4.2]
> 
> Let me get this straight you don't have enough swap for your workload?
> And you don't have per process limits on root by default?
> 
> So you are complaining about the OOM killer?  

I should not react - your questions are phrased rhetorically.

But yes, I am complaining because Linux by default is unreliable.
I strongly prefer a system that is reliable by default,
and I'll leave it to others to run it in an unreliable mode.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:21               ` Guest section DW
@ 2001-03-23 20:18                 ` Paul Jakma
  2001-03-24 20:19                   ` Jesse Pollard
  2001-03-23 23:48                 ` Eric W. Biederman
  1 sibling, 1 reply; 162+ messages in thread
From: Paul Jakma @ 2001-03-23 20:18 UTC (permalink / raw)
  To: Guest section DW
  Cc: Eric W. Biederman, Rik van Riel, Michael Peddemors,
	Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel

On Fri, 23 Mar 2001, Guest section DW wrote:

> But yes, I am complaining because Linux by default is unreliable.

no, your distribution is unreliable by default.

> I strongly prefer a system that is reliable by default,
> and I'll leave it to others to run it in an unreliable mode.

currently, setting sensible user limits on my machines means i never
get a hosed machine due to OOM. These limits are easy to set via
pam_limits. (not perfect though, i think its session specific..)

granted, if the machine hasn't been setup with user limits, then linux
doesn't deal at all well with OOM, so this should be fixed. but it can
easily be argued that admin error in not configuring limits is the
main cause for OOM.

> Andries

regards,

--paulj

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 20:18                 ` Paul Jakma
@ 2001-03-24 20:19                   ` Jesse Pollard
  0 siblings, 0 replies; 162+ messages in thread
From: Jesse Pollard @ 2001-03-24 20:19 UTC (permalink / raw)
  To: Paul Jakma, Guest section DW
  Cc: Eric W. Biederman, Rik van Riel, Michael Peddemors,
	Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel

On Fri, 23 Mar 2001, Paul Jakma wrote:
>On Fri, 23 Mar 2001, Guest section DW wrote:
>
>> But yes, I am complaining because Linux by default is unreliable.
>
>no, your distribution is unreliable by default.
>
>> I strongly prefer a system that is reliable by default,
>> and I'll leave it to others to run it in an unreliable mode.
>
>currently, setting sensible user limits on my machines means i never
>get a hosed machine due to OOM. These limits are easy to set via
>pam_limits. (not perfect though, i think its session specific..)

Process specific. Each forked process gets the same limits. You get OOM
as soon as all processes together use more than the system capacity.

>granted, if the machine hasn't been setup with user limits, then linux
>doesn't deal at all well with OOM, so this should be fixed. but it can
>easily be argued that admin error in not configuring limits is the
>main cause for OOM.

Admin has no real control is the problem. Limits are only good for one
process. As soon as that process forks one other process then the
useage limit is twice the limit established.

>> Andries
>
>regards,
>
>--paulj

-- 
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: jesse@cats-chateau.net

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:21               ` Guest section DW
  2001-03-23 20:18                 ` Paul Jakma
@ 2001-03-23 23:48                 ` Eric W. Biederman
  1 sibling, 0 replies; 162+ messages in thread
From: Eric W. Biederman @ 2001-03-23 23:48 UTC (permalink / raw)
  To: Guest section DW
  Cc: Rik van Riel, Michael Peddemors, Stephen Clouse,
	Patrick O'Rourke, linux-mm, linux-kernel

Guest section DW <dwguest@win.tue.nl> writes:

> On Fri, Mar 23, 2001 at 07:50:25AM -0700, Eric W. Biederman wrote:
> 
> > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs).
> > > Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs).
> > > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs).
> > > Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm).
> > > 
> > > [yes, that was rpm growing too large, taking a few emacs sessions]
> > > [2.4.2]
> > 
> > Let me get this straight you don't have enough swap for your workload?
> > And you don't have per process limits on root by default?
> > 
> > So you are complaining about the OOM killer?  
> 
> I should not react - your questions are phrased rhetorically.

To some extent I was also very puzzled by your complaint.

You have setup a system that by your definition unreliably and then
you complain it is unreliable.

> 
> But yes, I am complaining because Linux by default is unreliable.
> I strongly prefer a system that is reliable by default,
> and I'll leave it to others to run it in an unreliable mode.

Now all I know the system didn't have enough resources to do what
you asked to it do and it failed.  That sounds reliable to me.  

Obviously you were suprised at how the system failed.  Given
that unix has been doing this kind of thing for decades, you obviously
missed how the unix malloc overcommited memory.

Does you application trap sigsegv on a different stack so you can
catch stack growth failure?  And how does your app handle this case?

Having a no over commit kernel option would help.  

A cheap workaround is to call mlock_all(MCL_FUTRE...).  Then you are
garantteed you will always have ram locked into memory for your
program.   This assumes you have enough ram for your program.

Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 11:28           ` Guest section DW
  2001-03-23 14:50             ` Eric W. Biederman
@ 2001-03-23 21:11             ` José Luis Domingo López
  1 sibling, 0 replies; 162+ messages in thread
From: José Luis Domingo López @ 2001-03-23 21:11 UTC (permalink / raw)
  To: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1360 bytes --]

On Friday, 23 March 2001, at 12:28:15 +0100,
Guest section DW wrote:

> [...]
> To a murderer: "Why did you kill that old lady?"
> Reply: "I won't defend that deed, but who else should I have killed?"
> 
No comments.

> Andries - getting more and more unhappy with OOM
> 
> Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 2019 (emacs).
> Mar 23 11:48:49 mette kernel: Out of Memory: Killed process 1407 (emacs).
> Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 1495 (emacs).
> Mar 23 11:48:50 mette kernel: Out of Memory: Killed process 2800 (rpm).
> 
> [yes, that was rpm growing too large, taking a few emacs sessions]
> [2.4.2]
>
OOM clearly didn't work perfectly in this case, but it worked and left
your machine usable (maybe you lost data on your emacs sessions). From my
(OS design newbie) point of view, there must be quite difficult to keep
track of all system processes, and even a resource intensive task.

If you can do it better, come up with a kernel patch, submit it, and get
credit and fame for it. I would love to see Linux as the perfect OS for
everyone, but won't ever complain about each other's work, mainly when I'm
unable to contribute a thing.

-- 
José Luis Domingo López
Linux Registered User #189436     Debian GNU/Linux Potato (P166 64 MB RAM)
 
jdomingo AT internautas DOT   org  => Spam at your own risk


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 20:28     ` Stephen Clouse
                         ` (2 preceding siblings ...)
  2001-03-23  1:31       ` Michael Peddemors
@ 2001-03-27 15:05       ` Anthony de Boer - USEnet
  2002-03-23  0:33       ` Martin Dalecki
  4 siblings, 0 replies; 162+ messages in thread
From: Anthony de Boer - USEnet @ 2001-03-27 15:05 UTC (permalink / raw)
  To: linux-kernel

Stephen Clouse wrote:
> We run Oracle on a development box here, and it's always the first to get the
> axe (non-root process using 70-80 MB VM).  ...
> It would be nice to give immunity to certain uids, ...

It would seem to me that the new capabilities stuff _could_ be the answer.

Basically, all "am I root?" checks in the kernel should be becoming cap
flags, the OOM killer already avoids killing root processes, it's already
a tenet that yes you can hose your system doing insane things as root but
that nonroot users should NOT be able to hose a system, so being able to
eg. grant this capability to Oracle or ungrant it from sendmail could let
a sysadmin tell the kernel what must be preserved regardless of its UID.

As a baseline I'd want to see all user processes die before any UID 0
stuff, but being able to retune this would be extremely good.

-- 
Anthony de Boer -- as seen at http://www.leftmind.net/~adb/ -- BOFH, eh?
 / "Just when you think you've got a handle on herding cats, \
 \ along comes a three-legged cat on amphetamines."  -- Skud /

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 20:28     ` Stephen Clouse
                         ` (3 preceding siblings ...)
  2001-03-27 15:05       ` Anthony de Boer - USEnet
@ 2002-03-23  0:33       ` Martin Dalecki
  2001-03-22 23:53         ` Rik van Riel
  2001-03-23  0:20         ` Stephen Clouse
  4 siblings, 2 replies; 162+ messages in thread
From: Martin Dalecki @ 2002-03-23  0:33 UTC (permalink / raw)
  To: Stephen Clouse
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

Stephen Clouse wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Thu, Mar 22, 2001 at 12:47:27PM +0100, Guest section DW wrote:
> > Last week I installed SuSE 7.1 somewhere.
> > During the install: "VM: killing process rpm",
> > leaving the installer rather confused.
> > (An empty machine, 256MB, 144MB swap, I think 2.2.18.)
> >
> > Last month I had a computer algebra process running for a week.
> > Killed. But this computation was the only task this machine had.
> > Its sole reason of existence.
> > Too bad - zero information out of a week's computation.
> > (I think 2.4.0.)
> >
> > Clearly, Linux cannot be reliable if any process can be killed
> > at any moment. I am not happy at all with my recent experiences.
> 
> Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
> logically justify annihilating large-VM processes that have been running for
> days or weeks instead of just returning ENOMEM to a process that just started
> up.
> 
> We run Oracle on a development box here, and it's always the first to get the
> axe (non-root process using 70-80 MB VM).  Whenever someone's testing decides to
> run away with memory, I usually spend the rest of the day getting intimate with
> the backup files, since SIGKILLing random Oracle processes, as you might have
> guessed, has a tendency to rape the entire database.
> 
> It would be nice to give immunity to certain uids, or better yet, just turn the
> damn thing off entirely.  I've already hacked that in...errr, out.

AMEN! TO THIS!
Uptime of a process is a much better mesaure for a killing candidate
then it's size.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2002-03-23  0:33       ` Martin Dalecki
@ 2001-03-22 23:53         ` Rik van Riel
  2002-03-23  1:21           ` Martin Dalecki
  2001-03-23  0:20         ` Stephen Clouse
  1 sibling, 1 reply; 162+ messages in thread
From: Rik van Riel @ 2001-03-22 23:53 UTC (permalink / raw)
  To: Martin Dalecki
  Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm,
	linux-kernel

On Sat, 23 Mar 2002, Martin Dalecki wrote:

> Uptime of a process is a much better mesaure for a killing
> candidate then it's size.

You'll have fun with your root shell, then  ;)

The current OOM code takes things like uptime, used cpu, size
and a bunch of other things into account.

If it turns out that the code is not attaching a proper weight
to some of these factors, you should be sending patches, not
flames.

(the code is full of comments, so it should be easy enough to
find your way around the code and tweak it until it does the
right thing in a number of test cases)

regards,

Rik
--
Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml

Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:53         ` Rik van Riel
@ 2002-03-23  1:21           ` Martin Dalecki
  0 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2002-03-23  1:21 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm,
	linux-kernel

Rik van Riel wrote:
> 
> On Sat, 23 Mar 2002, Martin Dalecki wrote:
> 
> > Uptime of a process is a much better mesaure for a killing
> > candidate then it's size.
> 
> You'll have fun with your root shell, then  ;)

You mean the remote one? 

> The current OOM code takes things like uptime, used cpu, size
> and a bunch of other things into account.
> 
> If it turns out that the code is not attaching a proper weight
> to some of these factors, you should be sending patches, not
> flames.

Did I say anything insulting? I have just stated what I think
is more important... BTW> it's not quite obvious that
You have to look into oom_kill to find it in the kernel
source where to look at. (Yes I did just find /usr/src/linux -name
"oom*"
becouse I happen to remember but!

OK i will just place - in front of the description lines where I think
that you where mislead:

 * Good in this context means that:
 * 1) we lose the minimum amount of work done
-* 2) we recover a large amount of memory
 * 3) we don't kill anything innocent of eating tons of memory
-* 4) we want to kill the minimum amount of processes (one)
 * 5) we try to kill the process the user expects us to kill, this
 *    algorithm has been meticulously tuned to meet the priniciple
 *    of least surprise ... (be careful when you change it)

The following is a wrong assumtion. You usually nice processes to
the background just to guarantee for example smoot interaction just
in case you won't login in in some time to the machine.

For example let's have an dedicated http server, which does a lot of
embedded perl.
It's quite clever to renice it back, just in case this
remote machine get's overloaded, becouse otherwise your chances
to get a login in case the machine starts to trash,
would be much worser. But this doesn't mean that the
process isn't more important - becouse you do it to make the
machine crowl through high load peaks and still let you in in
case you have something urgent to do on it.

        /*
         * Niced processes are most likely less important, so double
         * their badness points.
         */
        if (p->nice > 0)
                points *= 2;

BTW> Why the hell you don't just use a polynomial approximation for
int_sqrt - the range of values is very closed an you are
working in a finite ring anyway - you could very easly find
a simple approximation which wouldn't need any looping.

This should be reversted:

        points /= int_sqrt(cpu_time);
        points /= int_sqrt(int_sqrt(run_time));
    points = p->mm->total_vm;

        /*
         * CPU time is in seconds and run time is in minutes. There is
no
         * particular reason for this other than that it turned out to
work
         * very well in practice. This is not safe against jiffie wraps
         * but we don't care _that_ much...
         */
        cpu_time = (p->times.tms_utime + p->times.tms_stime) >>
(SHIFT_HZ + 3);
        run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);

        points /= int_sqrt(cpu_time);
        points /= int_sqrt(int_sqrt(run_time));

==============================================================

NOW I SEE THE MOST IMPORTANT MISTAKE:

There should be a de-normalization of the units

CPU_time/total_uptime
RUN_time/total_uptime
mem/total_mem.

Otherwise you can't map the intended logics sufficiently safe
on to the calculation you do. You compare bits with seconds - which is
WRONG.

Let:
 m := memmory used by the process 
 M := the total memmory in the system.
 c := cpu time used by the process
 u := uptime of the process.
 U := uptime of the system

Then you calculate points
as 

(m / sqrt(c)) / sqrt(sqrt(r))

Which is just very wired function with a non homogen behaviour.
(Just take the first derivative of it in any dimension to see what I
mean)

You should calculate to represent you intended logics:

 x * (m / M) + y * (U / c) + z * (U / u),

where x y z are constants representing the wighting heuristic
importance one gives to those particular measure points.

A simple *normalized* polynom the only thing people and computers can
realy deal with.

> (the code is full of comments, so it should be easy enough to
> find your way around the code and tweak it until it does the
> right thing in a number of test cases)
> 
> regards,
> 
> Rik
> --
> Linux MM bugzilla: http://linux-mm.org/bugzilla.shtml
> 
> Virtual memory is like a game you can't win;
> However, without VM there's truly nothing to lose...
> 
>                 http://www.surriel.com/
> http://www.conectiva.com/       http://distro.conectiva.com/

-- 
- phone: +49 214 8656 283
- job:   eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort:
ru_RU.KOI8-R

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2002-03-23  0:33       ` Martin Dalecki
  2001-03-22 23:53         ` Rik van Riel
@ 2001-03-23  0:20         ` Stephen Clouse
  2002-03-23  1:30           ` Martin Dalecki
  1 sibling, 1 reply; 162+ messages in thread
From: Stephen Clouse @ 2001-03-23  0:20 UTC (permalink / raw)
  To: Martin Dalecki
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

[-- Attachment #1: msg.pgp --]
[-- Type: text/plain, Size: 1752 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Mar 23, 2002 at 01:33:50AM +0100, Martin Dalecki wrote:
> AMEN! TO THIS!
> Uptime of a process is a much better mesaure for a killing candidate
> then it's size.

Thing is, if you take a good study of mm/oom_kill.c, it *does* take start time
into account, as well as CPU time.  The problem is that a process (like Oracle,
in our case) using ludicrous amounts of memory can still rank at the top of the 
list, even with the time-based reduction factors, because total VM is the
starting number in the equation for determining what to kill.  Oracle or what
not sitting at 80 MB for a day or two will still find a way to outrank the
newly-started 1 MB shell process whose malloc triggered oom_kill in the first
place.

If anything, time really needs to be a hard criterion for sorting the final list
on and not merely a variable in the equation and thus tied to vmsize.

This is why the production database boxen aren't running 2.4 yet.  I can control
Oracle's usage very finely (since it uses a fixed memory pool preallocated at
startup), but if something else decides to fire up on there (like the nightly
backup and maintenance routine) and decides it needs just a pinch more memory
than what's available -- ick.  2.2.x doesn't appear to enforce new memory 
allocation with a sniper rifle -- the new process just suffers a pleasant ("Out
of memory!") or violent (SIGSEGV) death.

- -- 
Stephen Clouse <stephenc@theiqgroup.com>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-----BEGIN PGP SIGNATURE-----
Version: PGP 6.5.8

iQA/AwUBOrqW3wOGqGs0PadnEQLZUwCfWTr8HwAChQamWWvWWzZcX5DZ8PAAnROB
Ja25OAQu3W1h7Ck0SU/TfKj8
=VlQt
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  0:20         ` Stephen Clouse
@ 2002-03-23  1:30           ` Martin Dalecki
  2001-03-23  1:37             ` Rik van Riel
  0 siblings, 1 reply; 162+ messages in thread
From: Martin Dalecki @ 2002-03-23  1:30 UTC (permalink / raw)
  To: Stephen Clouse
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

Stephen Clouse wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On Sat, Mar 23, 2002 at 01:33:50AM +0100, Martin Dalecki wrote:
> > AMEN! TO THIS!
> > Uptime of a process is a much better mesaure for a killing candidate
> > then it's size.
> 
> Thing is, if you take a good study of mm/oom_kill.c, it *does* take start time

I did thing is Rik did use a non normalized formula in oom_kill for the
calculation of the kill penalty a process get's. This is the main
reason for the non controllable behaviour of it.

> into account, as well as CPU time.  The problem is that a process (like Oracle,
> in our case) using ludicrous amounts of memory can still rank at the top of the
> list, even with the time-based reduction factors, because total VM is the
> starting number in the equation for determining what to kill.  Oracle or what
> not sitting at 80 MB for a day or two will still find a way to outrank the
> newly-started 1 MB shell process whose malloc triggered oom_kill in the first
> place.

This is due to the broken calculation formula in oom_kill().

> 
> If anything, time really needs to be a hard criterion for sorting the final list
> on and not merely a variable in the equation and thus tied to vmsize.
> 
> This is why the production database boxen aren't running 2.4 yet.  I can control
> Oracle's usage very finely (since it uses a fixed memory pool preallocated at
> startup), but if something else decides to fire up on there (like the nightly
> backup and maintenance routine) and decides it needs just a pinch more memory
> than what's available -- ick.  2.2.x doesn't appear to enforce new memory
> allocation with a sniper rifle -- the new process just suffers a pleasant ("Out
> of memory!") or violent (SIGSEGV) death.

And you should never ever overcommit memmory to oracle! Don't make the
buffers bigger then half the memmory in the system really. There ARE
circumstances where oracle is using all available memmory in very random
manner.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2002-03-23  1:30           ` Martin Dalecki
@ 2001-03-23  1:37             ` Rik van Riel
  2001-03-23 10:48               ` Martin Dalecki
  0 siblings, 1 reply; 162+ messages in thread
From: Rik van Riel @ 2001-03-23  1:37 UTC (permalink / raw)
  To: Martin Dalecki
  Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm,
	linux-kernel

On Sat, 23 Mar 2002, Martin Dalecki wrote:

> This is due to the broken calculation formula in oom_kill().

Feel free to write better-working code.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  1:37             ` Rik van Riel
@ 2001-03-23 10:48               ` Martin Dalecki
  2001-03-23 14:56                 ` Rik van Riel
  0 siblings, 1 reply; 162+ messages in thread
From: Martin Dalecki @ 2001-03-23 10:48 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm,
	linux-kernel

Rik van Riel wrote:
> 
> On Sat, 23 Mar 2002, Martin Dalecki wrote:
> 
> > This is due to the broken calculation formula in oom_kill().
> 
> Feel free to write better-working code.

I don't get paid for it and I'm not idling through my days...

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 10:48               ` Martin Dalecki
@ 2001-03-23 14:56                 ` Rik van Riel
  2001-03-23 16:43                   ` Guest section DW
  2001-03-23 20:20                   ` Tom Diehl
  0 siblings, 2 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-23 14:56 UTC (permalink / raw)
  To: Martin Dalecki
  Cc: Stephen Clouse, Guest section DW, Patrick O'Rourke, linux-mm,
	linux-kernel

On Fri, 23 Mar 2001, Martin Dalecki wrote:
> Rik van Riel wrote:
> > On Sat, 23 Mar 2002, Martin Dalecki wrote:
> > 
> > > This is due to the broken calculation formula in oom_kill().
> > 
> > Feel free to write better-working code.
> 
> I don't get paid for it and I'm not idling through my days...

  <similar response from Andries>

Well, in that case you'll have to live with the current OOM
killer.  Martin wrote down a pretty detailed description of
what's wrong with my algorithm, if it really bothers him he
should be able to come up with something better.

Personally, I think there is more important VM code to look
after, since OOM is a pretty rare occurrance anyway.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 14:56                 ` Rik van Riel
@ 2001-03-23 16:43                   ` Guest section DW
  2001-03-24  5:57                     ` Rik van Riel
  2001-03-23 20:20                   ` Tom Diehl
  1 sibling, 1 reply; 162+ messages in thread
From: Guest section DW @ 2001-03-23 16:43 UTC (permalink / raw)
  To: Rik van Riel, Martin Dalecki
  Cc: Stephen Clouse, Patrick O'Rourke, linux-mm, linux-kernel

On Fri, Mar 23, 2001 at 11:56:23AM -0300, Rik van Riel wrote:
> On Fri, 23 Mar 2001, Martin Dalecki wrote:

> > > Feel free to write better-working code.
> > 
> > I don't get paid for it and I'm not idling through my days...
> 
>   <similar response from Andries>

No lies please.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 16:43                   ` Guest section DW
@ 2001-03-24  5:57                     ` Rik van Riel
  2001-03-25 16:35                       ` Guest section DW
  0 siblings, 1 reply; 162+ messages in thread
From: Rik van Riel @ 2001-03-24  5:57 UTC (permalink / raw)
  To: Guest section DW
  Cc: Martin Dalecki, Stephen Clouse, Patrick O'Rourke, linux-mm,
	linux-kernel

On Fri, 23 Mar 2001, Guest section DW wrote:
> On Fri, Mar 23, 2001 at 11:56:23AM -0300, Rik van Riel wrote:
> > On Fri, 23 Mar 2001, Martin Dalecki wrote:
> 
> > > > Feel free to write better-working code.
> > > 
> > > I don't get paid for it and I'm not idling through my days...
> > 
> >   <similar response from Andries>
> 
> No lies please.

You mean that you ARE willing to implement what you've been
arguing for?

Cool, I can't wait to see your patch.

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  5:57                     ` Rik van Riel
@ 2001-03-25 16:35                       ` Guest section DW
  0 siblings, 0 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-25 16:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Martin Dalecki, Stephen Clouse, Patrick O'Rourke, linux-mm,
	linux-kernel

On Sat, Mar 24, 2001 at 02:57:27AM -0300, Rik van Riel wrote:
> On Fri, 23 Mar 2001, Guest section DW wrote:
> > On Fri, Mar 23, 2001 at 11:56:23AM -0300, Rik van Riel wrote:
> > > On Fri, 23 Mar 2001, Martin Dalecki wrote:
> > 
> > > > > Feel free to write better-working code.
> > > > 
> > > > I don't get paid for it and I'm not idling through my days...
> > > 
> > >   <similar response from Andries>
> > 
> > No lies please.
> 
> You mean that you ARE willing to implement what you've been
> arguing for?

There had not been any such response by me -
thus you should not ascribe to me such a response.

Concerning overcommit: people tell me that Eduardo Horvath
in his patch submitted to l-k on 2000-03-31 already solved
the problem (entirely or to a large extent).

: This patch will prevent the linux kernel from allowing VM overcommit.

I have not yet read the code.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 14:56                 ` Rik van Riel
  2001-03-23 16:43                   ` Guest section DW
@ 2001-03-23 20:20                   ` Tom Diehl
  2001-03-23 23:56                     ` Tim Wright
  1 sibling, 1 reply; 162+ messages in thread
From: Tom Diehl @ 2001-03-23 20:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel

On Fri, 23 Mar 2001, Rik van Riel wrote:

> Well, in that case you'll have to live with the current OOM
> killer.  Martin wrote down a pretty detailed description of
> what's wrong with my algorithm, if it really bothers him he
> should be able to come up with something better.
>
> Personally, I think there is more important VM code to look
> after, since OOM is a pretty rare occurrance anyway.

Well actually it is not that rare at least for me. Every 3 or 4 days I run
into it (It happened again this morning). The machine has 128 Megs of ram
and 256 Megs of swap. It is my desktop machine and I keep 3 or 4 netscape
windows running all of the time. Well I try to at least. Every 3 or 4 days
the OOM Killer kills netscape, it happened this morning. If I could fix it
I would but alas I do not have the knowledge. The best I can do is test. :(

This is NOT a complaint I just bring this up as another data point.
It used to lock the machine so things are getting better. fwiw, I am
currently running 2.4.2-ac18. The old ac kernels (do not remember exactly
which ones but it was single digits) would allow the machine to start
thrashing. I could usually see that it was running out of memory and if I
was fast enough could kill Netscape b4 the machine locked. If I was not
fast enough it would lock hard. Nothing in the logs.

HTH,

-- 
......Tom	ATA100 is another testimony to the fact that pigs can be
tdiehl@pil.net	made to fly given sufficient thrust (to borrow an RFC)
		Alan Cox lkml 11 Jan 01

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 20:20                   ` Tom Diehl
@ 2001-03-23 23:56                     ` Tim Wright
  2001-03-24  0:21                       ` Tom Diehl
  0 siblings, 1 reply; 162+ messages in thread
From: Tim Wright @ 2001-03-23 23:56 UTC (permalink / raw)
  To: Tom Diehl; +Cc: Rik van Riel, linux-kernel

Netscape 4 has some very nasty habits like suddenly consuming ~80MB of memory.
Disabling java support seems to eradicate most occurences of this particularly
obnoxious behaviour. Under these circumstances, the OOM killer is doing exactly
the right thing i.e. killing a runaway app.

Tim

On Fri, Mar 23, 2001 at 03:20:41PM -0500, Tom Diehl wrote:
> On Fri, 23 Mar 2001, Rik van Riel wrote:
> 
> > Well, in that case you'll have to live with the current OOM
> > killer.  Martin wrote down a pretty detailed description of
> > what's wrong with my algorithm, if it really bothers him he
> > should be able to come up with something better.
> >
> > Personally, I think there is more important VM code to look
> > after, since OOM is a pretty rare occurrance anyway.
> 
> Well actually it is not that rare at least for me. Every 3 or 4 days I run
> into it (It happened again this morning). The machine has 128 Megs of ram
> and 256 Megs of swap. It is my desktop machine and I keep 3 or 4 netscape
> windows running all of the time. Well I try to at least. Every 3 or 4 days
> the OOM Killer kills netscape, it happened this morning. If I could fix it
> I would but alas I do not have the knowledge. The best I can do is test. :(
> 
> This is NOT a complaint I just bring this up as another data point.
> It used to lock the machine so things are getting better. fwiw, I am
> currently running 2.4.2-ac18. The old ac kernels (do not remember exactly
> which ones but it was single digits) would allow the machine to start
> thrashing. I could usually see that it was running out of memory and if I
> was fast enough could kill Netscape b4 the machine locked. If I was not
> fast enough it would lock hard. Nothing in the logs.
> 
> HTH,
> 
> -- 
> ......Tom	ATA100 is another testimony to the fact that pigs can be
> tdiehl@pil.net	made to fly given sufficient thrust (to borrow an RFC)
> 		Alan Cox lkml 11 Jan 01
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
Tim Wright - timw@splhi.com or timw@aracnet.com or twright@us.ibm.com
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 23:56                     ` Tim Wright
@ 2001-03-24  0:21                       ` Tom Diehl
  0 siblings, 0 replies; 162+ messages in thread
From: Tom Diehl @ 2001-03-24  0:21 UTC (permalink / raw)
  To: Tim Wright; +Cc: Rik van Riel, linux-kernel

On Fri, 23 Mar 2001, Tim Wright wrote:

> Netscape 4 has some very nasty habits like suddenly consuming ~80MB of memory.
> Disabling java support seems to eradicate most occurences of this particularly
> obnoxious behaviour. Under these circumstances, the OOM killer is doing exactly
> the right thing i.e. killing a runaway app.

Thanks for the info. I sus[ected as much but I was not sure.

-- 
......Tom	ATA100 is another testimony to the fact that pigs can be
tdiehl@pil.net	made to fly given sufficient thrust (to borrow an RFC)
		Alan Cox lkml 11 Jan 01


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 11:47   ` Guest section DW
                       ` (2 preceding siblings ...)
  2001-03-22 20:28     ` Stephen Clouse
@ 2001-03-23 17:26     ` James A. Sutherland
  2001-03-23 17:32       ` Alan Cox
                         ` (3 more replies)
  3 siblings, 4 replies; 162+ messages in thread
From: James A. Sutherland @ 2001-03-23 17:26 UTC (permalink / raw)
  To: Guest section DW
  Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

On Thu, 22 Mar 2001, Guest section DW wrote:
> On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote:
> > On Wed, 21 Mar 2001, Patrick O'Rourke wrote:
>
> > > Since the system will panic if the init process is chosen by
> > > the OOM killer, the following patch prevents select_bad_process()
> > > from picking init.
>
> There is a dozen other processes that must not be killed.
> Init is just a random example.

That depends what you mean by "must not". If it's your missile guidance
system, aircraft autopilot or life support system, the system must not run
out of memory in the first place. If the system breaks down badly, killing
init and thus panicking (hence rebooting, if the system is set up that
way) seems the best approach.

> > One question ... has the OOM killer ever selected init on
> > anybody's system ?
>
> Last week I installed SuSE 7.1 somewhere.
> During the install: "VM: killing process rpm",
> leaving the installer rather confused.
> (An empty machine, 256MB, 144MB swap, I think 2.2.18.)

If SuSE's install program needs more than a quarter Gb of RAM, you need a
better distro.

> Last month I had a computer algebra process running for a week.
> Killed. But this computation was the only task this machine had.
> Its sole reason of existence.
> Too bad - zero information out of a week's computation.

A computation your system was incapable of performing. OK, it's a shame it
took you a week to find this out, but the computation had to die: if the
only process running cannot run, it has to die!

> (I think 2.4.0.)
>
> Clearly, Linux cannot be reliable if any process can be killed
> at any moment.

What on earth did you expect to happen when the process exceeded the
machine's capabilities? Using more than all the resources fails. There
isn't an alternative.

James.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:26     ` James A. Sutherland
@ 2001-03-23 17:32       ` Alan Cox
  2001-03-23 18:58         ` Martin Dalecki
                           ` (3 more replies)
  2001-03-24  0:03       ` Guest section DW
                         ` (2 subsequent siblings)
  3 siblings, 4 replies; 162+ messages in thread
From: Alan Cox @ 2001-03-23 17:32 UTC (permalink / raw)
  To: James A. Sutherland
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

> That depends what you mean by "must not". If it's your missile guidance
> system, aircraft autopilot or life support system, the system must not run
> out of memory in the first place. If the system breaks down badly, killing
> init and thus panicking (hence rebooting, if the system is set up that
> way) seems the best approach.

Ultra reliable systems dont contain memory allocators. There are good reasons
for this but the design trade offs are rather hard to make in a real world
environment

Solving the trivial overcommit case is not a difficult task but since I don't
believe it is needed I'll wait for those who moan so loudly to do it


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:32       ` Alan Cox
@ 2001-03-23 18:58         ` Martin Dalecki
  2001-03-23 19:45         ` Jonathan Morton
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-23 18:58 UTC (permalink / raw)
  To: Alan Cox
  Cc: James A. Sutherland, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

I have a constructive proposal:

It would make much sense to make the oom killer
leave not just root processes alone but processes belonging to a UID
lower
then a certain value as well (500). This would be:

1. Easly managable by the admin. Just let oracle/www and analogous users
   have a UID lower then let's say 500.

2. In full compliance with the port trick done by TCP/IP (ports < 1024
vers other)

3. It wouldn't need any addition of new interface (no jebanoje gawno in
/proc in addition()

4. Really simple to implement/document understand.

5. Be the same way as Solaris does similiar things.

...

Damn: I will let my chess club alone toady and will just code it down
NOW.

Spec:

1. Processes with a UID < 100 are immune to OOM killers.
2. Processes with a UID >= 100 && < 500 are hard for the OOM killer to
take on.
3. Processes with a UID >= 500 are easy targets.

Let me introduce a new terminology in full analogy to "fire walls"
routers and therabouts:

Processes of category 1. are called captains (oficerzy)
Processes of category 2. are called corporals (porucznicy)
Processes of category 2. are called privates (¿o³nierze)

;-)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:32       ` Alan Cox
  2001-03-23 18:58         ` Martin Dalecki
@ 2001-03-23 19:45         ` Jonathan Morton
  2001-03-23 23:26           ` Eric W. Biederman
  2001-03-25 15:30         ` Martin Dalecki
  2001-03-25 20:47         ` Stephen Satchell
  3 siblings, 1 reply; 162+ messages in thread
From: Jonathan Morton @ 2001-03-23 19:45 UTC (permalink / raw)
  To: Martin Dalecki, Alan Cox
  Cc: James A. Sutherland, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

>It would make much sense to make the oom killer
>leave not just root processes alone but processes belonging to a UID
>lower
>then a certain value as well (500). This would be:
>
>1. Easly managable by the admin. Just let oracle/www and analogous users
>   have a UID lower then let's say 500.

That sounds vaguely sensible.  However, make it a "much less likely" rather
than an "impossible", otherwise we end up with an unkillable runaway root
process killing everything else in userland.

I'm still in favour of a failing malloc(), and I'm currently reading a bit
of source and docs to figure out where this should be done and why it isn't
done now.  So far I've found the overcommit_memory flag, which looks kinda
promising.

>1. Processes with a UID < 100 are immune to OOM killers.
>2. Processes with a UID >= 100 && < 500 are hard for the OOM killer to
>take on.
>3. Processes with a UID >= 500 are easy targets.

As I said above, "immune" can be dangerous.  "Extremely hard" would be
better terminology and behaviour.  It also helps that the current weighting
in badness() appears to leave getty processes alone, since they don't
consume much and normally have long uptimes - also I believe init would try
to restart them anyway.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 19:45         ` Jonathan Morton
@ 2001-03-23 23:26           ` Eric W. Biederman
  0 siblings, 0 replies; 162+ messages in thread
From: Eric W. Biederman @ 2001-03-23 23:26 UTC (permalink / raw)
  To: Jonathan Morton
  Cc: Martin Dalecki, Alan Cox, James A. Sutherland, Guest section DW,
	Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

Jonathan Morton <chromi@cyberspace.org> writes:

> >It would make much sense to make the oom killer
> >leave not just root processes alone but processes belonging to a UID
> >lower
> >then a certain value as well (500). This would be:
> >
> >1. Easly managable by the admin. Just let oracle/www and analogous users
> >   have a UID lower then let's say 500.
> 
> That sounds vaguely sensible.  However, make it a "much less likely" rather
> than an "impossible", otherwise we end up with an unkillable runaway root
> process killing everything else in userland.
> 
> I'm still in favour of a failing malloc(), and I'm currently reading a bit
> of source and docs to figure out where this should be done and why it isn't
> done now.  So far I've found the overcommit_memory flag, which looks kinda
> promising.

Lookup mlock & mlock_all they will handle the single process case.

Of course if you OOM you still have problems but that should make
them much harder to trigger.

Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:32       ` Alan Cox
  2001-03-23 18:58         ` Martin Dalecki
  2001-03-23 19:45         ` Jonathan Morton
@ 2001-03-25 15:30         ` Martin Dalecki
  2001-03-25 20:47         ` Stephen Satchell
  3 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-25 15:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: James A. Sutherland, Guest section DW, Rik van Riel,
	Patrick O'Rourke, linux-mm, linux-kernel

Alan Cox wrote:
> 
> > That depends what you mean by "must not". If it's your missile guidance
> > system, aircraft autopilot or life support system, the system must not run
> > out of memory in the first place. If the system breaks down badly, killing
> > init and thus panicking (hence rebooting, if the system is set up that
> > way) seems the best approach.
> 
> Ultra reliable systems dont contain memory allocators. There are good reasons
> for this but the design trade offs are rather hard to make in a real world
> environment

I esp. they run on CPU's without a stack or what?

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:32       ` Alan Cox
                           ` (2 preceding siblings ...)
  2001-03-25 15:30         ` Martin Dalecki
@ 2001-03-25 20:47         ` Stephen Satchell
  3 siblings, 0 replies; 162+ messages in thread
From: Stephen Satchell @ 2001-03-25 20:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel

At 05:30 PM 3/25/01 +0200, you wrote:
> > Ultra reliable systems dont contain memory allocators. There are good 
> reasons
> > for this but the design trade offs are rather hard to make in a real world
> > environment
>
>I esp. they run on CPU's without a stack or what?

No dynamic memory allocation AT ALL.  That includes the prohibition of a 
stack.  I've seen avionics-loop systems that abstract a stack but the 
"allocators" are part of the application and are designed to fall over 
gracefully when they become full -- but getting this past a project manager 
is hard, as it should be.

Then there are those systems with rather interesting watchdog timers.  If 
you don't tickle them just right, they fire and force a restart.  The 
nastiest of these required that you send four specific values to a specific 
I/O port, and the hardware looked to see if the values violated certain 
timing guidelines.  If you sent the code too early or too late, or if the 
value in the sequence was incorrect, BAM.  The hardware was designed by a 
guy with some rather interesting experiences with software "engineers" 
dealing with watchdog timers...

Satch

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:26     ` James A. Sutherland
  2001-03-23 17:32       ` Alan Cox
@ 2001-03-24  0:03       ` Guest section DW
  2001-03-24  7:52       ` Doug Ledford
  2001-03-25  0:32       ` Kurt Garloff
  3 siblings, 0 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-24  0:03 UTC (permalink / raw)
  To: James A. Sutherland
  Cc: Rik van Riel, Patrick O'Rourke, linux-mm, linux-kernel

On Fri, Mar 23, 2001 at 05:26:22PM +0000, James A. Sutherland wrote:

> > Clearly, Linux cannot be reliable if any process can be killed
> > at any moment.
> 
> What on earth did you expect to happen when the process exceeded the
> machine's capabilities? Using more than all the resources fails. There
> isn't an alternative.

That is the wrong way to phrase these things.
Large processes usually do not have a definite set of needed resources.
They can use lots of memory for buffers and cache and hash and be a bit
faster, or use much less and be a bit slower.
Linux first promises a lot of memory, but then fails to deliver,
without returning any error to the program.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:26     ` James A. Sutherland
  2001-03-23 17:32       ` Alan Cox
  2001-03-24  0:03       ` Guest section DW
@ 2001-03-24  7:52       ` Doug Ledford
  2001-03-25  0:32       ` Kurt Garloff
  3 siblings, 0 replies; 162+ messages in thread
From: Doug Ledford @ 2001-03-24  7:52 UTC (permalink / raw)
  To: James A. Sutherland
  Cc: Guest section DW, Rik van Riel, Patrick O'Rourke, linux-mm,
	linux-kernel

"James A. Sutherland" wrote:
> On Thu, 22 Mar 2001, Guest section DW wrote:
> > (I think 2.4.0.)
> >
> > Clearly, Linux cannot be reliable if any process can be killed
> > at any moment.
> 
> What on earth did you expect to happen when the process exceeded the
> machine's capabilities? Using more than all the resources fails. There
> isn't an alternative.

You might be successful in convincing myself or Andries of this as soon as the
oom killer only kills things when the system is really out of memory.  Right
now, it's not really an oom killer, it's more like an "I'm Too Lazy To Free Up
Some More Pages So Now You Die" (ITLTFUSMPSNYD) killer.

-- 

 Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
      Please check my web site for aic7xxx updates/answers before
                      e-mailing me about problems

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:26     ` James A. Sutherland
                         ` (2 preceding siblings ...)
  2001-03-24  7:52       ` Doug Ledford
@ 2001-03-25  0:32       ` Kurt Garloff
  2001-03-25 15:02         ` Sandy Harris
  2001-03-25 18:07         ` Guest section DW
  3 siblings, 2 replies; 162+ messages in thread
From: Kurt Garloff @ 2001-03-25  0:32 UTC (permalink / raw)
  To: James A. Sutherland; +Cc: Linux kernel list

[-- Attachment #1: Type: text/plain, Size: 874 bytes --]

On Fri, Mar 23, 2001 at 05:26:22PM +0000, James A. Sutherland wrote:
> If SuSE's install program needs more than a quarter Gb of RAM, you need a
> better distro.

Well, it's rpm ...
I guess the Debian packager is more friendly.
But if you choose to install a huge number of packages, the job to do for
the package manager (dependencies ...) is no trivial to do with few resources.

But that's not the point of the discussion.

Kernel related questions IMHO are:
(1) Why do we get into OOM? Can we avoid it?
(2) Is OOM sometimes misdetected (or triggered too early) and why?
(3) Does the OOM killer choose the right processes?

Regards,
-- 
Kurt Garloff  <garloff@suse.de>                          Eindhoven, NL
GPG key: See mail header, key servers         Linux kernel development
SuSE GmbH, Nuernberg, FRG                               SCSI, Security

[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-25  0:32       ` Kurt Garloff
@ 2001-03-25 15:02         ` Sandy Harris
  2001-03-25 18:07         ` Guest section DW
  1 sibling, 0 replies; 162+ messages in thread
From: Sandy Harris @ 2001-03-25 15:02 UTC (permalink / raw)
  To: Linux kernel list

Kurt Garloff wrote:

> Kernel related questions IMHO are:
> (1) Why do we get into OOM?

There was a long thread about this a few months back. We get into OOM because
malloc(), calloc() etc. can allocate more memory than is actually available.

e.g. Say you have machine with 64 RAM + 64 swap = 128 megs with 40 megs in use,
so 88 free. Now two processes each malloc() 80 megs. Both succeed. If both
processes then use that memory, someone is likely to fail later.

> Can we avoid it?

The obvious solution is to consider the above behaviour a bug and fix it.
The second malloc() should fail. The process making that call can then look
at the return value and decide what to do about the failure.

However, this was extensively discussed here last year, and that solution was
quite firmly rejected. I never understood the reasons. See the archives.

Someone did announce they were working on patches implementing a sane malloc().
What happened to that project?

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-25  0:32       ` Kurt Garloff
  2001-03-25 15:02         ` Sandy Harris
@ 2001-03-25 18:07         ` Guest section DW
  1 sibling, 0 replies; 162+ messages in thread
From: Guest section DW @ 2001-03-25 18:07 UTC (permalink / raw)
  To: Kurt Garloff, James A. Sutherland, Linux kernel list

On Sun, Mar 25, 2001 at 01:32:42AM +0100, Kurt Garloff wrote:
> On Fri, Mar 23, 2001 at 05:26:22PM +0000, James A. Sutherland wrote:
> > If SuSE's install program needs more than a quarter Gb of RAM, you need a
> > better distro.
> 
> Well, it's rpm ...

Yes. I investigated and found rpm's data base corrupted, and rpm cannot handle
that. Since I have several occurrences of rpm being killed by the oom killer
in my logs it is entirely possible that the data base got corrupted because
rpm was killed while in the process of updating it.



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:48 ` Rik van Riel
  2001-03-22  8:14   ` Eric W. Biederman
  2001-03-22 11:47   ` Guest section DW
@ 2001-03-22 14:53   ` Patrick O'Rourke
  2001-03-22 19:24   ` Philipp Rumpf
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 162+ messages in thread
From: Patrick O'Rourke @ 2001-03-22 14:53 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

Rik van Riel wrote:


> One question ... has the OOM killer ever selected init on
> anybody's system ?

Yes, which is why I created the patch.

-- 
Patrick O'Rourke
978.606.0236
orourke@missioncriticallinux.com


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:48 ` Rik van Riel
                     ` (2 preceding siblings ...)
  2001-03-22 14:53   ` Patrick O'Rourke
@ 2001-03-22 19:24   ` Philipp Rumpf
  2001-03-22 22:20   ` James A. Sutherland
  2001-03-23 17:31   ` Szabolcs Szakacsits
  5 siblings, 0 replies; 162+ messages in thread
From: Philipp Rumpf @ 2001-03-22 19:24 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

On Wed, Mar 21, 2001 at 08:48:54PM -0300, Rik van Riel wrote:
> On Wed, 21 Mar 2001, Patrick O'Rourke wrote:
> 
> > Since the system will panic if the init process is chosen by
> > the OOM killer, the following patch prevents select_bad_process()
> > from picking init.
> 
> One question ... has the OOM killer ever selected init on
> anybody's system ?

Yes, I managed to reproduce this a while ago.  (init was the only
process around though).

We don't ever kill init, fwiw;  we panic(), which is the right thing
to do if init can't keep running.

> I think that the scoring algorithm should make sure that
> we never pick init, unless the system is screwed so badly
> that init is broken or the only process left ;)

I can't think of a situation where the OOM killer does the wrong thing.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:48 ` Rik van Riel
                     ` (3 preceding siblings ...)
  2001-03-22 19:24   ` Philipp Rumpf
@ 2001-03-22 22:20   ` James A. Sutherland
  2001-03-23 17:31   ` Szabolcs Szakacsits
  5 siblings, 0 replies; 162+ messages in thread
From: James A. Sutherland @ 2001-03-22 22:20 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

On Wed, 21 Mar 2001, Rik van Riel wrote:

> On Wed, 21 Mar 2001, Patrick O'Rourke wrote:
>
> > Since the system will panic if the init process is chosen by
> > the OOM killer, the following patch prevents select_bad_process()
> > from picking init.
>
> One question ... has the OOM killer ever selected init on
> anybody's system ?

Well, I managed to get the OOM killer killing init once; OTOH, I had just
broken MM completely (disabled freeing of pages entirely!) so that doesn't
really count, I think :-)

> I think that the scoring algorithm should make sure that
> we never pick init, unless the system is screwed so badly
> that init is broken or the only process left ;)

If the system is that badly screwed, killing init is probably the right
thing to do, since this should then cause a panic, and thus a reboot if
the machine is so configured?


James.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:48 ` Rik van Riel
                     ` (4 preceding siblings ...)
  2001-03-22 22:20   ` James A. Sutherland
@ 2001-03-23 17:31   ` Szabolcs Szakacsits
  2001-03-24  5:54     ` Rik van Riel
  5 siblings, 1 reply; 162+ messages in thread
From: Szabolcs Szakacsits @ 2001-03-23 17:31 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

On Wed, 21 Mar 2001, Rik van Riel wrote:
> One question ... has the OOM killer ever selected init on
> anybody's system ?

Hi Rik,

When I ported your OOM killer to 2.2.x and integrated it into the
'reserved root memory' [*] patch, during intensive testing I found two
cases when init was killed. It happened on low-end machines and when OOM
killer wasn't triggered so init was killed in the page fault handler.
The later was also one of the reasons I replaced the "random" OOM killer
in page fault handler with yours [so there is only one OOM killer]. I
also asked you at that time whether there was any reason you didn't put
it also there but unfortunately you didn't answer. Practice showed it
works there as well [and actually some crashes that was reported here
recently could have been avoided in this way] but technically maybe I
missed something?

Other things that bothered me,
 - niced processes are penalized
 - trying to kill a task that is permanently in TASK_UNINTERRUPTIBLE
   will probably deadlock the machine [or the random OOM killer will
   kill the box].

	Szaka

[*] who are interested, it can be found at
	http://mlf.linux.rulez.org/mlf/ezaz/reserved_root_memory.html

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 17:31   ` Szabolcs Szakacsits
@ 2001-03-24  5:54     ` Rik van Riel
  2001-03-24  6:55       ` Juha Saarinen
  2001-03-27  8:31       ` Roger Gammans
  0 siblings, 2 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-24  5:54 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: Patrick O'Rourke, linux-mm, linux-kernel

On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote:

> When I ported your OOM killer to 2.2.x and integrated it into the
> 'reserved root memory' [*] patch, during intensive testing I found two
> cases when init was killed. It happened on low-end machines and when
> OOM killer wasn't triggered so init was killed in the page fault
> handler. The later was also one of the reasons I replaced the "random"
> OOM killer in page fault handler with yours [so there is only one OOM
> killer].

Good idea, we should do this for 2.4.  I cannot remember
reading an email from you about this, it's quite possible
I just missed it and didn't answer because I never read
it ...

> Other things that bothered me,
>  - niced processes are penalized

This can be considered a bug and should be fixed...

>  - trying to kill a task that is permanently in TASK_UNINTERRUPTIBLE
>    will probably deadlock the machine [or the random OOM killer will
>    kill the box].

This could indeed be a problem, though I cannot really see any
case where a task would be in TASK_UNINTERRUPTIBLE permanently.
OTOH, a 1GB read() will take a (much) too long time to finish.

Your ideas sound really good, would you have the time to implement
them for 2.4 ?

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* RE: [PATCH] Prevent OOM from killing init
  2001-03-24  5:54     ` Rik van Riel
@ 2001-03-24  6:55       ` Juha Saarinen
  2001-03-27  8:31       ` Roger Gammans
  1 sibling, 0 replies; 162+ messages in thread
From: Juha Saarinen @ 2001-03-24  6:55 UTC (permalink / raw)
  To: Rik van Riel, Szabolcs Szakacsits
  Cc: Patrick O'Rourke, linux-mm, linux-kernel

:: Your ideas sound really good, would you have the time to implement
:: them for 2.4 ?

2.4 or 2.5?

-- Juha

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  5:54     ` Rik van Riel
  2001-03-24  6:55       ` Juha Saarinen
@ 2001-03-27  8:31       ` Roger Gammans
  1 sibling, 0 replies; 162+ messages in thread
From: Roger Gammans @ 2001-03-27  8:31 UTC (permalink / raw)
  To: linux-kernel

On Sat, Mar 24, 2001 at 02:54:55AM -0300, Rik van Riel wrote:
> On Fri, 23 Mar 2001, Szabolcs Szakacsits wrote:
> >  - trying to kill a task that is permanently in TASK_UNINTERRUPTIBLE
> >    will probably deadlock the machine [or the random OOM killer will
> >    kill the box].
> 
> This could indeed be a problem, though I cannot really see any
> case where a task would be in TASK_UNINTERRUPTIBLE permanently.

I've seen this with 'mt rewind' jamming on ide-tape. I'm
not sure of the exact pathology , but ISTR that it
was related issuing that command while the hardware was busy.

In any case the point is that a badly written driver or faulty
h/w even in a subsiduary system can cause this.

In an ideal world of course these wouldn't happen, but OTOH
is this an issue in failing a box which is going to fail
anyway if we don't kill the process. If we could ensure
a graceful failure so much the better.

TTFN
-- 
Roger
     Think of the mess on the carpet. Sensible people do all their
     demon-summoning in the garage, which you can just hose down afterwards.
        --     damerell@chiark.greenend.org.uk

^ permalink raw reply	[flat|nested] 162+ messages in thread

* RE: [PATCH] Prevent OOM from killing init
@ 2001-03-21 23:41 Leif Sawyer
  2001-03-22  0:32 ` Kevin Buhr
  0 siblings, 1 reply; 162+ messages in thread
From: Leif Sawyer @ 2001-03-21 23:41 UTC (permalink / raw)
  To: Eli Carter, Patrick O'Rourke; +Cc: linux-kernel

Patrick O'Rourke, who wrote:
> Since the system will panic if the init process is chosen by
> the OOM killer, the following patch prevents select_bad_process()
> from picking init.
> 

(Patch deleted)

What happens when init is not pid == 1, as is often the case
during installs, booting off of cdrom, etc..


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-21 23:41 Leif Sawyer
@ 2001-03-22  0:32 ` Kevin Buhr
  0 siblings, 0 replies; 162+ messages in thread
From: Kevin Buhr @ 2001-03-22  0:32 UTC (permalink / raw)
  To: Patrick O'Rourke; +Cc: Eli Carter, linux-kernel

Leif Sawyer <lsawyer@gci.com> writes:
> 
> What happens when init is not pid == 1, as is often the case
> during installs, booting off of cdrom, etc..

Well, after spending hours scrutinizing Patrick's one-line patch, I'll
guess that, in these cases, the patch does not prevent init from being
killed by an OOM error.  But, I'll bet that was a rhetorical question.

In any event, whatever process has pid == 1, it can't voluntarily exit
without a panic, and it's the reaper of all orphaned children, so it
makes sense not to kill it.  As Eli points out, the patch is cleaner
if rewritten:

--- xxx/linux-2.4.3-pre6/mm/oom_kill.c  Tue Nov 14 13:56:46 2000
+++ linux-2.4.3-pre6/mm/oom_kill.c      Wed Mar 21 15:25:03 2001
@@ -123,7 +123,7 @@

         read_lock(&tasklist_lock);
         for_each_task(p) {
-               if (p->pid) {
+               if (p->pid > 1) {
                         int points = badness(p);
                         if (points > maxpoints) {
                                 chosen = p;

since no valid pid is ever negative.

I don't see a valid reason for *not* making this change, but I'm
batting zero for two on my last two patch submissions, so I've
probably missed something.

Kevin <buhr@stat.wisc.edu>

^ permalink raw reply	[flat|nested] 162+ messages in thread

* RE: [PATCH] Prevent OOM from killing init
@ 2001-03-22 11:08 Heusden, Folkert van
  0 siblings, 0 replies; 162+ messages in thread
From: Heusden, Folkert van @ 2001-03-22 11:08 UTC (permalink / raw)
  To: Patrick O'Rourke, linux-mm, linux-kernel

> Since the system will panic if the init process is chosen by
> the OOM killer, the following patch prevents select_bad_process()
> from picking init.

Hmmm, wouldn't it be nice to make this all configurable? Like; have
some list of PIDs that can be killed?
I would hate it the daemon that checks my UPS would get killed...
(that deamon brings the machine down safely when the UPS'
batteries get emptied).
Would be something like:

int *dont_kill_pid, ndont_kill_pid;
// initialize with at least pid '1' and n=1

         for_each_task(p) {
		int loop;
		for(loop=ndont_kill_pid-1; loop>=0; loop--)
		{
			if (dont_kill_pid[loop] == p->pid) break;
		}
              if (p->pid && !(loop>=0)) {
                         int points = badness(p);
                         if (points > maxpoints) {
                                 chosen = p;


(untested (not even compiled or anything) code)

^ permalink raw reply	[flat|nested] 162+ messages in thread

[parent not found: <4605B269DB001E4299157DD1569079D2809930@EXCHANGE03.plaza.ds.adp.com>]

* Re: [PATCH] Prevent OOM from killing init
       [not found] <4605B269DB001E4299157DD1569079D2809930@EXCHANGE03.plaza.ds.adp.com>
@ 2001-03-22 16:29 ` Rik van Riel
  2001-03-22 18:32   ` Christian Bodmer
  0 siblings, 1 reply; 162+ messages in thread
From: Rik van Riel @ 2001-03-22 16:29 UTC (permalink / raw)
  To: Tom Kondilis; +Cc: linux-mm, linux-kernel

On Thu, 22 Mar 2001, Tom Kondilis wrote:

> I had a 2.4.3pre3 do a 'Killing Init'
> My assuption is that I had a large benchmark running, while the benchmark
> was running,  I updated inittab to uncomment a mgetty of my serial port, and
> followed it with a 'telinit q'.
> When the system thought it ran out of memory with '1-order allocation
> failures' during a fork, which I think its a defect , because I still have
> 14GB of Swap left in the system. My system was dead.
> A real life case of killing Init.

That's not the OOM killer however, but init dying because it
couldn't get the memory it needed to satisfy a page fault or
somesuch...

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 16:29 ` Rik van Riel
@ 2001-03-22 18:32   ` Christian Bodmer
  2001-03-23 15:08     ` Horst von Brand
  0 siblings, 1 reply; 162+ messages in thread
From: Christian Bodmer @ 2001-03-22 18:32 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-mm, linux-kernel

I can't say I understand the whole MM system, however the random killing of 
processes seems like a rather unfortunate solution to the problem. If someone 
has a spare minute, maybe they could explain to me why running out of free 
memory in kswapd results in a deadlock situation.

That aside, would it be an improvement to define another process flag 
(PF_OOMPRESERVE) that would declare a process as undesirable to be killed in an 
OOM situation, so that the user has at least some control over what gets killed 
first or last respectively. Only when select_bad_process() runs out of 
unflagged processes will it then proceed to kill the processes with this new 
flag.

Just an idea, I am pretty sure there's tons of reasons why not to introduce a 
new per process flag.

/Cheers
Chris

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 18:32   ` Christian Bodmer
@ 2001-03-23 15:08     ` Horst von Brand
  2001-03-24  7:48       ` Doug Ledford
  0 siblings, 1 reply; 162+ messages in thread
From: Horst von Brand @ 2001-03-23 15:08 UTC (permalink / raw)
  To: Christian Bodmer; +Cc: linux-kernel

"Christian Bodmer" <cbinsec01@freesurf.ch> said:

> I can't say I understand the whole MM system, however the random killing
> of processes seems like a rather unfortunate solution to the problem. If
> someone has a spare minute, maybe they could explain to me why running
> out of free memory in kswapd results in a deadlock situation.

OOM is not "normal operations", it is a machine under very extreme stress,
and should *never* happen. To complicate (or even worse, slow down or
otherwise use up resources like memory) normal operations for "better
handling of OOM" is total nonsense.
-- 
Dr. Horst H. von Brand                       mailto:vonbrand@inf.utfsm.cl
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 15:08     ` Horst von Brand
@ 2001-03-24  7:48       ` Doug Ledford
  2001-03-24 10:21         ` Mike Galbraith
                           ` (2 more replies)
  0 siblings, 3 replies; 162+ messages in thread
From: Doug Ledford @ 2001-03-24  7:48 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Christian Bodmer, linux-kernel

Horst von Brand wrote:
> 
> "Christian Bodmer" <cbinsec01@freesurf.ch> said:
> 
> > I can't say I understand the whole MM system, however the random killing
> > of processes seems like a rather unfortunate solution to the problem. If
> > someone has a spare minute, maybe they could explain to me why running
> > out of free memory in kswapd results in a deadlock situation.
> 
> OOM is not "normal operations", it is a machine under very extreme stress,
> and should *never* happen. To complicate (or even worse, slow down or
> otherwise use up resources like memory) normal operations for "better
> handling of OOM" is total nonsense.

Puh-Leeze.   Let's inject some reality into this conversation:

[dledford@aic-cvs dledford]$ more kill-list 
Mar 10 22:02:34 monster kernel: Out of Memory: Killed process 475 (identd).
Mar 10 22:03:25 monster kernel: Out of Memory: Killed process 660 (xfs).
Mar 10 23:02:43 monster kernel: Out of Memory: Killed process 415 (rpc.statd).
Mar 11 01:20:31 monster kernel: Out of Memory: Killed process 397 (portmap).
Mar 11 01:37:09 monster kernel: Out of Memory: Killed process 474 (identd).
Mar 11 02:56:54 monster kernel: Out of Memory: Killed process 659 (xfs).
Mar 11 03:01:43 monster kernel: Out of Memory: Killed process 414 (rpc.statd).
Mar 11 03:09:30 monster kernel: Out of Memory: Killed process 396 (portmap).
Mar 11 03:37:30 monster kernel: Out of Memory: Killed process 538 (lpd).
Mar 11 03:49:46 monster kernel: Out of Memory: Killed process 493 (atd).
Mar 11 04:02:15 monster kernel: Out of Memory: Killed process 517 (sshd).
Mar 11 04:05:05 monster kernel: Out of Memory: Killed process 724 (bash).
Mar 11 05:02:40 monster kernel: Out of Memory: Killed process 717 (login).
Mar 11 05:54:04 monster kernel: Out of Memory: Killed process 718 (login).
Mar 11 13:34:25 monster kernel: Out of Memory: Killed process 20357 (bash).
Mar 11 16:04:12 monster kernel: Out of Memory: Killed process 5879 (diff).
Mar 11 16:52:41 monster kernel: Out of Memory: Killed process 7948 (tar).
Mar 11 17:37:09 monster kernel: Out of Memory: Killed process 10072 (tar).
Mar 11 17:42:26 monster kernel: Out of Memory: Killed process 10358 (tar).
Mar 11 18:24:30 monster kernel: Out of Memory: Killed process 11300
(run-parts).
Mar 11 19:23:56 monster kernel: Out of Memory: Killed process 11301
(set-time).
Mar 11 20:28:54 monster kernel: Out of Memory: Killed process 18165 (tar).
Mar 11 20:28:55 monster kernel: Out of Memory: Killed process 18167 (gzip).
Mar 11 21:30:51 monster kernel: Out of Memory: Killed process 21205 (tar).
Mar 11 21:33:09 monster kernel: Out of Memory: Killed process 11303 (rdate).
Mar 11 21:50:36 monster kernel: Out of Memory: Killed process 22195 (tar).
Mar 11 22:07:57 monster kernel: Out of Memory: Killed process 23049 (tar).
Mar 11 22:10:01 monster kernel: Out of Memory: Killed process 22987 (diff).
Mar 11 22:12:28 monster kernel: Out of Memory: Killed process 23233 (diff).
Mar 12 00:25:38 monster kernel: Out of Memory: Killed process 29692 (diff).
Mar 12 00:35:34 monster kernel: Out of Memory: Killed process 30229 (tar).
Mar 12 00:57:42 monster kernel: Out of Memory: Killed process 30796 (diff).
Mar 12 01:49:33 monster kernel: Out of Memory: Killed process 1153 (diff).
Mar 12 02:41:31 monster kernel: Out of Memory: Killed process 3488 (tar).
Mar 12 03:06:00 monster kernel: Out of Memory: Killed process 4257 (diff).
Mar 12 04:55:27 monster kernel: Out of Memory: Killed process 8845 (diff).
Mar 12 05:20:07 monster kernel: Out of Memory: Killed process 9712 (sh).
Mar 12 05:50:47 monster kernel: Out of Memory: Killed process 10475 (diff).
Mar 12 05:51:46 monster kernel: Out of Memory: Killed process 10838 (tar).
Mar 12 05:59:07 monster kernel: Out of Memory: Killed process 11162 (tar).
Mar 12 07:45:19 monster kernel: Out of Memory: Killed process 15489 (diff).
Mar 12 08:08:01 monster kernel: Out of Memory: Killed process 16340 (diff).
Mar 12 09:19:18 monster kernel: Out of Memory: Killed process 20182 (diff).
Mar 12 09:29:41 monster kernel: Out of Memory: Killed process 20237 (diff).
Mar 12 11:17:54 monster kernel: Out of Memory: Killed process 25611 (diff).
Mar 12 11:20:05 monster kernel: Out of Memory: Killed process 26133 (diff).
Mar 12 12:34:51 monster kernel: Out of Memory: Killed process 29826 (tar).
Mar 12 13:24:21 monster kernel: Out of Memory: Killed process 32281 (tar).
Mar 12 13:44:20 monster kernel: Out of Memory: Killed process 819 (tar).
Mar 12 13:49:37 monster kernel: Out of Memory: Killed process 1108 (tar).
Mar 12 14:03:46 monster kernel: Out of Memory: Killed process 1304 (diff).
Mar 12 14:26:29 monster kernel: Out of Memory: Killed process 2933 (tar).
Mar 12 14:29:08 monster kernel: Out of Memory: Killed process 3035 (diff).
Mar 12 14:45:53 monster kernel: Out of Memory: Killed process 3828 (diff).
Mar 12 15:06:05 monster kernel: Out of Memory: Killed process 4832 (tar).
Mar 12 16:03:42 monster kernel: Out of Memory: Killed process 7552 (tar).
Mar 12 17:10:35 monster kernel: Out of Memory: Killed process 10554 (diff).
Mar 12 17:27:39 monster kernel: Out of Memory: Killed process 11285 (diff).
Mar 12 17:52:07 monster kernel: Out of Memory: Killed process 12135 (diff).
Mar 12 18:29:39 monster kernel: Out of Memory: Killed process 14483 (tar).
Mar 12 19:58:20 monster kernel: Out of Memory: Killed process 18489 (diff).
Mar 12 20:11:46 monster kernel: Out of Memory: Killed process 19362 (tar).
Mar 12 20:31:07 monster kernel: Out of Memory: Killed process 20146 (tar).
Mar 12 21:20:00 monster kernel: Out of Memory: Killed process 22132 (diff).
Mar 12 21:37:42 monster kernel: Out of Memory: Killed process 23400 (tar).
Mar 12 22:24:48 monster kernel: Out of Memory: Killed process 25488 (diff).
Mar 12 22:44:35 monster kernel: Out of Memory: Killed process 26597 (tar).
Mar 12 23:49:01 monster kernel: Out of Memory: Killed process 29112 (diff).
Mar 12 23:51:34 monster kernel: Out of Memory: Killed process 29574 (tar).
Mar 13 00:50:36 monster kernel: Out of Memory: Killed process 32244 (diff).
Mar 13 01:05:21 monster kernel: Out of Memory: Killed process 513 (diff).
Mar 13 02:34:52 monster kernel: Out of Memory: Killed process 4948 (bash).
Mar 13 03:06:48 monster kernel: Out of Memory: Killed process 6511 (tar).
Mar 13 04:54:37 monster kernel: Out of Memory: Killed process 11753 (tar).
Mar 13 05:02:02 monster kernel: Out of Memory: Killed process 12137 (tar).
Mar 13 05:09:32 monster kernel: Out of Memory: Killed process 12521 (tar).
Mar 13 05:27:05 monster kernel: Out of Memory: Killed process 13383 (tar).
Mar 13 05:29:19 monster kernel: Out of Memory: Killed process 13490 (tar).
Mar 13 06:06:27 monster kernel: Out of Memory: Killed process 15063 (diff).
Mar 13 06:18:50 monster kernel: Out of Memory: Killed process 15704 (diff).
Mar 13 06:48:27 monster kernel: Out of Memory: Killed process 16703 (diff).
Mar 13 08:07:19 monster kernel: Out of Memory: Killed process 20995 (tar).
Mar 13 08:32:07 monster kernel: Out of Memory: Killed process 21933 (diff).
Mar 13 10:19:18 monster kernel: Out of Memory: Killed process 26764 (diff).
Mar 13 13:21:41 monster kernel: Out of Memory: Killed process 3452 (tar).
Mar 13 14:28:41 monster kernel: Out of Memory: Killed process 6654 (diff).
Mar 13 15:33:14 monster kernel: Out of Memory: Killed process 9434 (diff).
Mar 13 15:46:12 monster kernel: Out of Memory: Killed process 10469 (tar).
Mar 13 16:07:51 monster kernel: Out of Memory: Killed process 11518 (diff).
Mar 13 16:17:53 monster kernel: Out of Memory: Killed process 11588 (diff).
Mar 13 17:20:05 monster kernel: Out of Memory: Killed process 15139 (crond).
Mar 13 18:27:08 monster kernel: Out of Memory: Killed process 17909 (diff).
Mar 13 19:12:00 monster kernel: Out of Memory: Killed process 20059 (diff).
Mar 13 19:12:03 monster kernel: Out of Memory: Killed process 20278 (diff).
Mar 13 20:11:27 monster kernel: Out of Memory: Killed process 23113 (tar).
Mar 13 21:03:20 monster kernel: Out of Memory: Killed process 25638 (tar).
Mar 13 21:49:55 monster kernel: Out of Memory: Killed process 27811 (diff).
Mar 13 21:57:22 monster kernel: Out of Memory: Killed process 28037 (diff).
Mar 13 21:57:57 monster kernel: Out of Memory: Killed process 28383 (tar).
Mar 13 22:05:23 monster kernel: Out of Memory: Killed process 28759 (tar).
Mar 13 23:24:26 monster kernel: Out of Memory: Killed process 32225 (diff).
Mar 14 01:13:23 monster kernel: Out of Memory: Killed process 5235 (diff).
Mar 14 01:20:44 monster kernel: Out of Memory: Killed process 5525 (tar).
Mar 14 01:38:26 monster kernel: Out of Memory: Killed process 6326 (tar).
Mar 14 01:46:03 monster kernel: Out of Memory: Killed process 6713 (tar).
Mar 14 02:03:31 monster kernel: Out of Memory: Killed process 7527 (tar).
Mar 14 04:23:05 monster kernel: Out of Memory: Killed process 11806
(run-parts).
Mar 14 05:17:32 monster kernel: Out of Memory: Killed process 15152 (tar).
Mar 14 05:35:00 monster kernel: Out of Memory: Killed process 15995 (tar).
Mar 14 06:17:07 monster kernel: Out of Memory: Killed process 17282 (diff).
Mar 14 06:17:30 monster kernel: Out of Memory: Killed process 17439 (diff).
Mar 14 08:13:15 monster kernel: Out of Memory: Killed process 22491 (diff).
Mar 14 09:15:08 monster kernel: Out of Memory: Killed process 25782 (tar).
Mar 14 09:49:48 monster kernel: Out of Memory: Killed process 27088 (diff).
Mar 14 10:00:16 monster kernel: Out of Memory: Killed process 28020 (tar).
Mar 14 10:35:05 monster kernel: Out of Memory: Killed process 29703 (tar).
Mar 14 10:47:14 monster kernel: Out of Memory: Killed process 30142 (diff).
Mar 14 12:14:40 monster kernel: Out of Memory: Killed process 2126 (tar).
Mar 14 12:21:57 monster kernel: Out of Memory: Killed process 2135 (diff).
Mar 14 12:39:08 monster kernel: Out of Memory: Killed process 3201 (diff).
Mar 14 13:18:32 monster kernel: Out of Memory: Killed process 5259 (diff).
Mar 14 13:28:50 monster kernel: Out of Memory: Killed process 5385 (diff).
Mar 14 13:55:50 monster kernel: Out of Memory: Killed process 7159 (tar).
Mar 14 14:40:13 monster kernel: Out of Memory: Killed process 8946 (diff).
Mar 14 14:52:21 monster kernel: Out of Memory: Killed process 9932 (diff).
Mar 14 15:02:52 monster kernel: Out of Memory: Killed process 10494 (tar).
Mar 14 15:37:01 monster kernel: Out of Memory: Killed process 11776 (diff).
Mar 14 15:39:53 monster kernel: Out of Memory: Killed process 12268 (tar).
Mar 14 15:46:53 monster kernel: Out of Memory: Killed process 12228 (diff).
Mar 14 16:01:48 monster kernel: Out of Memory: Killed process 13205 (diff).
Mar 14 17:01:31 monster kernel: Out of Memory: Killed process 16291 (tar).
Mar 14 17:15:54 monster kernel: Out of Memory: Killed process 16843 (diff).
Mar 14 17:30:55 monster kernel: Out of Memory: Killed process 17549 (diff).
Mar 14 17:57:54 monster kernel: Out of Memory: Killed process 18798 (diff).
Mar 14 17:58:31 monster kernel: Out of Memory: Killed process 19129 (tar).
Mar 14 18:53:02 monster kernel: Out of Memory: Killed process 21348 (diff).
Mar 14 19:22:52 monster kernel: Out of Memory: Killed process 23256 (tar).
Mar 14 21:01:25 monster kernel: Out of Memory: Killed process 27361 (diff).
Mar 14 21:02:01 monster kernel: Out of Memory: Killed process 27461 (diff).
Mar 14 21:48:57 monster kernel: Out of Memory: Killed process 30069 (tar).
Mar 14 22:36:17 monster kernel: Out of Memory: Killed process 32220 (tar).
Mar 14 23:15:29 monster kernel: Out of Memory: Killed process 1333 (tar).
Mar 14 23:52:04 monster kernel: Out of Memory: Killed process 3022 (diff).
Mar 22 11:49:28 monster kernel: Out of Memory: Killed process 504 (identd).
Mar 22 11:53:18 monster kernel: Out of Memory: Killed process 506 (identd).
Mar 22 11:53:18 monster kernel: Out of Memory: Killed process 507 (identd).
Mar 22 11:53:18 monster kernel: Out of Memory: Killed process 508 (identd).
Mar 22 11:53:19 monster kernel: Out of Memory: Killed process 21534 (bash).
Mar 22 11:53:19 monster kernel: Out of Memory: Killed process 21559 (bash).
Mar 22 14:52:31 monster kernel: Out of Memory: Killed process 490 (identd).
Mar 22 15:19:07 monster kernel: Out of Memory: Killed process 633 (xfs).
Mar 22 15:19:09 monster kernel: Out of Memory: Killed process 436 (rpc.statd).
Mar 22 15:19:13 monster kernel: Out of Memory: Killed process 423 (portmap).
Mar 22 15:45:48 monster kernel: Out of Memory: Killed process 543 (lpd).
Mar 22 15:45:54 monster kernel: Out of Memory: Killed process 504 (atd).
Mar 22 16:12:13 monster kernel: Out of Memory: Killed process 524 (sshd).
[dledford@aic-cvs dledford]$ 

What was that you were saying about "should *never* happen"?  Oh, and let's
not overlook the fact that it killed off mostly system daemons to start off
with while leaving the real culprits alone.  Once it did get around to the
real culprits (diff and tar), it wasn't even killing them because they were
overly large, it was killing them because it wasn't reclaiming space from the
buffer cache and page cache.  All of the programs running on this machine were
never more than roughly 256MB of program code, and this is a 1GB machine. 
This behavior is totally unacceptable and, as Alan put it, is a bug in the
code.  It should never trigger the oom killer with 750+MB of cache sitting
around, but it does.  If you want people to buy into the value of the oom
killer, you've at least got to get it to quit killing shit when it absolutely
doesn't need to.

To those people that would suggest I send in code I only have this to say. 
Fine, I'll send in a patch to fix this bug.  It will make the oom killer call
the cache reclaim functions and never kill anything.  That would at least fix
the bug you see above.

-- 

 Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
      Please check my web site for aic7xxx updates/answers before
                      e-mailing me about problems

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  7:48       ` Doug Ledford
@ 2001-03-24 10:21         ` Mike Galbraith
  2001-03-24 18:19           ` Doug Ledford
                             ` (3 more replies)
  2001-03-24 12:42         ` Jonathan Morton
  2001-03-25 14:10         ` Martin Dalecki
  2 siblings, 4 replies; 162+ messages in thread
From: Mike Galbraith @ 2001-03-24 10:21 UTC (permalink / raw)
  To: linux-kernel

On Sat, 24 Mar 2001, Doug Ledford wrote:

[snip list of naughty behavior]

> What was that you were saying about "should *never* happen"?  Oh, and let's
> not overlook the fact that it killed off mostly system daemons to start off
> with while leaving the real culprits alone.  Once it did get around to the
> real culprits (diff and tar), it wasn't even killing them because they were
> overly large, it was killing them because it wasn't reclaiming space from the
> buffer cache and page cache.  All of the programs running on this machine were
> never more than roughly 256MB of program code, and this is a 1GB machine.
> This behavior is totally unacceptable and, as Alan put it, is a bug in the
> code.  It should never trigger the oom killer with 750+MB of cache sitting
> around, but it does.  If you want people to buy into the value of the oom
> killer, you've at least got to get it to quit killing shit when it absolutely
> doesn't need to.
>
> To those people that would suggest I send in code I only have this to say.
> Fine, I'll send in a patch to fix this bug.  It will make the oom killer call
> the cache reclaim functions and never kill anything.  That would at least fix
> the bug you see above.

That won't fix the problem, but merely paper it over.  The problem is
in the balancing code that lets swap be exausted while at the same time
allowing cache to become obscenely obese in the first place.  I can't
trigger that behavior here, but it obviously exists for some workloads.

General thread comment:
To those who are griping, and obviously rightfully so, Rik has twice
stated on this list that he could use some help with VM auto-balancing.
The responses (visible on this list at least) was rather underwhelming.
I noted no public exchange of ideas.. nada in fact.

Get off your lazy butts and do something about it.  Don't work on the
oom-killer though.. that's only a symptom.  Work on the problem instead.

	-Mike  (who doesn't give a rats ass if he gets flamed;-)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 10:21         ` Mike Galbraith
@ 2001-03-24 18:19           ` Doug Ledford
  2001-03-24 22:47             ` Mike Galbraith
  2001-03-24 23:35             ` Jonathan Morton
  2001-03-24 20:04           ` Jonathan Morton
                             ` (2 subsequent siblings)
  3 siblings, 2 replies; 162+ messages in thread
From: Doug Ledford @ 2001-03-24 18:19 UTC (permalink / raw)
  To: linux-kernel

Mike Galbraith wrote:
> 
> On Sat, 24 Mar 2001, Doug Ledford wrote:
> > To those people that would suggest I send in code I only have this to say.
> > Fine, I'll send in a patch to fix this bug.  It will make the oom killer call
> > the cache reclaim functions and never kill anything.  That would at least fix
> > the bug you see above.
> 
> That won't fix the problem, but merely paper it over.  The problem is
> in the balancing code that lets swap be exausted while at the same time
> allowing cache to become obscenely obese in the first place.  I can't
> trigger that behavior here, but it obviously exists for some workloads.

I would be more than happy to fix the problem properly if I knew the first
thing about the vm subsystem, but I don't.

> General thread comment:
> To those who are griping, and obviously rightfully so, Rik has twice
> stated on this list that he could use some help with VM auto-balancing.
> The responses (visible on this list at least) was rather underwhelming.
> I noted no public exchange of ideas.. nada in fact.

While my post didn't give an exact formula, I was quite clear on the fact that
the system is allowing the caches to overrun memory and cause oom problems. 
I'm more than happy to test patches, and I would even be willing to suggest
some algorithms that might help, but I don't know where to stick them in the
code.  Most of the people who have been griping are in a similar position.

> Get off your lazy butts and do something about it.  Don't work on the
> oom-killer though.. that's only a symptom.  Work on the problem instead.
> 
>         -Mike  (who doesn't give a rats ass if he gets flamed;-)
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 

 Doug Ledford <dledford@redhat.com>  http://people.redhat.com/dledford
      Please check my web site for aic7xxx updates/answers before
                      e-mailing me about problems

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 18:19           ` Doug Ledford
@ 2001-03-24 22:47             ` Mike Galbraith
  2001-03-24 23:35             ` Jonathan Morton
  1 sibling, 0 replies; 162+ messages in thread
From: Mike Galbraith @ 2001-03-24 22:47 UTC (permalink / raw)
  To: linux-kernel

On Sat, 24 Mar 2001, Doug Ledford wrote:

> Mike Galbraith wrote:
> >
> > General thread comment:
> > To those who are griping, and obviously rightfully so, Rik has twice
> > stated on this list that he could use some help with VM auto-balancing.
> > The responses (visible on this list at least) was rather underwhelming.
> > I noted no public exchange of ideas.. nada in fact.
>
> While my post didn't give an exact formula, I was quite clear on the fact that
> the system is allowing the caches to overrun memory and cause oom problems.

Yes.  A testcase would be good.  It's not happening to everybody nor is
it happening under all loads.  (if it were, it'd be long dead)

> I'm more than happy to test patches, and I would even be willing to suggest
> some algorithms that might help, but I don't know where to stick them in the
> code.  Most of the people who have been griping are in a similar position.

First step toward killing the critter is to lure him onto open ground.
Once there.. well, I've seen some pretty fancy shooting on this list.

	-Mike


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 18:19           ` Doug Ledford
  2001-03-24 22:47             ` Mike Galbraith
@ 2001-03-24 23:35             ` Jonathan Morton
  2001-03-25 18:35               ` Jonathan Morton
  2001-03-25 19:07               ` Mike Galbraith
  1 sibling, 2 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24 23:35 UTC (permalink / raw)
  To: Mike Galbraith, linux-kernel

>> While my post didn't give an exact formula, I was quite clear on the
>>fact that
>> the system is allowing the caches to overrun memory and cause oom problems.
>
>Yes.  A testcase would be good.  It's not happening to everybody nor is
>it happening under all loads.  (if it were, it'd be long dead)
>
>> I'm more than happy to test patches, and I would even be willing to suggest
>> some algorithms that might help, but I don't know where to stick them in the
>> code.  Most of the people who have been griping are in a similar position.
>
>First step toward killing the critter is to lure him onto open ground.
>Once there.. well, I've seen some pretty fancy shooting on this list.

My patch already fixes OOM problems caused by overgrown caches/buffers, by
making sure OOM is not triggered until these buffers have been cannibalised
down to freepages.high.  If balancing problems still exist, then they
should be retuned with my patch (or something very like it) in hand, to
separate one problem from the other.  AFAIK, balancing should now be a
performance issue rather than a stability issue.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 23:35             ` Jonathan Morton
@ 2001-03-25 18:35               ` Jonathan Morton
  2001-03-26  4:40                 ` Horst von Brand
                                   ` (2 more replies)
  2001-03-25 19:07               ` Mike Galbraith
  1 sibling, 3 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-25 18:35 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

>> My patch already fixes OOM problems caused by overgrown caches/buffers, by
>> making sure OOM is not triggered until these buffers have been cannibalised
>> down to freepages.high.  If balancing problems still exist, then they
>> should be retuned with my patch (or something very like it) in hand, to
>> separate one problem from the other.  AFAIK, balancing should now be a
>> performance issue rather than a stability issue.
>
>Great.  I haven't seen your patch yet as my gateway ate it's very last
>disk.  I look forward to reading it.

I'm currently investigating the old non-overcommit patch, which (apart from
needing manual applying to recent kernels) appears to be rather broken in a
trivial way.  It prevents allocation if total reserved memory is greater
than the total unallocated memory.  Let me say that again, a different way
- it prevents memory usage from exceeding 50%...

Is there a fast way of getting total VM size?  Eg. equivalent to the
following code:

si_meminfo(&i);
si_swapinfo(&i);
free = i.totalram + i.totalswap;

If not, I have to do some jiggery to keep good performance along with true
non-overcommittance.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-25 18:35               ` Jonathan Morton
@ 2001-03-26  4:40                 ` Horst von Brand
  2001-03-26  8:36                 ` Mike Galbraith
  2001-03-26 10:01                 ` Jonathan Morton
  2 siblings, 0 replies; 162+ messages in thread
From: Horst von Brand @ 2001-03-26  4:40 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Mike Galbraith, linux-kernel

Jonathan Morton <chromi@cyberspace.org> said:
> I'm currently investigating the old non-overcommit patch, which (apart from
> needing manual applying to recent kernels) appears to be rather broken in a
> trivial way.  It prevents allocation if total reserved memory is greater
> than the total unallocated memory.  Let me say that again, a different way
> - it prevents memory usage from exceeding 50%...

Think fork(2).
-- 
Horst von Brand                             vonbrand@sleipnir.valparaiso.cl
Casilla 9G, Vin~a del Mar, Chile                               +56 32 672616

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-25 18:35               ` Jonathan Morton
  2001-03-26  4:40                 ` Horst von Brand
@ 2001-03-26  8:36                 ` Mike Galbraith
  2001-03-26 10:01                 ` Jonathan Morton
  2 siblings, 0 replies; 162+ messages in thread
From: Mike Galbraith @ 2001-03-26  8:36 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-kernel

On Sun, 25 Mar 2001, Jonathan Morton wrote:

> >> My patch already fixes OOM problems caused by overgrown caches/buffers, by
> >> making sure OOM is not triggered until these buffers have been cannibalised
> >> down to freepages.high.  If balancing problems still exist, then they
> >> should be retuned with my patch (or something very like it) in hand, to
> >> separate one problem from the other.  AFAIK, balancing should now be a
> >> performance issue rather than a stability issue.
> >
> >Great.  I haven't seen your patch yet as my gateway ate it's very last
> >disk.  I look forward to reading it.
>
> I'm currently investigating the old non-overcommit patch, which (apart from
> needing manual applying to recent kernels) appears to be rather broken in a
> trivial way.  It prevents allocation if total reserved memory is greater
> than the total unallocated memory.  Let me say that again, a different way
> - it prevents memory usage from exceeding 50%...
>
> Is there a fast way of getting total VM size?  Eg. equivalent to the
> following code:
>
> si_meminfo(&i);
> si_swapinfo(&i);
> free = i.totalram + i.totalswap;

Other than using their components?.. don't know.

> If not, I have to do some jiggery to keep good performance along with true
> non-overcommittance.

(thinking about mlock and what that could do to any saved state.. and
how long allocations can block and where.. egad.  then there's zones)

I'm no VM expert, but I wonder if the overhead of obtaining this info
will be the worst you have to deal with.

	-Mike


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-25 18:35               ` Jonathan Morton
  2001-03-26  4:40                 ` Horst von Brand
  2001-03-26  8:36                 ` Mike Galbraith
@ 2001-03-26 10:01                 ` Jonathan Morton
  2001-03-26 14:48                   ` Rik van Riel
  2 siblings, 1 reply; 162+ messages in thread
From: Jonathan Morton @ 2001-03-26 10:01 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Mike Galbraith, linux-kernel

>> I'm currently investigating the old non-overcommit patch, which (apart from
>> needing manual applying to recent kernels) appears to be rather broken in a
>> trivial way.  It prevents allocation if total reserved memory is greater
>> than the total unallocated memory.  Let me say that again, a different way
>> - it prevents memory usage from exceeding 50%...
>
>Think fork(2).

fork() is allowed to return a failure value, and it already does so if
there isn't enough memory (at least with the limited tests I've come up
with).  Guess again.

I have, however, found a bug in the non-overcommit patch - it seems to be
capable of double-freeing (and then some) - starting 4 Java VMs and then
closing them causes VMReserved to go negative on my system.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-26 10:01                 ` Jonathan Morton
@ 2001-03-26 14:48                   ` Rik van Riel
  0 siblings, 0 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-26 14:48 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Horst von Brand, Mike Galbraith, linux-kernel

On Mon, 26 Mar 2001, Jonathan Morton wrote:

> I have, however, found a bug in the non-overcommit patch - it seems to
> be capable of double-freeing (and then some) - starting 4 Java VMs and
> then closing them causes VMReserved to go negative on my system.

*grin*

It's nice to see the non-overcommit code being tested and
fixed like this. If there turns out to be a demand for this
patch, I guess we'll even want to integrate this into the
kernel ... possibly even the 2.4 kernel, if the code changes
are small/managable enough.

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 23:35             ` Jonathan Morton
  2001-03-25 18:35               ` Jonathan Morton
@ 2001-03-25 19:07               ` Mike Galbraith
  1 sibling, 0 replies; 162+ messages in thread
From: Mike Galbraith @ 2001-03-25 19:07 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-kernel

On Sat, 24 Mar 2001, Jonathan Morton wrote:

> >> While my post didn't give an exact formula, I was quite clear on the
> >>fact that
> >> the system is allowing the caches to overrun memory and cause oom problems.
> >
> >Yes.  A testcase would be good.  It's not happening to everybody nor is
> >it happening under all loads.  (if it were, it'd be long dead)
> >
> >> I'm more than happy to test patches, and I would even be willing to suggest
> >> some algorithms that might help, but I don't know where to stick them in the
> >> code.  Most of the people who have been griping are in a similar position.
> >
> >First step toward killing the critter is to lure him onto open ground.
> >Once there.. well, I've seen some pretty fancy shooting on this list.
>
> My patch already fixes OOM problems caused by overgrown caches/buffers, by
> making sure OOM is not triggered until these buffers have been cannibalised
> down to freepages.high.  If balancing problems still exist, then they
> should be retuned with my patch (or something very like it) in hand, to
> separate one problem from the other.  AFAIK, balancing should now be a
> performance issue rather than a stability issue.

Great.  I haven't seen your patch yet as my gateway ate it's very last
disk.  I look forward to reading it.

	-Mike


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 10:21         ` Mike Galbraith
  2001-03-24 18:19           ` Doug Ledford
@ 2001-03-24 20:04           ` Jonathan Morton
  2001-03-24 20:59           ` Jonathan Morton
  2001-03-25 14:13           ` Martin Dalecki
  3 siblings, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24 20:04 UTC (permalink / raw)
  To: Doug Ledford, linux-kernel

>While my post didn't give an exact formula, I was quite clear on the fact that
>the system is allowing the caches to overrun memory and cause oom problems.
>I'm more than happy to test patches, and I would even be willing to suggest
>some algorithms that might help, but I don't know where to stick them in the
>code.  Most of the people who have been griping are in a similar position.

Meanwhile, I'm looking *very* hard at the VM system and trying to figure
out how it works.  So far I've got an "improved" system under test which
requires a little stress to cause an OOM-before-malloc-failure.  Right now
I'm working on making the OOM happen only when it *really* needs to -
previously, as some pointed out, it could trigger far too early, for
example when there was lots of buffer and cache memory that could
potentially be cannibalised.

Right now my best approximation is to make the OOM test be as optimistic as
it is safe to be, and the vm_enough_memory() test as pessimistic as
sensible.  Expect a test patch to appear on this list soon.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 10:21         ` Mike Galbraith
  2001-03-24 18:19           ` Doug Ledford
  2001-03-24 20:04           ` Jonathan Morton
@ 2001-03-24 20:59           ` Jonathan Morton
  2001-03-24 22:11             ` Rik van Riel
                               ` (2 more replies)
  2001-03-25 14:13           ` Martin Dalecki
  3 siblings, 3 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24 20:59 UTC (permalink / raw)
  To: Doug Ledford, linux-kernel

>Right now my best approximation is to make the OOM test be as optimistic as
>it is safe to be, and the vm_enough_memory() test as pessimistic as
>sensible.  Expect a test patch to appear on this list soon.

...and here it is!

This fixes a number of small but linked problems:

- malloc() never returned 0 when the system ran out of memory, instead the OOM killer was triggered.  Now, malloc() will return 0 if the calling process is more than 4 times the size of the amount of free memory.  As a speedup, available swap space is not considered unless physical memory is not sufficient to contain the process.  Note that if overcommit_memory is switched on, malloc() will never return 0 anyway.

- OOM killer was triggered too early - now takes account of buffer and cache memory, which can be cannibalised before the system has completely run out.

- OOM killer badness() factors readjusted in favour of Oracle-like processes (consuming 10's of MB of RAM but up for 3 days or so and with a low-order UID?  Now less likely to be killed...)

--- begin oom-patch.diff ---
diff -u linux-2.4.1.orig/mm/mmap.c linux/mm/mmap.c
--- linux-2.4.1.orig/mm/mmap.c  Mon Jan 29 16:10:41 2001
+++ linux/mm/mmap.c     Sat Mar 24 19:29:51 2001
@@ -54,6 +54,7 @@
         */

        long free;
+       struct sysinfo swp_info;

         /* Sometimes we want to use more memory than we have. */
        if (sysctl_overcommit_memory)
@@ -62,8 +63,32 @@
        free = atomic_read(&buffermem_pages);
        free += atomic_read(&page_cache_size);
        free += nr_free_pages();
-       free += nr_swap_pages;
-       return free > pages;
+
+       /* Attempt to curtail memory allocations before hard OOM occurs.
+        * Based on current process size, which is hopefully a good and fast heuristic.
+        * Also fix bug where the real OOM limit of (free == freepages.min) is not taken into account.
+        * In fact, we use freepages.high as the threshold to make sure there's still room for buffers+cache.
+        *
+        * -- Jonathan "Chromatix" Morton, 24th March 2001
+        */
+
+       if(current && current->mm)
+         free -= (current->mm->total_vm / 4);
+
+       free -= freepages.high;
+
+       /* Since getting swap info is expensive, see if our allocation can happen in physical RAM */
+       if(free > pages)
+         return 1;
+
+       /* Use the number of FREE swap pages, not the total */
+       si_swapinfo(&swp_info);
+       free += swp_info.freeswap;
+
+       if(free > pages)
+         return 1;
+
+       return 0;
 }

 /* Remove one vm structure from the inode's i_mapping address space. */
Only in linux/mm/: mmap.c~
diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
--- linux-2.4.1.orig/mm/oom_kill.c      Tue Nov 14 18:56:46 2000
+++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
@@ -76,7 +76,9 @@
        run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);

        points /= int_sqrt(cpu_time);
-       points /= int_sqrt(int_sqrt(run_time));
+
+       /* Long-running processes are *very* important, so don't take the 4th root */
+       points /= run_time;

        /*
         * Niced processes are most likely less important, so double
@@ -93,6 +95,10 @@
                                p->uid == 0 || p->euid == 0)
                points /= 4;

+       /* Much the same goes for processes with low UIDs */
+       if(p->uid < 100 || p->euid < 100)
+         points /= 2;
+
        /*
         * We don't want to kill a process with direct hardware access.
         * Not only could that mess up the hardware, but usually users
@@ -192,12 +198,20 @@
 int out_of_memory(void)
 {
        struct sysinfo swp_info;
+       long free;

        /* Enough free memory?  Not OOM. */
-       if (nr_free_pages() > freepages.min)
+       free = nr_free_pages();
+       if (free > freepages.min)
+               return 0;
+
+       if (free + nr_inactive_clean_pages() > freepages.low)
                return 0;

-       if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
+       /* Buffers and caches can be freed up (Jonathan "Chromatix" Morton) */
+       free += atomic_read(&buffermem_pages);
+       free += atomic_read(&page_cache_size);
+       if (free > freepages.low)
                return 0;

        /* Enough swap space left?  Not OOM. */
Only in linux/mm/: oom_kill.c~
--- end oom-patch.diff ---

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 20:59           ` Jonathan Morton
@ 2001-03-24 22:11             ` Rik van Riel
  2001-03-24 23:36             ` Jonathan Morton
  2001-03-25 14:30             ` Martin Dalecki
  2 siblings, 0 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-24 22:11 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Doug Ledford, linux-kernel

On Sat, 24 Mar 2001, Jonathan Morton wrote:

>         free = atomic_read(&buffermem_pages);
>         free += atomic_read(&page_cache_size);
>         free += nr_free_pages();
> -       free += nr_swap_pages;

> +       /* Since getting swap info is expensive, see if our allocation can happen in physical RAM */

Actually, getting swap info is as cheap as reading the variable
nr_swap_pages. I should fix this in the OOM killer ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 20:59           ` Jonathan Morton
  2001-03-24 22:11             ` Rik van Riel
@ 2001-03-24 23:36             ` Jonathan Morton
  2001-03-25 14:30             ` Martin Dalecki
  2 siblings, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24 23:36 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Doug Ledford, linux-kernel

>>         free = atomic_read(&buffermem_pages);
>>         free += atomic_read(&page_cache_size);
>>         free += nr_free_pages();
>> -       free += nr_swap_pages;
>
>> +       /* Since getting swap info is expensive, see if our allocation
>>can happen in physical RAM */
>
>Actually, getting swap info is as cheap as reading the variable
>nr_swap_pages. I should fix this in the OOM killer ;)

Just fixed that for myself (in both places) and about to test.  I'm almost
sure I actually encountered an error related to this, but I'll retest and
make sure...

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 20:59           ` Jonathan Morton
  2001-03-24 22:11             ` Rik van Riel
  2001-03-24 23:36             ` Jonathan Morton
@ 2001-03-25 14:30             ` Martin Dalecki
  2 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-25 14:30 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Doug Ledford, linux-kernel

Jonathan Morton wrote:
> 
> >Right now my best approximation is to make the OOM test be as optimistic as
> >it is safe to be, and the vm_enough_memory() test as pessimistic as
> >sensible.  Expect a test patch to appear on this list soon.
> 
> ...and here it is!
> 
> This fixes a number of small but linked problems:
> 
> - malloc() never returned 0 when the system ran out of memory, instead the OOM killer was triggered.  Now, malloc() will return 0 if the calling process is more than 4 times the size of the amount of free memory.  As a speedup, available swap space is not considered unless physical memory is not sufficient to contain the process.  Note that if overcommit_memory is switched on, malloc() will never return 0 anyway.
> 
> - OOM killer was triggered too early - now takes account of buffer and cache memory, which can be cannibalised before the system has completely run out.
> 
> - OOM killer badness() factors readjusted in favour of Oracle-like processes (consuming 10's of MB of RAM but up for 3 days or so and with a low-order UID?  Now less likely to be killed...)
> 
> --- begin oom-patch.diff ---
> diff -u linux-2.4.1.orig/mm/mmap.c linux/mm/mmap.c
> --- linux-2.4.1.orig/mm/mmap.c  Mon Jan 29 16:10:41 2001
> +++ linux/mm/mmap.c     Sat Mar 24 19:29:51 2001
> @@ -54,6 +54,7 @@
>          */
> 
>         long free;
> +       struct sysinfo swp_info;
> 
>          /* Sometimes we want to use more memory than we have. */
>         if (sysctl_overcommit_memory)
> @@ -62,8 +63,32 @@
>         free = atomic_read(&buffermem_pages);
>         free += atomic_read(&page_cache_size);
>         free += nr_free_pages();
> -       free += nr_swap_pages;
> -       return free > pages;
> +
> +       /* Attempt to curtail memory allocations before hard OOM occurs.
> +        * Based on current process size, which is hopefully a good and fast heuristic.
> +        * Also fix bug where the real OOM limit of (free == freepages.min) is not taken into account.
> +        * In fact, we use freepages.high as the threshold to make sure there's still room for buffers+cache.
> +        *
> +        * -- Jonathan "Chromatix" Morton, 24th March 2001
> +        */
> +
> +       if(current && current->mm)
> +         free -= (current->mm->total_vm / 4);
> +
> +       free -= freepages.high;
> +
> +       /* Since getting swap info is expensive, see if our allocation can happen in physical RAM */
> +       if(free > pages)
> +         return 1;
> +
> +       /* Use the number of FREE swap pages, not the total */
> +       si_swapinfo(&swp_info);
> +       free += swp_info.freeswap;
> +
> +       if(free > pages)
> +         return 1;
> +
> +       return 0;
>  }
> 
>  /* Remove one vm structure from the inode's i_mapping address space. */
> Only in linux/mm/: mmap.c~
> diff -u linux-2.4.1.orig/mm/oom_kill.c linux/mm/oom_kill.c
> --- linux-2.4.1.orig/mm/oom_kill.c      Tue Nov 14 18:56:46 2000
> +++ linux/mm/oom_kill.c Sat Mar 24 20:35:20 2001
> @@ -76,7 +76,9 @@
>         run_time = (jiffies - p->start_time) >> (SHIFT_HZ + 10);
> 
>         points /= int_sqrt(cpu_time);
> -       points /= int_sqrt(int_sqrt(run_time));
> +
> +       /* Long-running processes are *very* important, so don't take the 4th root */
> +       points /= run_time;
> 
>         /*
>          * Niced processes are most likely less important, so double
> @@ -93,6 +95,10 @@
>                                 p->uid == 0 || p->euid == 0)
>                 points /= 4;
> 
> +       /* Much the same goes for processes with low UIDs */
> +       if(p->uid < 100 || p->euid < 100)
> +         points /= 2;
> +
>         /*
>          * We don't want to kill a process with direct hardware access.
>          * Not only could that mess up the hardware, but usually users
> @@ -192,12 +198,20 @@
>  int out_of_memory(void)
>  {
>         struct sysinfo swp_info;
> +       long free;
> 
>         /* Enough free memory?  Not OOM. */
> -       if (nr_free_pages() > freepages.min)
> +       free = nr_free_pages();
> +       if (free > freepages.min)
> +               return 0;
> +
> +       if (free + nr_inactive_clean_pages() > freepages.low)
>                 return 0;
> 
> -       if (nr_free_pages() + nr_inactive_clean_pages() > freepages.low)
> +       /* Buffers and caches can be freed up (Jonathan "Chromatix" Morton) */
> +       free += atomic_read(&buffermem_pages);
> +       free += atomic_read(&page_cache_size);
> +       if (free > freepages.low)
>                 return 0;

Ahh this will make the oom killer robust against misbalanced
MM. I will assimiliate this idea.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 10:21         ` Mike Galbraith
                             ` (2 preceding siblings ...)
  2001-03-24 20:59           ` Jonathan Morton
@ 2001-03-25 14:13           ` Martin Dalecki
  3 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-25 14:13 UTC (permalink / raw)
  To: Mike Galbraith; +Cc: linux-kernel

Mike Galbraith wrote:
> 
> On Sat, 24 Mar 2001, Doug Ledford wrote:
> 
> [snip list of naughty behavior]
> 
> > What was that you were saying about "should *never* happen"?  Oh, and let's
> Get off your lazy butts and do something about it.  Don't work on the
> oom-killer though.. that's only a symptom.  Work on the problem instead.

You are absolutely right about the fact that there are serious
memmory balancing problems out there as well. But ther oom_killer.c
needs to be changed as well - becouse in it's current state it's
buggy as hell as well. You propably know that you earn stability
in SW systems by making them survive the borderline conditions...

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  7:48       ` Doug Ledford
  2001-03-24 10:21         ` Mike Galbraith
@ 2001-03-24 12:42         ` Jonathan Morton
  2001-03-24 15:06           ` Mike Galbraith
  2001-03-25 14:10         ` Martin Dalecki
  2 siblings, 1 reply; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24 12:42 UTC (permalink / raw)
  To: Mike Galbraith, linux-kernel

>General thread comment:
>To those who are griping, and obviously rightfully so, Rik has twice
>stated on this list that he could use some help with VM auto-balancing.
>The responses (visible on this list at least) was rather underwhelming.
>I noted no public exchange of ideas.. nada in fact.
>
>Get off your lazy butts and do something about it.  Don't work on the
>oom-killer though.. that's only a symptom.  Work on the problem instead.

Since I'm hacking around in this area anyway (warning: kernel newbie
alert!), I'd be happy to help examine the balancing code from a fresh
perspective.  Where should I be looking?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 12:42         ` Jonathan Morton
@ 2001-03-24 15:06           ` Mike Galbraith
  0 siblings, 0 replies; 162+ messages in thread
From: Mike Galbraith @ 2001-03-24 15:06 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-kernel

On Sat, 24 Mar 2001, Jonathan Morton wrote:

> >General thread comment:
> >To those who are griping, and obviously rightfully so, Rik has twice
> >stated on this list that he could use some help with VM auto-balancing.
> >The responses (visible on this list at least) was rather underwhelming.
> >I noted no public exchange of ideas.. nada in fact.
> >
> >Get off your lazy butts and do something about it.  Don't work on the
> >oom-killer though.. that's only a symptom.  Work on the problem instead.
>
> Since I'm hacking around in this area anyway (warning: kernel newbie
> alert!), I'd be happy to help examine the balancing code from a fresh
> perspective.  Where should I be looking?

Everything in mm plus fs/buffer.c at least. (plus includes)  A good
place to start is with __alloc_pages().. that will drag you through
a lot of the balancing code.  Following entry points (sys_brk, sys_mmap
etc) is highly recommended.  Be prepared for dizzy spells if you've
never toured mm-land before :)

	-Mike


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  7:48       ` Doug Ledford
  2001-03-24 10:21         ` Mike Galbraith
  2001-03-24 12:42         ` Jonathan Morton
@ 2001-03-25 14:10         ` Martin Dalecki
  2 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-25 14:10 UTC (permalink / raw)
  To: Doug Ledford; +Cc: Horst von Brand, Christian Bodmer, linux-kernel

Doug Ledford wrote:
> 
> Horst von Brand wrote:
> >
> > "Christian Bodmer" <cbinsec01@freesurf.ch> said:
> >
> > > I can't say I understand the whole MM system, however the random killing
> > > of processes seems like a rather unfortunate solution to the problem. If
> > > someone has a spare minute, maybe they could explain to me why running
> > > out of free memory in kswapd results in a deadlock situation.
> >
> > OOM is not "normal operations", it is a machine under very extreme stress,
> > and should *never* happen. To complicate (or even worse, slow down or
> > otherwise use up resources like memory) normal operations for "better
> > handling of OOM" is total nonsense.
> 
> Puh-Leeze.   Let's inject some reality into this conversation:
> 
> [dledford@aic-cvs dledford]$ more kill-list
> Mar 10 22:02:34 monster kernel: Out of Memory: Killed process 475 (identd).
> Mar 10 22:03:25 monster kernel: Out of Memory: Killed process 660 (xfs).
...
> Mar 22 15:45:54 monster kernel: Out of Memory: Killed process 504 (atd).
> Mar 22 16:12:13 monster kernel: Out of Memory: Killed process 524 (sshd).
> [dledford@aic-cvs dledford]$
> 
> What was that you were saying about "should *never* happen"?  Oh, and let's
> not overlook the fact that it killed off mostly system daemons to start off
> with while leaving the real culprits alone.  Once it did get around to the
> real culprits (diff and tar), it wasn't even killing them because they were
> overly large, it was killing them because it wasn't reclaiming space from the
> buffer cache and page cache.  All of the programs running on this machine were
> never more than roughly 256MB of program code, and this is a 1GB machine.

This is due to the fact that Riks killer doesn't normalize the
resource units it's using for measure. Basically the current
penatly calculations are a good random number generator.

> This behavior is totally unacceptable and, as Alan put it, is a bug in the
> code.  It should never trigger the oom killer with 750+MB of cache sitting
> around, but it does.  If you want people to buy into the value of the oom
> killer, you've at least got to get it to quit killing shit when it absolutely
> doesn't need to.
> 
> To those people that would suggest I send in code I only have this to say.
> Fine, I'll send in a patch to fix this bug.  It will make the oom killer call
> the cache reclaim functions and never kill anything.  That would at least fix
> the bug you see above.

Please just apply it to the patch I have recently send... It will help
more :-).

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-22 23:35 Mikael Pettersson
  2001-03-22 23:43 ` Alan Cox
  0 siblings, 1 reply; 162+ messages in thread
From: Mikael Pettersson @ 2001-03-22 23:35 UTC (permalink / raw)
  To: alan; +Cc: linux-kernel

On Thu, 22 Mar 2001 21:23:54 +0000 (GMT), Alan Cox wrote:

>> Really the whole oom_kill process seems bass-ackwards to me.  I can't in my mind
>> logically justify annihilating large-VM processes that have been running for 
>> days or weeks instead of just returning ENOMEM to a process that just started 
>> up.
>
>How do you return an out of memory error to a C program that is out of memory
>due to a stack growth fault. There is actually not a language construct for it

SIGSEGV.
Stack overflow for a language like C using standard implementation techniques
is the same as a page fault while accessing a page for which there is no backing
store. SIGSEGV is the logical choice, and the one I'd expect on other Unices.

oom_kill should simply fail the current allocation which cannot be satisfied,
either by having {s,}brk/mmap return error or by posting a SIGSEGV. This would
actually also be the correct answer, if Linux didn't overcommit memory ...

Remove the overcommit crap and oom_kill can go away; this entails ensuring
that mmap() honors MAP_RESERVE/MAP_NORESERVE.

/Mikael

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:35 Mikael Pettersson
@ 2001-03-22 23:43 ` Alan Cox
  2001-03-27  7:58   ` Helge Hafting
  0 siblings, 1 reply; 162+ messages in thread
From: Alan Cox @ 2001-03-22 23:43 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: alan, linux-kernel

> >How do you return an out of memory error to a C program that is out of memory
> >due to a stack growth fault. There is actually not a language construct for it
> SIGSEGV.
> Stack overflow for a language like C using standard implementation techniques
> is the same as a page fault while accessing a page for which there is no backing
> store. SIGSEGV is the logical choice, and the one I'd expect on other Unices.

Guess again. You are expanding the stack because you have no room left on it.
You take a fault. You want to report a SIGSEGV. Now where are you
going to put the stack frame ?

SIGSEGV in combination with a preallocated alternate stack maybe, but then you
still need to recover. C++ you can maybe do it with exception handling but
C doesnt really have the structure and longjmp just doesnt cut it.

Alan

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-22 23:43 ` Alan Cox
@ 2001-03-27  7:58   ` Helge Hafting
  0 siblings, 0 replies; 162+ messages in thread
From: Helge Hafting @ 2001-03-27  7:58 UTC (permalink / raw)
  To: Alan Cox, linux-kernel

Alan Cox wrote:
> 
> > >How do you return an out of memory error to a C program that is out of memory
> > >due to a stack growth fault. There is actually not a language construct for it
> > SIGSEGV.
> > Stack overflow for a language like C using standard implementation techniques
> > is the same as a page fault while accessing a page for which there is no backing
> > store. SIGSEGV is the logical choice, and the one I'd expect on other Unices.
> 
> Guess again. You are expanding the stack because you have no room left on it.
> You take a fault. You want to report a SIGSEGV. Now where are you
> going to put the stack frame ?
> 
> SIGSEGV in combination with a preallocated alternate stack maybe, but then you
> still need to recover. C++ you can maybe do it with exception handling but
> C doesnt really have the structure and longjmp just doesnt cut it.

Seems to me a guard page would do the trick.  Make the last page of the
stack
non-overcommitable and marked not present.  Maybe non-swappable too in
case
nothing else can be swapped out for some reason.
(Yes, that wastes a page per process)
Whenever we hit the guard page, try expanding the stack.  
If it works - fine.  If not - make the guard page present _and_ deliver
the SIGSEGV using this last page of stack.  No complicated alternate
stack construct, just report OOM one page in advance.

OOM is still possible if the program don't handle SIGSEGV well.
But a smart program now have the option of doing emergency deallocations
and/or dump its precious intermediate results to file.

Helge Hafting

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-23  0:09 Mikael Pettersson
  2001-03-23  0:27 ` Andrew Morton
  2001-03-23 16:24 ` Horst von Brand
  0 siblings, 2 replies; 162+ messages in thread
From: Mikael Pettersson @ 2001-03-23  0:09 UTC (permalink / raw)
  To: alan; +Cc: linux-kernel

On Thu, 22 Mar 2001 23:43:57 +0000 (GMT), Alan Cox wrote:

> > >How do you return an out of memory error to a C program that is out of memory
> > >due to a stack growth fault. There is actually not a language construct for it
> > SIGSEGV.
> > Stack overflow for a language like C using standard implementation techniques
> > is the same as a page fault while accessing a page for which there is no backing
> > store. SIGSEGV is the logical choice, and the one I'd expect on other Unices.
> 
> Guess again. You are expanding the stack because you have no room left on it.
> You take a fault. You want to report a SIGSEGV. Now where are you
> going to put the stack frame ?
> 
> SIGSEGV in combination with a preallocated alternate stack maybe

Oh I know 99% of the processes getting this will die. The behaviour I'd
expect from vanilla code in this particular case (stack overflow) is:
- page fault in stack "segment"
- no backing store available
- post SIGSEGV to current
  * push sighandler frame on current stack (or altstack, if registered) [+]
  * no room? SIG_DFL, i.e kill

My point is that with overcommit removed, there's no question as to
which process is actually out of memory. No need for the kernel to guess;
since it doesn't guess, it cannot guess wrong.

Concerning the stack: sure, oom makes it problematic to report the
error in a useful way. So use sigaltstack() and SA_ONSTACK. [+]
Processes that don't do this get killed, but not because oom_kill
did some fancy guesswork.

[+] Speaking as a hacker on a runtime system for a concurrent
programming language (Erlang), I consider the current Unix/POSIX/Linux
default of having the kernel throw up[*] at the user's current stack
pointer to be unbelievably broken. sigaltstack() and SA_ONSTACK should
not be options but required behaviour.

[*] Signal & trap frames used to be called "stack puke" in old 68k days.

/Mikael

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  0:09 Mikael Pettersson
@ 2001-03-23  0:27 ` Andrew Morton
  2001-03-23 12:29   ` Mikael Pettersson
  2001-03-23 16:24 ` Horst von Brand
  1 sibling, 1 reply; 162+ messages in thread
From: Andrew Morton @ 2001-03-23  0:27 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-kernel

Mikael Pettersson wrote:
> 
> [+] Speaking as a hacker on a runtime system for a concurrent
> programming language (Erlang), I consider the current Unix/POSIX/Linux
> default of having the kernel throw up[*] at the user's current stack
> pointer to be unbelievably broken. sigaltstack() and SA_ONSTACK should
> not be options but required behaviour.
> 

Why?  What problem does stack puke cause?

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  0:27 ` Andrew Morton
@ 2001-03-23 12:29   ` Mikael Pettersson
  0 siblings, 0 replies; 162+ messages in thread
From: Mikael Pettersson @ 2001-03-23 12:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Andrew Morton writes:
 > Mikael Pettersson wrote:
 > > 
 > > [+] Speaking as a hacker on a runtime system for a concurrent
 > > programming language (Erlang), I consider the current Unix/POSIX/Linux
 > > default of having the kernel throw up[*] at the user's current stack
 > > pointer to be unbelievably broken. sigaltstack() and SA_ONSTACK should
 > > not be options but required behaviour.
 > > 
 > 
 > Why?  What problem does stack puke cause?

It makes user-space stack management difficult or more costly.
You either have to over-estimate the size of each coroutine's [*]
stack, or you have to run with all signals blocked, or you have
to give up on using the machine's native stack.

The first leads to memory wastage (we're talking thousands of coroutines
here, each usually having a quite small stack), the second causes overheads
when resuming or suspending a coroutine (sigprocmask), and the third
loses performance badly on x86 (you lose one g.p. register to point to
your simulated stack, and you lose return-stack branch prediction since
you can't use call/ret instructions any more).

I currently work around this on Linux/x86 by overriding sigaction() et al
to always assert SA_ONSTACK. Unfortunately, this hack doesn't work on
all Unices we'd like to support. (I override sigaction since I also
need to trap signal setup calls from libraries linked with our code.)

[*] I use the term "coroutine" here to avoid the connotations associated
with term like "thread" and "process".

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23  0:09 Mikael Pettersson
  2001-03-23  0:27 ` Andrew Morton
@ 2001-03-23 16:24 ` Horst von Brand
  2001-03-23 16:49   ` Guest section DW
  1 sibling, 1 reply; 162+ messages in thread
From: Horst von Brand @ 2001-03-23 16:24 UTC (permalink / raw)
  To: Mikael Pettersson; +Cc: linux-kernel

Mikael Pettersson <mikpe@csd.uu.se> said:
> Oh I know 99% of the processes getting this will die. The behaviour I'd
> expect from vanilla code in this particular case (stack overflow) is:
> - page fault in stack "segment"
> - no backing store available
> - post SIGSEGV to current
>   * push sighandler frame on current stack (or altstack, if registered) [+]
>   * no room? SIG_DFL, i.e kill

I.e., kill innocent processes which try to increase their memory usage just
at the wrong moment. This is exactly what happened before the OOM-killer...

> My point is that with overcommit removed, there's no question as to
> which process is actually out of memory. No need for the kernel to guess;
> since it doesn't guess, it cannot guess wrong.

Just too bad there is no complete accounting for memory usage in the kernel
right now (a lot of complex data structures in kernel do consume varying
amounts of memory, not always in the name of a specific process; much of
the extra flexibility in later kernels is exactly because some structures
aren't static anymore). Say good-bye to modules, you could as well have
everything under the sun built in (as the memory for each possible module
will have to be assumed in use, just in case).

> Concerning the stack: sure, oom makes it problematic to report the
> error in a useful way. So use sigaltstack() and SA_ONSTACK. [+]
> Processes that don't do this get killed, but not because oom_kill
> did some fancy guesswork.

They just get killed for requesting memory at the wrong moment, which is a
lot worse.

> [+] Speaking as a hacker on a runtime system for a concurrent
> programming language (Erlang), I consider the current Unix/POSIX/Linux
> default of having the kernel throw up[*] at the user's current stack
> pointer to be unbelievably broken. sigaltstack() and SA_ONSTACK should
> not be options but required behaviour.
> 
> [*] Signal & trap frames used to be called "stack puke" in old 68k days.

Can we please remember that OOM is *not* in any way a "normal system state"
that has to be handled in a civilized, orderly way? This is just an escape
route in case everything else has failed.
-- 
Dr. Horst H. von Brand                       mailto:vonbrand@inf.utfsm.cl
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 16:24 ` Horst von Brand
@ 2001-03-23 16:49   ` Guest section DW
  2001-03-23 17:04     ` Alan Cox
  0 siblings, 1 reply; 162+ messages in thread
From: Guest section DW @ 2001-03-23 16:49 UTC (permalink / raw)
  To: Horst von Brand, Mikael Pettersson; +Cc: linux-kernel

On Fri, Mar 23, 2001 at 12:24:03PM -0400, Horst von Brand wrote:

> Can we please remember that OOM is *not* in any way a "normal system state"
> that has to be handled in a civilized, orderly way? This is just an escape
> route in case everything else has failed.

Can we please remember that a Blue Screen Of Death is *not* in any way a
"normal system state" that has to be handled in a civilized, orderly way?
This is just an escape route in case everything else has failed.

Linux is unreliable.
That is bad.



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 16:49   ` Guest section DW
@ 2001-03-23 17:04     ` Alan Cox
  0 siblings, 0 replies; 162+ messages in thread
From: Alan Cox @ 2001-03-23 17:04 UTC (permalink / raw)
  To: Guest section DW; +Cc: Horst von Brand, Mikael Pettersson, linux-kernel

> This is just an escape route in case everything else has failed.
> 
> Linux is unreliable.
> That is bad.

Since your definition of reliability is a mathematical abstraction requiring
infinite storage why don't you start by inventing infinitely large SDRAM
chips, then get back to us ?


^ permalink raw reply	[flat|nested] 162+ messages in thread

* RE: [PATCH] Prevent OOM from killing init
@ 2001-03-23  9:28 Heusden, Folkert van
  0 siblings, 0 replies; 162+ messages in thread
From: Heusden, Folkert van @ 2001-03-23  9:28 UTC (permalink / raw)
  To: Rik van Riel, Tom Kondilis; +Cc: linux-mm, linux-kernel

> That's not the OOM killer however, but init dying because it
> couldn't get the memory it needed to satisfy a page fault or
> somesuch...

Ehrm, I would like to re-state that it still would be nice if
some mechanism got introduced which enables one to set certain
processes to "cannot be killed".
For example: I would hate it it the UPS monitoring daemon got
killed for obvious reasons :o)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-23 18:29 Andries.Brouwer
  2001-03-23 18:38 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 162+ messages in thread
From: Andries.Brouwer @ 2001-03-23 18:29 UTC (permalink / raw)
  To: alan, linux-kernel

On Fri, Mar 23, 2001 at 05:04:07PM +0000, Alan Cox wrote:
> > This is just an escape route in case everything else has failed.
> >
> > Linux is unreliable.
> > That is bad.
>
> Since your definition of reliability is a mathematical abstraction requiring
> infinite storage why don't you start by inventing infinitely large SDRAM
> chips, then get back to us ?

Ah, Alan,
I can see that you dislike seeing me say bad things about Linux.
I dislike having to say them.

On the other hand, my definition of reliability does not require
infinite storage. After all, earlier Unix flavours did not need
an OOM killer either, and my editor was not killed under Unix V6
on 64k when I started some other process.

Linux is unreliable because a program can be killed at random,
without warning, because of bugs in some other program.
The old Unix guarantee that a program only crashes because of
its own behaviour is lost. That is very sad.

What can one do? I need not tell you - you know better than I do.
The main point is letting malloc fail when the memory cannot be
guaranteed. There are various solutions for stack space, none of
them very elegant, but all have in common that when we run out of
stack space the program doing that gets SIGSEGV, and not some
random other program. (And a well-written program could catch this
SIGSEGV and do cleanup, preserving the integrity of its data base.
Clearly one would want to guarantee a certain minimum stack space
at fork time.)

Will this setup be very inefficient? I don't know. Perhaps.
If my programs actually use 10 MB but have a guarantee for
200 MB then the rest of that memory is not wasted. But it can
only be used for things that can be freed when needed, like
inode and buffer cache.

But inefficient or not, I much prefer a system with guarantees,
something that is reliable by default, above something that
works well if you are lucky and fails at unpredictable moments.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 18:29 Andries.Brouwer
@ 2001-03-23 18:38 ` Alan Cox
  2001-03-24  0:46   ` Tim Wright
  2001-03-24 16:48   ` Jesse Pollard
  2001-03-23 18:43 ` nick
  2001-03-23 21:14 ` Jonathan Morton
  2 siblings, 2 replies; 162+ messages in thread
From: Alan Cox @ 2001-03-23 18:38 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: alan, linux-kernel

> infinite storage. After all, earlier Unix flavours did not need
> an OOM killer either, and my editor was not killed under Unix V6
> on 64k when I started some other process.

You were lucky. Its quite possible for V6 to kill processes when you run out
of swap

> The old Unix guarantee that a program only crashes because of
> its own behaviour is lost. That is very sad.

No such guarantee ever existed. There are systems that had stuff like per
user memory quotas but those were mostly much more mainframe oriented

> 200 MB then the rest of that memory is not wasted. But it can
> only be used for things that can be freed when needed, like
> inode and buffer cache.

No. You cannot free the inode and buffer cache arbitarily. You only have a
probability - that puts you back at square 1.

> But inefficient or not, I much prefer a system with guarantees,
> something that is reliable by default, above something that
> works well if you are lucky and fails at unpredictable moments.

malloc is merely an accounting exercise (actually its mostly mmap 
accounting). ptrace is the only quirk. Nobody feels its very important because
nobody has implemented it.

Alan

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 18:38 ` Alan Cox
@ 2001-03-24  0:46   ` Tim Wright
  2001-03-24 16:48   ` Jesse Pollard
  1 sibling, 0 replies; 162+ messages in thread
From: Tim Wright @ 2001-03-24  0:46 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andries.Brouwer, linux-kernel

On Fri, Mar 23, 2001 at 06:38:37PM +0000, Alan Cox wrote:
> > infinite storage. After all, earlier Unix flavours did not need
> > an OOM killer either, and my editor was not killed under Unix V6
> > on 64k when I started some other process.
> 
> You were lucky. Its quite possible for V6 to kill processes when you run out
> of swap
> 

It was actually worse than that. Grab your copy of "Lions", and check lines
4375-4377 in function xswap(). A failure to allocate space in the swapmap
caused a panic. Same problem in xalloc().

Tim

-- 
Tim Wright - timw@splhi.com or timw@aracnet.com or twright@us.ibm.com
IBM Linux Technology Center, Beaverton, Oregon
Interested in Linux scalability ? Look at http://lse.sourceforge.net/
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 18:38 ` Alan Cox
  2001-03-24  0:46   ` Tim Wright
@ 2001-03-24 16:48   ` Jesse Pollard
  2001-03-25 16:12     ` Szabolcs Szakacsits
  2001-03-25 16:39     ` Jonathan Morton
  1 sibling, 2 replies; 162+ messages in thread
From: Jesse Pollard @ 2001-03-24 16:48 UTC (permalink / raw)
  To: Alan Cox, Andries.Brouwer; +Cc: alan, linux-kernel

On Fri, 23 Mar 2001, Alan Cox wrote:
>> infinite storage. After all, earlier Unix flavours did not need
>> an OOM killer either, and my editor was not killed under Unix V6
>> on 64k when I started some other process.
>
>You were lucky. Its quite possible for V6 to kill processes when you run out
>of swap

Not lucky. I've used V6 - It would not start a process if the resources
werent available (no overcommit). It was also a swap based system and
not a page based system (PDP-11/45 1123+... supported both, but UNIX
only used swapping because it was easy to swap a 64Kbyte process).

>> The old Unix guarantee that a program only crashes because of
>> its own behaviour is lost. That is very sad.
>
>No such guarantee ever existed. There are systems that had stuff like per
>user memory quotas but those were mostly much more mainframe oriented

Only the swapping based systems gave this guarantee. Even AT&T System V
release 2 was swap based (M68020 systems).

>> 200 MB then the rest of that memory is not wasted. But it can
>> only be used for things that can be freed when needed, like
>> inode and buffer cache.
>
>No. You cannot free the inode and buffer cache arbitarily. You only have a
>probability - that puts you back at square 1.
>
>> But inefficient or not, I much prefer a system with guarantees,
>> something that is reliable by default, above something that
>> works well if you are lucky and fails at unpredictable moments.
>
>malloc is merely an accounting exercise (actually its mostly mmap 
>accounting). ptrace is the only quirk. Nobody feels its very important because
>nobody has implemented it.

Small correction - It was implemented, just not included in the standard
kernel.

Check mailing lists around March-April of 2000. The patch was generated
by Eduardo Horvath <eeh@turbolinux.com> for  2.3.99-pre3 and allowed the
administrator to:

"Available virtual memory is calculated as the sum of all swap space as
 well as free and reclaimable RAM, essentially the same value as used
 before.  The kernel will now operate in 4 different modes depending on the
 value of sysctl_overcommit_memory:

 1       Do accounting but do not prevent any allocations (old behavior)

 0       Do accounting but only prevent individual allocations that exceed
                total VM (old behavior)

 -1      Do accounting and prevent a user from making the amount of
                reserved memory exceed the total virtual memory.

 -2      Same as above but also for root.

 The default is set to -1 to allow root to essentially do whatever it
 wants.  But then if someone's broken root you're in trouble anyway.

 If the kernel itself requires memory it can allocate as much as it wants
 and can bring the system into an unsafe state (reserved > total).

 Memory segments that are not COW, ZFOD or otherwise swap backed do not
 require reservation."

It was a limited implementation, but worked quite well in testing.

-- 
-------------------------------------------------------------------------
Jesse I Pollard, II
Email: jesse@cats-chateau.net

Any opinions expressed are solely my own.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 16:48   ` Jesse Pollard
@ 2001-03-25 16:12     ` Szabolcs Szakacsits
  2001-03-25 16:39     ` Jonathan Morton
  1 sibling, 0 replies; 162+ messages in thread
From: Szabolcs Szakacsits @ 2001-03-25 16:12 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: Alan Cox, Andries.Brouwer, linux-kernel

On Sat, 24 Mar 2001, Jesse Pollard wrote:
> On Fri, 23 Mar 2001, Alan Cox wrote:
[ .... about non-overcommit .... ]
> > Nobody feels its very important because nobody has implemented it.

Enterprises use other systems because they have much better resource
management than Linux -- adding non-overcommit wouldn't help them much.
Desktop users, Linux newbies don't understand what's
eager/early/non-overcommit vs lazy/late/overcommit memory management
[just see these threads here if you aren't bored already enough ;)] and
even if they do at last they don't have the ability to implement it. And
between them, people are mostly fine with ulimit.

> Small correction - It was implemented, just not included in the standard
> kernel.

Please note, adding optional non-overcommit also wouldn't help much
without guaranteed/reserved resources [e.g. you are OOM -> appps, users
complain, admin login in and BANG OOM killer just killed one of the
jobs]. This was one of the reasons I made the reserved root memory
patch [this is also the way other OS'es do]. Now just the different
patches should be merged and write an OOM FAQ for users how to avoid,
control, etc it].

	Szaka

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 16:48   ` Jesse Pollard
  2001-03-25 16:12     ` Szabolcs Szakacsits
@ 2001-03-25 16:39     ` Jonathan Morton
  1 sibling, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-25 16:39 UTC (permalink / raw)
  To: Szabolcs Szakacsits, Jesse Pollard
  Cc: Alan Cox, Andries.Brouwer, linux-kernel

>[ .... about non-overcommit .... ]
>> > Nobody feels its very important because nobody has implemented it.
>
>Enterprises use other systems because they have much better resource
>management than Linux -- adding non-overcommit wouldn't help them much.
>Desktop users, Linux newbies don't understand what's
>eager/early/non-overcommit vs lazy/late/overcommit memory management
>[just see these threads here if you aren't bored already enough ;)] and
>even if they do at last they don't have the ability to implement it. And
>between them, people are mostly fine with ulimit.
>
>> Small correction - It was implemented, just not included in the standard
>> kernel.
>
>Please note, adding optional non-overcommit also wouldn't help much
>without guaranteed/reserved resources [e.g. you are OOM -> appps, users
>complain, admin login in and BANG OOM killer just killed one of the
>jobs]. This was one of the reasons I made the reserved root memory
>patch [this is also the way other OS'es do]. Now just the different
>patches should be merged and write an OOM FAQ for users how to avoid,
>control, etc it].

I'm currently trying to apply the 2.3.99.whatever non-overcommit patch to
2.4.1 - decidedly nontrivial, lots of failed hunks, parts of the kernel
have changed significantly even in this (fairly short) time.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 18:29 Andries.Brouwer
  2001-03-23 18:38 ` Alan Cox
@ 2001-03-23 18:43 ` nick
  2001-03-23 19:01   ` Martin Dalecki
  2001-03-23 21:14 ` Jonathan Morton
  2 siblings, 1 reply; 162+ messages in thread
From: nick @ 2001-03-23 18:43 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: alan, linux-kernel

Please point me to an Operating System that runs on any commonly available
platform and fits your requirements.
	Nick

On Fri, 23 Mar 2001 Andries.Brouwer@cwi.nl wrote:

> On Fri, Mar 23, 2001 at 05:04:07PM +0000, Alan Cox wrote:
> > > This is just an escape route in case everything else has failed.
> > >
> > > Linux is unreliable.
> > > That is bad.
> >
> > Since your definition of reliability is a mathematical abstraction requiring
> > infinite storage why don't you start by inventing infinitely large SDRAM
> > chips, then get back to us ?
> 
> Ah, Alan,
> I can see that you dislike seeing me say bad things about Linux.
> I dislike having to say them.
> 
> On the other hand, my definition of reliability does not require
> infinite storage. After all, earlier Unix flavours did not need
> an OOM killer either, and my editor was not killed under Unix V6
> on 64k when I started some other process.
> 
> Linux is unreliable because a program can be killed at random,
> without warning, because of bugs in some other program.
> The old Unix guarantee that a program only crashes because of
> its own behaviour is lost. That is very sad.
> 
> What can one do? I need not tell you - you know better than I do.
> The main point is letting malloc fail when the memory cannot be
> guaranteed. There are various solutions for stack space, none of
> them very elegant, but all have in common that when we run out of
> stack space the program doing that gets SIGSEGV, and not some
> random other program. (And a well-written program could catch this
> SIGSEGV and do cleanup, preserving the integrity of its data base.
> Clearly one would want to guarantee a certain minimum stack space
> at fork time.)
> 
> Will this setup be very inefficient? I don't know. Perhaps.
> If my programs actually use 10 MB but have a guarantee for
> 200 MB then the rest of that memory is not wasted. But it can
> only be used for things that can be freed when needed, like
> inode and buffer cache.
> 
> But inefficient or not, I much prefer a system with guarantees,
> something that is reliable by default, above something that
> works well if you are lucky and fails at unpredictable moments.
> 
> Andries
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 18:43 ` nick
@ 2001-03-23 19:01   ` Martin Dalecki
  2001-03-23 19:23     ` nick
  2001-03-23 22:12     ` Alan Cox
  0 siblings, 2 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-23 19:01 UTC (permalink / raw)
  To: nick; +Cc: Andries.Brouwer, alan, linux-kernel

nick@snowman.net wrote:
> 
> Please point me to an Operating System that runs on any commonly available
> platform and fits your requirements.
>         Nick

You don't beleve me if I tell you: DOS extender and JVM (Java Virtual
Machine)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 19:01   ` Martin Dalecki
@ 2001-03-23 19:23     ` nick
  2001-03-23 22:12     ` Alan Cox
  1 sibling, 0 replies; 162+ messages in thread
From: nick @ 2001-03-23 19:23 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: Andries.Brouwer, alan, linux-kernel

The only thing out of that I don't belive is that it's a useable Operating
System.  I like your solution though.  Thanks for actually comeing up with
a useable solution instead of mindlessly ranting.
	Nick

On Fri, 23 Mar 2001, Martin Dalecki wrote:

> nick@snowman.net wrote:
> > 
> > Please point me to an Operating System that runs on any commonly available
> > platform and fits your requirements.
> >         Nick
> 
> You don't beleve me if I tell you: DOS extender and JVM (Java Virtual
> Machine)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 19:01   ` Martin Dalecki
  2001-03-23 19:23     ` nick
@ 2001-03-23 22:12     ` Alan Cox
  2001-03-23 23:23       ` Stephen E. Clark
  1 sibling, 1 reply; 162+ messages in thread
From: Alan Cox @ 2001-03-23 22:12 UTC (permalink / raw)
  To: Martin Dalecki; +Cc: nick, Andries.Brouwer, alan, linux-kernel

> You don't beleve me if I tell you: DOS extender and JVM (Java Virtual
> Machine)

The JVM doesnt actually. The JVM will itself spontaenously explode in real
life when out of memory. Maybe the JVM on a DOS extender 8)


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 22:12     ` Alan Cox
@ 2001-03-23 23:23       ` Stephen E. Clark
  2001-03-24 10:40         ` Gérard Roudier
  0 siblings, 1 reply; 162+ messages in thread
From: Stephen E. Clark @ 2001-03-23 23:23 UTC (permalink / raw)
  To: Alan Cox; +Cc: Martin Dalecki, nick, Andries.Brouwer, linux-kernel

Alan Cox wrote:
> 
> > You don't beleve me if I tell you: DOS extender and JVM (Java Virtual
> > Machine)
> 
> The JVM doesnt actually. The JVM will itself spontaenously explode in real
> life when out of memory. Maybe the JVM on a DOS extender 8)
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Back in the early nineties I was working with 18 developers on a Data
General Aviion running DGUX. The system had only 16mb of memory and
600mb of disk. We were all continuously going thru the edit, compile,
debug steps developing as large Computer Aided Dispatch System. Never
did this system with its limited resources crash, or randomly start
killing user or system processes.

My $.02.
Steve

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 23:23       ` Stephen E. Clark
@ 2001-03-24 10:40         ` Gérard Roudier
  0 siblings, 0 replies; 162+ messages in thread
From: Gérard Roudier @ 2001-03-24 10:40 UTC (permalink / raw)
  To: Stephen E. Clark
  Cc: Alan Cox, Martin Dalecki, nick, Andries.Brouwer, linux-kernel

On Fri, 23 Mar 2001, Stephen E. Clark wrote:

> Alan Cox wrote:
> > 
> > > You don't beleve me if I tell you: DOS extender and JVM (Java Virtual
> > > Machine)
> > 
> > The JVM doesnt actually. The JVM will itself spontaenously explode in real
> > life when out of memory. Maybe the JVM on a DOS extender 8)
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> Back in the early nineties I was working with 18 developers on a Data
> General Aviion running DGUX. The system had only 16mb of memory and
> 600mb of disk. We were all continuously going thru the edit, compile,
> debug steps developing as large Computer Aided Dispatch System. Never
> did this system with its limited resources crash, or randomly start
> killing user or system processes.

What about the following (it is an estimate):

early nineties  -->  early eighties
18 developers   -->  18 developers
16mb of memory  -->   1 mb of memory
600 mb of disk  -->  70 mb of disk

Most current applications are so huge BLOATAGE that they should not 
deserve to be run just once. :-)
The kernel must try to cope with that and also with its own BLOATAGE.

Human nature is to eat what can be eaten, regardless if it is useful or
not.

> My $.02.

What about 'My M$.02' in some decades. :)

Btw, 'decade' comes from Latin 'deca'=10 and dies=days (not sure for
dies). As a result, it should have meant a period of 10 days instead of 10
years. It means a period of 10 days in French.

May-be, a knowledgeable person at this list has an explanation for this
misinterpretation. Could it be due to the word 'decadent' that has a 
very different ethymology.

10 days is too short for getting decadent, but 10 years should be enough,
no ? :-)

> Steve

  Gérard.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 18:29 Andries.Brouwer
  2001-03-23 18:38 ` Alan Cox
  2001-03-23 18:43 ` nick
@ 2001-03-23 21:14 ` Jonathan Morton
  2001-03-25 14:56   ` Marco Colombo
  2 siblings, 1 reply; 162+ messages in thread
From: Jonathan Morton @ 2001-03-23 21:14 UTC (permalink / raw)
  To: Andries.Brouwer, alan, linux-kernel

>The main point is letting malloc fail when the memory cannot be
>guaranteed.

If I read various things correctly, malloc() is supposed to fail as you
would expect if /proc/sys/vm/overcommit_memory is 0.  This is the case on
my RH 6.2 box, dunno about yours.  I can write a simple test program which
simply allocates tons of memory if you like...

...and I did.  It filled up my physical and swap memory, and got killed by
the OOM handler before malloc() failed, even though overcommit_memory was
set to 0.

*****BAD!*****

Here's my test program and output (on a Duron with 256M physical and 250M
swap):

[chromi@beryllium compsci]$ cat make_mem.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  /* Allocate tons of RAM, print out how far, we get, and exit when we
malloc() fails.
   * We also access each page we allocate, to ensure we really are getting
the memory we reserve.
   * If we are killed by SIGSEGV or by OOM instead of malloc() failing, the
VM system is broken.
   */

  char *p;
  unsigned long pages = 0;

  while(1) {
    p = malloc(1024);
    if(!p)
      break;
    *p = 1;
    pages++;
    printf("%lu K\r", pages);
  }

  printf("\n*** malloc() failed!\n");

  return 0;
}
[chromi@beryllium compsci]$ gcc -O -Wall -o make_mem make_mem.c
[chromi@beryllium compsci]$ ./make_mem
493625 KKilled
[chromi@beryllium compsci]$

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 21:14 ` Jonathan Morton
@ 2001-03-25 14:56   ` Marco Colombo
  0 siblings, 0 replies; 162+ messages in thread
From: Marco Colombo @ 2001-03-25 14:56 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: linux-kernel

On Fri, 23 Mar 2001, Jonathan Morton wrote:

> >The main point is letting malloc fail when the memory cannot be
> >guaranteed.
> 
> If I read various things correctly, malloc() is supposed to fail as you
> would expect if /proc/sys/vm/overcommit_memory is 0.  This is the case on
> my RH 6.2 box, dunno about yours.  I can write a simple test program which
> simply allocates tons of memory if you like...
> 
> ...and I did.  It filled up my physical and swap memory, and got killed by
> the OOM handler before malloc() failed, even though overcommit_memory was
> set to 0.
> 
> *****BAD!*****

Please search list archives, there are plenty of threads about
overcommitment.

Have a look at the sources, that part is easy to read and you'll
realize that /proc/sys/vm/overcommit_memory does not really enable
/ disable memory overcommitment: its closer to a sanity check to
disallow absurdly sized requests, IIRC.

.TM.
-- 
      ____/  ____/   /
     /      /       /			Marco Colombo
    ___/  ___  /   /		      Technical Manager
   /          /   /			 ESI s.r.l.
 _____/ _____/  _/		       Colombo@ESI.it


^ permalink raw reply	[flat|nested] 162+ messages in thread

* RE: [PATCH] Prevent OOM from killing init
@ 2001-03-23 19:33 Stephen Satchell
  0 siblings, 0 replies; 162+ messages in thread
From: Stephen Satchell @ 2001-03-23 19:33 UTC (permalink / raw)
  To: linux-kernel

At 10:28 AM 3/23/01 +0100, you wrote:
>Ehrm, I would like to re-state that it still would be nice if
>some mechanism got introduced which enables one to set certain
>processes to "cannot be killed".
>For example: I would hate it it the UPS monitoring daemon got
>killed for obvious reasons :o)

Hey, my new flame-proof suit arrived today, so let me give it a try-out...

1)  If you have a daemon that absolutely positively has to be there, why 
not put the damn thing in "inittab" with the RESPAWN attribute?  OOM kills 
it, init notices it, init respawns it, you have your UPS monitoring daemon 
back.

2)  Why is task #1 (init) considered at all by the OOM task-killer 
code?  Sounds like a possible off-by-one bug to me.

3)  If random task-killing is such a problem, one solution is to add yet 
another word to the process table entry, something on the order of 
"oom_importance".  Off the top of my head, this 16-bit value would be 
0x4000 for "normal" processes, and would be the value at start-up.  A value 
of 0xFFFF would be the "never-kill" value, while the value of 0x0000 would 
be the equivalent of the guy who ALWAYS gives up his airplane seat.  The 
process could set this value between 0x0000 and 0xBFFF for processes 
running without root privs, the full range for root processes.  The big 
advantage here is that a daemon or major system can set the value to zero 
during start-up (to ensure being killed if there aren't enough system 
resources) and then boost the immunity once it is going strong.  I can see 
this being of particular value in windows desktops where an attempt to 
start a widget causes an out-of memory condition and THAT WIDGET is the one 
that then dies.  That would be the expected behavior.

 From a debug perspective, it means that the programmer can avoid killing 
something on his development system "by accident" by attracting all the 
task-killing lightning during initial debug.  This would be a sure-fire 
improvement over accidentally killing your debugger, for example.

I call it "nice for memory".

Satch

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-23 23:15 Andries.Brouwer
  2001-03-23 23:17 ` Martin Dalecki
                   ` (2 more replies)
  0 siblings, 3 replies; 162+ messages in thread
From: Andries.Brouwer @ 2001-03-23 23:15 UTC (permalink / raw)
  To: Andries.Brouwer, alan; +Cc: linux-kernel

[to various people]

No, ulimit does not work. (But it helps a little.)
No, /proc/sys/vm/overcommit_memory does not work.

[to Alan]

> Nobody feels its very important because nobody has implemented it.

Yes, that is the right response.
What can one say? One can only do.

Andries


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 23:15 Andries.Brouwer
@ 2001-03-23 23:17 ` Martin Dalecki
  2001-03-24  0:13 ` Jonathan Morton
  2001-03-24  1:59 ` Paul Jakma
  2 siblings, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-23 23:17 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: alan, linux-kernel

Andries.Brouwer@cwi.nl wrote:
> 
> [to various people]
> 
> No, ulimit does not work. (But it helps a little.)
> No, /proc/sys/vm/overcommit_memory does not work.
> 
> [to Alan]
> 
> > Nobody feels its very important because nobody has implemented it.
> 
> Yes, that is the right response.
> What can one say? One can only do.

Please just expect a patch for tomorrow ;-).

The only thing I have currently to do is testing.
I will be using the installation process of the ORACLE iAS 9i for
linux on my notebook, becouse it used to trigger oom for me VERY
frequently. So far all things BEHAVE...

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 23:15 Andries.Brouwer
  2001-03-23 23:17 ` Martin Dalecki
@ 2001-03-24  0:13 ` Jonathan Morton
  2001-03-24  6:58   ` Rik van Riel
                     ` (2 more replies)
  2001-03-24  1:59 ` Paul Jakma
  2 siblings, 3 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24  0:13 UTC (permalink / raw)
  To: Andries.Brouwer, alan; +Cc: linux-kernel

>[to various people]
>
>No, ulimit does not work. (But it helps a little.)
>No, /proc/sys/vm/overcommit_memory does not work.

Entirely correct.  ulimit certainly makes it much harder for a single
runaway process to take down important parts of the system - now why
doesn't $(MAJOR_DISTRO_VENDOR) set it up by default?  NetBSD does.  It's
not an infallible solution by any means, but it sure does help.

I just asked a friend to run my test program on his NetBSD box - it ran
into ulimit and malloc() returned 0.  Setting ulimit on my RH 6.2 box -
which defaults to unlimited - also caused it to fail gracefully.

>[to Alan]
>
>> Nobody feels its very important because nobody has implemented it.
>
>Yes, that is the right response.
>What can one say? One can only do.

Ah, but what does one do?  Badger major distro vendors to set ulimit
properly by default?  Improve the OOM-killer so it gives less "badness" to
low-UID processes?  Implement an early-failure mechanism for malloc(), so
hard OOM is not hit except by an extremely determined process (or set of
processes)?

Personally, I think all of the above.  Your views may differ.

Hmm...  "if ( freemem < (size_of_mallocing_process / 20) ) fail_to_allocate;"

Seems like a reasonable soft limit - processes which have already got lots
of RAM can probably stand not to have that little bit more and can be
curbed more quickly.  Processes with less probably don't deserve to die and
furthermore are less likely to be engineered to handle malloc() failure, so
failure only occurs closer to the mark.  In this scenario OOM killing
(which is, after all, a last resort) should trigger rarely and simple
malloc() failure (which userspace apps can cope with more easily) is an
early-warning and prevention system.

Comments?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  0:13 ` Jonathan Morton
@ 2001-03-24  6:58   ` Rik van Riel
  2001-03-24 12:38   ` Jonathan Morton
  2001-03-24 13:12   ` Jonathan Morton
  2 siblings, 0 replies; 162+ messages in thread
From: Rik van Riel @ 2001-03-24  6:58 UTC (permalink / raw)
  To: Jonathan Morton; +Cc: Andries.Brouwer, alan, linux-kernel

On Sat, 24 Mar 2001, Jonathan Morton wrote:

> Hmm...  "if ( freemem < (size_of_mallocing_process / 20) ) fail_to_allocate;"
> 
> Seems like a reasonable soft limit - processes which have already got
> lots of RAM can probably stand not to have that little bit more and
> can be curbed more quickly.

This looks like it could nicely in preventing a single process
from getting out of hand and gobbling up all memory.

It won't prevent the system from a mongolian horde of processes,
but nobody should expect your one-liner to fix world piece ;)

I like it, now lets test it ;)

regards,

Rik
--
Virtual memory is like a game you can't win;
However, without VM there's truly nothing to lose...

		http://www.surriel.com/
http://www.conectiva.com/	http://distro.conectiva.com.br/


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  0:13 ` Jonathan Morton
  2001-03-24  6:58   ` Rik van Riel
@ 2001-03-24 12:38   ` Jonathan Morton
  2001-03-24 13:12   ` Jonathan Morton
  2 siblings, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24 12:38 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andries.Brouwer, alan, linux-kernel

At 6:58 am +0000 24/3/2001, Rik van Riel wrote:
>On Sat, 24 Mar 2001, Jonathan Morton wrote:
>
>> Hmm...  "if ( freemem < (size_of_mallocing_process / 20) )
>>fail_to_allocate;"
>>
>> Seems like a reasonable soft limit - processes which have already got
>> lots of RAM can probably stand not to have that little bit more and
>> can be curbed more quickly.
>
>This looks like it could nicely in preventing a single process
>from getting out of hand and gobbling up all memory.
>
>It won't prevent the system from a mongolian horde of processes,
>but nobody should expect your one-liner to fix world piece ;)
>
>I like it, now lets test it ;)

I thought of some things which could break it, which I want to try and deal
with before releasing a patch.  Specifically, I want to make freepages.min
sacrosanct, so that malloc() *never* tries to use it.  This should be
fairly easy to implement - simply subtract freepages.min from the freemem
part.  An even nicer way would be to subtract freepages.low (or some
similar value) instead of freepages.min for non-root or non-privileged
processes.

BTW, is the 'current' pointer always valid when vm_enough_memory() is
called?  If so, I can remove one redundant check.

My NetBSD friend appears to have found code in the BSD kernel which sets up
ulimit values sensibly by default - eg. it's not handled by the boot
scripts.  Presumably a root process is capable of changing the limits, but
I'm guessing that sensible defaults in the kernel have to be a Good Thing.
Comments?

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24  0:13 ` Jonathan Morton
  2001-03-24  6:58   ` Rik van Riel
  2001-03-24 12:38   ` Jonathan Morton
@ 2001-03-24 13:12   ` Jonathan Morton
  2 siblings, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24 13:12 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andries.Brouwer, alan, linux-kernel

>I thought of some things which could break it, which I want to try and deal
>with before releasing a patch.  Specifically, I want to make freepages.min
>sacrosanct, so that malloc() *never* tries to use it.  This should be
>fairly easy to implement - simply subtract freepages.min from the freemem
>part.  An even nicer way would be to subtract freepages.low (or some
>similar value) instead of freepages.min for non-root or non-privileged
>processes.

Hmm, interesting.  Even with my modification - which means that
vm_enough_memory() will always return false if the allocation would clobber
freepages.min - I can still trigger OOM quite easily.  Even with no swap on
my box, there's a lot of disk activity, probably due to there being
virtually no disk cache left - could the generation of disk buffer and
cache pages be bypassing vm_enough_memory()?  If so, would using
freepages.low as the threshold rather than freepages.min help at all? (or
have I got everything muddled...)

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-23 23:15 Andries.Brouwer
  2001-03-23 23:17 ` Martin Dalecki
  2001-03-24  0:13 ` Jonathan Morton
@ 2001-03-24  1:59 ` Paul Jakma
  2 siblings, 0 replies; 162+ messages in thread
From: Paul Jakma @ 2001-03-24  1:59 UTC (permalink / raw)
  To: Andries.Brouwer; +Cc: Linux Kernel

On Sat, 24 Mar 2001 Andries.Brouwer@cwi.nl wrote:

> No, ulimit does not work. (But it helps a little.)

no, not perfect, i very much agree. but in daily usage it reduces
chance of OOM to close to 0.

> No, /proc/sys/vm/overcommit_memory does not work.

that's because it disables the very rough resource checking that
linux has. it makes OOM even easier to achieve:

mm/mmap.c::vm_enough_memory():

	/* Sometimes we want to use more memory than we have. */
        if (sysctl_overcommit_memory)
            return 1;

it doesn't make linux go into a 'non-overcommit' mode, cause linux
does not have the accounting to cover it...

solution according to more knowledgable folks than i, sysadmin, is
better accounting so that vm_enough_memory can be more accurate
rather than developing an all-seeing oom_killer().

> Andries

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org
PGP5 key: http://www.clubi.ie/jakma/publickey.txt
-------------------------------------------
Fortune:
"We are on the verge: Today our program proved Fermat's next-to-last theorem."
		-- Epigrams in Programming, ACM SIGPLAN Sept. 1982

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-24  1:11 Andries.Brouwer
  0 siblings, 0 replies; 162+ messages in thread
From: Andries.Brouwer @ 2001-03-24  1:11 UTC (permalink / raw)
  To: timw; +Cc: alan, linux-kernel


> It was actually worse than that. Grab your copy of "Lions", and check lines
> 4375-4377 in function xswap(). A failure to allocate space in the swapmap
> caused a panic. Same problem in xalloc().

[no Lions nearby; somewhere I still have the printout but am
too lazy to search; I also have the tape but nothing to read it with]

yes, you may well be right if you say that my picture
of the distant past is too rosy - maybe I forgot all
this trouble
still - yesterday I lost three edit sessions -
I do not recall any such occurrence in the 25 years before


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-24  1:38 Jonathan Morton
  0 siblings, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-24  1:38 UTC (permalink / raw)
  To: linux-kernel

>Hmm...  "if ( freemem < (size_of_mallocing_process / 20) ) fail_to_allocate;"
>
>Seems like a reasonable soft limit - processes which have already got lots
>of RAM can probably stand not to have that little bit more and can be
>curbed more quickly.  Processes with less probably don't deserve to die and
>furthermore are less likely to be engineered to handle malloc() failure, so
>failure only occurs closer to the mark.  In this scenario OOM killing
>(which is, after all, a last resort) should trigger rarely and simple
>malloc() failure (which userspace apps can cope with more easily) is an
>early-warning and prevention system.

Following up my own post with some action, I hacked 2.4.1's
mm/mmap.c::vm_enough_pages() to include something similar to the above
algorithm.  In fact, it triggers malloc() failure when 1/16th of
current->mm->total_vm would be greater than the sum of the free space and
the potentially-allocated area.

My very quick tests show that my test program (the rogue allocator) now in
fact does encounter a failed malloc() at approx. 475M, instead of being
killed by the OOM handler at approx. 490M.  This is pretty much the desired
behaviour.

If someone would like me to post a patch and have it tested, I'd be happy
to do so.

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-24  2:30 Andreas Franck
  0 siblings, 0 replies; 162+ messages in thread
From: Andreas Franck @ 2001-03-24  2:30 UTC (permalink / raw)
  To: linux-kernel

Hi together,

seems like a hot discussion going on, but I couldn't resist and would like to
throw in my $0.02.

Besides misunderstandings and general displeasure, some very interesting
facts have shown up in the discussion (oh, yeah), which I'd like to know more
about, and just extend them with a bit of my latest experience regarding
memory usage.

First one is about buffer/inode cache. What I expect as a medium-skilled
system hacker would be: Before giving up with an OOM-whatever,

a) all non-dirty buffers should be freed, possibly giving tons of memory
b) all dirty buffers should be flushed and freed, alas

I'm not sure if both is tried ATM, but I think enough experts are here to
answer my questions :)

What I saw lately was some general system sluggishness after copying very big
files (ripping a CD image to disk) - it seems the system has paged out most
of its processes (including the calling bash shell) in favor of the copying
task, just for buffers! Up to which degree is this reasonable? It seems to
slow down the system when using swap, so for this task I better had
deactivated it. Not what one "intuitively" expects.

So, what is the second important point? The current system cannot properly
distinguish between memory an application "really" needs and memory an
application "eventually" needs (as internal caches, ...).

A possible solution could be the implementation of something like SIGDANGER,
which would be sent to an application in case of memory overload, so
it should try to free a bit memory if it can. Surely applications would have
to be modified to use that information. How about the C library, does it
maintain any big buffers, for I/O or so? I don't know, changes there could
surely be passed on transparently. Ok, ok, it's the MacOS way of thinking, so
the other possibility. This problems are intimately related to memory
overcommitting, or not doing so, so what might be fatal in overcommitting?

One problem arises if an application gets a huge part of overcommitted memory
and then tries to use it, which spontaneously fails - just because the memory
was committed somewhere else, to the 999 other apps which are already
 running.

The flaw there is that at some time, you can guarantee that the overcommit
would fail, if the memory was really used. At this point, the application
could be halted (so that it does not get the chance to make use of the
overcommit promise), until some more memory is available again - either by
paging, or by waiting for other jobs to terminate. This could lead to
starvation, but it potentially could let the system survive.

A further idea would be to use overcommitted memory only for buffers and
caches, this was already mentioned before. In any situation "near" an OOM,
further memory pressure should be avoided - for example, by letting malloc()
fail. This might also hurt existing processes, so some heuristics could
decide - a malloc() from a freshly started process should fail regardlessly
of its size, while older processes might get some more tolerance, because the
system might trust their behaviour a bit more.

So far from me, this was just a collection of some more or less unrelated
thoughts, which I'd like to know a bit more about, or hear from experts why
all of this is b*llshit (or: already done(TM)!)

Greetings,
Andreas

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-24 10:18 Andries.Brouwer
  0 siblings, 0 replies; 162+ messages in thread
From: Andries.Brouwer @ 2001-03-24 10:18 UTC (permalink / raw)
  To: Andries.Brouwer, paul; +Cc: linux-kernel

    From paul@jakma.org Sat Mar 24 03:00:17 2001

    > No, ulimit does not work. (But it helps a little.)

    no, not perfect, i very much agree. but in daily usage it reduces
    chance of OOM to close to 0.

No. How would you use it? Compute individual limits for
each process? One typically has a few very large processes
that may easily take most of memory, and lots of small processes.
With a low ulimit these large processes do not run.
With a large ulimit it does not help against OOM.
The job of accounting what is available belongs to the system,
not the user.

Note that ulimit does not limit the sum of your processes,
it limits each individual process.

Andries

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
@ 2001-03-24 23:41 Benoit Garnier
  2001-03-25  5:45 ` Stephen Satchell
  2001-03-25 14:32 ` Martin Dalecki
  0 siblings, 2 replies; 162+ messages in thread
From: Benoit Garnier @ 2001-03-24 23:41 UTC (permalink / raw)
  To: linux-kernel

Szabolcs Szakacsits wrote :

> But if you start
> to think you get the conclusion that process killing can't be avoided if
> you want the system keep running.

What's the point in keeping the OS running if the applications are silently
killed?

If your box is running for example a mail server, and it appears that
another process is juste eating the free memory, do you really want to kill
the mail server, just because it's the main process and consuming more
memory and CPU than others?

Well, fine, your OS is up, but your application is not here anymore.

I just think there's no general solution, users must have the chance to
choose processes not to be killed, or malloc() returning errors.

----
Benoît GARNIER

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 23:41 Benoit Garnier
@ 2001-03-25  5:45 ` Stephen Satchell
  2001-03-25  6:58   ` Stephen Clouse
  2001-03-25 14:37   ` Martin Dalecki
  2001-03-25 14:32 ` Martin Dalecki
  1 sibling, 2 replies; 162+ messages in thread
From: Stephen Satchell @ 2001-03-25  5:45 UTC (permalink / raw)
  To: linux-kernel

At 12:41 AM 3/25/01 +0100, you wrote:
>If your box is running for example a mail server, and it appears that
>another process is juste eating the free memory, do you really want to kill
>the mail server, just because it's the main process and consuming more
>memory and CPU than others?
>
>Well, fine, your OS is up, but your application is not here anymore.

If you have a mission-critical application running on your box, add it to 
the inittab file with the RESPAWN attribute.  That way, OOM killer kills 
it, init notices it, and init restarts your server.

By the way, are the people working on the OOM-killer also working to avoid 
killing task 1?

Satch


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-25  5:45 ` Stephen Satchell
@ 2001-03-25  6:58   ` Stephen Clouse
  2001-03-25 14:37   ` Martin Dalecki
  1 sibling, 0 replies; 162+ messages in thread
From: Stephen Clouse @ 2001-03-25  6:58 UTC (permalink / raw)
  To: Stephen Satchell; +Cc: linux-kernel

[-- Attachment #1: msg.pgp --]
[-- Type: text/plain, Size: 1869 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, Mar 24, 2001 at 09:45:01PM -0800, Stephen Satchell wrote:
> If you have a mission-critical application running on your box, add it to 
> the inittab file with the RESPAWN attribute.  That way, OOM killer kills 
> it, init notices it, and init restarts your server.

Ah, that's great for simple daemons.  Now tell me how to help an app like this 
(Oracle exampled here):

oracle      89  0.0  0.4 41076 1776 ?        S    Mar22   0:00 ora_pmon_slash
oracle      91  0.0  0.6 40676 2620 ?        S    Mar22   0:00 ora_dbw0_slash
oracle      93  0.0  0.4 40544 1788 ?        S    Mar22   0:00 ora_lgwr_slash
oracle      95  0.0  0.4 40544 1744 ?        S    Mar22   0:00 ora_ckpt_slash
oracle      97  0.0  1.1 40556 4404 ?        S    Mar22   0:00 ora_smon_slash
oracle      99  0.0  0.5 40536 2188 ?        S    Mar22   0:00 ora_reco_slash
oracle     101  0.0  0.4 40656 1756 ?        S    Mar22   0:00 ora_arc0_slash

In this example, when oom_kill reaps one of these autonomous threads, Oracle 
opts to crash and burn.  Database corruption is almost guaranteed.

In all reality, I'm sure any daemon (threads or no) that works heavily with disk
files is likely to screw itself and its data if it gets sigkilled for no
reason.  And in our environment, there is no reason for it to get sigkilled.

I'm going to severely hurt the first person that says such a program should be
*expecting* random untrappable annihilation of its threads.  (And what happens
when the master process *is* the target?)

- -- 
Stephen Clouse <stephenc@theiqgroup.com>
Senior Programmer, IQ Coordinator Project Lead
The IQ Group, Inc. <http://www.theiqgroup.com/>

-----BEGIN PGP SIGNATURE-----
Version: PGP 6.5.8

iQA/AwUBOr2XDgOGqGs0PadnEQK0rACfQELDid11+m90bS/DrGyrsHW45ZEAn19G
mL3fSCdi2TeHDxGLA8uXT8l5
=oQPV
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-25  5:45 ` Stephen Satchell
  2001-03-25  6:58   ` Stephen Clouse
@ 2001-03-25 14:37   ` Martin Dalecki
  1 sibling, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-25 14:37 UTC (permalink / raw)
  To: Stephen Satchell; +Cc: linux-kernel

Stephen Satchell wrote:
> 
> At 12:41 AM 3/25/01 +0100, you wrote:
> >If your box is running for example a mail server, and it appears that
> >another process is juste eating the free memory, do you really want to kill
> >the mail server, just because it's the main process and consuming more
> >memory and CPU than others?
> >
> >Well, fine, your OS is up, but your application is not here anymore.
> 
> If you have a mission-critical application running on your box, add it to
> the inittab file with the RESPAWN attribute.  That way, OOM killer kills
> it, init notices it, and init restarts your server.

That makes me actually rolling on by back... Just try to add oracle to
inittab
crash it and watch it grabefully restarting by repawn!!!!!!!!!

> By the way, are the people working on the OOM-killer also working to avoid
> killing task 1?

Already done.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] Prevent OOM from killing init
  2001-03-24 23:41 Benoit Garnier
  2001-03-25  5:45 ` Stephen Satchell
@ 2001-03-25 14:32 ` Martin Dalecki
  1 sibling, 0 replies; 162+ messages in thread
From: Martin Dalecki @ 2001-03-25 14:32 UTC (permalink / raw)
  To: Benoit Garnier; +Cc: linux-kernel

Benoit Garnier wrote:
> 
> Szabolcs Szakacsits wrote :
> 
> > But if you start
> > to think you get the conclusion that process killing can't be avoided if
> > you want the system keep running.
> 
> What's the point in keeping the OS running if the applications are silently
> killed?
> 
> If your box is running for example a mail server, and it appears that
> another process is juste eating the free memory, do you really want to kill
> the mail server, just because it's the main process and consuming more
> memory and CPU than others?

Yes bloody dumn, becouse I can then go no to the box, blacklist
the smapper causing this with ipchains (or whatever it's called)
and restart sendmail - WITHOUT DRIVING 1900km.

^ permalink raw reply	[flat|nested] 162+ messages in thread

[parent not found: <Pine.LNX.4.30.0103251549100.13864-100000@fs131-224.f-secure.com>]

[parent not found: <l03130315b6e242006a4b@[192.168.239.101]>]

* Re: [PATCH] Prevent OOM from killing init
       [not found] ` <l03130315b6e242006a4b@[192.168.239.101]>
@ 2001-03-25 15:47   ` Jonathan Morton
  0 siblings, 0 replies; 162+ messages in thread
From: Jonathan Morton @ 2001-03-25 15:47 UTC (permalink / raw)
  To: Szabolcs Szakacsits; +Cc: linux-kernel

>> >start your app, wait for malloc to fail, hit enter for the other app and
>> >watch you app to be OOM killed ;)
>>
>> That would only happen if memory_overcommit was turned on, in which case my
>> modification would have zero effect anyway (the overcommit test happens
>> before my code).
>
>Thanks for listening and trying out the above trivial code instead of
>wrong theoretical arguments ;)
>
>So again, Linux *always* overcommit memory, the
>/proc/sys/vm/overcommit_memory controls total overcommit or
>quasi-overcommit [ehen you make your check in vm_enough() the memory is
>already overcommitted].

OK, looks like I got mixed up between *reservation* (malloc) and
*allocation* (access), and we're checking allocated memory when we should
really be checking reserved.  Be patient - I haven't done much of this type
of thing...  but your argument turns out to be correct, and I eventually
figured it out for myself.  I certainly agree that the default should be to
assume that all reserved memory will be used.  Maybe even do little nasty
things like printk(KERN_WARN "root is overcommitting memory!\n"); in
appropriate places, to discourage overcommitting.

>The solution is something like,
>add optional non-overcommit support,
>	http://lwn.net/2000/0406/a/no-overcommit.html

This sounds like a good solution.  Saw the size of the patch, it's big and
touches lots of bits of VM code, but it looks as though parts of my ideas
will also fit in there and be helpful.

Hmm... so we get an adjusted or replaced OOM-kill-selection algorithm, my
out_of_memory() fix and runaway process clamp, and this big(ish)
memory-accounting patch.  Sounds like a good combination to me, fixing all
the problems I've heard about recently.

There are some unrelated performance problems I've encountered during my
testing (eg. kswapd gets incredibly inefficient when swap usage grows
beyond about 500Mb on my 256Mb physical machine, causing swap bandwidth to
fall way below the HDs' capabilities), which I'm going to ignore for now.
Probably whoever takes on the VM balancing problem can look into that, as
it's probably related to that rather than this...

--------------------------------------------------------------
from:     Jonathan "Chromatix" Morton
mail:     chromi@cyberspace.org  (not for attachments)
big-mail: chromatix@penguinpowered.com
uni-mail: j.d.morton@lancaster.ac.uk

The key to knowledge is not to rely on people to teach you it.

Get VNC Server for Macintosh from http://www.chromatix.uklinux.net/vnc/

-----BEGIN GEEK CODE BLOCK-----
Version 3.12
GCS$/E/S dpu(!) s:- a20 C+++ UL++ P L+++ E W+ N- o? K? w--- O-- M++$ V? PS
PE- Y+ PGP++ t- 5- X- R !tv b++ DI+++ D G e+ h+ r++ y+(*)
-----END GEEK CODE BLOCK-----

^ permalink raw reply	[flat|nested] 162+ messages in thread

end of thread, other threads:[~2001-03-27 15:06 UTC | newest]

Thread overview: 162+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-03-21  3:48 Only 10 MB/sec with via 82c686b chipset? SodaPop
2001-03-21 13:48 ` egger
2001-03-22  2:14   ` TimO
2001-03-23  2:38   ` Only 10 MB/sec with via 82c686b - FIXED SodaPop
2001-03-23  9:48     ` Alan Cox
2001-03-23 16:21       ` SodaPop
2001-03-23 17:00       ` [PATCH] Prevent OOM from killing init SodaPop
2001-03-23 18:42         ` Martin Dalecki
2001-03-23 20:25           ` SodaPop
2001-03-23 20:33             ` Martin Dalecki
2001-03-23 19:19         ` Jonathan Morton
2001-03-23  2:59   ` Only 10 MB/sec with via 82c686b - FIXED SodaPop
2001-03-21 14:18 ` Only 10 MB/sec with via 82c686b chipset? Jonathan Morton
2001-03-21 17:34   ` egger
  -- strict thread matches above, loose matches on Subject: below --
2001-03-21 22:54 [PATCH] Prevent OOM from killing init Patrick O'Rourke
2001-03-21 23:11 ` Eli Carter
2001-03-21 23:40   ` Patrick O'Rourke
2001-03-21 23:48 ` Rik van Riel
2001-03-22  8:14   ` Eric W. Biederman
2001-03-22  9:24     ` Rik van Riel
2001-03-22 19:29     ` Philipp Rumpf
2001-03-22 11:47   ` Guest section DW
2001-03-22 15:01     ` Rik van Riel
2001-03-22 19:04       ` Guest section DW
2001-03-22 16:41     ` Eric W. Biederman
2001-03-22 20:28     ` Stephen Clouse
2001-03-22 21:01       ` Ingo Oeser
2001-03-22 21:23       ` Alan Cox
2001-03-22 22:00         ` Guest section DW
2001-03-22 22:12           ` Ed Tomlinson
2001-03-22 22:52           ` Alan Cox
2001-03-22 23:27             ` Guest section DW
2001-03-22 23:37               ` Rik van Riel
2001-03-26 19:04                 ` James Antill
2001-03-26 20:05                   ` Rik van Riel
2001-03-22 23:40               ` Alan Cox
2001-03-23 20:09                 ` Szabolcs Szakacsits
2001-03-23 22:21                   ` Alan Cox
2001-03-23 22:37                     ` Szabolcs Szakacsits
2001-03-23 19:57           ` Szabolcs Szakacsits
2001-03-22 22:10         ` Doug Ledford
2001-03-22 22:53           ` Alan Cox
2001-03-22 23:30             ` Doug Ledford
2001-03-22 23:40               ` Alan Cox
2001-03-22 23:43         ` Stephen Clouse
2001-03-23 19:26         ` Szabolcs Szakacsits
2001-03-23 20:41           ` Paul Jakma
2001-03-23 21:58             ` george anzinger
2001-03-24  5:55               ` Rik van Riel
2001-03-23 22:18             ` Szabolcs Szakacsits
2001-03-24  2:08               ` Paul Jakma
2001-03-23  1:31       ` Michael Peddemors
2001-03-23  7:04         ` Rik van Riel
2001-03-23 11:28           ` Guest section DW
2001-03-23 14:50             ` Eric W. Biederman
2001-03-23 17:21               ` Guest section DW
2001-03-23 20:18                 ` Paul Jakma
2001-03-24 20:19                   ` Jesse Pollard
2001-03-23 23:48                 ` Eric W. Biederman
2001-03-23 21:11             ` José Luis Domingo López
2001-03-27 15:05       ` Anthony de Boer - USEnet
2002-03-23  0:33       ` Martin Dalecki
2001-03-22 23:53         ` Rik van Riel
2002-03-23  1:21           ` Martin Dalecki
2001-03-23  0:20         ` Stephen Clouse
2002-03-23  1:30           ` Martin Dalecki
2001-03-23  1:37             ` Rik van Riel
2001-03-23 10:48               ` Martin Dalecki
2001-03-23 14:56                 ` Rik van Riel
2001-03-23 16:43                   ` Guest section DW
2001-03-24  5:57                     ` Rik van Riel
2001-03-25 16:35                       ` Guest section DW
2001-03-23 20:20                   ` Tom Diehl
2001-03-23 23:56                     ` Tim Wright
2001-03-24  0:21                       ` Tom Diehl
2001-03-23 17:26     ` James A. Sutherland
2001-03-23 17:32       ` Alan Cox
2001-03-23 18:58         ` Martin Dalecki
2001-03-23 19:45         ` Jonathan Morton
2001-03-23 23:26           ` Eric W. Biederman
2001-03-25 15:30         ` Martin Dalecki
2001-03-25 20:47         ` Stephen Satchell
2001-03-24  0:03       ` Guest section DW
2001-03-24  7:52       ` Doug Ledford
2001-03-25  0:32       ` Kurt Garloff
2001-03-25 15:02         ` Sandy Harris
2001-03-25 18:07         ` Guest section DW
2001-03-22 14:53   ` Patrick O'Rourke
2001-03-22 19:24   ` Philipp Rumpf
2001-03-22 22:20   ` James A. Sutherland
2001-03-23 17:31   ` Szabolcs Szakacsits
2001-03-24  5:54     ` Rik van Riel
2001-03-24  6:55       ` Juha Saarinen
2001-03-27  8:31       ` Roger Gammans
2001-03-21 23:41 Leif Sawyer
2001-03-22  0:32 ` Kevin Buhr
2001-03-22 11:08 Heusden, Folkert van
     [not found] <4605B269DB001E4299157DD1569079D2809930@EXCHANGE03.plaza.ds.adp.com>
2001-03-22 16:29 ` Rik van Riel
2001-03-22 18:32   ` Christian Bodmer
2001-03-23 15:08     ` Horst von Brand
2001-03-24  7:48       ` Doug Ledford
2001-03-24 10:21         ` Mike Galbraith
2001-03-24 18:19           ` Doug Ledford
2001-03-24 22:47             ` Mike Galbraith
2001-03-24 23:35             ` Jonathan Morton
2001-03-25 18:35               ` Jonathan Morton
2001-03-26  4:40                 ` Horst von Brand
2001-03-26  8:36                 ` Mike Galbraith
2001-03-26 10:01                 ` Jonathan Morton
2001-03-26 14:48                   ` Rik van Riel
2001-03-25 19:07               ` Mike Galbraith
2001-03-24 20:04           ` Jonathan Morton
2001-03-24 20:59           ` Jonathan Morton
2001-03-24 22:11             ` Rik van Riel
2001-03-24 23:36             ` Jonathan Morton
2001-03-25 14:30             ` Martin Dalecki
2001-03-25 14:13           ` Martin Dalecki
2001-03-24 12:42         ` Jonathan Morton
2001-03-24 15:06           ` Mike Galbraith
2001-03-25 14:10         ` Martin Dalecki
2001-03-22 23:35 Mikael Pettersson
2001-03-22 23:43 ` Alan Cox
2001-03-27  7:58   ` Helge Hafting
2001-03-23  0:09 Mikael Pettersson
2001-03-23  0:27 ` Andrew Morton
2001-03-23 12:29   ` Mikael Pettersson
2001-03-23 16:24 ` Horst von Brand
2001-03-23 16:49   ` Guest section DW
2001-03-23 17:04     ` Alan Cox
2001-03-23  9:28 Heusden, Folkert van
2001-03-23 18:29 Andries.Brouwer
2001-03-23 18:38 ` Alan Cox
2001-03-24  0:46   ` Tim Wright
2001-03-24 16:48   ` Jesse Pollard
2001-03-25 16:12     ` Szabolcs Szakacsits
2001-03-25 16:39     ` Jonathan Morton
2001-03-23 18:43 ` nick
2001-03-23 19:01   ` Martin Dalecki
2001-03-23 19:23     ` nick
2001-03-23 22:12     ` Alan Cox
2001-03-23 23:23       ` Stephen E. Clark
2001-03-24 10:40         ` Gérard Roudier
2001-03-23 21:14 ` Jonathan Morton
2001-03-25 14:56   ` Marco Colombo
2001-03-23 19:33 Stephen Satchell
2001-03-23 23:15 Andries.Brouwer
2001-03-23 23:17 ` Martin Dalecki
2001-03-24  0:13 ` Jonathan Morton
2001-03-24  6:58   ` Rik van Riel
2001-03-24 12:38   ` Jonathan Morton
2001-03-24 13:12   ` Jonathan Morton
2001-03-24  1:59 ` Paul Jakma
2001-03-24  1:11 Andries.Brouwer
2001-03-24  1:38 Jonathan Morton
2001-03-24  2:30 Andreas Franck
2001-03-24 10:18 Andries.Brouwer
2001-03-24 23:41 Benoit Garnier
2001-03-25  5:45 ` Stephen Satchell
2001-03-25  6:58   ` Stephen Clouse
2001-03-25 14:37   ` Martin Dalecki
2001-03-25 14:32 ` Martin Dalecki
     [not found] <Pine.LNX.4.30.0103251549100.13864-100000@fs131-224.f-secure.com>
     [not found] ` <l03130315b6e242006a4b@[192.168.239.101]>
2001-03-25 15:47   ` Jonathan Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox