All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.4.23-pre VM regression?
@ 2003-10-16 11:52 Marcelo Tosatti
  2003-10-16 13:15 ` Tvrtko A. Uršulin
  2003-10-16 13:24 ` Andrea Arcangeli
  0 siblings, 2 replies; 10+ messages in thread
From: Marcelo Tosatti @ 2003-10-16 11:52 UTC (permalink / raw)
  To: andrea, riel; +Cc: linux-kernel


Andrea, 

Martin first reported problems with "gzip -dc file | less" (280MB file).
less was getting killed. He had no swap... I asked him to add some swap
and it works now. Fine. 

The thing is that with 2.4.22 less was being killed, but with 2.4.23-pre
he gets:

>> And yes, the app was killed:
> >
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process named
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process gpm
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process sendmail
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process less

So a lot of processes which should not get killed are dying. This is
really bad. I was afraid it could happen and it did.

What now? Resurrect OOM-killer? 

> > Hi,
> >   it's a long time I haven't seen sthis messages, but it just happened that
> > I did on my laptop ASUS L3880C(1GB RAM). The message show on
> > 2.4.23-pre5+acpi20030918 and 2.4.23-pre7. The application get's killed on
> > 2.4.22-acpi20030918 too, just without the "0-order allocation" message.
> > I enabled in kernel the VM allocation debug option when configuring, but
> > apparently I have to turn it on also somewhere else. *Documentation* is
> > missing: 1) the help in "make config/menuconfig" etc. doesn't say anything,
> > the Documentation subdirectory doesn't say anything except "debug" as
> > kernel boot option on command-line(I did that too, but no change) and also
> > linux kernel-FAQ doesn't say either. :(
> >
> > How I tested?
> > `gzip -dc file | less' and pressed `G' to jump to the very end of the file.
> > The filesize is 280MB only. In a while, the mouse stopps moving for a
> > while, than the system gets sometimes unloaded, fan is raises it's RPM's up
> > and down town to time, and mouse cursor eventually does a move and then
> > less command gets killed. In dmesg I found:
> >
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process less

With 2.4.22:

> 2.4.22-acpi-20030918 with HIGHMEM gives only in dmesg:
>
> Out of Memory: Killed process 1904 (less).





^ permalink raw reply	[flat|nested] 10+ messages in thread
[parent not found: <fa.jkt135h.1l0s0t@ifi.uio.no>]
* Re: 2.4.23-pre VM regression?
@ 2003-10-18 22:09 A.D.F.
  0 siblings, 0 replies; 10+ messages in thread
From: A.D.F. @ 2003-10-18 22:09 UTC (permalink / raw)
  To: linux-kernel


> On Thu, Oct 16, 2003 at 10:29:16AM -0200, Marcelo Tosatti wrote:
> > 
> > 
> > On Thu, 16 Oct 2003, Andrea Arcangeli wrote:
> > 
> > > On Thu, Oct 16, 2003 at 09:52:30AM -0200, Marcelo Tosatti wrote:
> > > > 
> > > > Andrea, 
> > > > 
> > > > Martin first reported problems with "gzip -dc file | less" (280MB file).
> > > > less was getting killed. He had no swap... I asked him to add some swap
> > > > and it works now. Fine. 
> > > > 
> > > > The thing is that with 2.4.22 less was being killed, but with 2.4.23-pre
> > > > he gets:
> > > 
> > > note, that's a true oom, less needs to allocate 280MB and it doesn't fit
> > > in ram. there's no bug as far as I can tell.
> > >
> > > a `vmstat 1` could confirm that.
> > > 
> > > > >> And yes, the app was killed:
> > > > > >
> > > > > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > > > > VM: killing process named
> > > > > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > > > > VM: killing process gpm
> > > > > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > > > > VM: killing process sendmail
> > > > > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > > > > VM: killing process less
> > > 
> > > here the vm keeps killing until 'less' - the real offender - is nuked.
> > > 
> > > > So a lot of processes which should not get killed are dying. This is
> > > > really bad. I was afraid it could happen and it did.
> > > > 
> > > > What now? Resurrect OOM-killer? 
> > > 
> > > the oom killer has the problem I outlined some email ago, with shared
> > > memory it gets fooled badly etc.., though in a desktop with all tiny
> > > tasks except the memory-hog (`less` in this case) it works well.
> > 
> > Andrea,
> > 
> > There is no memory. Right. Some task has to be killed. But not small
> > programs like sendmail/named/etc. What should be killed is "less". That is
> > clear, right?
> 
> sure. I think I already explained there are downsides in disabling the
> oom killer for desktops where the offender task is normally the biggest
> one too, but those downsides aren't something I care about given the
> cases it gets right w/o it (i.e. huge-shm-SGA/mlock/oomdeadlocks). the
> oom killer can do the wrong decision too sometime, and more
> systematically as well.

No, the hole point of view on this matter is wrong !

Kernel should try hard to not kill any process unless it behaves like
a time bomb (and in any case such a OOM killer should be configurable,
sysadmin should be able to enable or disable it).

memory allocation
-----------------
Kernel should instead do some memory preallocation
for reserved internal tasks and should fail system calls requesting
too much RAM.

Many applications don't know how much RAM the system has,
but they know well what to do when malloc(), calloc(), etc. return NULL
or when mmap() fails with ENOMEM: they usually try to shrink
their own caches and/or unused allocated objects
(well many apps simply exit, but this is their choice).

mmap
----
In the old good days of Kernels 2.0.x and 2.2.x,
you could mmap() a file bigger than RAM + swap size and nothing
too bad happened because kernel tried hard to keep in RAM only
used "window" of "mmapped" area (and it was obvious that to accomplish
such a task, accesses to mmapped area were slowed down a bit).

Now, in 2.4.x, with swap disabled,
what happens when an application mmaps a file bigger than RAM size ?

	1) mmap() succedes;

	2) application starts to read the file sequentially and
	   when it has read as many bytes as the RAM size - n
	   (where n is 1 - 32 MB), then it is brutally killed (kill -9)
	   without any advice.

This is _bad_ !!!

Before answering that no, this might be good, etc.,
think at common systems with 64, 128 or 256 MB RAM
and also at embedded systems that usually have swap disabled.

Conclusion is:  Kernels 2.2.x are stable rocks, kernels 2.4.x aren't yet,
in other words: don't kill, return ENOMEM instead and
use a safe overcommit value of 0.

(Please, CC to me).

-- 
Nick Name:      A.D.F.
E-Mail:         adefacc@tin.it
Content-Type:   text/plain
--

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-10-27 21:11 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-10-16 11:52 2.4.23-pre VM regression? Marcelo Tosatti
2003-10-16 13:15 ` Tvrtko A. Uršulin
2003-10-16 13:24 ` Andrea Arcangeli
2003-10-16 12:29   ` Marcelo Tosatti
2003-10-16 13:35     ` Andrea Arcangeli
2003-10-19 23:21       ` Ken Moffat
2003-10-21  9:25         ` Stephan von Krawczynski
2003-10-27 21:11           ` Mike Fedyk
     [not found] <fa.jkt135h.1l0s0t@ifi.uio.no>
     [not found] ` <fa.j3l9liv.1djudhj@ifi.uio.no>
2003-10-16 23:37   ` Andreas Hartmann
  -- strict thread matches above, loose matches on Subject: below --
2003-10-18 22:09 A.D.F.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.