public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* lot of VM problem with 2.4.23
@ 2003-12-21  0:14 Octave
  2003-12-21  9:30 ` Peter Zaitsev
  2003-12-21 10:43 ` bert hubert
  0 siblings, 2 replies; 27+ messages in thread
From: Octave @ 2003-12-21  0:14 UTC (permalink / raw)
  To: linux-kernel

Hi,
Since we use 2.4.23 we have lot of crash. I have no kernel panic.
All I can report is this kind of syslog's message:
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process rateup

Mysql doesn't like 2.4.23 either.
SQL Error : 1 Can't create/write to file '/tmp/#sql2ec_1acd2_2.MYI' (Errcode: 30)

It arrives on servers which need swap.

No problem with 2.4.22.

Thanks
Octave

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21  0:14 lot of VM problem with 2.4.23 Octave
@ 2003-12-21  9:30 ` Peter Zaitsev
  2003-12-21 14:37   ` Marcelo Tosatti
  2003-12-21 10:43 ` bert hubert
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zaitsev @ 2003-12-21  9:30 UTC (permalink / raw)
  To: Octave; +Cc: linux-kernel

On Sun, 2003-12-21 at 03:14, Octave wrote:
> Hi,
> Since we use 2.4.23 we have lot of crash. I have no kernel panic.
> All I can report is this kind of syslog's message:
> __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> VM: killing process rateup
> 
> Mysql doesn't like 2.4.23 either.
> SQL Error : 1 Can't create/write to file '/tmp/#sql2ec_1acd2_2.MYI' (Errcode: 30)

Octave,

This looks like for some reason  your /tmp file-system is read only or
for any other reason kernel returns this error code. 

Do not you have any more error messages dmesg or logs ? 

-- 
Peter Zaitsev, Full-Time Developer
MySQL AB, www.mysql.com

Are you MySQL certified?  www.mysql.com/certification


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21  0:14 lot of VM problem with 2.4.23 Octave
  2003-12-21  9:30 ` Peter Zaitsev
@ 2003-12-21 10:43 ` bert hubert
  1 sibling, 0 replies; 27+ messages in thread
From: bert hubert @ 2003-12-21 10:43 UTC (permalink / raw)
  To: Octave; +Cc: linux-kernel

On Sun, Dec 21, 2003 at 01:14:22AM +0100, Octave wrote:
> Hi,
> Since we use 2.4.23 we have lot of crash. I have no kernel panic.

Now might be a great time to try 2.6!

-- 
http://www.PowerDNS.com      Open source, database driven DNS Software 
http://lartc.org           Linux Advanced Routing & Traffic Control HOWTO

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
       [not found] <14ZDV-1H1-1@gated-at.bofh.it>
@ 2003-12-21 13:53 ` Kristian
  0 siblings, 0 replies; 27+ messages in thread
From: Kristian @ 2003-12-21 13:53 UTC (permalink / raw)
  To: Octave; +Cc: linux-kernel

Octave <oles@ovh.net> schrieb:
> Since we use 2.4.23 we have lot of crash. I have no kernel panic.
> All I can report is this kind of syslog's message:
> __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> VM: killing process rateup

That's the new aa VM. Andrea has removed the out_of_memory killer in 2.4.23. Search the archives, Marcelo has sent a patch to the list that re-enables the oom_killer.

*Kristian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21  9:30 ` Peter Zaitsev
@ 2003-12-21 14:37   ` Marcelo Tosatti
  2003-12-21 15:03     ` Octave
  2003-12-21 18:47     ` Octave
  0 siblings, 2 replies; 27+ messages in thread
From: Marcelo Tosatti @ 2003-12-21 14:37 UTC (permalink / raw)
  To: Peter Zaitsev; +Cc: Octave, linux-kernel



On Sun, 21 Dec 2003, Peter Zaitsev wrote:

> On Sun, 2003-12-21 at 03:14, Octave wrote:
> > Hi,
> > Since we use 2.4.23 we have lot of crash. I have no kernel panic.
> > All I can report is this kind of syslog's message:
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process rateup

How much swap do you have in your system?

This is happening because the system is unable to free memory (you
probably ran out of swap for some reason).


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 14:37   ` Marcelo Tosatti
@ 2003-12-21 15:03     ` Octave
  2003-12-21 15:42       ` Willy Tarreau
  2003-12-21 18:47     ` Octave
  1 sibling, 1 reply; 27+ messages in thread
From: Octave @ 2003-12-21 15:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Peter Zaitsev, linux-kernel

On Sun, Dec 21, 2003 at 12:37:44PM -0200, Marcelo Tosatti wrote:
> 
> 
> On Sun, 21 Dec 2003, Peter Zaitsev wrote:
> 
> > On Sun, 2003-12-21 at 03:14, Octave wrote:
> > > Hi,
> > > Since we use 2.4.23 we have lot of crash. I have no kernel panic.
> > > All I can report is this kind of syslog's message:
> > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > VM: killing process rateup
> 
> How much swap do you have in your system?
> 
> This is happening because the system is unable to free memory (you
> probably ran out of swap for some reason).

Marcelo,

For exemple this server: Piv 2.4GHz with 1Go RAM
I think it arrives when all memory is used (ram + swap). 

# free
             total       used       free     shared    buffers     cached
Mem:       1032592    1016492      16100          0      36184     759668
-/+ buffers/cache:     220640     811952
Swap:       265064     122784     142280

# ps auxw | grep -v "0  0.0"
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
mysql    26958  0.0 14.0 275000 144972 ?     S    Dec19   2:09 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
mysql    27004  0.0 14.0 275000 144972 ?     S    Dec19   2:28 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
mysql    27005  0.0 14.0 275000 144972 ?     S    Dec19   2:03 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
root      9621  0.0  0.1  6316 1860 ?        S    01:02   0:00 /usr/sbin/sshd
root      9631  0.0  0.1  2500 1352 pts/0    S    01:02   0:00 -bash
root      9683  0.0  0.1  6276 1844 ?        S    13:47   0:00 /usr/sbin/sshd
root      9707  0.0  0.1  2504 1356 pts/2    S    13:47   0:00 -bash
postfix  29728  0.0  0.1  3508 1184 ?        S    15:45   0:00 pickup -l -t fifo -u
mysql     7341  0.0 14.0 275000 144972 ?     S    15:59   0:00 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
root      7347  0.0  0.1  2504 1356 pts/2    R    15:59   0:00 -bash

There is nothing to take 1Go of RAM.

I switched to 2.4.22 some servers and there is no more problem with. So I think
it's 2.4.23's problem only.

Octave

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 15:03     ` Octave
@ 2003-12-21 15:42       ` Willy Tarreau
  2003-12-21 16:13         ` Octave
  0 siblings, 1 reply; 27+ messages in thread
From: Willy Tarreau @ 2003-12-21 15:42 UTC (permalink / raw)
  To: Octave; +Cc: Marcelo Tosatti, Peter Zaitsev, linux-kernel

On Sun, Dec 21, 2003 at 04:03:12PM +0100, Octave wrote:
 
> # ps auxw | grep -v "0  0.0"
> USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
> mysql    26958  0.0 14.0 275000 144972 ?     S    Dec19   2:09 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
> mysql    27004  0.0 14.0 275000 144972 ?     S    Dec19   2:28 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
> mysql    27005  0.0 14.0 275000 144972 ?     S    Dec19   2:03 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
> root      9621  0.0  0.1  6316 1860 ?        S    01:02   0:00 /usr/sbin/sshd
> root      9631  0.0  0.1  2500 1352 pts/0    S    01:02   0:00 -bash
> root      9683  0.0  0.1  6276 1844 ?        S    13:47   0:00 /usr/sbin/sshd
> root      9707  0.0  0.1  2504 1356 pts/2    S    13:47   0:00 -bash
> postfix  29728  0.0  0.1  3508 1184 ?        S    15:45   0:00 pickup -l -t fifo -u
> mysql     7341  0.0 14.0 275000 144972 ?     S    15:59   0:00 /usr/sbin/mysqld --basedir=/ --datadir=/var/lib/mysql --user=mysql --
> root      7347  0.0  0.1  2504 1356 pts/2    R    15:59   0:00 -bash
> 
> There is nothing to take 1Go of RAM.

Octave,

one of my collegues had a server which occasionally crashed at night with
mysql taking all the memory. I think it was with an old 2.4.18 kernel. He
finally reinstalled all the machine and it never happened anymore. So
eventhough it works for you with 2.4.22, perhaps 2.4.23 triggers a mysql
bug which is fixed in more recent releases ?

Cheers,
Willy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 15:42       ` Willy Tarreau
@ 2003-12-21 16:13         ` Octave
  2003-12-21 18:45           ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Octave @ 2003-12-21 16:13 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: Marcelo Tosatti, Peter Zaitsev, linux-kernel

> one of my collegues had a server which occasionally crashed at night with
> mysql taking all the memory. I think it was with an old 2.4.18 kernel. He
> finally reinstalled all the machine and it never happened anymore. So
> eventhough it works for you with 2.4.22, perhaps 2.4.23 triggers a mysql
> bug which is fixed in more recent releases ?

Willy,
Hmm ... could be, but I don't think so. I use last mysql3 version and
I have this problem without mysql too.

For example on this box there is no mysql (Piv 2.4GHz/512Mo):
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process watchdog
__alloc_pages: 1-order allocation failed (gfp=0x1f0/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process watchdog

# free
             total       used       free     shared    buffers     cached
Mem:        514468     508416       6052          0      11608     205464
-/+ buffers/cache:     291344     223124
Swap:       265032      77524     187508

When I swithed more that 700 servers 10-15 days ago to 2.4.23, I saw that 
servers swaped less that with 2.4.22. So I believe VM was modified. Cool.
Great job. Now servers begin to crash :/ 

Octave


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 16:13         ` Octave
@ 2003-12-21 18:45           ` Marcelo Tosatti
  2003-12-21 19:14             ` Octave
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2003-12-21 18:45 UTC (permalink / raw)
  To: Octave; +Cc: Willy Tarreau, Marcelo Tosatti, Peter Zaitsev, linux-kernel



On Sun, 21 Dec 2003, Octave wrote:

> > one of my collegues had a server which occasionally crashed at night with
> > mysql taking all the memory. I think it was with an old 2.4.18 kernel. He
> > finally reinstalled all the machine and it never happened anymore. So
> > eventhough it works for you with 2.4.22, perhaps 2.4.23 triggers a mysql
> > bug which is fixed in more recent releases ?
>
> Willy,
> Hmm ... could be, but I don't think so. I use last mysql3 version and
> I have this problem without mysql too.
>
> For example on this box there is no mysql (Piv 2.4GHz/512Mo):
> __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> VM: killing process watchdog
> __alloc_pages: 1-order allocation failed (gfp=0x1f0/0)
> __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> VM: killing process watchdog
>
> # free
>              total       used       free     shared    buffers     cached
> Mem:        514468     508416       6052          0      11608     205464
> -/+ buffers/cache:     291344     223124
> Swap:       265032      77524     187508
>
> When I swithed more that 700 servers 10-15 days ago to 2.4.23, I saw that
> servers swaped less that with 2.4.22. So I believe VM was modified. Cool.
> Great job. Now servers begin to crash :/

Octave,

Can you please "echo 1 > /proc/sys/vm_gfp_debug" (to turn VM debugging on)
and rerun the tests which crash the box.

Also run "vmstat 5" in the background and save that to a file.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 14:37   ` Marcelo Tosatti
  2003-12-21 15:03     ` Octave
@ 2003-12-21 18:47     ` Octave
  2003-12-21 18:59       ` Tomas Szepe
  1 sibling, 1 reply; 27+ messages in thread
From: Octave @ 2003-12-21 18:47 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

> How much swap do you have in your system?
> 
> This is happening because the system is unable to free memory (you
> probably ran out of swap for some reason).

Marcelo,
You can run this easy script. 2.4.19 takes about 30 minutes 
to kill all process. 2.4.23 takes about 60 minutes.

I think, server crashs when VM kills a process like watchdog (why ?).

Octave

# head -n 100000 /dev/urandom > file
# cat > full.pl
#!/usr/bin/perl

open (F,"file");@F=<F>;close(F);
for (;;) { push @F,@F; }
# chmod 755 full.pl
# for i in `seq 1 100`; do ./full.pl &  done
[1] 767
[2] 768
[3] 769
[4] 770
[5] 771
[...]


# tail -f /var/log/messages
Dec 21 18:55:32 stock kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Dec 21 18:55:32 stock kernel: VM: killing process full.pl
Dec 21 18:55:37 stock kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Dec 21 18:55:37 stock kernel: VM: killing process full.pl



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 18:47     ` Octave
@ 2003-12-21 18:59       ` Tomas Szepe
  2003-12-21 23:43         ` Octave
  0 siblings, 1 reply; 27+ messages in thread
From: Tomas Szepe @ 2003-12-21 18:59 UTC (permalink / raw)
  To: Octave; +Cc: Marcelo Tosatti, linux-kernel

On Dec-21 2003, Sun, 19:47 +0100
Octave <oles@ovh.net> wrote:

> You can run this easy script. 2.4.19 takes about 30 minutes 
> to kill all process. 2.4.23 takes about 60 minutes.

Can you also try 2.4.24-pre1 with the OOM killer enabled?

-- 
Tomas Szepe <szepe@pinerecords.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 18:45           ` Marcelo Tosatti
@ 2003-12-21 19:14             ` Octave
  2003-12-21 20:45               ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Octave @ 2003-12-21 19:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel

> Octave,
> 
> Can you please "echo 1 > /proc/sys/vm_gfp_debug" (to turn VM debugging on)
> and rerun the tests which crash the box.
> 
> Also run "vmstat 5" in the background and save that to a file.
> 

Marcelo,

I got this kernel panic:

Dec 21 20:04:34 stock kernel: e9f01e50 c012e1d8 000001d2 08451094 00000001 e7493ea0 0000040e 00000000 
Dec 21 20:04:34 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 21 20:04:34 stock kernel:        00104025 c01225a3 e7493ea0 08451094 00000001 e9efec60 f7ec5380 c01226bf 
Dec 21 20:04:34 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c01225a3>] [<c01226bf>] [<c01228e8>]
Dec 21 20:04:34 stock kernel:   [<c0111707>] [<c011157c>] [<c0123db0>] [<c0122d85>] [<c0106fa0>]
Dec 21 20:04:52 stock kernel: f7313e18 c012e1d8 000001d2 f770c184 00000020 f7f0ffe0 0000040e 00000000 
Dec 21 20:04:52 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 21 20:04:52 stock kernel:        00000012 c0124e80 00000007 00000020 0000004a f717e920 f717e920 c0124efe 
Dec 21 20:04:52 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c0124e80>] [<c0124efe>] [<c0126793>]
Dec 21 20:04:52 stock kernel:   [<c01226e5>] [<c01228e8>] [<c0111707>] [<c011157c>] [<c019047c>] [<c0135190>]
Dec 21 20:04:52 stock kernel:   [<c0106fa0>]
Dec 21 20:05:01 stock kernel: dba23e50 c012e1d8 000001d2 094de03c 00000001 e6461500 0000040e 00000000 
Dec 21 20:05:01 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 21 20:05:01 stock kernel:        00104025 c01225a3 e6461500 094de03c 00000001 c23eea80 c10e6f90 c01226bf 
Dec 21 20:05:01 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c01225a3>] [<c01226bf>] [<c01228e8>]
Dec 21 20:05:01 stock kernel:   [<c0111707>] [<c011157c>] [<c0123db0>] [<c0122d85>] [<c0106fa0>]
Dec 21 20:05:42 stock kernel: c8525e50 c012e1d8 000001d2 08cb90f4 00000001 e14a2f20 0000040e 00000000 
Dec 21 20:05:42 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 21 20:05:42 stock kernel:        00104025 c01225a3 e14a2f20 08cb90f4 00000001 df9fc420 c138dbc0 c01226bf 
Dec 21 20:05:42 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c01225a3>] [<c01226bf>] [<c01228e8>]
Dec 21 20:05:42 stock kernel:   [<c0111707>] [<c011157c>] [<c0123db0>] [<c0122d85>] [<c0106fa0>]
Dec 21 20:05:42 stock kernel: d70ebedc c012e1d8 000001d2 00000000 00000000 f71aa544 00000414 00000000 
Dec 21 20:05:42 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 21 20:05:42 stock kernel:        00000000 c012595a 00000000 ffffffea f71b0ba0 00000000 f7fb29cc 00001000 
Dec 21 20:05:42 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c012595a>] [<c0125f1a>] [<c0125d9c>]
Dec 21 20:05:42 stock kernel:   [<c0135126>] [<c0106eb7>]
Dec 21 20:05:42 stock kernel: e3e97edc c012e1d8 000001d2 00000000 00000000 f71aa544 00000414 00000000 
Dec 21 20:05:42 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 21 20:05:42 stock kernel:        00000000 c012595a 00000000 ffffffea f6ee63c0 00000000 f7fb31f4 00001000 
Dec 21 20:05:42 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c012595a>] [<c0125f1a>] [<c0125d9c>]
Dec 21 20:05:42 stock kernel:   [<c0135126>] [<c0106eb7>]
Warning (Oops_read): Code line not seen, dumping what data is available

Trace; c012e1d8 <__alloc_pages+2a4/2b0>
Trace; c012dd80 <_alloc_pages+18/1c>
Trace; c01225a3 <do_anonymous_page+3b/124>
Trace; c01226bf <do_no_page+33/200>
Trace; c01228e8 <handle_mm_fault+5c/b8>
Trace; c0111707 <do_page_fault+18b/4cd>
Trace; c011157c <do_page_fault+0/4cd>
Trace; c0123db0 <do_brk+130/230>
Trace; c0122d85 <sys_brk+c1/f0>
Trace; c0106fa0 <error_code+34/3c>
Trace; c012e1d8 <__alloc_pages+2a4/2b0>
Trace; c012dd80 <_alloc_pages+18/1c>
Trace; c0124e80 <page_cache_read+7c/d0>
Trace; c0124efe <read_cluster_nonblocking+2a/40>
Trace; c0126793 <filemap_nopage+13f/23c>
Trace; c01226e5 <do_no_page+59/200>
Trace; c01228e8 <handle_mm_fault+5c/b8>
Trace; c0111707 <do_page_fault+18b/4cd>
Trace; c011157c <do_page_fault+0/4cd>
Trace; c019047c <tty_read+dc/124>
Trace; c0135190 <sys_read+100/108>
Trace; c0106fa0 <error_code+34/3c>
Trace; c012e1d8 <__alloc_pages+2a4/2b0>
Trace; c012dd80 <_alloc_pages+18/1c>
Trace; c01225a3 <do_anonymous_page+3b/124>
Trace; c01226bf <do_no_page+33/200>
Trace; c01228e8 <handle_mm_fault+5c/b8>
Trace; c0111707 <do_page_fault+18b/4cd>
Trace; c011157c <do_page_fault+0/4cd>
Trace; c0123db0 <do_brk+130/230>
Trace; c0122d85 <sys_brk+c1/f0>
Trace; c0106fa0 <error_code+34/3c>
Trace; c012e1d8 <__alloc_pages+2a4/2b0>
Trace; c012dd80 <_alloc_pages+18/1c>
Trace; c01225a3 <do_anonymous_page+3b/124>
Trace; c01226bf <do_no_page+33/200>
Trace; c01228e8 <handle_mm_fault+5c/b8>
Trace; c0111707 <do_page_fault+18b/4cd>
Trace; c011157c <do_page_fault+0/4cd>
Trace; c0123db0 <do_brk+130/230>
Trace; c0122d85 <sys_brk+c1/f0>
Trace; c0106fa0 <error_code+34/3c>
Trace; c012e1d8 <__alloc_pages+2a4/2b0>
Trace; c012dd80 <_alloc_pages+18/1c>
Trace; c012595a <do_generic_file_read+356/488>
Trace; c0125f1a <generic_file_read+96/198>
Trace; c0125d9c <file_read_actor+0/e8>
Trace; c0135126 <sys_read+96/108>
Trace; c0106eb7 <system_call+2f/34>
Trace; c012e1d8 <__alloc_pages+2a4/2b0>
Trace; c012dd80 <_alloc_pages+18/1c>
Trace; c012595a <do_generic_file_read+356/488>
Trace; c0125f1a <generic_file_read+96/198>
Trace; c0125d9c <file_read_actor+0/e8>
Trace; c0135126 <sys_read+96/108>
Trace; c0106eb7 <system_call+2f/34>




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 19:14             ` Octave
@ 2003-12-21 20:45               ` Marcelo Tosatti
  2003-12-21 21:09                 ` Octave
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2003-12-21 20:45 UTC (permalink / raw)
  To: Octave; +Cc: Marcelo Tosatti, linux-kernel, andrea

[-- Attachment #1: Type: TEXT/PLAIN, Size: 6103 bytes --]



On Sun, 21 Dec 2003, Octave wrote:

> > Octave,
> >
> > Can you please "echo 1 > /proc/sys/vm_gfp_debug" (to turn VM debugging on)
> > and rerun the tests which crash the box.
> >
> > Also run "vmstat 5" in the background and save that to a file.
> >
>
> Marcelo,
>
> I got this kernel panic:
>
> Dec 21 20:04:34 stock kernel: e9f01e50 c012e1d8 000001d2 08451094 00000001 e7493ea0 0000040e 00000000
> Dec 21 20:04:34 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80
> Dec 21 20:04:34 stock kernel:        00104025 c01225a3 e7493ea0 08451094 00000001 e9efec60 f7ec5380 c01226bf
> Dec 21 20:04:34 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c01225a3>] [<c01226bf>] [<c01228e8>]
> Dec 21 20:04:34 stock kernel:   [<c0111707>] [<c011157c>] [<c0123db0>] [<c0122d85>] [<c0106fa0>]
> Dec 21 20:04:52 stock kernel: f7313e18 c012e1d8 000001d2 f770c184 00000020 f7f0ffe0 0000040e 00000000
> Dec 21 20:04:52 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80
> Dec 21 20:04:52 stock kernel:        00000012 c0124e80 00000007 00000020 0000004a f717e920 f717e920 c0124efe
> Dec 21 20:04:52 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c0124e80>] [<c0124efe>] [<c0126793>]
> Dec 21 20:04:52 stock kernel:   [<c01226e5>] [<c01228e8>] [<c0111707>] [<c011157c>] [<c019047c>] [<c0135190>]
> Dec 21 20:04:52 stock kernel:   [<c0106fa0>]
> Dec 21 20:05:01 stock kernel: dba23e50 c012e1d8 000001d2 094de03c 00000001 e6461500 0000040e 00000000
> Dec 21 20:05:01 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80
> Dec 21 20:05:01 stock kernel:        00104025 c01225a3 e6461500 094de03c 00000001 c23eea80 c10e6f90 c01226bf
> Dec 21 20:05:01 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c01225a3>] [<c01226bf>] [<c01228e8>]
> Dec 21 20:05:01 stock kernel:   [<c0111707>] [<c011157c>] [<c0123db0>] [<c0122d85>] [<c0106fa0>]
> Dec 21 20:05:42 stock kernel: c8525e50 c012e1d8 000001d2 08cb90f4 00000001 e14a2f20 0000040e 00000000
> Dec 21 20:05:42 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80
> Dec 21 20:05:42 stock kernel:        00104025 c01225a3 e14a2f20 08cb90f4 00000001 df9fc420 c138dbc0 c01226bf
> Dec 21 20:05:42 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c01225a3>] [<c01226bf>] [<c01228e8>]
> Dec 21 20:05:42 stock kernel:   [<c0111707>] [<c011157c>] [<c0123db0>] [<c0122d85>] [<c0106fa0>]
> Dec 21 20:05:42 stock kernel: d70ebedc c012e1d8 000001d2 00000000 00000000 f71aa544 00000414 00000000
> Dec 21 20:05:42 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80
> Dec 21 20:05:42 stock kernel:        00000000 c012595a 00000000 ffffffea f71b0ba0 00000000 f7fb29cc 00001000
> Dec 21 20:05:42 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c012595a>] [<c0125f1a>] [<c0125d9c>]
> Dec 21 20:05:42 stock kernel:   [<c0135126>] [<c0106eb7>]
> Dec 21 20:05:42 stock kernel: e3e97edc c012e1d8 000001d2 00000000 00000000 f71aa544 00000414 00000000
> Dec 21 20:05:42 stock kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80
> Dec 21 20:05:42 stock kernel:        00000000 c012595a 00000000 ffffffea f6ee63c0 00000000 f7fb31f4 00001000
> Dec 21 20:05:42 stock kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c012595a>] [<c0125f1a>] [<c0125d9c>]
> Dec 21 20:05:42 stock kernel:   [<c0135126>] [<c0106eb7>]
> Warning (Oops_read): Code line not seen, dumping what data is available

Octave,

This is not a kernel panic, its the VM debugging output.

Can you please apply the attached patch on top of 2.4.23 and rerun the
test with "echo 1 > /proc/sys/vm_gfp_debug" ?

It will printout the number of available swap pages when processes get
killed.

Thanks

> Trace; c012e1d8 <__alloc_pages+2a4/2b0>
> Trace; c012dd80 <_alloc_pages+18/1c>
> Trace; c01225a3 <do_anonymous_page+3b/124>
> Trace; c01226bf <do_no_page+33/200>
> Trace; c01228e8 <handle_mm_fault+5c/b8>
> Trace; c0111707 <do_page_fault+18b/4cd>
> Trace; c011157c <do_page_fault+0/4cd>
> Trace; c0123db0 <do_brk+130/230>
> Trace; c0122d85 <sys_brk+c1/f0>
> Trace; c0106fa0 <error_code+34/3c>
> Trace; c012e1d8 <__alloc_pages+2a4/2b0>
> Trace; c012dd80 <_alloc_pages+18/1c>
> Trace; c0124e80 <page_cache_read+7c/d0>
> Trace; c0124efe <read_cluster_nonblocking+2a/40>
> Trace; c0126793 <filemap_nopage+13f/23c>
> Trace; c01226e5 <do_no_page+59/200>
> Trace; c01228e8 <handle_mm_fault+5c/b8>
> Trace; c0111707 <do_page_fault+18b/4cd>
> Trace; c011157c <do_page_fault+0/4cd>
> Trace; c019047c <tty_read+dc/124>
> Trace; c0135190 <sys_read+100/108>
> Trace; c0106fa0 <error_code+34/3c>
> Trace; c012e1d8 <__alloc_pages+2a4/2b0>
> Trace; c012dd80 <_alloc_pages+18/1c>
> Trace; c01225a3 <do_anonymous_page+3b/124>
> Trace; c01226bf <do_no_page+33/200>
> Trace; c01228e8 <handle_mm_fault+5c/b8>
> Trace; c0111707 <do_page_fault+18b/4cd>
> Trace; c011157c <do_page_fault+0/4cd>
> Trace; c0123db0 <do_brk+130/230>
> Trace; c0122d85 <sys_brk+c1/f0>
> Trace; c0106fa0 <error_code+34/3c>
> Trace; c012e1d8 <__alloc_pages+2a4/2b0>
> Trace; c012dd80 <_alloc_pages+18/1c>
> Trace; c01225a3 <do_anonymous_page+3b/124>
> Trace; c01226bf <do_no_page+33/200>
> Trace; c01228e8 <handle_mm_fault+5c/b8>
> Trace; c0111707 <do_page_fault+18b/4cd>
> Trace; c011157c <do_page_fault+0/4cd>
> Trace; c0123db0 <do_brk+130/230>
> Trace; c0122d85 <sys_brk+c1/f0>
> Trace; c0106fa0 <error_code+34/3c>
> Trace; c012e1d8 <__alloc_pages+2a4/2b0>
> Trace; c012dd80 <_alloc_pages+18/1c>
> Trace; c012595a <do_generic_file_read+356/488>
> Trace; c0125f1a <generic_file_read+96/198>
> Trace; c0125d9c <file_read_actor+0/e8>
> Trace; c0135126 <sys_read+96/108>
> Trace; c0106eb7 <system_call+2f/34>
> Trace; c012e1d8 <__alloc_pages+2a4/2b0>
> Trace; c012dd80 <_alloc_pages+18/1c>
> Trace; c012595a <do_generic_file_read+356/488>
> Trace; c0125f1a <generic_file_read+96/198>
> Trace; c0125d9c <file_read_actor+0/e8>
> Trace; c0135126 <sys_read+96/108>
> Trace; c0106eb7 <system_call+2f/34>
>
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 491 bytes --]

--- linux-2.4.23/mm/page_alloc.c.orig	2003-12-21 17:54:37.000000000 -0200
+++ linux-2.4.23/mm/page_alloc.c	2003-12-21 17:53:59.000000000 -0200
@@ -436,8 +436,10 @@
  out:
 	printk(KERN_NOTICE "__alloc_pages: %u-order allocation failed (gfp=0x%x/%i)\n",
 	       order, gfp_mask, !!(current->flags & PF_MEMALLOC));
-	if (unlikely(vm_gfp_debug))
+	if (unlikely(vm_gfp_debug)) {
+		printk(KERN_ERR "OOM: nr_swap_pages=%d", nr_swap_pages);
 		dump_stack();
+	}
 	return NULL;
 }
 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 20:45               ` Marcelo Tosatti
@ 2003-12-21 21:09                 ` Octave
  2003-12-21 22:23                   ` Willy Tarreau
  0 siblings, 1 reply; 27+ messages in thread
From: Octave @ 2003-12-21 21:09 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, andrea

> This is not a kernel panic, its the VM debugging output.
> 
> Can you please apply the attached patch on top of 2.4.23 and rerun the
> test with "echo 1 > /proc/sys/vm_gfp_debug" ?
> 
> It will printout the number of available swap pages when processes get
> killed.

Marcelo,

How about this ?

Dec 21 22:08:44 stock kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Dec 21 22:08:44 stock kernel: OOM: nr_swap_pages=0cd865e6c c012e1e8 c0262e3c 00000000 000001d2 00000000 00000001 cd863c00 
Dec 21 22:08:44 stock kernel:        0000040e 00000000 00000018 00000002 c029e9d8 c029ead4 00000000 000001d2 
Dec 21 22:08:44 stock kernel:        00000000 c012dd80 c11d3db0 c0121fe8 e0b12220 bffffb20 00000001 cd863c00 
Dec 21 22:08:44 stock kernel: Call Trace:    [__get_free_pages+4/24] [_alloc_pages+24/28] [do_wp_page+168/736] [handle_mm_fault+135/184] [do_page_fault+395/1229]
Dec 21 22:08:44 stock kernel: Call Trace:    [<c012e1e8>] [<c012dd80>] [<c0121fe8>] [<c0122913>] [<c0111707>]
Dec 21 22:08:44 stock kernel:   [do_page_fault+0/1229] [error_code+52/60]
Dec 21 22:08:44 stock kernel:   [<c011157c>] [<c0106fa0>]
Dec 21 22:08:44 stock kernel: VM: killing process watchdog
Dec 21 22:08:44 stock kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Dec 21 22:08:44 stock kernel: OOM: nr_swap_pages=0f7767e6c c012e1e8 c0262e3c 00000000 000001d2 00000000 00000001 f79bf6e0 
Dec 21 22:08:44 stock kernel:        0000040e 00000000 00000018 00000002 c029e9d8 c029ead4 00000000 000001d2 
Dec 21 22:08:44 stock kernel:        00000000 c012dd80 c11d3db0 c0121fe8 f7be36c0 bffffb20 00000001 f79bf6e0 
Dec 21 22:08:44 stock kernel: Call Trace:    [__get_free_pages+4/24] [_alloc_pages+24/28] [do_wp_page+168/736] [_alloc_pages+24/28] [handle_mm_fault+135/184]
Dec 21 22:08:44 stock kernel: Call Trace:    [<c012e1e8>] [<c012dd80>] [<c0121fe8>] [<c012dd80>] [<c0122913>]
Dec 21 22:08:45 stock kernel:   [do_page_fault+395/1229] [do_page_fault+0/1229] [do_fork+1719/2028] [sys_fork+20/28] [error_code+52/60]
Dec 21 22:08:45 stock kernel:   [<c0111707>] [<c011157c>] [<c01150eb>] [<c0105a44>] [<c0106fa0>]
Dec 21 22:08:45 stock kernel: VM: killing process watchdog

Octave

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 21:09                 ` Octave
@ 2003-12-21 22:23                   ` Willy Tarreau
  2003-12-22  7:03                     ` Ville Herva
  0 siblings, 1 reply; 27+ messages in thread
From: Willy Tarreau @ 2003-12-21 22:23 UTC (permalink / raw)
  To: Octave; +Cc: Marcelo Tosatti, linux-kernel, andrea

On Sun, Dec 21, 2003 at 10:09:17PM +0100, Octave wrote:
> > This is not a kernel panic, its the VM debugging output.
> > 
> > Can you please apply the attached patch on top of 2.4.23 and rerun the
> > test with "echo 1 > /proc/sys/vm_gfp_debug" ?
> > 
> > It will printout the number of available swap pages when processes get
> > killed.
> 
> Marcelo,
> 
> How about this ?
> 
> Dec 21 22:08:44 stock kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> Dec 21 22:08:44 stock kernel: OOM: nr_swap_pages=0cd865e6c c012e1e8 c0262e3c 00000000 000001d2 00000000 00000001 cd863c00 

OK, so there's no available swap anymore (nr_swap_pages=0, Marcelo forgot the
'\n' in the patch). I simply think that with other kernels, you're very short
of memory, but it runs, while with this one, all memory gets consumed, and
since there's no smart oom killer, one process has to get killed.

Cheers,
Willy


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 18:59       ` Tomas Szepe
@ 2003-12-21 23:43         ` Octave
  2003-12-22 11:27           ` Marcelo Tosatti
  0 siblings, 1 reply; 27+ messages in thread
From: Octave @ 2003-12-21 23:43 UTC (permalink / raw)
  To: Tomas Szepe; +Cc: linux-kernel

On Sun, Dec 21, 2003 at 07:59:59PM +0100, Tomas Szepe wrote:
> On Dec-21 2003, Sun, 19:47 +0100
> Octave <oles@ovh.net> wrote:
> 
> > You can run this easy script. 2.4.19 takes about 30 minutes 
> > to kill all process. 2.4.23 takes about 60 minutes.
> 
> Can you also try 2.4.24-pre1 with the OOM killer enabled?

I complied 2.4.24-pre1 with OOM killer. After 2 minutes of
test, server is down. 

Octave


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 22:23                   ` Willy Tarreau
@ 2003-12-22  7:03                     ` Ville Herva
  2003-12-22 18:35                       ` Mike Fedyk
  2003-12-22 19:09                       ` Andrea Arcangeli
  0 siblings, 2 replies; 27+ messages in thread
From: Ville Herva @ 2003-12-22  7:03 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel, andrea

On Sun, Dec 21, 2003 at 11:23:38PM +0100, you [Willy Tarreau] wrote:
> >
> > Dec 21 22:08:44 stock kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > Dec 21 22:08:44 stock kernel: OOM: nr_swap_pages=0cd865e6c c012e1e8 c0262e3c 00000000 000001d2 00000000 00000001 cd863c00 
> 
> OK, so there's no available swap anymore (nr_swap_pages=0, Marcelo forgot the
> '\n' in the patch). I simply think that with other kernels, you're very short
> of memory, but it runs, while with this one, all memory gets consumed, and
> since there's no smart oom killer, one process has to get killed.

BTW, I have a box with 128MB ram and 512MB swap running 2.4.21-jam1 (it has
the -aa vm). I can't shut it down cleanly, because trying it goes into
endless loop trying to free memory when turning off swap. Nothing but
alt-sysrq-b seems to work.

I don't know if there is a kernel memory leak, since all user level
processes should be killed at that point, right? Unfortunately I didn't have
time to dig deeper, as the box is in (sort of) production.


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-21 23:43         ` Octave
@ 2003-12-22 11:27           ` Marcelo Tosatti
  2003-12-22 12:30             ` Octave
  0 siblings, 1 reply; 27+ messages in thread
From: Marcelo Tosatti @ 2003-12-22 11:27 UTC (permalink / raw)
  To: Octave; +Cc: Tomas Szepe, linux-kernel, andrea



On Mon, 22 Dec 2003, Octave wrote:

> On Sun, Dec 21, 2003 at 07:59:59PM +0100, Tomas Szepe wrote:
> > On Dec-21 2003, Sun, 19:47 +0100
> > Octave <oles@ovh.net> wrote:
> >
> > > You can run this easy script. 2.4.19 takes about 30 minutes
> > > to kill all process. 2.4.23 takes about 60 minutes.
> >
> > Can you also try 2.4.24-pre1 with the OOM killer enabled?
>
> I complied 2.4.24-pre1 with OOM killer. After 2 minutes of
> test, server is down.

Hi Octave,

What do you mean with "server is down" ? The OOM killer killed an
application ? What were the messages?

Under out of memory, 2.4.22 should also kill a process, but you say it
doesnt.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22 11:27           ` Marcelo Tosatti
@ 2003-12-22 12:30             ` Octave
  2003-12-22 15:17               ` Andrea Arcangeli
  0 siblings, 1 reply; 27+ messages in thread
From: Octave @ 2003-12-22 12:30 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: linux-kernel, andrea

> Hi Octave,
> 
> What do you mean with "server is down" ? The OOM killer killed an
> application ? What were the messages?
> 
> Under out of memory, 2.4.22 should also kill a process, but you say it
> doesnt.

Marcelo,

All I have with 
- 2.4.24-pre1 is
# echo 1 > /proc/sys/vm/vm_gfp_debug        
# for i in `seq 1 100`; do ./full.pl &  done
[1] 849
[2] 850
[...]                                                                                                 
# tail -f /var/log/messages                                                                                                                    
[...]

SOFTDOG: Initiating system reboot.

LILO:

- 2.4.23 
Dec 22 13:16:30 u8668 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)                                                                            
Dec 22 13:16:30 u8668 kernel: f7465e74 c012e1d8 000001d2 00000000 00000001 c493e120 000003ef 00000000                                                           
Dec 22 13:16:30 u8668 kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80                                                    
Dec 22 13:16:30 u8668 kernel:        c1ab6bf0 c0121fe8 f5759d40 00125eac 00000001 c493e120 00000000 0003923f                                                    
Dec 22 13:16:30 u8668 kernel: Call Trace:    [raw_devices+7128/8192] [raw_devices+6016/8192] [buf.0+40/1024] [log_buf+275/16384] [device_list+2031/2032]        
Dec 22 13:16:30 u8668 kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c0121fe8>] [<c0122913>] [<c0111707>]                                                   
Dec 22 13:16:30 u8668 kernel:   [device_list+1636/2032] [dfont_unitable+352/608]                                                                                
Dec 22 13:16:30 u8668 kernel:   [<c011157c>] [<c0106fa0>]                                                                                                       
Dec 22 13:16:30 u8668 kernel: VM: killing process full.pl
Dec 22 13:16:30 u8668 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)                                                                            
Dec 22 13:16:30 u8668 kernel: f7465d40 c012e1d8 000001d2 00000000 00000000 f79d79a4 000003ef 00000000                                                           
Dec 22 13:16:30 u8668 kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80                                                    
Dec 22 13:16:34 u8668 kernel:        00000000 c012595a f7464000 c0000000 f7465eec 00000000 f7ddb788 00001000                                                    
Dec 22 13:16:34 u8668 kernel: Call Trace:    [raw_devices+7128/8192] [raw_devices+6016/8192] [log_buf+12634/16384] [log_buf+14106/16384] [log_buf+13724/16384]  
Dec 22 13:16:34 u8668 kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c012595a>] [<c0125f1a>] [<c0125d9c>]                                                   
Dec 22 13:16:34 u8668 kernel:   [read_ahead+669/1020] [blk_dev+29672/35712] [blk_dev+31175/35712] [blk_dev+32524/35712] [devpts_root_inode_operations+67/80] [df
ont_unitable+119/608]                                                                                                                                           
Dec 22 13:16:35 u8668 kernel:   [<c0135dbd>] [<c013d328>] [<c013d907>] [<c013de4c>] [<c0105ac3>] [<c0106eb7>]
Dec 22 13:16:35 u8668 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Dec 22 13:16:35 u8668 kernel: f729fe14 c012e1d8 000001d2 00128000 00000001 f77ef8e0 000003ef 00000000 
Dec 22 13:16:35 u8668 kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 22 13:16:35 u8668 kernel:        00104025 c01225a3 f77ef8e0 00128000 00000001 f7317aa0 f7ce4800 c01226bf 
Dec 22 13:16:36 u8668 kernel: Call Trace:    [raw_devices+7128/8192] [raw_devices+6016/8192] [printk_buf.1+451/1024] [printk_buf.1+735/1024] [log_buf+232/16384]
Dec 22 13:16:36 u8668 kernel: Call Trace:    [<c012e1d8>] [<c012dd80>] [<c01225a3>] [<c01226bf>] [<c01228e8>]
Dec 22 13:16:36 u8668 kernel:   [pidhash+2049/4096] [log_buf+704/16384] [log_buf+3230/16384] [DAC960_MessageLevelMap+30/32] [dfont_unitable+119/608]
Dec 22 13:16:36 u8668 kernel:   [<c0121781>] [<c0122ac0>] [<c012349e>] [<c010c6be>] [<c0106eb7>]
Dec 22 13:16:36 u8668 kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Dec 22 13:16:36 u8668 kernel: f729fe14 c012e1d8 000001d2 00128000 00000001 f77ef8e0 000003ef 00000000 
Dec 22 13:16:36 u8668 kernel:        00000018 00000002 c029e998 c029ea94 00000000 000001d2 00000000 c012dd80 
Dec 22 13:16:36 u8668 kernel:        00104025 c01225a3 f77ef8e0 00128000 00000001 f7317aa0 f7ce4800 c01226bf 
Dec 22 13:16:36 u8668 kernel: Call Trace:    [raw_devices+7128/8192] [raw_devices+6016/8192] [printk_buf.1+451/1024] [printk_buf.1+735/1024] [log_buf+232/16384]

[...]

then nothing more.

- 2.4.22

# echo 1 > /proc/sys/vm/vm_gfp_debug
bash: /proc/sys/vm/vm_gfp_debug: No such file or directory
# echo 1 > /proc/sys/vm/            
bdflush            max-readahead      min-readahead      page-cluster
kswapd             max_map_count      overcommit_memory  pagetable_cache
# tail -f /var/log/messages
Dec 22 13:28:34 u8668 kernel: Out of Memory: Killed process 441 (named).
Dec 22 13:28:34 u8668 kernel: Out of Memory: Killed process 443 (named).
Dec 22 13:28:34 u8668 kernel: Out of Memory: Killed process 444 (named).
Dec 22 13:28:34 u8668 kernel: Out of Memory: Killed process 445 (named).
Dec 22 13:28:34 u8668 kernel: Out of Memory: Killed process 446 (named).
Dec 22 13:28:34 u8668 kernel: Out of Memory: Killed process 447 (named).
Dec 22 13:28:42 u8668 kernel: Out of Memory: Killed process 750 (mysqld).
Dec 22 13:28:42 u8668 kernel: Out of Memory: Killed process 760 (mysqld).
Dec 22 13:28:42 u8668 kernel: Out of Memory: Killed process 761 (mysqld).
Dec 22 13:28:48 u8668 kernel: Out of Memory: Killed process 636 (httpd).
Dec 22 13:28:57 u8668 kernel: Out of Memory: Killed process 637 (httpd).
Dec 22 13:29:03 u8668 kernel: Out of Memory: Killed process 638 (httpd).
Dec 22 13:29:14 u8668 kernel: Out of Memory: Killed process 639 (httpd).
SOFTDOG: Initiating system reboot.

Thanks for help

Octave


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22 12:30             ` Octave
@ 2003-12-22 15:17               ` Andrea Arcangeli
  2003-12-23 22:59                 ` Octave
  0 siblings, 1 reply; 27+ messages in thread
From: Andrea Arcangeli @ 2003-12-22 15:17 UTC (permalink / raw)
  To: Octave; +Cc: Marcelo Tosatti, linux-kernel

On Mon, Dec 22, 2003 at 01:30:36PM +0100, Octave wrote:
> > Hi Octave,
> > 
> > What do you mean with "server is down" ? The OOM killer killed an
> > application ? What were the messages?
> > 
> > Under out of memory, 2.4.22 should also kill a process, but you say it
> > doesnt.
> 
> Marcelo,
> 
> All I have with 
> - 2.4.24-pre1 is
> # echo 1 > /proc/sys/vm/vm_gfp_debug        
> # for i in `seq 1 100`; do ./full.pl &  done
> [1] 849
> [2] 850
> [...]                                                                                                 
> # tail -f /var/log/messages                                                                                                                    
> [...]
> 
> SOFTDOG: Initiating system reboot.

your softdog is too strict for the workload you're running. You can't
pretend a low latency scheduling behaviour with hundres oom. what you
see is perfectly normal.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22  7:03                     ` Ville Herva
@ 2003-12-22 18:35                       ` Mike Fedyk
  2003-12-22 21:12                         ` Ville Herva
  2003-12-22 19:09                       ` Andrea Arcangeli
  1 sibling, 1 reply; 27+ messages in thread
From: Mike Fedyk @ 2003-12-22 18:35 UTC (permalink / raw)
  To: Ville Herva, Marcelo Tosatti, linux-kernel, andrea

On Mon, Dec 22, 2003 at 09:03:44AM +0200, Ville Herva wrote:
> BTW, I have a box with 128MB ram and 512MB swap running 2.4.21-jam1 (it has
> the -aa vm). I can't shut it down cleanly, because trying it goes into
> endless loop trying to free memory when turning off swap. Nothing but
> alt-sysrq-b seems to work.
> 
> I don't know if there is a kernel memory leak, since all user level
> processes should be killed at that point, right? Unfortunately I didn't have
> time to dig deeper, as the box is in (sort of) production.

Maybe, it depends on your init scripts.  Does your distribution do a kill -9
of all processes before turning off swap?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22  7:03                     ` Ville Herva
  2003-12-22 18:35                       ` Mike Fedyk
@ 2003-12-22 19:09                       ` Andrea Arcangeli
  2003-12-22 21:18                         ` Ville Herva
  1 sibling, 1 reply; 27+ messages in thread
From: Andrea Arcangeli @ 2003-12-22 19:09 UTC (permalink / raw)
  To: Ville Herva, Marcelo Tosatti, linux-kernel

On Mon, Dec 22, 2003 at 09:03:44AM +0200, Ville Herva wrote:
> On Sun, Dec 21, 2003 at 11:23:38PM +0100, you [Willy Tarreau] wrote:
> > >
> > > Dec 21 22:08:44 stock kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > Dec 21 22:08:44 stock kernel: OOM: nr_swap_pages=0cd865e6c c012e1e8 c0262e3c 00000000 000001d2 00000000 00000001 cd863c00 
> > 
> > OK, so there's no available swap anymore (nr_swap_pages=0, Marcelo forgot the
> > '\n' in the patch). I simply think that with other kernels, you're very short
> > of memory, but it runs, while with this one, all memory gets consumed, and
> > since there's no smart oom killer, one process has to get killed.
> 
> BTW, I have a box with 128MB ram and 512MB swap running 2.4.21-jam1 (it has
> the -aa vm). I can't shut it down cleanly, because trying it goes into
> endless loop trying to free memory when turning off swap. Nothing but
> alt-sysrq-b seems to work.
> 
> I don't know if there is a kernel memory leak, since all user level
> processes should be killed at that point, right? Unfortunately I didn't have
> time to dig deeper, as the box is in (sort of) production.

if this is a leak, I doubt it has been introduced recently, the only
swap accounting related change was in the shm layer, and it was supposed
to be a race fix. You may want to check if you've some shm allocated
in /dev/shm or ipcs, while the machine reboots.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22 18:35                       ` Mike Fedyk
@ 2003-12-22 21:12                         ` Ville Herva
  2003-12-22 22:52                           ` Gene Heskett
  0 siblings, 1 reply; 27+ messages in thread
From: Ville Herva @ 2003-12-22 21:12 UTC (permalink / raw)
  To: Marcelo Tosatti, linux-kernel, andrea

On Mon, Dec 22, 2003 at 10:35:54AM -0800, you [Mike Fedyk] wrote:
> > 
> > I don't know if there is a kernel memory leak, since all user level
> > processes should be killed at that point, right? Unfortunately I didn't have
> > time to dig deeper, as the box is in (sort of) production.
> 
> Maybe, it depends on your init scripts.  Does your distribution do a kill -9
> of all processes before turning off swap?

(It's a 7.0 Red Hat).

It does 
   runcmd "Sending all processes the KILL signal..."  /sbin/killall5 -9
before
   [ -n "$SWAPS" ] && runcmd "Turning off swap: " swapoff $SWAPS
in /etc/rc6.d/S01reboot and I've seen the "Sending all processes the KILL
signal..." message appear before the memory freeing loop starts rolling.

 
-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22 19:09                       ` Andrea Arcangeli
@ 2003-12-22 21:18                         ` Ville Herva
  0 siblings, 0 replies; 27+ messages in thread
From: Ville Herva @ 2003-12-22 21:18 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: Marcelo Tosatti, linux-kernel

On Mon, Dec 22, 2003 at 08:09:43PM +0100, you [Andrea Arcangeli] wrote:
> On Mon, Dec 22, 2003 at 09:03:44AM +0200, Ville Herva wrote:
> > 
> > BTW, I have a box with 128MB ram and 512MB swap running 2.4.21-jam1 (it has
> > the -aa vm). I can't shut it down cleanly, because trying it goes into
> > endless loop trying to free memory when turning off swap. Nothing but
> > alt-sysrq-b seems to work.
> > 
> > I don't know if there is a kernel memory leak, since all user level
> > processes should be killed at that point, right? Unfortunately I didn't have
> > time to dig deeper, as the box is in (sort of) production.
> 
> if this is a leak, I doubt it has been introduced recently, the only
> swap accounting related change was in the shm layer, and it was supposed
> to be a race fix. You may want to check if you've some shm allocated
> in /dev/shm or ipcs, while the machine reboots.

I'll check that the next time I reboot. 

Unfortunately the box is in (semi-)production so I can't reboot it at will.

 
-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22 21:12                         ` Ville Herva
@ 2003-12-22 22:52                           ` Gene Heskett
  2003-12-23  8:51                             ` Ville Herva
  0 siblings, 1 reply; 27+ messages in thread
From: Gene Heskett @ 2003-12-22 22:52 UTC (permalink / raw)
  To: Ville Herva, Marcelo Tosatti, linux-kernel, andrea

On Monday 22 December 2003 16:12, Ville Herva wrote:
>On Mon, Dec 22, 2003 at 10:35:54AM -0800, you [Mike Fedyk] wrote:
>> > I don't know if there is a kernel memory leak, since all user
>> > level processes should be killed at that point, right?
>> > Unfortunately I didn't have time to dig deeper, as the box is in
>> > (sort of) production.
>>
>> Maybe, it depends on your init scripts.  Does your distribution do
>> a kill -9 of all processes before turning off swap?
>
>(It's a 7.0 Red Hat).
>
>It does
>   runcmd "Sending all processes the KILL signal..."  /sbin/killall5
> -9 before
>   [ -n "$SWAPS" ] && runcmd "Turning off swap: " swapoff $SWAPS
>in /etc/rc6.d/S01reboot and I've seen the "Sending all processes the
> KILL signal..." message appear before the memory freeing loop
> starts rolling.

If its a pristine rh7.0 install, that version of bind has a notorious 
rootkit hole.  So I wonder if the machine has been kitted by some 
script kiddie whose good at covering his tracks but not the rest of 
the housekeeping.  Do a google search for "chkrootkit", and install 
it to get a better view of that possibility.

An OS upgrade does seem to be in order, lots has happened since 7.0.  
7.3 with an updated kernel is my firewall, uptime is about 95 days 
now.  It was shut down while I was out of state for a couple of 
months last fall.  RH8.0 on this machine, with the real KDE-3.1.1a, 
but since RH is gonna force a switch, debian may be the next thing 
installed here.  Or maybe Mepis since he's only 50 miles up the 
interstate from me. :)


>-- v --
>
>v@iki.fi
>-
>To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

-- 
Cheers, Gene
AMD K6-III@500mhz 320M
Athlon1600XP@1400mhz  512M
99.22% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attornies please note, additions to this message
by Gene Heskett are:
Copyright 2003 by Maurice Eugene Heskett, all rights reserved.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22 22:52                           ` Gene Heskett
@ 2003-12-23  8:51                             ` Ville Herva
  0 siblings, 0 replies; 27+ messages in thread
From: Ville Herva @ 2003-12-23  8:51 UTC (permalink / raw)
  To: Gene Heskett; +Cc: Marcelo Tosatti, linux-kernel, andrea

On Mon, Dec 22, 2003 at 05:52:01PM -0500, you [Gene Heskett] wrote:
> >
> >(It's a 7.0 Red Hat).
> >
> >It does
> >   runcmd "Sending all processes the KILL signal..."  /sbin/killall5
> > -9 before
> >   [ -n "$SWAPS" ] && runcmd "Turning off swap: " swapoff $SWAPS
> >in /etc/rc6.d/S01reboot and I've seen the "Sending all processes the
> > KILL signal..." message appear before the memory freeing loop
> > starts rolling.
> 
> If its a pristine rh7.0 install, that version of bind has a notorious 
> rootkit hole.  

Pristine - hell no. None of my install are pristine :).

Bind - hell no. I may be^W^Wam stupid, but does *anyone* put boxes in
production without disabling services first?

> So I wonder if the machine has been kitted by some 
> script kiddie whose good at covering his tracks but not the rest of 
> the housekeeping.

Uh, it may be rooted allright, but I seriously doubt that is cause of this
symptom.

> An OS upgrade does seem to be in order, lots has happened since 7.0.  

I know that, but I'm not exactly lacking interesting admin things to do so
that I would run around upgrading all the boxes there are lying around on
the corners each time a distro upgrade comes up. The box works for what it
is supposed to do (a miracle, that, considering it's a 200MHz/64MB
screamer), it's behind a fw, most services are disabled. And it's actually
being phased out of production anyway.

Sure, if I had endlessly time on my hands, I'd upgrade the distro, but right
now, I'll have to settle with just keeping the crucial services up-to-date.

> 7.3 with an updated kernel is my firewall, uptime is about 95 days 
> now. 

No problem with getting good uptimes with this kernel (2.4.21-jam1). I only
had to boot it to install the do_brk patch.

> It was shut down while I was out of state for a couple of 
> months last fall.

Longest uptimes I've got are with 2.0 and 2.2 so far. And I have a number of
RH 6.2 distros running still. I see no fundamental problem with an old
distro (as long as you know the down sides (keep the services up-to-date, no
>2GB file support with all applications etc etc).


-- v --

v@iki.fi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: lot of VM problem with 2.4.23
  2003-12-22 15:17               ` Andrea Arcangeli
@ 2003-12-23 22:59                 ` Octave
  0 siblings, 0 replies; 27+ messages in thread
From: Octave @ 2003-12-23 22:59 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

> your softdog is too strict for the workload you're running. You can't
> pretend a low latency scheduling behaviour with hundres oom. what you
> see is perfectly normal.

You are right. I can see oom killer's work without watchdog. It takes
1 hour before all bad process are killed, since oom kills first the
process with shared memory (named, httpd, mysql etc), then the bad
process. I think it's the right way for general used.

Thanks
Octave


^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2003-12-23 23:01 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-12-21  0:14 lot of VM problem with 2.4.23 Octave
2003-12-21  9:30 ` Peter Zaitsev
2003-12-21 14:37   ` Marcelo Tosatti
2003-12-21 15:03     ` Octave
2003-12-21 15:42       ` Willy Tarreau
2003-12-21 16:13         ` Octave
2003-12-21 18:45           ` Marcelo Tosatti
2003-12-21 19:14             ` Octave
2003-12-21 20:45               ` Marcelo Tosatti
2003-12-21 21:09                 ` Octave
2003-12-21 22:23                   ` Willy Tarreau
2003-12-22  7:03                     ` Ville Herva
2003-12-22 18:35                       ` Mike Fedyk
2003-12-22 21:12                         ` Ville Herva
2003-12-22 22:52                           ` Gene Heskett
2003-12-23  8:51                             ` Ville Herva
2003-12-22 19:09                       ` Andrea Arcangeli
2003-12-22 21:18                         ` Ville Herva
2003-12-21 18:47     ` Octave
2003-12-21 18:59       ` Tomas Szepe
2003-12-21 23:43         ` Octave
2003-12-22 11:27           ` Marcelo Tosatti
2003-12-22 12:30             ` Octave
2003-12-22 15:17               ` Andrea Arcangeli
2003-12-23 22:59                 ` Octave
2003-12-21 10:43 ` bert hubert
     [not found] <14ZDV-1H1-1@gated-at.bofh.it>
2003-12-21 13:53 ` Kristian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox