public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* OOM in 2.6.19-rc*
@ 2006-11-11 16:40 Christian Kujau
  2006-11-11 17:23 ` Benoit Boissinot
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Christian Kujau @ 2006-11-11 16:40 UTC (permalink / raw)
  To: linux-kernel

Hello,

a few days ago I upgraded my desktop machine (x86_64) to ubuntu/edgy 
thus completely changing the userland. Since I'm using kernel.org 
kernels I upgraded to a current kernel as well (2.6.19-rc4-git from Nov 
4 and 2.6.19-rc4-mm2). Now, while working under X11, probably reading 
email, all of a sudden the machine was not responsible any more and the 
disk was spinning like wild. The desktop applet showed all swap being 
used up then the display froze too and ~5 min later the machine came 
back with the gnome-login screen: it had not rebooted but ran OOM and 
several apps got killed.

OK, must be some application leaking memory, I thought, that's what 
happens to new userland version. Looking at the syslog, "nautilus" 
(gnome filemanager) invoked the oom killer. OK, but the scenario 
repeated the next day, early in the morning when I was not even on the
box, saying it was nautilus again.
In the last days other applications seem to invoke the OOM killer as
well and I wonder if each one of them is really to blame for leaking 
memory or something else would be responsible for the killings. Here's 
log output, each listing the first appliction triggering the OOM killer:

# for i in /var/log/messages*; do (zgrep "invoked" "$i" | head -1 ); done
Nov 11 08:04:16 prinz64 kernel: [104237.902269] firefox-bin invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Nov 10 07:59:34 prinz64 kernel: [64627.382818] Xorg invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Nov  9 07:59:22 prinz64 kernel: [25047.487534] rpc.idmapd invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=-17
Nov  8 17:33:59 prinz64 kernel: [  919.954547] beep-media-play invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Nov  7 18:55:23 prinz64 kernel: [  842.590646] firefox-bin invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Nov  5 07:55:34 prinz64 kernel: [18128.545690] nautilus invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0
Nov  4 17:31:23 prinz64 kernel: [  688.904652] nautilus invoked oom-killer: gfp_mask=0x201d2, order=0, oomkilladj=0

The kernels running when these were happening:
Nov  4 - 2.6.19-rc2
Nov  5 - 2.6.19-rc2
Nov  7 - 2.6.19-rc4-mm2
Nov  8 - 2.6.19-rc4
Nov  9 - 2.6.19-rc4
Nov 10 - 2.6.19-rc4
Nov 11 - 2.6.19-rc4

Because killing these application does not seem to free up memory, 
plenty of other applications got killed shortly after this. Full logs
and .config can be found here: http://nerdbynature.de/bits/2.6.19-rc4/

I do notice anacron running just before the killings - but: even *if* 
anacron runs a mem-leaking program: should the OOM killer just kill that 
app and not the (probably) innocent ones in the first place?

Thanks for your thoughts,
Christian.
-- 
BOFH excuse #194:

We only support a 1200 bps connection.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 16:40 OOM in 2.6.19-rc* Christian Kujau
@ 2006-11-11 17:23 ` Benoit Boissinot
  2006-11-11 18:31   ` Christian Kujau
  2006-11-11 18:19 ` Adrian Bunk
  2006-11-12  9:15 ` Arjan van de Ven
  2 siblings, 1 reply; 10+ messages in thread
From: Benoit Boissinot @ 2006-11-11 17:23 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel

On 11/11/06, Christian Kujau <evil@g-house.de> wrote:
> Hello,
>
> a few days ago I upgraded my desktop machine (x86_64) to ubuntu/edgy
> thus completely changing the userland. Since I'm using kernel.org
> kernels I upgraded to a current kernel as well (2.6.19-rc4-git from Nov
> 4 and 2.6.19-rc4-mm2). Now, while working under X11, probably reading
> email, all of a sudden the machine was not responsible any more and the
> disk was spinning like wild. The desktop applet showed all swap being
> used up then the display froze too and ~5 min later the machine came
> back with the gnome-login screen: it had not rebooted but ran OOM and
> several apps got killed.
>
Just a thought, do you have a swap activated ? (there is a bug in edgy
where the swap isn't mounted)

regards,

Benoit

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 16:40 OOM in 2.6.19-rc* Christian Kujau
  2006-11-11 17:23 ` Benoit Boissinot
@ 2006-11-11 18:19 ` Adrian Bunk
  2006-11-11 18:38   ` Christian Kujau
  2006-11-12  9:15 ` Arjan van de Ven
  2 siblings, 1 reply; 10+ messages in thread
From: Adrian Bunk @ 2006-11-11 18:19 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel

On Sat, Nov 11, 2006 at 04:40:17PM +0000, Christian Kujau wrote:
> Hello,
> 
> a few days ago I upgraded my desktop machine (x86_64) to ubuntu/edgy 
> thus completely changing the userland. Since I'm using kernel.org 
> kernels I upgraded to a current kernel as well (2.6.19-rc4-git from Nov 
> 4 and 2.6.19-rc4-mm2). Now, while working under X11, probably reading 
> email, all of a sudden the machine was not responsible any more and the 
> disk was spinning like wild. The desktop applet showed all swap being 
> used up then the display froze too and ~5 min later the machine came 
> back with the gnome-login screen: it had not rebooted but ran OOM and 
> several apps got killed.
>...

Can you test whether an older kernel (preferably the one that worked 
before) shows the same problem?

This way you might know whether it's a kernel problem or a distribution 
problem.

> Thanks for your thoughts,
> Christian.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 17:23 ` Benoit Boissinot
@ 2006-11-11 18:31   ` Christian Kujau
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2006-11-11 18:31 UTC (permalink / raw)
  To: Benoit Boissinot; +Cc: linux-kernel

On Sat, 11 Nov 2006, Benoit Boissinot wrote:
> On 11/11/06, Christian Kujau <evil@g-house.de> wrote:
> Just a thought, do you have a swap activated ? (there is a bug in edgy
> where the swap isn't mounted)

ah, forgot to mention this: yes, swap was activated (~300 MB swapfile, 
box has 1GB RAM) but I disabled it after the 2nd incident because I 
thought the machine would recover faster from OOM when no swap to fill 
up was available...didn't help much though :(

Thanks,
Christan.
-- 
BOFH excuse #142:

new guy cross-connected phone lines with ac power bus.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 18:19 ` Adrian Bunk
@ 2006-11-11 18:38   ` Christian Kujau
  2006-11-11 18:53     ` Adrian Bunk
  2006-11-11 20:38     ` Tim Schmielau
  0 siblings, 2 replies; 10+ messages in thread
From: Christian Kujau @ 2006-11-11 18:38 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: linux-kernel

On Sat, 11 Nov 2006, Adrian Bunk wrote:
> Can you test whether an older kernel (preferably the one that worked
> before) shows the same problem?

I could try 2.6.17...but currently I don't know how to reproduce the OOM 
condition - so I'd have to wait 24h until *something* happens and the 
OOM killer kicks in.

> This way you might know whether it's a kernel problem or a distribution
> problem.

I think I'm more interested as to why the OOM killer seems to kill 
innocent apps at random. I can imagine that it's not easy for the kernel 
to tell which userland-application is using up too much memory. Hm, 
egrep -r "OOM|ut of memory" Documentation/    does not reveal much :(

Thanks,
Christian.
-- 
BOFH excuse #362:

Plasma conduit breach

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 18:38   ` Christian Kujau
@ 2006-11-11 18:53     ` Adrian Bunk
  2006-11-11 19:12       ` Christian Kujau
  2006-11-11 20:38     ` Tim Schmielau
  1 sibling, 1 reply; 10+ messages in thread
From: Adrian Bunk @ 2006-11-11 18:53 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel

On Sat, Nov 11, 2006 at 06:38:05PM +0000, Christian Kujau wrote:
> On Sat, 11 Nov 2006, Adrian Bunk wrote:
> >Can you test whether an older kernel (preferably the one that worked
> >before) shows the same problem?
> 
> I could try 2.6.17...but currently I don't know how to reproduce the OOM 
> condition - so I'd have to wait 24h until *something* happens and the 
> OOM killer kicks in.

If you want to know what caused your provlem, this is the logical first 
step.

> >This way you might know whether it's a kernel problem or a distribution
> >problem.
> 
> I think I'm more interested as to why the OOM killer seems to kill 
> innocent apps at random. I can imagine that it's not easy for the kernel 
> to tell which userland-application is using up too much memory. Hm, 
> egrep -r "OOM|ut of memory" Documentation/    does not reveal much :(

mm/oom_kill.c is well documented.

> Thanks,
> Christian.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 18:53     ` Adrian Bunk
@ 2006-11-11 19:12       ` Christian Kujau
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2006-11-11 19:12 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: linux-kernel

On Sat, 11 Nov 2006, Adrian Bunk wrote:
> mm/oom_kill.c is well documented.

Thanks, I'll take a look.

Christian.
-- 
BOFH excuse #353:

Second-system effect.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 18:38   ` Christian Kujau
  2006-11-11 18:53     ` Adrian Bunk
@ 2006-11-11 20:38     ` Tim Schmielau
  1 sibling, 0 replies; 10+ messages in thread
From: Tim Schmielau @ 2006-11-11 20:38 UTC (permalink / raw)
  To: Christian Kujau; +Cc: Adrian Bunk, linux-kernel

On Sat, 11 Nov 2006, Christian Kujau wrote:

> I think I'm more interested as to why the OOM killer seems to kill innocent
> apps at random. I can imagine that it's not easy for the kernel to tell which
> userland-application is using up too much memory. Hm, egrep -r "OOM|ut of
> memory" Documentation/    does not reveal much :(

A look at /proc/*/oom_score might shed some light on the "at random" part.
I.e., doing
  for job in /proc/[0-9]* ; do \
    echo -e "`cat $job/oom_score` \t $job \t `head -c50 $job/cmdline`"; \
  done | sort -n
the last process listed is considered the biggest memory hog of the 
moment (Of course, this still does not tell _why_).

Tim

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-11 16:40 OOM in 2.6.19-rc* Christian Kujau
  2006-11-11 17:23 ` Benoit Boissinot
  2006-11-11 18:19 ` Adrian Bunk
@ 2006-11-12  9:15 ` Arjan van de Ven
  2006-11-13  0:56   ` Christian Kujau
  2 siblings, 1 reply; 10+ messages in thread
From: Arjan van de Ven @ 2006-11-12  9:15 UTC (permalink / raw)
  To: Christian Kujau; +Cc: linux-kernel

On Sat, 2006-11-11 at 16:40 +0000, Christian Kujau wrote:
> Hello,
> 
> a few days ago I upgraded my desktop machine (x86_64) to ubuntu/edgy 
> thus completely changing the userland. Since I'm using kernel.org 
> kernels I upgraded to a current kernel as well (2.6.19-rc4-git from Nov 
> 4 and 2.6.19-rc4-mm2).

which modules/drivers do you use? Maybe there's a less commonly used on
in there that we could look at.
(The assumption is that all commonly used ones would have shown up
en-masse on lkml if there was a big leak in them; rarer ones less so)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: OOM in 2.6.19-rc*
  2006-11-12  9:15 ` Arjan van de Ven
@ 2006-11-13  0:56   ` Christian Kujau
  0 siblings, 0 replies; 10+ messages in thread
From: Christian Kujau @ 2006-11-13  0:56 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: linux-kernel

Oh dear, Murphy hits again....or was it Heisenberg? Since I posted to 
lkml the daily OOM killings went away. I'm running 2.6.19-rc5-mm1 right 
now and no OOM situation today..phew.

On Sun, 12 Nov 2006, Arjan van de Ven wrote:
> which modules/drivers do you use? Maybe there's a less commonly used on
> in there that we could look at.

Thanks for your reply (all your replies!), FWIW:

# lsmod
Module                  Size  Used by
dm_crypt               12304  0
dm_mod                 55280  1 dm_crypt
powernow_k8            10584  0
freq_table              4168  1 powernow_k8
w83627hf               28944  0
hwmon_vid               3648  1 w83627hf
eeprom                  6992  0
i2c_dev                 7368  0
i2c_isa                 5184  1 w83627hf
ide_cd                 39520  0
cdrom                  37160  1 ide_cd
ide_disk               14272  0
ata_generic             6468  0
libata                106920  1 ata_generic
qla2xxx               154668  0
firmware_class          9216  1 qla2xxx
snd_intel8x0           32872  2
snd_ac97_codec        108440  1 snd_intel8x0
snd_ac97_bus            2816  1 snd_ac97_codec
ohci1394               33032  0
ieee1394               93168  1 ohci1394
snd_pcm_oss            41440  0
snd_mixer_oss          16512  1 snd_pcm_oss
snd_pcm                74828  3 snd_intel8x0,snd_ac97_codec,snd_pcm_oss
snd_timer              22536  1 snd_pcm
k8temp                  5440  0
scsi_transport_fc      39492  1 qla2xxx
i2c_nforce2             5696  0
i2c_core               20056  5 w83627hf,eeprom,i2c_dev,i2c_isa,i2c_nforce2
amd74xx                15344  0 [permanent]
ide_core              130300  3 ide_cd,ide_disk,amd74xx
snd                    56680  10 snd_intel8x0,snd_ac97_codec,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer
soundcore               7648  1 snd
hwmon                   3168  2 w83627hf,k8temp
snd_page_alloc          8464  2 snd_intel8x0,snd_pcm

# uname -a
Linux prinz64 2.6.19-rc5-mm1 #4 PREEMPT Sat Nov 11 16:02:25 GMT 2006 x86_64 GNU/Linux


Christian.
-- 
BOFH excuse #21:

POSIX compliance problem

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2006-11-13  0:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-11 16:40 OOM in 2.6.19-rc* Christian Kujau
2006-11-11 17:23 ` Benoit Boissinot
2006-11-11 18:31   ` Christian Kujau
2006-11-11 18:19 ` Adrian Bunk
2006-11-11 18:38   ` Christian Kujau
2006-11-11 18:53     ` Adrian Bunk
2006-11-11 19:12       ` Christian Kujau
2006-11-11 20:38     ` Tim Schmielau
2006-11-12  9:15 ` Arjan van de Ven
2006-11-13  0:56   ` Christian Kujau

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox