Re: oom-killer killing even if memory is available?

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mel Gorman <mel@csn.ul.ie>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>,
	linux-mm@kvack.org, Nick Piggin <npiggin@suse.de>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Andreas Krebbel <krebbel@linux.vnet.ibm.com>
Subject: Re: oom-killer killing even if memory is available?
Date: Fri, 20 Mar 2009 15:27:00 +0000	[thread overview]
Message-ID: <20090320152700.GM24586@csn.ul.ie> (raw)
In-Reply-To: <20090317024605.846420e1.akpm@linux-foundation.org>

On Tue, Mar 17, 2009 at 02:46:05AM -0700, Andrew Morton wrote:
> On Tue, 17 Mar 2009 10:00:49 +0100 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> 
> > Hi all,
> > 
> > the below looks like there is some bug in the memory management code.
> > Even if there seems to be plenty of memory available the oom-killer
> > kills processes.
> > 
> > The below happened after 27 days uptime, memory seems to be heavily
> > fragmented, but there are stills larger portions of memory free that
> > could satisfy an order 2 allocation. Any idea why this fails?
> > 

You are hitting the watermark code for the order-2 allocation in all
liklihood. This looks like a GFP_KERNEL allocation so ordinarily it's a
bit surprising.

> > [root@t6360003 ~]# uptime
> >  09:33:41 up 27 days, 22:55,  1 user,  load average: 0.00, 0.00, 0.00
> > 
> > Mar 16 21:40:40 t6360003 kernel: basename invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
> > Mar 16 21:40:40 t6360003 kernel: CPU: 0 Not tainted 2.6.28 #1
> > Mar 16 21:40:40 t6360003 kernel: Process basename (pid: 30555, task: 000000007baa6838, ksp: 0000000063867968)
> > Mar 16 21:40:40 t6360003 kernel: 0700000084a8c238 0000000063867a90 0000000000000002 0000000000000000 
> > Mar 16 21:40:40 t6360003 kernel:        0000000063867b30 0000000063867aa8 0000000063867aa8 000000000010534e 
> > Mar 16 21:40:40 t6360003 kernel:        0000000000000000 0000000063867968 0000000000000000 000000000000000a 
> > Mar 16 21:40:40 t6360003 kernel:        000000000000000d 0000000000000000 0000000063867a90 0000000063867b08 
> > Mar 16 21:40:40 t6360003 kernel:        00000000004a5ab0 000000000010534e 0000000063867a90 0000000063867ae0 
> > Mar 16 21:40:40 t6360003 kernel: Call Trace:
> > Mar 16 21:40:40 t6360003 kernel: ([<0000000000105248>] show_trace+0xf4/0x144)
> > Mar 16 21:40:40 t6360003 kernel:  [<0000000000105300>] show_stack+0x68/0xf4
> > Mar 16 21:40:40 t6360003 kernel:  [<000000000049c84c>] dump_stack+0xb0/0xc0
> > Mar 16 21:40:40 t6360003 kernel:  [<000000000019235e>] oom_kill_process+0x9e/0x220
> > Mar 16 21:40:40 t6360003 kernel:  [<0000000000192c30>] out_of_memory+0x17c/0x264
> > Mar 16 21:40:40 t6360003 kernel:  [<000000000019714e>] __alloc_pages_internal+0x4f6/0x534
> > Mar 16 21:40:40 t6360003 kernel:  [<0000000000104058>] crst_table_alloc+0x48/0x108
> > Mar 16 21:40:40 t6360003 kernel:  [<00000000001a3f60>] __pmd_alloc+0x3c/0x1a8
> > Mar 16 21:40:40 t6360003 kernel:  [<00000000001a802e>] handle_mm_fault+0x262/0x9cc
> > Mar 16 21:40:40 t6360003 kernel:  [<00000000004a1a7a>] do_dat_exception+0x30a/0x41c
> > Mar 16 21:40:40 t6360003 kernel:  [<0000000000115e5c>] sysc_return+0x0/0x8
> > Mar 16 21:40:40 t6360003 kernel:  [<0000004d193bfae0>] 0x4d193bfae0
> > Mar 16 21:40:40 t6360003 kernel: Mem-Info:
> > Mar 16 21:40:40 t6360003 kernel: DMA per-cpu:
> > Mar 16 21:40:40 t6360003 kernel: CPU    0: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    1: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    2: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    3: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    4: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    5: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: Normal per-cpu:
> > Mar 16 21:40:40 t6360003 kernel: CPU    0: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    1: hi:  186, btch:  31 usd:  30
> > Mar 16 21:40:40 t6360003 kernel: CPU    2: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    3: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    4: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: CPU    5: hi:  186, btch:  31 usd:   0
> > Mar 16 21:40:40 t6360003 kernel: Active_anon:372 active_file:45 inactive_anon:154
> > Mar 16 21:40:40 t6360003 kernel:  inactive_file:152 unevictable:987 dirty:0 writeback:188 unstable:0
> > Mar 16 21:40:40 t6360003 kernel:  free:146348 slab:875833 mapped:805 pagetables:378 bounce:0
> > Mar 16 21:40:40 t6360003 kernel: DMA free:467728kB min:4064kB low:5080kB high:6096kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:116kB unevictable:0kB present:2068480kB pages_scanned:0 all_unreclaimable? no
> > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 2020 2020
> > Mar 16 21:40:40 t6360003 kernel: Normal free:117664kB min:4064kB low:5080kB high:6096kB active_anon:1488kB inactive_anon:616kB active_file:188kB inactive_file:492kB unevictable:3948kB present:2068480kB pages_scanned:128 all_unreclaimable? no
> > Mar 16 21:40:40 t6360003 kernel: lowmem_reserve[]: 0 0 0
> 
> The scanner has wrung pretty much all it can out of the reclaimable pages -
> the LRUs are nearly empty.  There's a few hundred MB free and apparently we
> don't have four physically contiguous free pages anywhere.  It's
> believeable.
> 
> The question is: where the heck did all your memory go?  You have 2GB of
> ZONE_NORMAL memory in that machine, but only a tenth of it is visible to
> the page reclaim code.
> 
> Something must have allocated (and possibly leaked) it.
> 

This looks like a memory leak all right. There used to be a patch that
recorded a stack trace for every page allocation but it was dropped from
-mm ages ago because of a merge conflict. I didn't revive it at the time
because it wasn't of immediate concern.

Should I revive the patch or do we have preferred ways of tracking down
memory leaks these days?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2009-03-20 15:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-03-17  9:00 oom-killer killing even if memory is available? Heiko Carstens
2009-03-17  9:46 ` Andrew Morton
2009-03-17 10:17   ` Heiko Carstens
2009-03-17 10:28     ` Heiko Carstens
2009-03-17 10:49       ` Nick Piggin
2009-03-17 11:39         ` Heiko Carstens
2009-03-20  5:08         ` Wu Fengguang
2009-03-20 15:27   ` Mel Gorman [this message]
2009-03-20 21:02     ` Andrew Morton
2009-03-23 11:55       ` Mel Gorman
2009-03-23 14:58       ` Mel Gorman
2009-03-17  9:51 ` Nick Piggin
2009-03-17 10:11   ` Heiko Carstens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090320152700.GM24586@csn.ul.ie \
    --to=mel@csn.ul.ie \
    --cc=akpm@linux-foundation.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=krebbel@linux.vnet.ibm.com \
    --cc=linux-mm@kvack.org \
    --cc=npiggin@suse.de \
    --cc=schwidefsky@de.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).