Re: oomkillers gone wild.

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Dave Jones <davej@redhat.com>
To: David Rientjes <rientjes@google.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: oomkillers gone wild.
Date: Tue, 5 Jun 2012 14:52:39 -0400	[thread overview]
Message-ID: <20120605185239.GA28172@redhat.com> (raw)
In-Reply-To: <20120605174454.GA23867@redhat.com>

On Tue, Jun 05, 2012 at 01:44:54PM -0400, Dave Jones wrote:
 > On Mon, Jun 04, 2012 at 04:30:57PM -0700, David Rientjes wrote:
 >  > On Mon, 4 Jun 2012, Dave Jones wrote:
 >  > 
 >  > > we picked this..
 >  > > 
 >  > > [21623.066911] [  588]     0   588    22206        1   2       0             0 dhclient
 >  > > 
 >  > > over say..
 >  > > 
 >  > > [21623.116597] [ 7092]  1000  7092  1051124    31660   3       0             0 trinity-child3
 >  > > 
 >  > > What went wrong here ?
 >  > > 
 >  > > And why does that score look so.. weird.
 >  > > 
 >  > 
 >  > It sounds like it's because pid 588 has uid=0 and the adjustment for root 
 >  > processes is causing an overflow.  I assume this fixes it?
 > 
 > Still doesn't seem right..
 > 
 > eg..
 > 
 > [42309.542776] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
 > ..
 > [42309.553933] [  500]    81   500     5435        1   4     -13          -900 dbus-daemon
 > ..
 > [42309.597531] [ 9054]  1000  9054   528677    14540   3       0             0 trinity-child3
 > ..
 > 
 > [42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
 > [42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB
 > 
 > and a slew of similar 'wrong process' death spiral kills follows..

So after manually killing all the greedy processes, and getting the box to stop oom-killing
random things, it settled down. But I noticed something odd, that I think I also saw a few
weeks ago..

# free
             total       used       free     shared    buffers     cached
Mem:       3886296    3666924     219372          0       2904      20008
-/+ buffers/cache:    3644012     242284
Swap:      6029308      14488    6014820

What's using up that memory ?

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME 
142524 142420  99%    9.67K  47510	  3   1520320K task_struct
142560 142417  99%    1.75K   7920	 18    253440K signal_cache
142428 142302  99%    1.19K   5478	 26    175296K task_xstate
306064 289292  94%    0.36K   6956	 44    111296K debug_objects_cache
143488 143306  99%    0.50K   4484	 32     71744K cred_jar
142560 142421  99%    0.50K   4455       32     71280K task_delay_info
150753 145021  96%    0.45K   4308	 35     68928K kmalloc-128

Why so many task_structs ? There's only 128 processes running, and most of them
are kernel threads.

/sys/kernel/slab/task_struct/alloc_calls shows..

 142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7

I get the impression that the oom-killer hasn't cleaned up properly after killing some of
those forked processes.

any thoughts ?

	Dave


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Dave Jones <davej@redhat.com>
To: David Rientjes <rientjes@google.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: oomkillers gone wild.
Date: Tue, 5 Jun 2012 14:52:39 -0400	[thread overview]
Message-ID: <20120605185239.GA28172@redhat.com> (raw)
In-Reply-To: <20120605174454.GA23867@redhat.com>

On Tue, Jun 05, 2012 at 01:44:54PM -0400, Dave Jones wrote:
 > On Mon, Jun 04, 2012 at 04:30:57PM -0700, David Rientjes wrote:
 >  > On Mon, 4 Jun 2012, Dave Jones wrote:
 >  > 
 >  > > we picked this..
 >  > > 
 >  > > [21623.066911] [  588]     0   588    22206        1   2       0             0 dhclient
 >  > > 
 >  > > over say..
 >  > > 
 >  > > [21623.116597] [ 7092]  1000  7092  1051124    31660   3       0             0 trinity-child3
 >  > > 
 >  > > What went wrong here ?
 >  > > 
 >  > > And why does that score look so.. weird.
 >  > > 
 >  > 
 >  > It sounds like it's because pid 588 has uid=0 and the adjustment for root 
 >  > processes is causing an overflow.  I assume this fixes it?
 > 
 > Still doesn't seem right..
 > 
 > eg..
 > 
 > [42309.542776] [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
 > ..
 > [42309.553933] [  500]    81   500     5435        1   4     -13          -900 dbus-daemon
 > ..
 > [42309.597531] [ 9054]  1000  9054   528677    14540   3       0             0 trinity-child3
 > ..
 > 
 > [42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
 > [42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB
 > 
 > and a slew of similar 'wrong process' death spiral kills follows..

So after manually killing all the greedy processes, and getting the box to stop oom-killing
random things, it settled down. But I noticed something odd, that I think I also saw a few
weeks ago..

# free
             total       used       free     shared    buffers     cached
Mem:       3886296    3666924     219372          0       2904      20008
-/+ buffers/cache:    3644012     242284
Swap:      6029308      14488    6014820

What's using up that memory ?

  OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME 
142524 142420  99%    9.67K  47510	  3   1520320K task_struct
142560 142417  99%    1.75K   7920	 18    253440K signal_cache
142428 142302  99%    1.19K   5478	 26    175296K task_xstate
306064 289292  94%    0.36K   6956	 44    111296K debug_objects_cache
143488 143306  99%    0.50K   4484	 32     71744K cred_jar
142560 142421  99%    0.50K   4455       32     71280K task_delay_info
150753 145021  96%    0.45K   4308	 35     68928K kmalloc-128

Why so many task_structs ? There's only 128 processes running, and most of them
are kernel threads.

/sys/kernel/slab/task_struct/alloc_calls shows..

 142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7

I get the impression that the oom-killer hasn't cleaned up properly after killing some of
those forked processes.

any thoughts ?

	Dave

next prev parent reply	other threads:[~2012-06-05 18:52 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-06-04 15:27 oomkillers gone wild Dave Jones
2012-06-04 15:27 ` Dave Jones
2012-06-04 23:30 ` David Rientjes
2012-06-04 23:30   ` David Rientjes
2012-06-05 17:44   ` Dave Jones
2012-06-05 17:44     ` Dave Jones
2012-06-05 18:52     ` Dave Jones [this message]
2012-06-05 18:52       ` Dave Jones
2012-06-08 19:57       ` David Rientjes
2012-06-08 19:57         ` David Rientjes
2012-06-08 20:03         ` Dave Jones
2012-06-08 20:03           ` Dave Jones
2012-06-08 20:37         ` Thomas Gleixner
2012-06-08 20:37           ` Thomas Gleixner
2012-06-10  2:15           ` David Rientjes
2012-06-10  2:15             ` David Rientjes
2012-06-08 20:15     ` David Rientjes
2012-06-08 20:15       ` David Rientjes
2012-06-08 21:03       ` Dave Jones
2012-06-08 21:03         ` Dave Jones
2012-06-10  2:21         ` David Rientjes
2012-06-10  2:21           ` David Rientjes
2012-06-10  3:21           ` KOSAKI Motohiro
2012-06-10  3:21             ` KOSAKI Motohiro
2012-06-10 20:10             ` Dave Jones
2012-06-10 20:10               ` Dave Jones
2012-06-10 23:52               ` David Rientjes
2012-06-10 23:52                 ` David Rientjes
2012-06-11  0:46                 ` Dave Jones
2012-06-11  0:46                   ` Dave Jones
2012-06-11  9:11                   ` [patch 3.5-rc2] mm, oom: fix and cleanup oom score calculations David Rientjes
2012-06-11  9:11                     ` David Rientjes
2012-06-11 19:13                     ` Dave Jones
2012-06-11 19:13                       ` Dave Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120605185239.GA28172@redhat.com \
    --to=davej@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.