From: Dave Jones <davej@redhat.com>
To: David Rientjes <rientjes@google.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: oomkillers gone wild.
Date: Tue, 5 Jun 2012 14:52:39 -0400 [thread overview]
Message-ID: <20120605185239.GA28172@redhat.com> (raw)
In-Reply-To: <20120605174454.GA23867@redhat.com>
On Tue, Jun 05, 2012 at 01:44:54PM -0400, Dave Jones wrote:
> On Mon, Jun 04, 2012 at 04:30:57PM -0700, David Rientjes wrote:
> > On Mon, 4 Jun 2012, Dave Jones wrote:
> >
> > > we picked this..
> > >
> > > [21623.066911] [ 588] 0 588 22206 1 2 0 0 dhclient
> > >
> > > over say..
> > >
> > > [21623.116597] [ 7092] 1000 7092 1051124 31660 3 0 0 trinity-child3
> > >
> > > What went wrong here ?
> > >
> > > And why does that score look so.. weird.
> > >
> >
> > It sounds like it's because pid 588 has uid=0 and the adjustment for root
> > processes is causing an overflow. I assume this fixes it?
>
> Still doesn't seem right..
>
> eg..
>
> [42309.542776] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> ..
> [42309.553933] [ 500] 81 500 5435 1 4 -13 -900 dbus-daemon
> ..
> [42309.597531] [ 9054] 1000 9054 528677 14540 3 0 0 trinity-child3
> ..
>
> [42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
> [42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB
>
> and a slew of similar 'wrong process' death spiral kills follows..
So after manually killing all the greedy processes, and getting the box to stop oom-killing
random things, it settled down. But I noticed something odd, that I think I also saw a few
weeks ago..
# free
total used free shared buffers cached
Mem: 3886296 3666924 219372 0 2904 20008
-/+ buffers/cache: 3644012 242284
Swap: 6029308 14488 6014820
What's using up that memory ?
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
142524 142420 99% 9.67K 47510 3 1520320K task_struct
142560 142417 99% 1.75K 7920 18 253440K signal_cache
142428 142302 99% 1.19K 5478 26 175296K task_xstate
306064 289292 94% 0.36K 6956 44 111296K debug_objects_cache
143488 143306 99% 0.50K 4484 32 71744K cred_jar
142560 142421 99% 0.50K 4455 32 71280K task_delay_info
150753 145021 96% 0.45K 4308 35 68928K kmalloc-128
Why so many task_structs ? There's only 128 processes running, and most of them
are kernel threads.
/sys/kernel/slab/task_struct/alloc_calls shows..
142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7
I get the impression that the oom-killer hasn't cleaned up properly after killing some of
those forked processes.
any thoughts ?
Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Dave Jones <davej@redhat.com>
To: David Rientjes <rientjes@google.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
linux-mm@kvack.org
Subject: Re: oomkillers gone wild.
Date: Tue, 5 Jun 2012 14:52:39 -0400 [thread overview]
Message-ID: <20120605185239.GA28172@redhat.com> (raw)
In-Reply-To: <20120605174454.GA23867@redhat.com>
On Tue, Jun 05, 2012 at 01:44:54PM -0400, Dave Jones wrote:
> On Mon, Jun 04, 2012 at 04:30:57PM -0700, David Rientjes wrote:
> > On Mon, 4 Jun 2012, Dave Jones wrote:
> >
> > > we picked this..
> > >
> > > [21623.066911] [ 588] 0 588 22206 1 2 0 0 dhclient
> > >
> > > over say..
> > >
> > > [21623.116597] [ 7092] 1000 7092 1051124 31660 3 0 0 trinity-child3
> > >
> > > What went wrong here ?
> > >
> > > And why does that score look so.. weird.
> > >
> >
> > It sounds like it's because pid 588 has uid=0 and the adjustment for root
> > processes is causing an overflow. I assume this fixes it?
>
> Still doesn't seem right..
>
> eg..
>
> [42309.542776] [ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name
> ..
> [42309.553933] [ 500] 81 500 5435 1 4 -13 -900 dbus-daemon
> ..
> [42309.597531] [ 9054] 1000 9054 528677 14540 3 0 0 trinity-child3
> ..
>
> [42309.643057] Out of memory: Kill process 500 (dbus-daemon) score 511952 or sacrifice child
> [42309.643620] Killed process 500 (dbus-daemon) total-vm:21740kB, anon-rss:0kB, file-rss:4kB
>
> and a slew of similar 'wrong process' death spiral kills follows..
So after manually killing all the greedy processes, and getting the box to stop oom-killing
random things, it settled down. But I noticed something odd, that I think I also saw a few
weeks ago..
# free
total used free shared buffers cached
Mem: 3886296 3666924 219372 0 2904 20008
-/+ buffers/cache: 3644012 242284
Swap: 6029308 14488 6014820
What's using up that memory ?
OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
142524 142420 99% 9.67K 47510 3 1520320K task_struct
142560 142417 99% 1.75K 7920 18 253440K signal_cache
142428 142302 99% 1.19K 5478 26 175296K task_xstate
306064 289292 94% 0.36K 6956 44 111296K debug_objects_cache
143488 143306 99% 0.50K 4484 32 71744K cred_jar
142560 142421 99% 0.50K 4455 32 71280K task_delay_info
150753 145021 96% 0.45K 4308 35 68928K kmalloc-128
Why so many task_structs ? There's only 128 processes running, and most of them
are kernel threads.
/sys/kernel/slab/task_struct/alloc_calls shows..
142421 copy_process.part.21+0xbb/0x1790 age=8/19929576/48173720 pid=0-16867 cpus=0-7
I get the impression that the oom-killer hasn't cleaned up properly after killing some of
those forked processes.
any thoughts ?
Dave
next prev parent reply other threads:[~2012-06-05 18:52 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-04 15:27 oomkillers gone wild Dave Jones
2012-06-04 15:27 ` Dave Jones
2012-06-04 23:30 ` David Rientjes
2012-06-04 23:30 ` David Rientjes
2012-06-05 17:44 ` Dave Jones
2012-06-05 17:44 ` Dave Jones
2012-06-05 18:52 ` Dave Jones [this message]
2012-06-05 18:52 ` Dave Jones
2012-06-08 19:57 ` David Rientjes
2012-06-08 19:57 ` David Rientjes
2012-06-08 20:03 ` Dave Jones
2012-06-08 20:03 ` Dave Jones
2012-06-08 20:37 ` Thomas Gleixner
2012-06-08 20:37 ` Thomas Gleixner
2012-06-10 2:15 ` David Rientjes
2012-06-10 2:15 ` David Rientjes
2012-06-08 20:15 ` David Rientjes
2012-06-08 20:15 ` David Rientjes
2012-06-08 21:03 ` Dave Jones
2012-06-08 21:03 ` Dave Jones
2012-06-10 2:21 ` David Rientjes
2012-06-10 2:21 ` David Rientjes
2012-06-10 3:21 ` KOSAKI Motohiro
2012-06-10 3:21 ` KOSAKI Motohiro
2012-06-10 20:10 ` Dave Jones
2012-06-10 20:10 ` Dave Jones
2012-06-10 23:52 ` David Rientjes
2012-06-10 23:52 ` David Rientjes
2012-06-11 0:46 ` Dave Jones
2012-06-11 0:46 ` Dave Jones
2012-06-11 9:11 ` [patch 3.5-rc2] mm, oom: fix and cleanup oom score calculations David Rientjes
2012-06-11 9:11 ` David Rientjes
2012-06-11 19:13 ` Dave Jones
2012-06-11 19:13 ` Dave Jones
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120605185239.GA28172@redhat.com \
--to=davej@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.