All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [3.2-rc3] OOM killer doesn't kill the obvious memory hog
Date: Fri, 2 Dec 2011 14:31:48 +1100	[thread overview]
Message-ID: <20111202033148.GA7046@dastard> (raw)
In-Reply-To: <20111202015921.GZ7046@dastard>

On Fri, Dec 02, 2011 at 12:59:21PM +1100, Dave Chinner wrote:
> On Thu, Dec 01, 2011 at 02:35:31PM -0800, David Rientjes wrote:
> > On Thu, 1 Dec 2011, Dave Chinner wrote:
> > 
> > > > /*
> > > >  * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for
> > > >  * pid.
> > > >  */
> > > > #define OOM_SCORE_ADJ_MIN       (-1000)
> > > > 
> > > >  
> > > > IIUC, this task cannot be killed by oom-killer because of oom_score_adj settings.
> > > 
> > > It's not me or the test suite that setting this, so it's something
> > > the kernel must be doing automagically.
> > > 
> > 
> > The kernel does not set oom_score_adj to ever disable oom killing for a 
> > thread.  The only time the kernel touches oom_score_adj is when setting it 
> > to "1000" in ksm and swap to actually prefer a memory allocator for oom 
> > killing.
> > 
> > It's also possible to change this value via the deprecated 
> > /proc/pid/oom_adj interface until it is removed next year.  Check your 
> > dmesg for warnings about using the deprecated oom_adj interface or change 
> > the printk_once() in oom_adjust_write() to a normal printk() to catch it.
> 
> No warnings at all, as I've already said. If it is userspace,
> whatever is doing it is using the oom_score_adj interface correctly.

.....

> <sigh>
> 
> The reports all cycle around this loop:
> 
> 	linux-mm says userspace/distro problem
> 	distro says openssh problem
> 	openssh says kernel problem
> 
> And there doesn't appear to be any resolution in any of the reports,
> just circular finger pointing and frustrated users.
> 
> I can't find anything in the distro startup or udev scripts that
> modify the oom parameters, and the openssh guys say they only
> pass on the value inhereted from ssh's parent process, so it clearly
> not obvious where the bug lies at this point. It's been around for
> some time, though...
> 
> More digging to do...

A working sshd startup and login:

Dec  2 13:16:32 test-2 sshd[2119]: Set /proc/self/oom_score_adj from 0 to -1000
Dec  2 13:16:32 test-2 sshd[2119]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 13:16:32 test-2 sshd[2119]: Server listening on 0.0.0.0 port 22.
Dec  2 13:16:32 test-2 sshd[2119]: socket: Address family not supported by protocol
Dec  2 13:16:36 test-2 sshd[2119]: debug1: Forked child 2576.
Dec  2 13:16:36 test-2 sshd[2576]: Set /proc/self/oom_score_adj to 0

The child process sets itself back to 0 correctly. Now, a non-working
startup and login:

Dec  2 13:19:56 test-2 sshd[2126]: Set /proc/self/oom_score_adj from 0 to -1000
Dec  2 13:19:56 test-2 sshd[2126]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 13:19:56 test-2 sshd[2126]: Server listening on 0.0.0.0 port 22.
Dec  2 13:19:56 test-2 sshd[2126]: socket: Address family not supported by protocol
Dec  2 13:19:57 test-2 sshd[2126]: Received signal 15; terminating.
Dec  2 13:19:57 test-2 sshd[2317]: Set /proc/self/oom_score_adj from -1000 to -1000
Dec  2 13:19:57 test-2 sshd[2317]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 13:19:57 test-2 sshd[2317]: Server listening on 0.0.0.0 port 22.
Dec  2 13:19:57 test-2 sshd[2317]: socket: Address family not supported by protocol
Dec  2 13:20:01 test-2 sshd[2317]: debug1: Forked child 2322.
Dec  2 13:20:01 test-2 sshd[2322]: Set /proc/self/oom_score_adj to -1000

Somewhere in the statup process, a sshd process is getting a SIGTERM
and dying. It is then restarted immediately form a context that has
a /proc/self/oom_score_adj value of -1000, which is where the
problem lies. So, how does this occur? Looks like a distro problem -
I added 'echo "sshd restart" >> /var/log/auth.log' to the
/etc/init.d/ssh restart command, and this pops out now:

Dec  2 14:00:08 test-2 sshd[2037]: debug3: oom_adjust_setup
Dec  2 14:00:08 test-2 sshd[2037]: Set /proc/self/oom_score_adj from 0 to -1000
Dec  2 14:00:08 test-2 sshd[2037]: debug2: fd 3 setting O_NONBLOCK
Dec  2 14:00:08 test-2 sshd[2037]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 14:00:08 test-2 sshd[2037]: Server listening on 0.0.0.0 port 22.
Dec  2 14:00:08 test-2 sshd[2037]: socket: Address family not supported by protocol
sshd restart
Dec  2 14:00:11 test-2 sshd[2037]: Received signal 15; terminating.
Dec  2 14:00:11 test-2 sshd[2330]: debug3: oom_adjust_setup
Dec  2 14:00:11 test-2 sshd[2330]: Set /proc/self/oom_score_adj from -1000 to -1000

So, something in a startup script is causing an sshd restart from a
context where the oom_score_adj = -1000. There's only two
possibilities here - the dhcp client bringing the network interface
up or a udev event after the sshd has been started.

Bingo:

$ sudo ifup --verbose eth0
Configuring interface eth0=eth0 (inet)
run-parts --verbose /etc/network/if-pre-up.d
run-parts: executing /etc/network/if-pre-up.d/bridge
run-parts: executing /etc/network/if-pre-up.d/uml-utilities

dhclient -v -pf /var/run/dhclient.eth0.pid -lf
/var/lib/dhcp/dhclient.eth0.leases eth0
Internet Systems Consortium DHCP Client 4.1.1-P1
Copyright 2004-2010 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/eth0/00:e4:b6:63:63:6e
Sending on   LPF/eth0/00:e4:b6:63:63:6e
Sending on   Socket/fallback
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6
DHCPOFFER from 192.168.1.254
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.254
bound to 192.168.1.61 -- renewal in 17198 seconds.
run-parts --verbose /etc/network/if-up.d
run-parts: executing /etc/network/if-up.d/mountnfs
run-parts: executing /etc/network/if-up.d/ntpdate
run-parts: executing /etc/network/if-up.d/openssh-server
run-parts: executing /etc/network/if-up.d/uml-utilities
$

and /etc/network/if-up.d/openssh-server does a restart on the
ssh server.

And this is set in /etc/network/interfaces:

allow-hotplug eth0

which means udev can execute the ifup command whenteh device
appears, asynchronously to the startup scripts that are running.

So, it's a distro bug - sshd should never be started from from udev
context because of this inherited oom_score_adj thing.
Interestingly, the ifup ssh restart script says this:

# We'd like to use 'reload' here, but it has some problems; see #502444.
if [ -x /usr/sbin/invoke-rc.d ]; then
        invoke-rc.d ssh restart >/dev/null 2>&1 || true
else
        /etc/init.d/ssh restart >/dev/null 2>&1 || true
fi

Bug 502444 describes the exact startup race condition that I've just
found. It does a ssh server restart because reload causes the sshd
server to fail to start if a start is currently in progress.  So,
rather than solving the start vs reload race condition, it got a
bandaid (use restart to restart sshd from the reload context) and
left it as a landmine.....

<sigh>

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Dave Chinner <david@fromorbit.com>
To: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [3.2-rc3] OOM killer doesn't kill the obvious memory hog
Date: Fri, 2 Dec 2011 14:31:48 +1100	[thread overview]
Message-ID: <20111202033148.GA7046@dastard> (raw)
In-Reply-To: <20111202015921.GZ7046@dastard>

On Fri, Dec 02, 2011 at 12:59:21PM +1100, Dave Chinner wrote:
> On Thu, Dec 01, 2011 at 02:35:31PM -0800, David Rientjes wrote:
> > On Thu, 1 Dec 2011, Dave Chinner wrote:
> > 
> > > > /*
> > > >  * /proc/<pid>/oom_score_adj set to OOM_SCORE_ADJ_MIN disables oom killing for
> > > >  * pid.
> > > >  */
> > > > #define OOM_SCORE_ADJ_MIN       (-1000)
> > > > 
> > > >  
> > > > IIUC, this task cannot be killed by oom-killer because of oom_score_adj settings.
> > > 
> > > It's not me or the test suite that setting this, so it's something
> > > the kernel must be doing automagically.
> > > 
> > 
> > The kernel does not set oom_score_adj to ever disable oom killing for a 
> > thread.  The only time the kernel touches oom_score_adj is when setting it 
> > to "1000" in ksm and swap to actually prefer a memory allocator for oom 
> > killing.
> > 
> > It's also possible to change this value via the deprecated 
> > /proc/pid/oom_adj interface until it is removed next year.  Check your 
> > dmesg for warnings about using the deprecated oom_adj interface or change 
> > the printk_once() in oom_adjust_write() to a normal printk() to catch it.
> 
> No warnings at all, as I've already said. If it is userspace,
> whatever is doing it is using the oom_score_adj interface correctly.

.....

> <sigh>
> 
> The reports all cycle around this loop:
> 
> 	linux-mm says userspace/distro problem
> 	distro says openssh problem
> 	openssh says kernel problem
> 
> And there doesn't appear to be any resolution in any of the reports,
> just circular finger pointing and frustrated users.
> 
> I can't find anything in the distro startup or udev scripts that
> modify the oom parameters, and the openssh guys say they only
> pass on the value inhereted from ssh's parent process, so it clearly
> not obvious where the bug lies at this point. It's been around for
> some time, though...
> 
> More digging to do...

A working sshd startup and login:

Dec  2 13:16:32 test-2 sshd[2119]: Set /proc/self/oom_score_adj from 0 to -1000
Dec  2 13:16:32 test-2 sshd[2119]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 13:16:32 test-2 sshd[2119]: Server listening on 0.0.0.0 port 22.
Dec  2 13:16:32 test-2 sshd[2119]: socket: Address family not supported by protocol
Dec  2 13:16:36 test-2 sshd[2119]: debug1: Forked child 2576.
Dec  2 13:16:36 test-2 sshd[2576]: Set /proc/self/oom_score_adj to 0

The child process sets itself back to 0 correctly. Now, a non-working
startup and login:

Dec  2 13:19:56 test-2 sshd[2126]: Set /proc/self/oom_score_adj from 0 to -1000
Dec  2 13:19:56 test-2 sshd[2126]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 13:19:56 test-2 sshd[2126]: Server listening on 0.0.0.0 port 22.
Dec  2 13:19:56 test-2 sshd[2126]: socket: Address family not supported by protocol
Dec  2 13:19:57 test-2 sshd[2126]: Received signal 15; terminating.
Dec  2 13:19:57 test-2 sshd[2317]: Set /proc/self/oom_score_adj from -1000 to -1000
Dec  2 13:19:57 test-2 sshd[2317]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 13:19:57 test-2 sshd[2317]: Server listening on 0.0.0.0 port 22.
Dec  2 13:19:57 test-2 sshd[2317]: socket: Address family not supported by protocol
Dec  2 13:20:01 test-2 sshd[2317]: debug1: Forked child 2322.
Dec  2 13:20:01 test-2 sshd[2322]: Set /proc/self/oom_score_adj to -1000

Somewhere in the statup process, a sshd process is getting a SIGTERM
and dying. It is then restarted immediately form a context that has
a /proc/self/oom_score_adj value of -1000, which is where the
problem lies. So, how does this occur? Looks like a distro problem -
I added 'echo "sshd restart" >> /var/log/auth.log' to the
/etc/init.d/ssh restart command, and this pops out now:

Dec  2 14:00:08 test-2 sshd[2037]: debug3: oom_adjust_setup
Dec  2 14:00:08 test-2 sshd[2037]: Set /proc/self/oom_score_adj from 0 to -1000
Dec  2 14:00:08 test-2 sshd[2037]: debug2: fd 3 setting O_NONBLOCK
Dec  2 14:00:08 test-2 sshd[2037]: debug1: Bind to port 22 on 0.0.0.0.
Dec  2 14:00:08 test-2 sshd[2037]: Server listening on 0.0.0.0 port 22.
Dec  2 14:00:08 test-2 sshd[2037]: socket: Address family not supported by protocol
sshd restart
Dec  2 14:00:11 test-2 sshd[2037]: Received signal 15; terminating.
Dec  2 14:00:11 test-2 sshd[2330]: debug3: oom_adjust_setup
Dec  2 14:00:11 test-2 sshd[2330]: Set /proc/self/oom_score_adj from -1000 to -1000

So, something in a startup script is causing an sshd restart from a
context where the oom_score_adj = -1000. There's only two
possibilities here - the dhcp client bringing the network interface
up or a udev event after the sshd has been started.

Bingo:

$ sudo ifup --verbose eth0
Configuring interface eth0=eth0 (inet)
run-parts --verbose /etc/network/if-pre-up.d
run-parts: executing /etc/network/if-pre-up.d/bridge
run-parts: executing /etc/network/if-pre-up.d/uml-utilities

dhclient -v -pf /var/run/dhclient.eth0.pid -lf
/var/lib/dhcp/dhclient.eth0.leases eth0
Internet Systems Consortium DHCP Client 4.1.1-P1
Copyright 2004-2010 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on LPF/eth0/00:e4:b6:63:63:6e
Sending on   LPF/eth0/00:e4:b6:63:63:6e
Sending on   Socket/fallback
DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 6
DHCPOFFER from 192.168.1.254
DHCPREQUEST on eth0 to 255.255.255.255 port 67
DHCPACK from 192.168.1.254
bound to 192.168.1.61 -- renewal in 17198 seconds.
run-parts --verbose /etc/network/if-up.d
run-parts: executing /etc/network/if-up.d/mountnfs
run-parts: executing /etc/network/if-up.d/ntpdate
run-parts: executing /etc/network/if-up.d/openssh-server
run-parts: executing /etc/network/if-up.d/uml-utilities
$

and /etc/network/if-up.d/openssh-server does a restart on the
ssh server.

And this is set in /etc/network/interfaces:

allow-hotplug eth0

which means udev can execute the ifup command whenteh device
appears, asynchronously to the startup scripts that are running.

So, it's a distro bug - sshd should never be started from from udev
context because of this inherited oom_score_adj thing.
Interestingly, the ifup ssh restart script says this:

# We'd like to use 'reload' here, but it has some problems; see #502444.
if [ -x /usr/sbin/invoke-rc.d ]; then
        invoke-rc.d ssh restart >/dev/null 2>&1 || true
else
        /etc/init.d/ssh restart >/dev/null 2>&1 || true
fi

Bug 502444 describes the exact startup race condition that I've just
found. It does a ssh server restart because reload causes the sshd
server to fail to start if a start is currently in progress.  So,
rather than solving the start vs reload race condition, it got a
bandaid (use restart to restart sshd from the reload context) and
left it as a landmine.....

<sigh>

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2011-12-02  3:32 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-01  9:36 [3.2-rc3] OOM killer doesn't kill the obvious memory hog Dave Chinner
2011-12-01  9:36 ` Dave Chinner
2011-12-01  9:50 ` KAMEZAWA Hiroyuki
2011-12-01  9:50   ` KAMEZAWA Hiroyuki
2011-12-01 12:46   ` Dave Chinner
2011-12-01 12:46     ` Dave Chinner
2011-12-01 22:35     ` David Rientjes
2011-12-01 22:35       ` David Rientjes
2011-12-02  1:59       ` Dave Chinner
2011-12-02  1:59         ` Dave Chinner
2011-12-02  3:31         ` Dave Chinner [this message]
2011-12-02  3:31           ` Dave Chinner
2011-12-02  5:44           ` KAMEZAWA Hiroyuki
2011-12-02  5:44             ` KAMEZAWA Hiroyuki
2011-12-04 22:04             ` Dave Chinner
2011-12-04 22:04               ` Dave Chinner
2011-12-06 20:31             ` David Rientjes
2011-12-06 20:31               ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111202033148.GA7046@dastard \
    --to=david@fromorbit.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.