All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Josh Boyer <jwboyer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Mel Gorman <mgorman@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: Odd ENOMEM being returned in 3.8-rcX
Date: Fri, 08 Feb 2013 12:13:09 -0800	[thread overview]
Message-ID: <87k3qiwomi.fsf@xmission.com> (raw)
In-Reply-To: <20130208181949.GD31684@hansolo.jdub.homelinux.org> (Josh Boyer's message of "Fri, 8 Feb 2013 13:19:49 -0500")

Josh Boyer <jwboyer@redhat.com> writes:

> On Thu, Feb 07, 2013 at 07:35:01PM -0500, Josh Boyer wrote:
>> On Thu, Feb 07, 2013 at 02:15:02PM -0800, Andrew Morton wrote:
>> > On Thu, 7 Feb 2013 16:57:42 -0500
>> > Josh Boyer <jwboyer@redhat.com> wrote:
>> > 
>> > > Hi All,
>> > > 
>> > > We've hit a weird error in Fedora using the 3.8-rcX kernels.  It seems
>> > > the mock tool is getting back ENOMEM when doing very simple things that
>> > > normally just work.  The 3.7 kernels on the same userspace work just
>> > > fine.  It seems just running 'mock init -v' is enough to cause the
>> > > failure.
>> > 
>> > I assume you're not seeing the "page allocation failure" message and
>> > backtrace.  This means that either
>> 
>> Right.  If I disable our debug options, I see no backtraces at all and
>> the python app still gets ENOMEM returned.  (See below for those
>> interested).
>> 
>> > a) it's a __GFP_NOWARN callsite.  This is rare.  Or
>> > 
>> > b) it's actually a different error but someone went and overwrote a
>> >    callee's return value with -ENOMEM.  We do this a lot and it sucks.
>> 
>> We do it in copy_io :\.
>> 
>> > > At first glance it seems copy_io is failing (possibly because
>> > > get_task_io_context fails), and then the above fallout is printed.  The
>> > > warning seems fairly valid, but I don't think that is the root of the
>> > > problem.
>> > 
>> > yes, get_task_io_context() might be the place.  Tried adding a few
>> > error-path printks in there to see what's happening?
>> 
>> Yeah, that's my next step.  I guess I know what I'll be doing tomorrow.
>> 
>> > I can't see anything around there which leaves interrupts disabled
>> > though.  It's quite likely that there's some code with is forgetting to
>> > reenable interrupts on a rarely-tested error path, and that ENOMEM is
>> > tickling the bug.
>> 
>> Right, agreed.  As I said, I think that is mostly a secondary issue.
>> Hopefully it will be easy to fix once we figure out why we're getting
>> the ENOMEM error.
>> 
>> Python backtrace below.  Seems to be failing on forking a umount command
>> after init'ing the chroot.  I can put the full output somewhere if
>> people are interested.
>
> OK.  I've bisected this down to:
>
> 50804fe3737ca6a5942fdc2057a18a8141d00141 is the first bad commit
> commit 50804fe3737ca6a5942fdc2057a18a8141d00141
> Author: Eric W. Biederman <ebiederm@xmission.com>
> Date:   Tue Mar 2 15:41:50 2010 -0800
>
>     pidns: Support unsharing the pid namespace.
>     
>
> I haven't really gotten much farther than that yet, but the bisect was
> pretty straight forward.  Eric, is there anything specific I can gather
> or do to help figure out why that is causing mock to get such a weird
> error?  I can provide the bisect log if you'd like.

My best guess in some dark corner of mock has untested code to unshare a
pid namespace, and that corner started doing something now that
unsharing of the pid namespace actually works.

If mock has called unshare(CLONE_NEWPID). And then forked a process and
that process exited, and then forked anothe process that second and all
subsequent fork calls will fail with -ENOMEM (because init has exited in
the pid namespace).  -ENOMEM will be generated because of a failure of
alloc_pid.

Looking at that code path a little closer that just about has to be it,
because I goofed and the error path drops the lock but not irqs.  The
patch below should fix the nasty warning and confirm where the code is
failing in copy_process.

An strace to see which syscalls mock is making and with which flags
would be very interesting.  I am almost certain that there is a
unshare(CLONE_NEWPID) somewhere in there.  But in a remote corner of
possibility it could weird clone flags, or something else.

Beyond that I suspect we want to work with the mock folks so they get
their code to use a pid namespace working the way they intended.

Eric

From: "Eric W. Biederman" <ebiederm@xmission.com>
Date: Fri, 8 Feb 2013 12:05:54 -0800
Subject: [PATCH] pid: unlock_irq when alloc_pid fails because init has
 exited.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
---
 kernel/pid.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/pid.c b/kernel/pid.c
index de9af60..f2c6a68 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -331,7 +331,7 @@ out:
 	return pid;
 
 out_unlock:
-	spin_unlock(&pidmap_lock);
+	spin_unlock_irq(&pidmap_lock);
 out_free:
 	while (++i <= ns->level)
 		free_pidmap(pid->numbers + i);
-- 
1.7.5.4

  reply	other threads:[~2013-02-08 20:13 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-07 21:57 Odd ENOMEM being returned in 3.8-rcX Josh Boyer
2013-02-07 22:15 ` Andrew Morton
2013-02-08  0:35   ` Josh Boyer
2013-02-08 18:19     ` Josh Boyer
2013-02-08 20:13       ` Eric W. Biederman [this message]
2013-02-08 20:23         ` Josh Boyer
2013-02-08 20:45           ` Eric W. Biederman
2013-02-08 21:27             ` Josh Boyer
2013-02-08 22:05               ` Eric W. Biederman
2013-02-08 22:40                 ` Clark Williams
2013-02-08 22:10               ` Clark Williams
2013-02-08 22:40                 ` Eric W. Biederman
2013-02-08 22:56                   ` Clark Williams
2013-02-08 22:12         ` Josh Boyer
2013-02-11 23:57         ` Andrew Morton
2013-02-12 10:34           ` Eric W. Biederman
2013-02-08 20:18       ` Josh Boyer
2013-02-08 20:36         ` Eric W. Biederman
2013-02-08 20:40           ` Josh Boyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k3qiwomi.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=jwboyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.