All of lore.kernel.org
 help / color / mirror / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Josh Boyer <jwboyer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Al Viro <viro@zeniv.linux.org.uk>, Mel Gorman <mgorman@suse.de>,
	linux-kernel@vger.kernel.org
Subject: Re: Odd ENOMEM being returned in 3.8-rcX
Date: Fri, 08 Feb 2013 12:36:08 -0800	[thread overview]
Message-ID: <87fw16v8zr.fsf@xmission.com> (raw)
In-Reply-To: <20130208201826.GE31684@hansolo.jdub.homelinux.org> (Josh Boyer's message of "Fri, 8 Feb 2013 15:18:27 -0500")

Josh Boyer <jwboyer@redhat.com> writes:

> On Fri, Feb 08, 2013 at 01:19:49PM -0500, Josh Boyer wrote:
>> On Thu, Feb 07, 2013 at 07:35:01PM -0500, Josh Boyer wrote:
>> > On Thu, Feb 07, 2013 at 02:15:02PM -0800, Andrew Morton wrote:
>> > > On Thu, 7 Feb 2013 16:57:42 -0500
>> > > Josh Boyer <jwboyer@redhat.com> wrote:
>> > > 
>> > > > Hi All,
>> > > > 
>> > > > We've hit a weird error in Fedora using the 3.8-rcX kernels.  It seems
>> > > > the mock tool is getting back ENOMEM when doing very simple things that
>> > > > normally just work.  The 3.7 kernels on the same userspace work just
>> > > > fine.  It seems just running 'mock init -v' is enough to cause the
>> > > > failure.
>> > > 
>> > > I assume you're not seeing the "page allocation failure" message and
>> > > backtrace.  This means that either
>> > 
>> > Right.  If I disable our debug options, I see no backtraces at all and
>> > the python app still gets ENOMEM returned.  (See below for those
>> > interested).
>> > 
>> > > a) it's a __GFP_NOWARN callsite.  This is rare.  Or
>> > > 
>> > > b) it's actually a different error but someone went and overwrote a
>> > >    callee's return value with -ENOMEM.  We do this a lot and it sucks.
>> > 
>> > We do it in copy_io :\.
>> > 
>> > > > At first glance it seems copy_io is failing (possibly because
>> > > > get_task_io_context fails), and then the above fallout is printed.  The
>> > > > warning seems fairly valid, but I don't think that is the root of the
>> > > > problem.
>> > > 
>> > > yes, get_task_io_context() might be the place.  Tried adding a few
>> > > error-path printks in there to see what's happening?
>> > 
>> > Yeah, that's my next step.  I guess I know what I'll be doing tomorrow.
>> > 
>> > > I can't see anything around there which leaves interrupts disabled
>> > > though.  It's quite likely that there's some code with is forgetting to
>> > > reenable interrupts on a rarely-tested error path, and that ENOMEM is
>> > > tickling the bug.
>> > 
>> > Right, agreed.  As I said, I think that is mostly a secondary issue.
>> > Hopefully it will be easy to fix once we figure out why we're getting
>> > the ENOMEM error.
>> > 
>> > Python backtrace below.  Seems to be failing on forking a umount command
>> > after init'ing the chroot.  I can put the full output somewhere if
>> > people are interested.
>> 
>> OK.  I've bisected this down to:
>> 
>> 50804fe3737ca6a5942fdc2057a18a8141d00141 is the first bad commit
>> commit 50804fe3737ca6a5942fdc2057a18a8141d00141
>> Author: Eric W. Biederman <ebiederm@xmission.com>
>> Date:   Tue Mar 2 15:41:50 2010 -0800
>> 
>>     pidns: Support unsharing the pid namespace.
>>     
>> 
>> I haven't really gotten much farther than that yet, but the bisect was
>> pretty straight forward.  Eric, is there anything specific I can gather
>> or do to help figure out why that is causing mock to get such a weird
>> error?  I can provide the bisect log if you'd like.
>
> I took a look at what mock was doing and it was mostly very simple
> stuff.  The two exceptions were that it was calling unshare, then doing
> some file checks and I/O, and then calling fork to exec off some helper
> things.  Up until the point it fails, the forks work and the children go
> do whatever it is they were supposed to do.  I've CC'd Clark Williams
> just in case people have questions on mock itself, but I'm not sure that
> will be needed.

Our emails crossed paths.  You have just confirmed my suspicion about
what was going wrong.

The practical question is why mock is calling unshare(CLONE_NEWPID)
because it clearly seems not to understand how to unshare the pid
namespace and use it that way.

Except for forgeting to reenable irqs in the failure path of alloc_pid
the behavior is exactly correct and is how the pid namespace is designed
to behave in the case of unshare.


> which is consistent with what mock is seeing.  If I comment out the call
> to unshare, it seems to always work.  It seems to consistently fail with
> ENOMEM after the first 3-5 forked children, but it varies within that
> range.

If you add a waitpid or space out your forks you will see that it always
fails after your first child in the pid namespace has exited.

We don't allow children in a pid namespace after fork has exited.

Eric

  reply	other threads:[~2013-02-08 20:36 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-07 21:57 Odd ENOMEM being returned in 3.8-rcX Josh Boyer
2013-02-07 22:15 ` Andrew Morton
2013-02-08  0:35   ` Josh Boyer
2013-02-08 18:19     ` Josh Boyer
2013-02-08 20:13       ` Eric W. Biederman
2013-02-08 20:23         ` Josh Boyer
2013-02-08 20:45           ` Eric W. Biederman
2013-02-08 21:27             ` Josh Boyer
2013-02-08 22:05               ` Eric W. Biederman
2013-02-08 22:40                 ` Clark Williams
2013-02-08 22:10               ` Clark Williams
2013-02-08 22:40                 ` Eric W. Biederman
2013-02-08 22:56                   ` Clark Williams
2013-02-08 22:12         ` Josh Boyer
2013-02-11 23:57         ` Andrew Morton
2013-02-12 10:34           ` Eric W. Biederman
2013-02-08 20:18       ` Josh Boyer
2013-02-08 20:36         ` Eric W. Biederman [this message]
2013-02-08 20:40           ` Josh Boyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fw16v8zr.fsf@xmission.com \
    --to=ebiederm@xmission.com \
    --cc=akpm@linux-foundation.org \
    --cc=jwboyer@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.